We have seen in the previous chapters why the study of steganography needs to be closely linked to that of the channels supposed to cover steganograms and the interpretation of the usual cover-datagrams.
The structure of this section is aligned along traditional linguistic lines of layers accounting for atomic symbols, syntax relating the symbols and semantics expressing their meanings, approached via lexical, grammatical and ontological models.
Since language is essentially redundant, it will carry information that is irrelevant for understanding its meaning. In the context of steganographic embedding, a good model for redundant information in language suitable for steganography is meaning-preserving substitution. Depending on the approach we employ, the term ``meaning-preserving'' has different interpretations.
The most straightforward subliminal channel in natural language is probably the choice of words. On the word-level, meaning is traditionally linked to the lexical relation of synonymy. For example, consider the following set of covers:

We can use synonymy to encode ten states in the above sentences, since
two information-carrying symbols are available, one of which can take
on five values, whereas the other one can take on two values.
The first is the choice for either wonderful, decent, nice,
fine or great and the other is for either city or town:
We call sets of words that can be used interchangeably synonymy sets or synsets for short and denote them in standard set-notation as

A mixed-radix number is one way to encode a secret state in
a sentence, if we think of state in numeric terms. Mixed-radix
numbers can be written using positional notation
in the following way (, , p. 208):
Given the bases of an
-digit mixed-radix
number
, we can always use
this numerical identity to uniquely write each number
in the range from
to
as a sequence
where
for each
.
Since, analogously, each sequence of bits of a length
can always
be thought of as a binary number in that range, we can
always find a unique mixed-radix representation for it, and
vice-versa. In practice such a conversion can be
done efficiently using Horner's rule. (, p. 635)
gives some examples.
To establish a direct correspondence between a text and a mixed-radix number
encoding document state, let's establish the convention that the leftmost
word in a text always corresponds to the most significant digit in the mixed-radix
representation of the text and words from the text that do not fall into any
synset can be ignored for this purpose. If a text contains
words
that could be replaced from synsets
we represent state for the subliminal channel by an
-digit mixed-radix
number with bases
.
We can then interpret a mixed-radix digit
in the range
as the choice for the
-th word in synset
with respect to alphabetic
order.
Turning back to the example, we would encode one of the ten possible states
by a mixed-radix number in the range from zero to nine:
However, mixed-radix-conversion is computationally rather intensive,
given that all we want to achieve is a uniquely decodable
representation for a bitstring.
A more efficient approach would be to reconstruct the secret message
by concatenating
codewords that are directly associated with a word-choice, instead
of using mixed-radix conversion. The most simplistic approach
would be to restrict synsets to cardinalities that are powers
of two. For example:
The fact that nice is never chosen by the stegosystem,
although it is sometimes chosen by native speakers,
could be security-relevant. It would, in such circumstances be
better to allow a random choice between two synonymous words
that decode to the same bitstring.
The restriction to block-codes clearly reduces capacity, since fewer states can be represented by the same elements. () uses variable-length codes, the Huffman code in particular, for his mimic functions. Not only do variable-length codes lift this restriction, they also make it possible to control the probabilities with which symbols are chosen, as opposed to block-codes where all choices are equally probable:
).
Recall that, given any set of symbols, a binary prefix-code can be
constructed by arranging the symbols as leafs in a tree in which
each node has two branches. Thinking of them as labeled either with
0 or 1, we can assign a binary codeword to each
symbol by concatenating all the bits along the path from the root
to the symbol. Prefix-codes have the property that they never assign a
codeword that is the prefix of another codeword, which is important because
otherwise a string resulting from
concatenation of such codewords could
not be uniquely deciphered.
If
The code's impact on the distribution of words in the cover
follows intuitively. In one of two cases
(0, 1), a random bitstring will start with 0.
Only in one out of four cases (00, 01, 10, 11),
it will start with 10, etc.
The probabilities will, of course, sum up to one, because some choice
is made in any case, and since we are using the binary system,
the probabilities are restricted to negative powers of two. Figure
demonstrates the use of relative entropy for
quantifying the security of such a system. It can be seen that
the restriction to negative powers of two is theoretically exploitable,
however, () shows some approaches that make more precise mimicry
possible.
All approaches we have seen so far have one basic idea in common:
transforming a sequence of symbols
Here
is some code that reinterprets single symbols, and
is some operation that reconstructs the secret message.
We have seen three examples for
: a binary block-code, a
binary variable-length code (in particular the Huffman-code)
and numbers with respect to alphabetic order. The operation
most commonly used for ``
'' is string-concatenation, since
it is simple and computationally efficient. However,
other operations are possible, such as mixed-radix conversion.
() demonstrated a more sophisticated approach with his context-free mimic functions. It provides higher capacities, since it uses not only the choice for a word for steganographic purposes, but also the choice for a syntactic structure, relating the words. His approach closely mimics the real structure of natural language, since it generates context-free structures, just like natural languages do.
Recall that the Kleene-closure
of a set of symbols
is the set of finite sequences that can be constructed by
choosing each of its elements from
(including the empty
sequence).
Further, recall that a context-free grammar
consists of
a set of variables
, a set of terminals
,
a finite set
of productions and a designated start-symbol
. Following (), if
is a
production of
and
and
are strings in
, we define a relation
as
,
(
directly derives
).
If
is the reflexive and transitive closure
of
, then the language characterized by the
grammar
can be defined as
A probabilistic context-free grammar (PCFG) associates a
probability
with each production
to be chosen. If
is a fixed non-terminal variable,
then all productions
are
mutually exclusive and the probabilities of these
productions sum up to one.
Wendy might be a computer using such a grammar to observe
a cover-channel. A datagram
will be suspicious if
and if
, where
is the set of all
productions occuring in the cover-channel and
is the set of productions the stegosystem actually uses
to derive
from
.4.1
Then, if Alice wants to send a message
to Bob, she first
has to randomize her message to get a bitstring
with
bits uniformly distributed.
Of course she can not transmit a 0/1-coded random bitstring over a
channel, arbitrated by Wendy, unless
,
but she can use
to generate messages that will be in
.
To do so, she starts with
, and applies productions to non-terminal
symbols until she is left with a grammatically correct message.
Whenever she expands a non-terminal symbol
, and when there
are several productions expanding
, she can use these ``degrees of
freedom'' to encode some bits from the message.
In fact, the approach is very similar to that presented in the
previous section. Alice constructs a sequence
that
serves a dual purpose.
First,
will be a sequence of productions
deriving a message
from
, i.e.
If Alice and Bob agree on some parsing-conventions,
this sequence will be uniquely recoverable, given only
the steganogram
and the grammar
, and therefore
the bitstring will be recoverable as well.
On the other hand, Alice and Bob can be sure that
is innocuous to Wendy, and therefore they have exploited
a subliminal channel.
Let's consider the example used by ():
Figure
shows a PCFG. The probabilities are
given in parentheses. We could have measured them by counting
the occurences of the productions in a treebank (a statistically
representative collection of syntax-trees). This grammar will,
for example, derive a cover
and

shows the bitstrings, assigned by the
Huffman trees for each of the non-terminals.
We can now go through a complete example. The ``
''
will be used to denote the current state in reading the secret
bitstring and in grammatical production.
), use it to expand
).
Note that, as opposed to replacement of lexically synonymous words, mimicry completely disregards semantic aspects of language. Data is not hidden in linguistic ambiguity, but in semantically significant parts of language. Therefore mimic-functions, as studied by Wayner, can only fool machines, not humans.
Of the techniques considered herein, the ontological one is the most sophisticated approach with respect to modelling semantics. Instead of implicitly leaving semantics intact by replacing only synonymous words while embedding information into an innocuous text, an explicit model for ``meaning'' is used to evaluate equivalence between texts.
Consider the following examples:
For this to be successful in the above case, we would have to be able to translate between English and ANL, between German and ANL, we would have to understand sentences in active-voice and passive-voice, and we would have to make use of some knowledge about the world under discussion, such as the fact that Steve's guitar is named ``Evo''.
One formalism traditionally considered a good candidate for an ANL is First-Order Predicate-Calculus. The meaning of all of the four above sentences could be represented by

With respect to this model, the above expression formalizes the meaning
of the sentence by the assertion
``There is an
, such that
is the event of playing something,
the actor taking part in
is
,
has the name
,
the experiencer taking part in
is
,
is a
,
and
is called
.''
Linguists call the ``semantic'' model, just presented ``deep structure''. It is a very vague way of modeling the semantics behind a natural language sentence, and therefore it might be questioned whether one can talk about a meaning-representation in the context of deep structure. However, it demonstrates what it takes for a model to go beyond purely syntactic issues.
Natural language generation is the problem of translating such
an ANL representation to a natural language. In the course of
this translation many decisions must be made. Formalizing these
decisions made in the course of generating a sentence is the basic
idea of systemic grammar. One
could say that systemic grammars are somehow anchored in the
``semantic'' view of language while context-free grammars take
a ``more syntactic'' point of view. Therefore systemic grammars
are often the linguistic model of choice, when it comes to automatically
generating sentences. Figure
shows an example.
In this example, generating natural language sentences from meaning, amounts to making the following decisions:
If we have a specific meaning ``in mind'' (or rather represented in ANL), we can make different sequences of choices, as we traverse this network, that do not affect meaning. For example, the decision voice in the grammar tells us, that we can go for active, which means that the semantic actor will syntactically be in subject position, and the semantic experiencer will syntactically end up in object position. This would yield Steve plays the guitar. The grammar also offers passive, which means that the semantic actor will end up in object position, and the semantic experiencer will end up in subject position. Then, the very same meaning will be expressed as The guitar is played by Steve. Almost all such grammatical choices are instances of ambiguity, where one meaning is expressible in many ways. Of course mood is not the only example, transitivity could also be used, by generating two sentences Steve plays. The guitar is played.
If we did not have an ANL-representation of a sentence's meaning available, we could never judge whether or not two sequences of grammatical choices are equivalent in meaning. Therefore, this basic approach was presented as the ``ontological'' one, since it relies on a formal model of the semantics behind the text that should be generated.
Note that the symbolic approach is not symbolic because it replaces symbols, but because it replaces them in such a way, that symbolic ``correctness'' is preserved, and analogously the syntactic approach is not syntactic because it replaces syntactic structures, but because it preserves syntactic correctness. This is crucial to the understanding of the semantic approach, which does not change semantics, but rather preserves a specific formal interpretation of the semantics behind a natural language text.