If we are not satisfied with establishing robustness implicitly, similarly to how Atallah did, error-correction quickly becomes an issue. Error-correcting codes would enable us to actively reconstruct data after receiving it incorrectly (as a result of Wendy, the active warden, attacking a steganogram by substituting words used for coding the secret) as long as there are not too many errors.
However, the automatic construction of error-correcting codes is traditionally
studied only in the context of q-ary symmetric channelschannel@q-ary.
Generally we can think of steganograms and covers as datagrams of a fixed length
represented by
-tuples
.
Datagrams transmitted by
-ary channels are restricted to choosing each
from a
fixed alphabet
of cardinality
, so that
, i.e.
datagrams that can be transmitted over a
-ary channel are chosen from
.
However, this is not the case in lexical steganography, if we wish to code the
data into word-replacements
, where each
is chosen
from a different synset with a different cardinality, so that
.
As opposed to
, we wish to choose our datagrams from
.
We will not deal with the construction of
-ary error-correcting codes in detail.
For readers unfamiliar with the topic of error-correction, it might be helpful to
think of the simplest form of error-correction which is a repetition-code. For example,
in order to encode a value
for submission over a channel with
elements,
we could choose the configuration
from
. If
an error happens (i.e. Wendy changes a value
to
), then
might be incorrectly received. We can then correct it by assuming
instead, because one substitution would suffice to replace
, whereas two substitutions would be necessary
to replace
.
Assuming that a channel is more likely to transmit each value correctly than
erroneously (since Wendy is trying not to completely destroy the datagram),
we can safely decode
to
.
In order to make the well-known techniques for automatic construction of
error-correcting codes for
-ary channels applicable in this scenario,
we will compose the word-choices into blocks, and apply error-correction at
the block-level rather than to word-choices.
We will define a blocking
as a set partition over the index-set
for word-choices in the datagram. Formally,

Again, we will use the numeric properties of word-choices as digits of mixed-radix
numbers for coding, by defining
, the numeric value of a word-choice
, as
, if and only if,
is the
-th replaceable word
in
in alphabetic order. The correspondence between a block's numeric
value
, and the values of each of the assigned word-choices can be
defined by the numeric value of a mixed-radix number, the digits
of which are represented by the word-choices assigned to the block.
This property of a block to
behave as an abstract element reduces the problem of finding a configuration for
the word-choices
to finding a
configuration for the blocks' numeric values
and determining the values
of the word-choices by expressing them as mixed-radix numbers.
Figure
shows this graphically. In this example, there are
word-choices assigned to
blocks. The word-choices
are chosen from synsets
,
,
,
and
.
We can think of this as a mixed-radix
number
We already mentioned that the reason why we compose the word-choices into
blocks is because we wish to make error-correcting codes for
-ary channels
applicable in this scenario. Two methods to achieve this come to mind.
Here
is a variable summand which allows for some deviation
in the blocks' capacities. It is desireable to keep
as small as possible, however, it will be necessary to allow for some deviation, to
allow for efficient assignment of word-choices to blocks, according to Method II,
and to compensate for word-choices of capacities that appear only once for
Method I.
Figures
and
show examples of how the two
methods would choose the blocks.
.
For example, instead of
one single choice
In response to a code, constructed with blocking by Method I, Wendy would pick out exactly one block to attack, since one block is sufficient for the data to be unrecoverable, and then attack as many units within this block as possible, maximizing the probability to succeed in making the whole block unrecoverable. In response to a code constructed with blocking by Method II, she would attack as many blocks as possible, maximizing the probability of breaking the block-error-correction, but would attack no more than one unit within each block since one element is sufficient for the whole block to be unrecoverable.
A simplistic approach to making sure Wendy cannot make use of such strategic considerations, is to make sure she doesn't know the blocking. A straightforward way of achieving this would be to have the encoder ``flip a coin'' to decide which strategy to use. That way, Wendy could not rely on any single strategy to be most successful. However, she will still benefit from choosing one of the above attack-strategies, since she knows it must be one of the two available strategies, or she could use a clever combination like picking one block, in which to attack many units and attacking only one unit in each of the remaining blocks, which would maximize her probability of succeeding over both strategies.
A more sophisticated approach might be to partition the sequence of
units in two areas and handle one area by Method I
and the other by Method II, so that Wendy never knows
how the areas are composed. We could assign units to two areas,
for example, by seeding a random-number generator with a secret key
to generate a bitstring
of length
. Random-number generators
have the important property that they generate a bitstring
in such a way that there are no statistically significant correlations
between any two bits in the string
and
. The value of
could be used to decide upon a blocking-method to use. For example,
a value
could be interpreted by the encoding/decoding-mechanisms
as ``use Method I for unit
'' and
as ``use Method II for
unit
'', as depicted in Figure
.
We can think of a coding-scheme organized in six layers as
shown in Figure
.
all elements
The repetition-code was used in the above setup only as a primitive example of
error-correction. It has many disadvantageous side-effects. For example
is left unassigned in Figure
, and would have to be initialized
randomly. The code is very inefficient, since it needs vast amounts of redundancy
and corrects only a comparatively small number of errors. From a strategic point of
view, it is not optimal either.
Consider, for instance, the consequences of an error in word-choice
. It would
result in altering both
and
, leading the error-correction
astray when reconstructing
.
All of these limitations could be overcome by using more sophisticated codes, for
example Hamming codes. However, we would like to draw attention to the multi-layered
blocking-scheme, rather than to the details of error-correction. The blocking scheme
introduced above, incorporates some degree of security, since it will
not easily be possible to extract the secret from the word-choices without predicting
the pseudorandom numbers assigning elements to coding-strategies. It also provides for
robustness against some attacks, overcoming the problematic requirement of
error-correcting codes to operate on
-ary channels.
,
if we wanted to encode the value
, by:
Figure
shows the impact of an error. If Wendy tries to
destroy the secret, by changing element
from
to
and element
from
to
, we could still recover the
secret.
The error propagates through all the levels of coding, until it is
corrected. Since
is
, instead of
,
gets
, instead
of
. This propagates down to the block-level (since this element happens
to be handled by Method II), where the repetition code would expect
. The decoder now finds the values
instead. Since
was transmitted correctly
twice,
is more likely to be the element which is in error, so the decoder
can assume
.
Similarly, the error at word-choice
changes
from
to
and
propagates only down to the unit-level
(since this element happens to be handled by Method I) where the repetition
code would expect
. The decoder now
finds the values
instead. Since
was
transmitted correctly twice,
is more likely to be the unit which
is in error, so the decoder can assume
, thereby correcting the
error and successfully defending against Wendy's attack.