|
[``Forward ambiguity'']
[``Backward ambiguity'']
|
Basically, a lexical steganography system deals with two kinds of sense-ambiguity. We will refer to the sense-ambiguity an encoder is confronted with when deciding which synset the replacements of a specific word should be chosen from as forward ambiguityambiguity@forward, and the sense-ambiguity a decoder is confronted with, when deciding which synset a replacement was originally chosen from as backward ambiguityambiguity@backward.
Let
be the set of words and
be the set of synsets
in a lexicon. We require that
is enumerable and
is a set of synsets with words from
.
In accordance with chapter
, we use a function
to denote the lexical
evidence
of a word
, which is nothing but the set of
synsets that contain
. We use a function
to denote the contextual evidence
of an occurence of
in a text under investigation.
We can think of
8.1as a set of contexts, but we will not be concerned with
the actual data-structure, since it depends on the model employed by
the actual disambiguation system. The disambiguation system will
herein be a function
,
so that
and
implies that
is a correct replacement for
in the specific context
.
A text in which a secret is to be embedded could, for example, contain the
word
.
When the encoder looks up the lemma move in its dictionary,
it will find three synsets:
,
, and
.
These make up the lexical evidence
.
Since there are several alternatives from which to choose, we call
forward-ambiguous.
The disambiguator would be needed to decide upon the correct synset
from which the replacements can be chosen.
If it chooses
from
, we can replace the original word
by a word from
determined for coding-purposes,
for example
.
When decoding the secret again, the decoder would look up run in its
dictionary, and will find several synsets:
,
,
, which make up
the lexical evidence
.
Since there are several alternatives, from which to choose, we call
backward-ambiguous.
At this point, the decoder would have to employ a disambiguator to
decide upon the sense the given replacement was originally chosen from.
If it chooses
from
, we can interpret
as
a replacement from
, therefore correctly decoding the data again.
However, we have to be aware of the fact that the disambiguator might just
as well choose a different sense, a problem we will deal with in the next
section.
The problem of forward-ambiguity is security relevant, since an incorrect identification of the replacements will produce unnatural text. Backwards-ambiguity is relevant for the decoder, since an incorrect identification of the synset the replacements were originally chosen from will result in incorrectly decoding the data coded by the replacement.
If we employ an automated scheme, we cannot do much against the consequences of forward-ambiguity but to use a highly precise word-sense-disambiguator. However, we could get better results if we let a human judge, whether the disambiguator's decision was correct or not. As long as the human judges on the transmitting and receiving ends agree, this will not affect the performance of the code. The first drawback of this scheme is that this might not always be the case, and the second one is that it could take many such decisions, discouraging the use of practical systems implementing such a scheme, because of the user's additional effort.
In the presence of backwards-ambiguity, it is crucial not to see a
word-sense-disambiguator as a black box
where we put in a word, and get out an identification of the set containing
the universally correct replacements.
Given the context of a lexeme
and the
lexical evidence about this lexeme
, the disambiguator simply
estimates which
best fits the context
.
If we replace
by
, we do not change the context, so
will always equal
, but we
have to be aware that
is not necessarily equal to
, and we have to think of the
consequences.
For example, the disambiguator employed in the encoder deciding upon the correct synset to replace move from, might choose the synset also containing the word run, because both motion and strike are very unlikely to appear in the context. The problem is that, if we substitute run for move, we change the lexical evidence. If the decoder would now blindly use a disambiguator to pick the most probable synset to replace run from, then this synset might well be the one containing test, instead of the one containing move, because the context might happen to give such evidence.
We can resolve the problem of backward-ambiguity, by letting an encoder analyze
the lexicon, and decide in advance whether a word should be chosen for coding-purposes
or not. It needs to decide for each possible
which could be replaced for
,
whether or not this replacement would lead to backwards-ambiguity. Looking only
at the lexicon, this could potentially be the case whenever
, which is the reason why Winstein and Chapman
required their synsets to be disjunct. However, if we bring a sense-disambiguator
into the picture, a replacement of
for
involves problematic ambiguity
only if the disambiguators resolving the forward- and backward- ambiguity disagree
about the word-sense. Formally,
| (8.1) |
It might be desirable to avoid using such replacements, so we can be sure the decoder will be able to pick the right synset. However, if a human would be able to pick the correct synset and only a computer would pick the wrong one, then we might even want to provoke this situation, because it is an HIP, giving an arbitrator a hard time trying to automatically analyze the text.
Let
denote the replacements for a word
. It
can easily be determined from the synset
We can now distinguish between
Building upon this classification of replacements we can distinguish
We can basically think of three different scenarios for using these replacements in a coding strategy: