Throughout the previous sections we have discussed and evaluated different theoretical approaches and practical implementations of stegosystems, taking into account many different criteria. In this section, we will make these criteria explicit, by first proposing objectives of the functionality we expect of natural language stegosystems, and then giving advice on design considerations to keep in mind for their construction and evaluation. We will motivate them by theoretical results we have discussed so far, and by practical lessons learned from the stegosystems presented in the previous sections.
Analyzing security often involves establishing a way to measure a ``degree
of fulfillment'' for the above proposition.
We have seen the approach of (), measuring the security via
the Kullback-Leibler distance
between the
distribution of covers
and the distribution of steganograms
.
Furthermore, we have seen that it measures
security only from an information-theoretic point of view, and that it is
necessary to take into account many other constraints. These could, for example,
arise from syntactic and semantic restrictions imposed by the usual interpretation
of covers. We have discussed why this gets especially difficult to put in formal
terms if this usual interpretation is carried out by humans, as is the case with
natural language steganography.
Furthermore, we have seen that the ``impossibility'' of telling a steganogram from a real cover, without knowing the secret key, is achieved by constructing the stegosystem in such a way that testing a datagram for secret messages involves solving a hard problem. Such a problem could be expressed in terms of its computational complexity or in terms of information-theoretic considerations about ``guessing'' a key. We have seen that an HIP can be such a problem, contributing to a steganogram's security by making it more difficult to handle by automated systems.
This property of a stegosystem is especially important in the presence of an active warden. An active warden will try to prevent subliminal communication, by imposing small changes on all datagrams that pass through a gateway under its control. Robust steganograms will make it more difficult for an active warden to do so.
For watermarking systems, this feature is absolutely imperative, since watermarks must not be removable. Watermarking is probably one of the more realistic applications of natural language steganography. Natural language channels do not offer as much redundancy as other media, so we cannot expect natural language steganograms to offer high capacities for storing secret messages. This is not usually a problem in the context of watermarking, since watermarks are not usually required to be very large. However, for other applications of steganography, capacity usually is an issue. This is another reason why robustness is an important feature for natural language steganography systems.
|
[disjunct synsets]
[natural synsets]
|
Atallah's system is the only one where we cannot disregard the possibility that it could fulfill the ``wishlist'' presented in the previous section. Adapting Winstein's approach to take into account the above considerations would be possible as well.
Robustness is an issue considered only by Atallah. His system is the only one that fulfills (2), by selectively picking the sentences in which transformations should be applied. This way of accounting for robustness is similar to the famous ``battleship'' game. If the active warden happens to correctly guess the position of a data-carrying sentence, and applies transformations to it that affect its ANL-representation, there is no way to recover the data.
Limitations in the sense of (4) are especially relevant for Atallah's system. Natural language systems that rely heavily on ontologic resources have traditionally suffered from the fact that they could not scale out of the microworld-domains their ontologies were designed for and tested in. As a result of (3) however, such restriction to a microworld-domain is not acceptable for natural language steganography. The problem is that, currently, there is no common-sense ontology. The effort of () is a notable exception, however, even their system is far from the vision of truly capturing all of the common sense necessary to understand natural language. Therefore, the question whether Atallah's approach will ``scale out of the laboratory'' is still open.
Wayner's system is the only one that takes (7) into account. Winstein's system relies on mixed-radix coding, which chooses each digit of a mixed-radix number and therefore each word at the same probability. Chapman's system uses a binary block-code. The problem with these approaches is that the distribution of word-choices can be traced back to the secret. If the bits in the secret are uniformly distributed, then the word-choices will be uniformly distributed as well. () give an example where the generator replaces the word what by whichsoever. Clearly, texts where whichsoever occurs just as frequently as what will be suspicious.
Overgeneration of language, in the sense of (8), can occur in Wayner's system, depending on the grammar and in Winstein's system, depending on the lexicon. However, their systems do not overgenerate as heavily as Chapman's. Constructing lexica, style-templates or context-free grammars that don't overgenerate is not usually a problem. However, it is a problem not to make the system heavily undergenerate as a result, therefore violating (9).
For example, Winstein's system undergenerates natural language because
it does not make use of replacements that are sometimes made by human
speakers, due to its restriction to disjunct interchangeability sets.
Figure
shows the impact of disjunct interchangeability sets.
This could be exploited, for example, by analyzing a cover for word-clusters.
Word-clustering methods are well-known techniques, usually used in the
context of statistic natural language learning, to determine words that
commonly appear interchangeably in a given context.
If these clusters provide evidence for strictly disjunct synsets, the cover
is clearly suspicious. Neverthless, such evidence is less significant
in Winstein's embedding-framework than in Chapman's generation-framework,
since the restriction to disjunct interchangeability sets applies only for
a small portion of the words in a text.
A similar phenomenon appears in the context of Wayner's system. It is manifested in the fact that all of the demonstration-grammars used with Wayner's system produce rather repetitive text, simply due to the fact that the demonstration PCFGs are far too small. This weakness could be exploited by analyzing a steganogram using techniques for grammatical rule inference. If it indicates that comparatively small CFGs could have produced the text, and the text never contains non-context-free linguistic constructs, this is strong evidence to suggest a hidden message.
We cannot disregard the possibility that similar approaches might make it possible to exploit Atallah's limited ontology. We have already shown why an ontology needs to be comparatively limited, as opposed to the ``ontology'' used by humans.
From what we have seen so far, the basic approach that is most promising for building a secure and robust natural language steganography system in the near future is a lexical replacement system, similar in principle to Winstein's.
Winstein's system fulfills (4), since it relies only on lexical resources. Since large-scale lexica are available, covering a significant portion of the English language, these resources do not significantly limit the scalability of the whole system. In accordance with (5), it is based on embedding, which makes undergeneration in the sense of (9) less significant a topic, makes it possible to use the system for watermarking, and is a more realistic scenario to produce text which will be innocuous, even to human arbitrators in the sense of (3).
To fulfill (6), it would perhaps be necessary to filter out all terms from the lexicon that appear very seldom in natural text. (For example, we have seen an interchangeability set where three is replaced by the Roman number iii).
In order to fulfill (7), one would have to integrate some of Wayner's ideas into the system, for example, using a variable-length code to mimic the distribution of word-choices.
The linguistic model would have to be more accurate, in terms of (8) and (9). We have to keep in mind that if WordNet contains a synset, then this synset is relative to a linguistic context, so if we find a replacement for a word in the lexicon, then all this has to say is that there exists a context in which we could replace the word. When we don't examine the actual context of the word in the text where we make the substitution, the system will overgenerate as a result of substituting the word, although the context doesn't permit doing so. As a result of the limitation to disjunct synsets, the system will undergenerate because there will be replacements a human would apply that would violate the stegosystem's constraint of keeping synsets disjunct.
In order to make the linugistic model more accurate, one would have to lift the restriction to disjunct interchangeability sets and integrate a statistic word-sense disambiguator instead. That these techniques cannot always resolve sense-ambiguities has three interesting side-effects.
First of all, during embedding, whenever the ambiguity cannot be resolved, this is strong evidence that the context does not allow any substitution without significantly changing the meaning of the overall text. Therefore the system can skip such words, not making any replacement. We could not get such evidence, from a lexicon with disjunct synsets.
Secondly, during extraction, it will not always be possible to tell which synset a replacement was originally chosen from. Chapman and Winstein saw this as a disadvantage, because it is a significant obstacle when it comes to automatically extracting the secret again. I would seriously like to question this. The fact that it is very difficult in this situation for an automatic analyzer to extract the secret is actually an HIP keeping an arbitrator from analyzing the secret underlying the text, therefore providing additional security in the sense of (1). An extractor could be constructed in such a way that a human would have to choose the senses of ``problematic'' words. This will not be a problem for extracting the secret from a cover that is known to be a steganogram by its legitimate receiver. However, it will be a problem for an arbitrator when automatically analyzing large amounts of text.
We have already mentioned the need to choose words according to their probabilities of actually occuring in the text. Instead of relying on each replacement for a word to be equally probable, we need a model capturing the probability of all the possible replacements in a given linguistic context, so a variable-length code, as mentioned before, can be constructed. This is the third interesting side-effect, since models for use with statistic WSD will usually incorporate knowledge about the probability of a word appearing in the given context.