next up previous contents
Next: 7. Towards Secure and Up: Towards Linguistic Steganography: A Previous: 5. Systems For Natural   Contents

Subsections

6. Lessons Learned

6.1 Objectives for Natural Language Stegosystems

Objectives

Throughout the previous sections we have discussed and evaluated different theoretical approaches and practical implementations of stegosystems, taking into account many different criteria. In this section, we will make these criteria explicit, by first proposing objectives of the functionality we expect of natural language stegosystems, and then giving advice on design considerations to keep in mind for their construction and evaluation. We will motivate them by theoretical results we have discussed so far, and by practical lessons learned from the stegosystems presented in the previous sections.

6.1.0.0.1 Functional objectives

(1)
Security: It must not be possible to distinguish between a steganogram and a naturally occuring cover, without knowing a secret key.

Analyzing security often involves establishing a way to measure a ``degree of fulfillment'' for the above proposition. We have seen the approach of (), measuring the security via the Kullback-Leibler distance $D(C \vert\vert E)$ between the distribution of covers $P(C)$ and the distribution of steganograms $P(E)$. Furthermore, we have seen that it measures security only from an information-theoretic point of view, and that it is necessary to take into account many other constraints. These could, for example, arise from syntactic and semantic restrictions imposed by the usual interpretation of covers. We have discussed why this gets especially difficult to put in formal terms if this usual interpretation is carried out by humans, as is the case with natural language steganography.

Furthermore, we have seen that the ``impossibility'' of telling a steganogram from a real cover, without knowing the secret key, is achieved by constructing the stegosystem in such a way that testing a datagram for secret messages involves solving a hard problem. Such a problem could be expressed in terms of its computational complexity or in terms of information-theoretic considerations about ``guessing'' a key. We have seen that an HIP can be such a problem, contributing to a steganogram's security by making it more difficult to handle by automated systems.

(2)
Robustness: It should not be possible to manipulate a steganogram in such a way that the secret cannot be extracted any more.

This property of a stegosystem is especially important in the presence of an active warden. An active warden will try to prevent subliminal communication, by imposing small changes on all datagrams that pass through a gateway under its control. Robust steganograms will make it more difficult for an active warden to do so.

For watermarking systems, this feature is absolutely imperative, since watermarks must not be removable. Watermarking is probably one of the more realistic applications of natural language steganography. Natural language channels do not offer as much redundancy as other media, so we cannot expect natural language steganograms to offer high capacities for storing secret messages. This is not usually a problem in the context of watermarking, since watermarks are not usually required to be very large. However, for other applications of steganography, capacity usually is an issue. This is another reason why robustness is an important feature for natural language steganography systems.

6.1.0.0.2 Design Considerations

(3)
Kerckhoffs' principle: It is of central importance never to make assumptions about the ``enemy'', except that he does not know the secret key.
This important design consideration comes from experience with cryptosystems in military applications and has proven to be a valuable lecture for engineering security systems. In natural language steganography, we have to be aware of the strategically problematic consequences of propositions of the form, ``suppose the arbitrator is using linguistic model $X$...''. Whenever the arbitrator actually uses a better linguistic model, in terms of more accurately capturing human usage of natural language, the stegosystem is worthless. This has the consequence that the only reliable benchmark by which a system's linguistic performance is to be measured is human ability to understand natural language, as opposed to any formal model. Therefore a steganogram's property to be indistinguishable from real covers will have to be evaluated empirically by humans.

(4)
The state of the art in natural language processing is a significant limiting factor, determining what we can expect a linguistic stegosystem to do.
We have seen that models for true natural language understanding in the context of a symbolic system are still in their infancy, and have discussed the implications on Atallah's system, which relies on an explicit meaning-representation. We have mentioned the inadequacy of context-free grammars as models for natural language and discussed the implications on Wayner's system, which relies on PCFGs to characterize innocuous covers.

(5)
Systems following an approach of generating innocuous text, rather than embedding secrets in given texts, are unlikely to yield practically applicable results in the near future.
Partially motivated by applicability to watermarking, and as a result of (4), we believe that embedding is the right paradigm for constructing practical systems for natural language steganography. None of the generation-based systems constructed so far could fool a human into thinking their steganograms were innocuous text. The authors of these systems argued that their systems targeted automated arbitrators operating according to known formal models. However, this argument is problematic in the sense of (3).

(6)
Every symbol chosen by the stegosystem for coding-purposes that can be directly observed in a steganogram (e.g. word-choices), or that can be derived from analyzing it (e.g. context-free productions, deriving a steganogram from a given grammar), is a potential ``clue'' for detecting hidden messages. A steganogram must not contain a clue that is never observed in real covers.
Examining a single clue, a steganalyst will always suspect the steganogram to contain a hidden message, if he knows that it can never occur in innocuous covers. This is, of course, a direct consequence of (1).

(7)
A distribution of clues observable in a steganogram $P(E)$ must not be statistically significant evidence for distinguishing it from naturally occuring covers.
A reasonable approach might be to formulate $P(C)$, the distribution of clues observable in natural covers, and construct the stegosystem in such a way that the Kullback-Leibler distance $D(C \vert\vert E)$ is as small as possible.

(8)
Linguistic models for use in natural language stegosystems must not overgenerate.
Accuracy of linguistic models is usually analyzed by two properties, their behavior of overgeneration and undergeneration. If a system produces texts a human speaker would usually not produce, we say it overgenerates. If a system never produces text a human speaker would normally produce, we say it undergenerates. Clearly, as a result of (6) and (7), a natural language stegosystem must not overgenerate, since this would make a steganogram suspicious to an arbitrator.

(9)
Linguistic models for use in natural language stegosystems should not undergenerate.
(10)
If there is a tradeoff between making a system overgenerate and making it undergenerate, it is preferable to make it undergenerate.
Natural language stegosystems that heavily undergenerate text will produce texts that significantly deviate from naturally occuring text, therefore violating (7). However, the significance of this exploit depends on what portion of the text a stegosystem ``touches'' at all, and how many transformations are applied, so (9) will usually be less important than (8), and the impact of (9) will be less significant for embedding systems.

6.2 Comparison and Evaluation of Current Systems

Comparison and Evaluation

Figure: Comparison of schemes.
\begin{figure}\begin{center}
\begin{tabular}{r\vert llll}
& Winstein & Chapman ...
... block & Huffman & quadratic residues \\
\end{tabular}
\end{center}\end{figure}

Figure: The impact of disjunct synsets on the lexical account for the senses of move.
[disjunct synsets]\includegraphics[scale=.9]{img/ambiguity2.eps} [natural synsets]\includegraphics[scale=.9]{img/ambiguity.eps}

Atallah's system is the only one where we cannot disregard the possibility that it could fulfill the ``wishlist'' presented in the previous section. Adapting Winstein's approach to take into account the above considerations would be possible as well.

Robustness is an issue considered only by Atallah. His system is the only one that fulfills (2), by selectively picking the sentences in which transformations should be applied. This way of accounting for robustness is similar to the famous ``battleship'' game. If the active warden happens to correctly guess the position of a data-carrying sentence, and applies transformations to it that affect its ANL-representation, there is no way to recover the data.

Limitations in the sense of (4) are especially relevant for Atallah's system. Natural language systems that rely heavily on ontologic resources have traditionally suffered from the fact that they could not scale out of the microworld-domains their ontologies were designed for and tested in. As a result of (3) however, such restriction to a microworld-domain is not acceptable for natural language steganography. The problem is that, currently, there is no common-sense ontology. The effort of () is a notable exception, however, even their system is far from the vision of truly capturing all of the common sense necessary to understand natural language. Therefore, the question whether Atallah's approach will ``scale out of the laboratory'' is still open.

Wayner's system is the only one that takes (7) into account. Winstein's system relies on mixed-radix coding, which chooses each digit of a mixed-radix number and therefore each word at the same probability. Chapman's system uses a binary block-code. The problem with these approaches is that the distribution of word-choices can be traced back to the secret. If the bits in the secret are uniformly distributed, then the word-choices will be uniformly distributed as well. () give an example where the generator replaces the word what by whichsoever. Clearly, texts where whichsoever occurs just as frequently as what will be suspicious.

Overgeneration of language, in the sense of (8), can occur in Wayner's system, depending on the grammar and in Winstein's system, depending on the lexicon. However, their systems do not overgenerate as heavily as Chapman's. Constructing lexica, style-templates or context-free grammars that don't overgenerate is not usually a problem. However, it is a problem not to make the system heavily undergenerate as a result, therefore violating (9).

For example, Winstein's system undergenerates natural language because it does not make use of replacements that are sometimes made by human speakers, due to its restriction to disjunct interchangeability sets. Figure [*] shows the impact of disjunct interchangeability sets. This could be exploited, for example, by analyzing a cover for word-clusters. Word-clustering methods are well-known techniques, usually used in the context of statistic natural language learning, to determine words that commonly appear interchangeably in a given context. If these clusters provide evidence for strictly disjunct synsets, the cover is clearly suspicious. Neverthless, such evidence is less significant in Winstein's embedding-framework than in Chapman's generation-framework, since the restriction to disjunct interchangeability sets applies only for a small portion of the words in a text.

A similar phenomenon appears in the context of Wayner's system. It is manifested in the fact that all of the demonstration-grammars used with Wayner's system produce rather repetitive text, simply due to the fact that the demonstration PCFGs are far too small. This weakness could be exploited by analyzing a steganogram using techniques for grammatical rule inference. If it indicates that comparatively small CFGs could have produced the text, and the text never contains non-context-free linguistic constructs, this is strong evidence to suggest a hidden message.

We cannot disregard the possibility that similar approaches might make it possible to exploit Atallah's limited ontology. We have already shown why an ontology needs to be comparatively limited, as opposed to the ``ontology'' used by humans.

6.3 Possible Improvements and Future Directions

Future Directions

From what we have seen so far, the basic approach that is most promising for building a secure and robust natural language steganography system in the near future is a lexical replacement system, similar in principle to Winstein's.

Winstein's system fulfills (4), since it relies only on lexical resources. Since large-scale lexica are available, covering a significant portion of the English language, these resources do not significantly limit the scalability of the whole system. In accordance with (5), it is based on embedding, which makes undergeneration in the sense of (9) less significant a topic, makes it possible to use the system for watermarking, and is a more realistic scenario to produce text which will be innocuous, even to human arbitrators in the sense of (3).

To fulfill (6), it would perhaps be necessary to filter out all terms from the lexicon that appear very seldom in natural text. (For example, we have seen an interchangeability set where three is replaced by the Roman number iii).

In order to fulfill (7), one would have to integrate some of Wayner's ideas into the system, for example, using a variable-length code to mimic the distribution of word-choices.

The linguistic model would have to be more accurate, in terms of (8) and (9). We have to keep in mind that if WordNet contains a synset, then this synset is relative to a linguistic context, so if we find a replacement for a word in the lexicon, then all this has to say is that there exists a context in which we could replace the word. When we don't examine the actual context of the word in the text where we make the substitution, the system will overgenerate as a result of substituting the word, although the context doesn't permit doing so. As a result of the limitation to disjunct synsets, the system will undergenerate because there will be replacements a human would apply that would violate the stegosystem's constraint of keeping synsets disjunct.

In order to make the linugistic model more accurate, one would have to lift the restriction to disjunct interchangeability sets and integrate a statistic word-sense disambiguator instead. That these techniques cannot always resolve sense-ambiguities has three interesting side-effects.

First of all, during embedding, whenever the ambiguity cannot be resolved, this is strong evidence that the context does not allow any substitution without significantly changing the meaning of the overall text. Therefore the system can skip such words, not making any replacement. We could not get such evidence, from a lexicon with disjunct synsets.

Secondly, during extraction, it will not always be possible to tell which synset a replacement was originally chosen from. Chapman and Winstein saw this as a disadvantage, because it is a significant obstacle when it comes to automatically extracting the secret again. I would seriously like to question this. The fact that it is very difficult in this situation for an automatic analyzer to extract the secret is actually an HIP keeping an arbitrator from analyzing the secret underlying the text, therefore providing additional security in the sense of (1). An extractor could be constructed in such a way that a human would have to choose the senses of ``problematic'' words. This will not be a problem for extracting the secret from a cover that is known to be a steganogram by its legitimate receiver. However, it will be a problem for an arbitrator when automatically analyzing large amounts of text.

We have already mentioned the need to choose words according to their probabilities of actually occuring in the text. Instead of relying on each replacement for a word to be equally probable, we need a model capturing the probability of all the possible replacements in a given linguistic context, so a variable-length code, as mentioned before, can be constructed. This is the third interesting side-effect, since models for use with statistic WSD will usually incorporate knowledge about the probability of a word appearing in the given context.


next up previous contents
Next: 7. Towards Secure and Up: Towards Linguistic Steganography: A Previous: 5. Systems For Natural   Contents
Richard Bergmair 2005-01-31