next up previous contents
Next: 3. Lexical Language Processing Up: Towards Linguistic Steganography: A Previous: 1. Introduction   Contents

Subsections

2. Steganographic Security

Cryptography is sometimes referred to as the art and science of secure communication. Usually this is achieved by relying on the security of some other communication system, a system that takes care of distributing a key, which is a piece of information that makes some communication-endpoints ``more privileged'' than others. Based on such a setup, communication channels not assumed to be secure (e.g. a channel where we cannot disregard the possibility of an eavesdropper intercepting the messages) are secured, by making them dependent on communication channels we can safely assume to be secure (e.g. a key distribution system we can trust).

It is important for cryptographers to bear in mind that every piece of information not explicitly defined as a key is available to everybody. Kerckhoffs' principle (, ) states that the cryptologic methods used should be assumed common wisdom.

One approach to security is to represent information in such a way that the resulting datagram will be easily interpretable by privileged endpoints, i.e. ones that have the right key, while interpretation of the same data by non-privileged endpoints poses a serious problem, usually incorporating vast computational effort. Systems implementing such security are called cryptosystems. The study of how these systems can be constructed is referred to as cryptography, while the study of solving the interpretation-problems posed by cryptosystems is referred to as cryptanalysis.

Another approach to security takes into account the awareness of the very existence of a datagram, as opposed to the ability of interpreting a given datagram. Here information is represented in such a way that the resulting datagram will be known to contain secret information only by privileged endpoints (i.e. ones that have been told where to expect hidden information), while testing whether a given datagram does or does not contain secret information poses a serious problem for non-privileged endpoints. Analogously, systems implementing such security are called stegosystems, the study of their construction is called steganography and the study of testing whether or whether not a given datagram contains a secret message is called steganalysis.

2.1 A Framework for Secure Communication

The Framework

Figure: The cryptographic scenario. Information is ``locked inside a safe''.
\includegraphics[scale=0.6]{img/block-crypto.eps}

The purely cryptographic scenario is depicted in Figure [*]. Alice wants to send a message to Bob, and she wants to do so via an insecure channel, i.e. a channel Eve the eavesdropper has access to. One has to assume that whatever Alice submits over this channel will be received by Bob and will also be intercepted by Eve. Alice and Bob want to make sure that Bob will be able to interpret the message, and Eve will not. Therefore, they rely on a trusted key-distribution facility, that will equip both Alice and Bob, but not Eve, with random pieces of information - keys. Using the key and the message that is to be transmitted, Alice computes a cryptogram, she encrypts the message. The properties of the cryptogram make sure that, after transmitting it over the channel, there will be a simple way for Bob to decrypt the message again (using the key). However, there will not be a simple way for Eve to break the cryptogram, i.e. reconstruct the secret message, given only the cryptogram but not the key.

Figure: The steganographic scenario. Information has to be read ``between the lines''.
\includegraphics[scale=0.6]{img/block-stego.eps}

The steganographic scenario is depicted in Figure [*]. Instead of Eve, the eavesdropper, Alice's and Bob's problem is that they are now in prison, and their messages are arbitrated by Wendy the warden. Alice and Bob want to develop an escape-plan, but Wendy must not ``see'' anything but harmless communication between two well-behaved prisoners. (, )

Again Alice wants to submit a message $m \in M$ chosen from the message-space $M$ to Bob, and again a secure key-distribution facility makes sure Bob has an advantage over Wendy when it comes to reconstructing this message. That is, Bob and Alice know exactly which key $k$ in the key-space $K$ is used (they could have agreed on one before imprisonment), while Wendy only knows that $k$ must be chosen in one of the $\vert K\vert$ possible ways.

Wendy has a set $C$, usually disjunct from $M$, of possible coverscover that she knows are harmless, e.g. the set of English greetings. For example, let

\begin{displaymath}
C=\lbrace
\mbox{\textsf{Hi!}},
\mbox{\textsf{Good morning!}},
\mbox{\textsf{How are you?}}
\rbrace
\end{displaymath}

and

\begin{displaymath}
M=\lbrace
\mbox{\textsf{Escape tonight!}},
\mbox{\textsf{D...
... tonight!}},
\mbox{\textsf{Can we escape tonight?}}
\rbrace .
\end{displaymath}

If Alice sends Hi! to Bob, they can be sure Wendy will not suspect any escape-plans being developed, but under no circumstances can they send Escape tonight!, since Wendy will immediately put them into a high-security prison no one has ever escaped from.

How can Alice and Bob exploit this communication system? A basic idea due to () is that of a subliminal channel. We can abuse a cover channel to submit information (it is not supposed or even allowed to submit) by shifting the interpretation of the signals sent over the channel. Channels operating under such a shifted interpretation are called subliminal. A first approach might be to use an invertible function $e:M \mapsto C$. Then, Alice can map a message $m$ to a steganogram $c$, using $e(m)=c$. Since $c \in C$, Wendy will not find it suspicious, and since the function is invertible, Bob will be able to compute $e^{-1}(c)=m$ in order to reconstruct the original message. In the simplest case this function could be expressed by a table:

\begin{eqnarray*}
e( \mbox{\textsf{Escape tonight!}} ) &=& \mbox{\textsf{Hi!}} \...
...xtsf{Can we escape tonight?}} ) &=& \mbox{\textsf{How are you?}}
\end{eqnarray*}

Here $e$ itself would have to act as a key, since if Wendy knows $e^{-1}$, she can, just like Bob, check whether or not $e^{-1}(c)$ is a message she should worry about. For example, if Wendy knows that $e^{-1}(\textsf{Hi!})=\textsf{Escape tonight!}$, then she can break the stegosystem by observing whether there is a correlation between Alice greeting Bob with $\textsf{Hi!}$ and attempts to escape that night.

A second approach might be to use a non-invertible function $e:M \times K \mapsto C$, to encode a message and a function $d:C \times K \mapsto M$ to decode it again (for example assuming $d(e(m,k),k)=m$). This approach has the advantage that, following Kerckhoffs' principle, $e$ and $d$ can safely be assumed public knowledge. At this point, one might see steganography merely as a special kind of cryptography, where we deal with ordinary cryptograms, but have to use special representations for them, in particular ones that will not arouse Wendy's suspicion. This is, of course, only feasible if we have a precise idea about what will and what will not be suspicious to Wendy. In other words, we need a model characterizing $C$. However such a model will usually only be available in very restricted cases, for example, when Wendy is known to be a computer behaving according to a known formal model.

A core problem of steganography is therefore the semantic component that enters the scene when we try to formalize what it means for a steganogram to be innocuous, i.e. when we try to determine $C$. For example, steganography systems are often concerned with the set of all digital images. In this work we will be concerned with the set of all natural language texts. Of course, images where random pixels have been inverted in color or the like give rise to the suspicion that some unusual digital manipulation has occurred. A sentence like, Hi Bob! Let's break out tonight!, is perfect natural language, but it will clearly not be innocuous. In fact, steganography systems need to be somewhat more selective about the set of possible covers, e.g. ``the set of all digital images, that could have originated from a digital camera'' or ``the set of all natural language texts that could have appeared in a newspaper''. As a result, a steganography system dealing with JPEG images needs a model far more sophisticated than the definition of the JPEG-file-format and, analogously, it is crucial for natural language steganography systems to take semantic aspects into account.

A general design principle for steganography, following from these observations is that we assume that Alice only uses a subset $C' \subseteq C$ of covers. For example, she could actually take a picture with her digital camera, or she could cut out an article from today's newspaper. Then, using the cover $c' \in C'$, she performs some operation $e: C' \times M \times K \mapsto E$ called embedding, to map a message $m \in M$ to a steganogram $e \in E$ in the set of all possible steganograms $E$, using a key $k \in K$. This operation is subject to some constraints which make up a model for perceptual similarity. We assume that there is some function2.1 $\textit{sim}_d(c',e)$ which can be used to determine the perceptual distortion between a cover $c'$ and a steganogram $e$. Wendy will see $e$ as innocuous as long as $\textit{sim}_d(c',e) \leq \delta$, i.e. as long as $c'$ and $e$ differ only in some fixed amount of distortion $\delta$ which cannot be perceived by Wendy. The design goal by which the embedding function must be defined is that, given a message $m$ that is to be transmitted using a key $k$, Alice can select a $c'$ from the set of covers she actually has available $C'$ in such a way that, if $e(m,c',k)$ maps to $x$, there will be a $c$ in the set of all covers $C$, which is indistinguishable by Wendy from $x$, in terms of the perceptual distance $\textit{sim}_d$. Formally,

\begin{displaymath}
\forall m \in M ~ \forall k \in K ~ \exists c' \in C' ~ \exists c \in C: ~~ \textit{sim}_d(c,e(c',m,k))
\leq \delta.
\end{displaymath} (2.1)

We adopt this approach because a model characterizing $C$, i.e. a system capable of generating innocuous covers in the first place, is often difficult or impossible to construct, whereas a model capturing what deviations from a given innocuous cover will make it suspicious, is often available.

Of course, there must be a way for Bob to extract the message again. Most commonly this is done using a function $d : E \times K \mapsto M$, the extraction-function. Some stegosystems need the original cover available for extraction. This could be viewed as a special case of the system defined so far by letting $K = K' \times C'$, i.e. there is a set $K'$, the random keys are chosen from, and a key from the actual keyspace of the stegosystem is constructed by choosing a $k' \in K'$, and by choosing a $c' \in C'$.2.2 In such a system it is necessary to view the choice of a cover, as part of the key, since it will be significantly easier for a warden to detect hidden information, given the original cover. Therefore the choice of a cover (or the cover itself) should in such systems always be transmitted over secure channels.

2.2 Information Theory: ``A Probability Says it All.''

Information Theory

Where do security systems get their security from? What does it mean for a cryptosystem to be perfectly secure? How can a stegosystem ever be secure in the sense that it is equally difficult to break, than to break a cryptosystem? How can the ``amount of security'' we can expect from a security system be measured, when it is not perfectly secure?

Figure: Two kinds of weak cryptosystems.
[exploitable keys] \includegraphics{img/it-bad2-crypto.eps} [exploitable messages] \includegraphics{img/it-bad-crypto.eps}

The information-theoretic idea behind a cryptosystem could informally be stated as ``message - key = interceptible datagram''. The information theory behind cryptanalysis, on the other hand is ``intercepted datagram + educated guessing = message''. Whenever it takes less cryptanalytic guessing than it would take to guess the message in the first place, the system is, theoretically2.3 exploitable. Note that the information theoretic point of view depends heavily on probabilistic models being available, characterizing the choice of a message and the choice of a key. We saw in the diary-example why it is reasonable to assume such models for simple cryptosystems.

Figure [*] shows two cryptosystems. Messages $M_1,...,M_6$ and a probability-distribution $P(M_i)$ are given. The system depends on two keys $K_1,K_2$ chosen with probabilities $P(K_i)$. By deterministic processing, based only on the message and the key, we obtain cryptograms $E_1,\ldots,E_6$, with probabilities $P(E_i \vert K_i \wedge M_i)$ depending only on the key and the message.

Figure [*] shows a very weak cryptosystem. When cryptogram $E_1$ is intercepted, one can tell that the message this cryptogram originated from is most likely $M_1$ rather than $M_2$, since the key transforming $M_1$ into $E_1$ is more likely to be chosen than the key transforming $M_2$ into $E_1$. The impact of this possible exploit is measured by () by the key-equivocation2.4

\begin{displaymath}
H(K\vert E) = \sum_{K,E} P(K \wedge E) \log \frac{1}{P(K\vert E)}.
\end{displaymath} (2.2)

In the example, Eve exploited the fact that the substitution-table was not completely random. Instead of randomly permuting the alphabet, the alphabet had only been shifted and reversed.

Figure [*] shows another kind of weakness a cryptosystem could have. In this system, all keys are equally probable but the messages are not. If message $E_1$ is intercepted, there is no way to tell whether the key generating $E_1$ from $M_1$ is more or less likely than the key generating $E_1$ from $M_2$, but since $M_2$ is, per se, more likely than $M_1$, $M_2$ will possibly be the solution to this cryptogram. This exploit is quantified by () as the message-equivocation

\begin{displaymath}
H(M\vert E) = \sum_{M,E} P(M \wedge E) \log
\frac{1}{P(M\vert E)}.
\end{displaymath} (2.3)

In the example, Eve exploited the fact that Alice had encrypted English-language-text, so she knew some probabilities of the message underlying the cryptogram.

Therefore the most desirable cryptosystem is one with keys equally probable and with messages equally probable. () shows, in detail, why perfectly secure cryptography can only be achieved if we allow at least as many keys as there are messages. For our purposes, the intuitive picture shall suffice. When there are more messages than there are keys, it will always be possible, by simply guessing the keys, to determine the message (however, by possibly using vast computational resources). Since guessing the key amounts to less information than guessing the message, this is considered a weakness, from the information theoretic point of view.

What we have considered so far is the upper triangle ( $\overline{MKE}$) of Figure [*], respectively that which is labelled $R$ in Figure [*]. Each arc in the relation $R$ in Figure [*] corresponds to the choice of one of six equally probable keys. (Keys were not labelled with their probabilities here for the sake of clarity). From what was defined so far, $R$ is a perfect cryptosystem, if its input is uniformly distributed. As a result, its output will be uniformly distributed as well.

For analyzing the impact of non-uniformly distributed messages, it might be helpful to view the input of this cryptosystem as originating from a relation $Q$, which provides perfect compression. So, given that $R$ is a perfect cryptosystem, $Q \circ R$ offers perfect secrecy, if $Q$ offers perfect compression.

Turning back to Figure [*], there is one influence on $E$ we have not yet considered. A secrecy system that takes into account the influence from $C$ to $E$, follows the basic idea of mimicry (, ,). Here $C$ is a set of possible covers, in the sense of a steganography system, and we are given the probabilities $P(C_i)$ for innocuous covers to occur.

If the probabilities of our cryptosystem's output $E$, given by $P(E_i)$, which depends only on $P(M_i)$ and $P(K_i)$, are different from the probabilities of innocuous covers $P(C_i)$, then a one-to-one correspondence between cryptograms $E$ and suspectedly innocuous covers $C$ will clearly be exploitable, since covers will occur with ``unnatural'' probabilities. This could be quantified by what one would be tempted to call the cover-equivocation, although this term is not commonly used:

\begin{displaymath}
H(C\vert E) = \sum_{C,E} P(C \wedge E) \log \frac{1}{P(C\vert E)}.
\end{displaymath} (2.4)

() goes yet a bit further and uses the relative entropy $D(C \vert\vert E)$, also called Kullback-Leibler distance, to investigate, from a statistical point of view, a steganalyst's hypothesis-testing-problem of trying to find out whether or not covers have originated from a stegosystem. For this purpose we need two distributions $P_C(c)$ and $P_E(c)$, where the former is the probability of a cover being produced ``naturally'' and the latter is the probability of a steganogram being produced from the stegosystem. (Both distributions are over all datagrams that can be submitted over the channel, e.g. $C \cup E$):
\begin{displaymath}
D(C\vert\vert E) = \sum_{c \in C} P_C(c) \log \frac{P_C(c)}{P_E(c)}.
\end{displaymath} (2.5)

This measure is not a metric in the mathematical sense, but it has the important property that it is a nonnegative convex function of $P_C(c)$ and is zero if, and only if, the distributions are equal. The larger this measure gets, the less security we can expect from the stegosystem.

For analyzing the impact of the cover-distribution, it is convenient to view the output of a perfect cryptosystem (such as $R$) as the input to a relation $S$ providing mimicry. Given that $R$ is a perfect cryptosystem, $R \circ S$ will be a perfect stegosystem, if $S$ is the inverse of perfect compression, i.e. perfect mimicry. As can be seen in Figure [*], mimicry is basically defined as a relation transforming a small message space with equally probable messages into a larger message space with messages distributed according to cover-characteristics. The exact opposite is compression, which is supposed to transform large non-uniformly distributed message spaces into small ones.

Figure: Message, key, steganogram, cover, and how they relate to each other
\includegraphics{img/it-influence.eps}

Figure: Mimicry as the inverse of compression.
[compression]\includegraphics{img/it-perf-comp.eps} [mimicry]\includegraphics{img/it-perf-rcomp.eps}

Figure: A perfect stegosystem.
\includegraphics{img/it-complete.eps}

Considering the parts of Figure [*], there is no commonly agreed upon notion of what deserves to be called steganography. () emphasizes the importance of what we have called $S$ as the very core of strong theoretical steganography, while () considers $R \circ S$ in his information theoretic model for steganography, demonstrating the impact of the cryptographic aspects of a stegosystem. Of course, reversing the mimicry on a cover that has not actually originated from a stegosystem will produce ``garbage''. A basic requirement is that it should not be possible to distinguish this ``garbage'' from what comes out when reversing the mimicry on a cover that has originated from a stegosystem.

2.3 Ontology: ``We need Models!''

Ontology

Recalling the idea behind practical steganographic covers (``images that could have originated from a digital camera'', ``natural language texts that could have appeared as newspaper-articles''), the first problem of the information theoretic approach gets obvious: that of finding a probabilistic model measuring probabilities of such covers. What is the probability of a yellow smiley face on blue background? What is the probability of Steve plays the guitar brilliantly? Theoretically, whenever a steganalyst has such a model, then this model can be used in steganography as well, to construct a stegosystem where probabilities arising from this model are not exploitable. In practice, however, the idea of ``public wisdom'', when it comes to knowledge about steganalytic activities, should be doubted.

The second problem was already mentioned briefly. There is no point in producing digital images, where the statistical distribution of colors of pixels matches that of digital images taken from a digital camera, if the resulting steganogram is not even syntactically correct JPEG, and there is no point in producing character-sequences with characters distributed as in English text, if the characters do not even make up correct words.

The problem goes even beyond purely syntactic issues, into a semantic realm. A stegosystem that produces covers that are suspicious under a cover's usual interpretation will clearly be insecure, no matter how low the relative entropy is. We can say, relative entropy (equation [*], in particular) is a ``degree of fulfillment'' for equation [*] from an information theoretic point of view, but it will be necessary to enforce the fulfillment also from the point of view of a model that takes into account this ``usual interpretation'' of a cover.

Such models are available for many kinds of steganography and watermarking systems, since they can usually rely on simple measurements. In image-based steganography, for example, one can compare the deviation in color of a pixel, resulting from the embedding, to the deviation in color that will be perceivable to a human observer.

[p51] Color values can, for instance, be stored according to their Euclidean distance in RGB space:

\begin{displaymath}
d = \sqrt{R^2 + G^2 + B^2}.
\end{displaymath}

Since the human visual system is more sensitive to changes in the luminance of a color, another (probably better) approach would be sorting the pallette entries according to their luminance component. [p44]

\begin{displaymath}
Y = 0.299 R + 0.587 G + 0.114 B
\end{displaymath}

(, )
Here formulae are known that capture human perception from a physiologic point of view, based on simple measurements. Clearly a computer has certain advantages over a human when it comes to measuring whether or not the color of a pixel is 1 degree in 256 more red than blue. Since 2004, the ACM even publishes a periodical called ``ACM Transactions on Applied Perception''.

In linguistic steganography this semantic requirement is probably the most difficult problem that has to be tackled, since we cannot rely on simple measurements.

``A semantic theory must describe the relationship between the words and syntactic structures of natural language and the postulated formalism of concepts and operations on concepts.'' (, )
However, there is currently no such formalism that operates on all the concepts understood by humans as the meaning of natural language. If we do not wish to resolve these problems we have to draw back to the pragmatic approach Winograd used, concentrating on a few specific aspects, when we go about postulating such formalisms, yet have to remain aware of the criticism brought forth by () about such approaches:
``Thus, much of the ``I'' in these ``AI'' programs is in the eye - and ``I'' - of the beholder.'' (, )

2.4 AI: ``What if there are no Models?''

Artificial Intelligence

We saw earlier that breaking a cryptogram should, by definition, amount to solving a hard problem, such as the information-theoretic problem of ``guessing'' a solution, or the problem of finding an efficient algorithm that makes a solution feasible with limited computational resources. The AI-community knows many problems a computer cannot easily solve, therefore posing problems that are not merely difficult to solve within a given formalism, but that are difficult to solve due to the very fact that we do not know any formalism in which they could be solved at all. The value of such problems from a cryptographic point of view has recently been discovered to tell computers and humans apart.

Generally, such a cryptosystem is called Human Interactive Proof, HIP for short (, ,). The most prominent characterization of an HIP is the Completely Automated Public Turing Tests to Tell Computers and Humans Apart, CAPTCHA for short, as described by (). The name refers to Turing's Test (, ), as the basic scenario. Humans and computers are ``sitting in'' black-boxes of which nothing but an interface is known. This interface can equally be used by computers or humans, which makes it difficult to tell computers and humans apart. However, the scenario differs from the original Turing-Test in that it is ``completely automated'', which means that the judges cannot be humans themselves. Therefore the scenario is sometimes referred to as a Reverse Turing Test. The requirement for the test to be ``public'' refers to Kerckhoffs' principle.

The most prominent HIPs are image-based techniques, employed, for example, in the web registration forms of Yahoo!, Hotmail, PayPal, and many others. In order to prevent automated robots from subscribing for free email accounts at Yahoo!, the registration form relies on having the user recognize a text appearing in a heavily distorted image. There is simply no technique known to carry out such advanced optical character recognition, as it would take to automatically recognize the text. However, humans seem to have no problem with this kind of recognition. Since the distortion of these images can be done automatically, such methods can safely regard their image-databases, lexica, and distortion-mechanisms as public knowledge. In the end, security relies on the private randomness used by the distortion-filters, and since the space of possible transformations is large enough, this method can provide solid security.

The problem is closely linked to linguistic steganography. If natural language steganograms could be constructed in such a way that they cannot be analyzed fully automatically, it would make an arbitrator's job much more difficult. A great advantage of linguistic steganography over other forms of steganography arises from the large amounts of data coded in natural language. Arbitrating such large amounts of data is nearly impossible, and even more so if we manage to prevent computers from doing the job. One of the highlights of the method presented herein is a layer of security that arises from such considerations.

The creation of a true CAPTCHA in a text-domain, in the sense of an HIP that does not rely on any private resources however, is still an open problem. It was motivated by () by the need for CAPTCHAs that can be used also by visually impaired people. Human-aided recognition of text in the sense of an HIP had already been under investigation in the context of this project, when Luis von Ahn published the problem-statement in Communications of the ACM in February 2004. () give a partial solution, an HIP which relies on the linguistic problem of lexical word-sense disambiguation. The approach cannot claim to provide a fully ``public'' solution, since it relies on a private repository of linguistic knowledge. However, it has the ability to learn its language, therefore this database can be viewed as a dynamic resource. The assumption that, based on an initial private ``seed'' of linguistic knowledge, this dynamic resource grows faster than that of any enemy is not unreasonable, and therefore the impact of the approach to rely on a private resource is limited. Eliminating the need for such a private database would be desirable, but remains an open problem.

Figure: A tough question for a computer.
\fbox{
\begin{minipage}{8.5cm}
Which of the following are meaningful replacement...
...$\bigcirc$] She walked home alone in the nighttime.
\end{itemize}\end{minipage}}

The basic setup that allows distinguishing computers and humans in a lexical domain is a lexicon's inability to truly represent a word's meaning. Linguists have found out that it is hardly possible to ``define'' a word in a lexicon, or in any other formal system, in such a way, that a word's ``meaning'' would not change with the syntactic and semantic context it is used in.

The creators of the most prominent lexical database WordNet, saw meaning closely related to the linguistic concept of synonymy. By their definition ``two expressions are synonymous in a linguistic context $C$ if the substitution of one for the other in $C$ does not alter the truth value'' (, ). A linguistic context might for example be a set of sentences. Observing a set of sentences and their truth values, if we find that the sentences' truth values never change, when a specific word is substituted for another, then the two words are synonymous.

Therefore we can never define what it means for a word to be synonymous to dark. The best we can do is to state that there exists a linguistic context in which dark can be interchanged by black or sinister, and there exists a context in which dark can be interchanged by night or nighttime. Consider, for example, the sentence She walked home alone in the dark. A native speaker would probably accept She walked home alone in the night or She walked home alone in the nighttime but not She walked home alone in the black or She walked home alone in the sinister. On the other hand, consider the sentence Don't play with dark powers. Here Don't play with black powers or Don't play with sinister powers would be correct, but Don't play with night powers or Don't play with nighttime powers would not. Therefore the question in Figure [*] will be very difficult to answer for a computer relying on a lexicon while it is trivial for a human.


next up previous contents
Next: 3. Lexical Language Processing Up: Towards Linguistic Steganography: A Previous: 1. Introduction   Contents
Richard Bergmair 2005-01-31