Cryptography is sometimes referred to as the art and science of secure communication. Usually this is achieved by relying on the security of some other communication system, a system that takes care of distributing a key, which is a piece of information that makes some communication-endpoints ``more privileged'' than others. Based on such a setup, communication channels not assumed to be secure (e.g. a channel where we cannot disregard the possibility of an eavesdropper intercepting the messages) are secured, by making them dependent on communication channels we can safely assume to be secure (e.g. a key distribution system we can trust).
It is important for cryptographers to bear in mind that every piece of information not explicitly defined as a key is available to everybody. Kerckhoffs' principle (, ) states that the cryptologic methods used should be assumed common wisdom.
One approach to security is to represent information in such a way that the resulting datagram will be easily interpretable by privileged endpoints, i.e. ones that have the right key, while interpretation of the same data by non-privileged endpoints poses a serious problem, usually incorporating vast computational effort. Systems implementing such security are called cryptosystems. The study of how these systems can be constructed is referred to as cryptography, while the study of solving the interpretation-problems posed by cryptosystems is referred to as cryptanalysis.
Another approach to security takes into account the awareness of the very existence of a datagram, as opposed to the ability of interpreting a given datagram. Here information is represented in such a way that the resulting datagram will be known to contain secret information only by privileged endpoints (i.e. ones that have been told where to expect hidden information), while testing whether a given datagram does or does not contain secret information poses a serious problem for non-privileged endpoints. Analogously, systems implementing such security are called stegosystems, the study of their construction is called steganography and the study of testing whether or whether not a given datagram contains a secret message is called steganalysis.
The purely cryptographic scenario is depicted in
Figure
. Alice wants to send a message
to Bob, and she wants to do so via an insecure channel,
i.e. a channel Eve the eavesdropper has access to. One has to assume that
whatever Alice submits over this channel will be received by
Bob and will also be intercepted by Eve.
Alice and Bob want to make sure that Bob will be able to
interpret the message, and Eve will not. Therefore, they
rely on a trusted key-distribution facility, that will equip
both Alice and Bob, but not Eve, with random pieces of
information - keys.
Using the key and the message that is to be
transmitted, Alice computes a cryptogram, she
encrypts the message. The properties of the
cryptogram make sure that, after transmitting it
over the channel, there will be a simple way
for Bob to decrypt the message again (using
the key). However, there will not be a simple way for Eve to
break the cryptogram, i.e. reconstruct the secret
message, given only the cryptogram but not the key.
The steganographic scenario is depicted in
Figure
.
Instead of Eve, the eavesdropper, Alice's and Bob's problem
is that they are now in prison, and their messages are arbitrated
by Wendy the warden. Alice and Bob want to develop an escape-plan,
but Wendy must not ``see'' anything but harmless communication
between two well-behaved prisoners. (, )
Again Alice wants to submit a message
chosen from
the message-space
to Bob, and
again a secure key-distribution facility makes sure
Bob has an advantage over Wendy when it comes to reconstructing
this message. That is, Bob and Alice know exactly which key
in the key-space
is used (they could have agreed on one before imprisonment),
while Wendy only knows that
must be chosen in one of the
possible ways.
Wendy has a set
, usually disjunct from
, of possible
coverscover that she knows are harmless, e.g. the set
of English greetings. For example, let
How can Alice and Bob exploit this communication system?
A basic idea due to () is that of
a subliminal channel. We can abuse a cover channel
to submit information (it is not supposed or even allowed
to submit) by shifting the interpretation of the signals
sent over the channel. Channels operating under such a
shifted interpretation are called subliminal.
A first approach might be to use
an invertible function
. Then, Alice can
map a message
to a steganogram
, using
.
Since
, Wendy will not find it suspicious,
and since the function is invertible, Bob will be able
to compute
in order to reconstruct the original
message.
In the simplest case this function could be expressed
by a table:

A second approach might be to use a non-invertible function
, to encode a message and a
function
to decode it again (for example
assuming
).
This approach has the advantage that, following Kerckhoffs'
principle,
and
can safely be assumed public knowledge.
At this point, one might see steganography merely as a special
kind of cryptography, where we deal with ordinary cryptograms,
but have to use special representations for them, in particular
ones that will not arouse Wendy's suspicion. This is, of course,
only feasible if we have a precise idea about what will and what
will not be suspicious to Wendy. In other words, we need a model
characterizing
. However such a model will usually only be
available in very restricted cases, for example, when Wendy is
known to be a computer behaving according to a known formal model.
A core problem of steganography is therefore the semantic component
that enters the scene when we try to formalize what it means for
a steganogram to be innocuous, i.e. when we try
to determine
.
For example, steganography systems are often concerned
with the set of all digital images. In this work we will
be concerned with the set of all natural language texts.
Of course, images where random pixels have been inverted in
color or the like give rise to the suspicion that some
unusual digital manipulation has occurred.
A sentence like, Hi Bob! Let's break out tonight!,
is perfect natural language, but it will clearly not be innocuous.
In fact, steganography
systems need to be somewhat more selective about the set
of possible covers, e.g. ``the set of all digital images,
that could have originated from a digital camera'' or
``the set of all natural language texts that could have
appeared in a newspaper''. As a result, a steganography
system dealing with JPEG images needs a model far more
sophisticated than the definition of the JPEG-file-format
and, analogously, it is crucial for natural language
steganography systems to take semantic aspects into account.
A general design principle for steganography, following from
these observations is that we assume that Alice only
uses a subset
of covers.
For example, she could actually take a picture with her digital
camera, or she could cut out an article from today's
newspaper. Then, using the cover
, she performs
some operation
called
embedding, to map a message
to a steganogram
in the set of all possible steganograms
,
using a key
.
This operation is subject to some constraints which make up a
model for perceptual similarity. We assume that there is some
function2.1
which can be used to
determine the perceptual distortion between a cover
and a steganogram
. Wendy will see
as innocuous as long as
, i.e. as long as
and
differ only in some fixed amount of distortion
which cannot be perceived by Wendy.
The design goal by which the embedding function must be defined is
that, given a message
that is to be transmitted using
a key
, Alice can select a
from the set of covers she
actually has available
in such a way that, if
maps to
, there will be a
in the set
of all covers
, which is indistinguishable by Wendy from
,
in terms of the perceptual distance
. Formally,
Of course, there must be a way for Bob to extract the message
again. Most commonly this is done using a function
, the extraction-function.
Some stegosystems need the
original cover available for extraction. This could be
viewed as a special case of the system defined so far
by letting
, i.e. there is a set
,
the random keys are chosen from, and a key from
the actual keyspace of the stegosystem is constructed
by choosing a
, and by choosing a
.2.2 In such a system it is
necessary to view the choice of a cover, as part of the key,
since it will be significantly easier for a warden to
detect hidden information, given the original cover.
Therefore the choice of a cover (or the cover itself) should in
such systems always be transmitted over secure channels.
Where do security systems get their security from? What does it mean for a cryptosystem to be perfectly secure? How can a stegosystem ever be secure in the sense that it is equally difficult to break, than to break a cryptosystem? How can the ``amount of security'' we can expect from a security system be measured, when it is not perfectly secure?
The information-theoretic idea behind a cryptosystem could informally be stated as ``message - key = interceptible datagram''. The information theory behind cryptanalysis, on the other hand is ``intercepted datagram + educated guessing = message''. Whenever it takes less cryptanalytic guessing than it would take to guess the message in the first place, the system is, theoretically2.3 exploitable. Note that the information theoretic point of view depends heavily on probabilistic models being available, characterizing the choice of a message and the choice of a key. We saw in the diary-example why it is reasonable to assume such models for simple cryptosystems.
Figure
shows two cryptosystems. Messages
and a probability-distribution
are given. The
system depends on two keys
chosen with probabilities
. By deterministic processing, based only on the message
and the key, we obtain cryptograms
, with
probabilities
depending only on the
key and the message.
Figure
shows a very weak cryptosystem.
When cryptogram
is intercepted, one can tell that the
message this cryptogram originated from is most likely
rather than
, since the key transforming
into
is more likely to be chosen than the key transforming
into
. The impact of this possible exploit is measured
by () by the key-equivocation2.4
![]() |
(2.2) |
Figure
shows another kind of weakness
a cryptosystem could have. In this system, all keys are
equally probable but the messages are not. If message
is intercepted, there is no way to tell whether
the key generating
from
is more or less
likely than the key generating
from
, but
since
is, per se, more likely than
,
will
possibly be the solution to
this cryptogram. This exploit is quantified by () as
the message-equivocation
![]() |
(2.3) |
Therefore the most desirable cryptosystem is one with keys equally probable and with messages equally probable. () shows, in detail, why perfectly secure cryptography can only be achieved if we allow at least as many keys as there are messages. For our purposes, the intuitive picture shall suffice. When there are more messages than there are keys, it will always be possible, by simply guessing the keys, to determine the message (however, by possibly using vast computational resources). Since guessing the key amounts to less information than guessing the message, this is considered a weakness, from the information theoretic point of view.
What we have considered so far is the upper triangle
(
) of Figure
,
respectively that which is labelled
in Figure
. Each
arc in the relation
in Figure
corresponds to the choice of one of
six equally probable keys. (Keys were not labelled
with their probabilities here for the sake of
clarity). From what was defined so far,
is a perfect cryptosystem, if its input is uniformly
distributed. As a result, its output will be
uniformly distributed as well.
For analyzing the impact of non-uniformly distributed
messages, it might be helpful to view the input of this
cryptosystem as originating from a relation
, which
provides perfect compression. So, given that
is a
perfect cryptosystem,
offers perfect secrecy,
if
offers perfect compression.
Turning back to Figure
, there is one
influence on
we have not yet considered. A secrecy
system that takes into account the influence from
to
, follows the basic idea of mimicry (, ,).
Here
is a set of possible covers, in the sense
of a steganography system, and we are given the
probabilities
for innocuous covers to occur.
If the probabilities of our cryptosystem's output
,
given by
, which depends only on
and
, are different from the probabilities of
innocuous covers
, then a
one-to-one correspondence between cryptograms
and
suspectedly innocuous covers
will clearly be
exploitable, since covers will occur with ``unnatural''
probabilities. This could be quantified by what
one would be tempted to call the
cover-equivocation, although this term
is not commonly used:
For analyzing the impact of the cover-distribution,
it is convenient to view
the output of a perfect cryptosystem (such as
)
as the input to a relation
providing mimicry.
Given that
is a perfect cryptosystem,
will be a perfect stegosystem, if
is the inverse
of perfect compression, i.e. perfect mimicry. As can
be seen in Figure
, mimicry is basically
defined as a relation transforming a small message
space with equally probable messages into a larger message
space with messages distributed according to
cover-characteristics. The exact opposite is
compression, which is supposed to transform
large non-uniformly distributed message spaces
into small ones.
Considering the parts of Figure
,
there is no commonly agreed upon notion of what
deserves to be called steganography.
() emphasizes the importance of what
we have called
as the very core of strong
theoretical steganography, while () considers
in his information theoretic model
for steganography, demonstrating the impact of the
cryptographic aspects of a stegosystem.
Of course, reversing the mimicry on a cover that
has not actually originated from a stegosystem
will produce ``garbage''.
A basic requirement is that it should not be possible
to distinguish this ``garbage'' from what comes out when reversing
the mimicry on a cover that has originated from
a stegosystem.
Recalling the idea behind practical steganographic covers (``images that could have originated from a digital camera'', ``natural language texts that could have appeared as newspaper-articles''), the first problem of the information theoretic approach gets obvious: that of finding a probabilistic model measuring probabilities of such covers. What is the probability of a yellow smiley face on blue background? What is the probability of Steve plays the guitar brilliantly? Theoretically, whenever a steganalyst has such a model, then this model can be used in steganography as well, to construct a stegosystem where probabilities arising from this model are not exploitable. In practice, however, the idea of ``public wisdom'', when it comes to knowledge about steganalytic activities, should be doubted.
The second problem was already mentioned briefly. There is no point in producing digital images, where the statistical distribution of colors of pixels matches that of digital images taken from a digital camera, if the resulting steganogram is not even syntactically correct JPEG, and there is no point in producing character-sequences with characters distributed as in English text, if the characters do not even make up correct words.
The problem goes even beyond purely syntactic issues, into a
semantic realm. A stegosystem that produces covers that are suspicious
under a cover's usual interpretation will clearly be insecure, no matter
how low the relative entropy is.
We can say, relative entropy (equation
, in particular)
is a ``degree of fulfillment'' for equation
from an information theoretic point of view, but it will
be necessary to enforce the fulfillment also from the point of view
of a model that takes into account this ``usual interpretation'' of
a cover.
Such models are available for many kinds of steganography and watermarking systems, since they can usually rely on simple measurements. In image-based steganography, for example, one can compare the deviation in color of a pixel, resulting from the embedding, to the deviation in color that will be perceivable to a human observer.
[p51] Color values can, for instance, be stored according to their Euclidean distance in RGB space:
Since the human visual system is more sensitive to changes in the luminance of a color, another (probably better) approach would be sorting the pallette entries according to their luminance component. [p44]
(, )Here formulae are known that capture human perception from a physiologic point of view, based on simple measurements. Clearly a computer has certain advantages over a human when it comes to measuring whether or not the color of a pixel is 1 degree in 256 more red than blue. Since 2004, the ACM even publishes a periodical called ``ACM Transactions on Applied Perception''.
In linguistic steganography this semantic requirement is probably the most difficult problem that has to be tackled, since we cannot rely on simple measurements.
``A semantic theory must describe the relationship between the words and syntactic structures of natural language and the postulated formalism of concepts and operations on concepts.'' (, )However, there is currently no such formalism that operates on all the concepts understood by humans as the meaning of natural language. If we do not wish to resolve these problems we have to draw back to the pragmatic approach Winograd used, concentrating on a few specific aspects, when we go about postulating such formalisms, yet have to remain aware of the criticism brought forth by () about such approaches:
``Thus, much of the ``I'' in these ``AI'' programs is in the eye - and ``I'' - of the beholder.'' (, )
We saw earlier that breaking a cryptogram should, by definition, amount to solving a hard problem, such as the information-theoretic problem of ``guessing'' a solution, or the problem of finding an efficient algorithm that makes a solution feasible with limited computational resources. The AI-community knows many problems a computer cannot easily solve, therefore posing problems that are not merely difficult to solve within a given formalism, but that are difficult to solve due to the very fact that we do not know any formalism in which they could be solved at all. The value of such problems from a cryptographic point of view has recently been discovered to tell computers and humans apart.
Generally, such a cryptosystem is called Human Interactive Proof, HIP for short (, ,). The most prominent characterization of an HIP is the Completely Automated Public Turing Tests to Tell Computers and Humans Apart, CAPTCHA for short, as described by (). The name refers to Turing's Test (, ), as the basic scenario. Humans and computers are ``sitting in'' black-boxes of which nothing but an interface is known. This interface can equally be used by computers or humans, which makes it difficult to tell computers and humans apart. However, the scenario differs from the original Turing-Test in that it is ``completely automated'', which means that the judges cannot be humans themselves. Therefore the scenario is sometimes referred to as a Reverse Turing Test. The requirement for the test to be ``public'' refers to Kerckhoffs' principle.
The most prominent HIPs are image-based techniques, employed, for example, in the web registration forms of Yahoo!, Hotmail, PayPal, and many others. In order to prevent automated robots from subscribing for free email accounts at Yahoo!, the registration form relies on having the user recognize a text appearing in a heavily distorted image. There is simply no technique known to carry out such advanced optical character recognition, as it would take to automatically recognize the text. However, humans seem to have no problem with this kind of recognition. Since the distortion of these images can be done automatically, such methods can safely regard their image-databases, lexica, and distortion-mechanisms as public knowledge. In the end, security relies on the private randomness used by the distortion-filters, and since the space of possible transformations is large enough, this method can provide solid security.
The problem is closely linked to linguistic steganography. If natural language steganograms could be constructed in such a way that they cannot be analyzed fully automatically, it would make an arbitrator's job much more difficult. A great advantage of linguistic steganography over other forms of steganography arises from the large amounts of data coded in natural language. Arbitrating such large amounts of data is nearly impossible, and even more so if we manage to prevent computers from doing the job. One of the highlights of the method presented herein is a layer of security that arises from such considerations.
The creation of a true CAPTCHA in a text-domain, in the sense of an HIP that does not rely on any private resources however, is still an open problem. It was motivated by () by the need for CAPTCHAs that can be used also by visually impaired people. Human-aided recognition of text in the sense of an HIP had already been under investigation in the context of this project, when Luis von Ahn published the problem-statement in Communications of the ACM in February 2004. () give a partial solution, an HIP which relies on the linguistic problem of lexical word-sense disambiguation. The approach cannot claim to provide a fully ``public'' solution, since it relies on a private repository of linguistic knowledge. However, it has the ability to learn its language, therefore this database can be viewed as a dynamic resource. The assumption that, based on an initial private ``seed'' of linguistic knowledge, this dynamic resource grows faster than that of any enemy is not unreasonable, and therefore the impact of the approach to rely on a private resource is limited. Eliminating the need for such a private database would be desirable, but remains an open problem.
The basic setup that allows distinguishing computers and humans in a lexical domain is a lexicon's inability to truly represent a word's meaning. Linguists have found out that it is hardly possible to ``define'' a word in a lexicon, or in any other formal system, in such a way, that a word's ``meaning'' would not change with the syntactic and semantic context it is used in.
The creators of the most prominent lexical database WordNet, saw
meaning closely related to the linguistic concept of synonymy.
By their definition ``two expressions are synonymous in a
linguistic context
if the substitution of one for the other
in
does not alter the truth value'' (, ).
A linguistic context might for example be a set of sentences.
Observing a set of sentences and their truth values, if we find that
the sentences' truth values never change, when a specific word is
substituted for another, then the two words are synonymous.
Therefore we can never define what it means for a word to be
synonymous to dark. The best we can do is to state that there
exists a linguistic context in which dark can be interchanged
by black or sinister, and there exists a context in which
dark can be interchanged by night or nighttime.
Consider, for example, the sentence She walked home alone in
the dark. A native speaker would probably accept
She walked home alone in the night or
She walked home alone in the nighttime but not
She walked home alone in the black or
She walked home alone in the sinister.
On the other hand, consider the sentence
Don't play with dark powers. Here
Don't play with black powers or
Don't play with sinister powers would
be correct, but
Don't play with night powers or
Don't play with nighttime powers
would not.
Therefore the question in Figure
will be very
difficult to answer for a computer relying on a lexicon while it is
trivial for a human.