next up previous contents
Next: 1. Introduction Up: Towards Linguistic Steganography: A Previous: List of Figures   Contents

Subsections

Dear Diary,

OT1panrmn

Jan-07: Eve's Diary

Dear Diary, What am I doing here? Why exactly do they allow people to encrypt messages in the first place, if they employ people like myself, who have nothing better to do than to break these codes again?

Today they sent me another ciphertext to break by email, but since I don't trust this modern stuff, I quickly scribbled down the text on to a piece of paper (Figure [*]).

I soon realized that breaking this code could not be much of a challenge. The alphabet was pretty limited. Only the 26 English alphabetic characters occurred. Whoever had used this code was naive enough to provide me with a nicely laid out word length enciphermentcipher@word-length, so that blanks appeared in the enciphered message as cleartext, giving me valuable information about where single words began and ended.

The first thing I usually do in such cases is to check the unilateral frequency distributiondistribution@unilateral frequency. So I wrote down the characters and counted how often each of them appeared in the ciphertext. I also wrote it down as a percentage (Figure [*]).

Figure: Unilateral frequency distribution for the ciphertext.
\includegraphics[scale=0.6]{img/ex1-freq-code.eps}

The characteristics I found confirmed my suspicion that there was a natural language directly underlying the character-sequence, because some characters occurred much more frequently than others. One classic method for encryption is substitution, which is simply to replace letters in the cleartext by other letters, according to some systematic scheme, or to a table (the code), to make it unreadable. Another method is transposition, which is to change the order in which the letters appear in the cleartext.

Figure: The ciphertext that is to be broken.
\includegraphics[scale=0.6]{img/ex1-cipher.eps}

I could quickly disregard the possibility of a transposition-system, since the letter E did not appear. The letter E is, in many languages, used quite frequently. If this ciphertext had been produced by a transposition system, the author would have managed to avoid using this letter at all. This is why it was far more reasonable to assume it had originated from some form of substitution.

The letter G could, for example, have originated from substitution for E, since it is most frequently used in the ciphertext. Yet one has to remain cautious, since the text is very short, and the statistics are therefore rather insignificant. In the army they had taught me that it would suffice to remember SENIORITA, a word containing the most frequently used letters in English.

I decided to assume a monoalphabetic substitution first. In this cipher only one letter is substituted at a time, but how could I crack that code? What substitution had produced this ciphertext? What did this table look like? After having a closer look at the frequencies, I recognized a pattern that I had seen before.

I took today's issue of the Washington Post that was still lying around on my office desk and counted some letters in English plaintext, and there it was: the pattern I had been looking for.

Figure: Unilateral frequency distribution of English plaintext.
\includegraphics[scale=0.6]{img/ex1-freq-code.eps}

The frequencies for the letters E,F,G, and H in English plaintext were 16%,3%,1%, and 5%. The frequencies for the letters P,O,N,M (note the reverse alphabetic order) in the cipher I wanted to break were 14%,0%,1%, and 5%. I admit, there could have been better evidence, but this similarity was all I had.

I wrote down an alphabet, associated E with P, F with O, G with N, and H with M. Thereby, I had in fact broken four codewords. I simply continued the alphabet in the same direction, writing down letters and their substitution in a table that I thought could have been the code (Figure [*]). I used that table to replace characters in the ciphertext and realized that it made perfect sense (Figure [*]). Well, it made some sense at least, but it was undoubtably English plaintext.

Figure: Two similar patterns.
\includegraphics[scale=0.6]{img/ex1-code.eps}

Figure: The cleartext.
\includegraphics[scale=0.6]{img/ex1-clear.eps}
Donald H. Rumsfeld
Feb. 12, 2002, Department of Defense news briefing

Jan-11: Alice's Diary

Dear Diary, Today, another one of my messages to Bob was intercepted and decoded. The EESA (Eve's Evil Security Agency) called me up this morning and warned me that I would get in serious trouble if I didn't stop wasting their time, sending around encrypted Rumsfeld-quotes. Apart from the question whether, or whether not, Rumsfeld-quotes constitute a serious threat to national security, I asked myself how I could overcome this weakness in my cipher.

Figure: A code for a homophonic cipher.

                                                    934
                                                    863
                                                    822
                617                                 348
                217                                 435
                978                                 769
                132                                 195 239
                242                                 368 773                             437
                406                                 896 301                             259
                276                                 279 790                             991
                311                                 122 110                             475
                148                                 405 802                             154
                238                                 076 210                             571
                362                                 581 517                             744
                364                                 843 626                 537         443
                092                     145         740 928                 341         833
                913                     780         119 910                 086         187
                485                     444         569 897             776 861         530
                591                     363         173 003             212 550         915
034             662                     588         963 941         261 178 890         169
121             722         630         243         719 093         801 245 430         126
369             199         179         474         346 635         168 163 075         803
857             248         417         919         968 104         837 912 929         712
511             095         370         411         618 125         300 693 796         050
533             755         355         705         359 760         384 083 634         628
241             315         167         479         920 783         531 449 674 636     373
082         166 345         298 720     158         052 436         313 434 738 812     033
458         478 921     196 360 408     989 621     974 800         289 516 170 513     365
469 251     037 937     302 551 186     498 642 942 016 514         772 156 204 975     647     529

 A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z

Somehow, I had to get rid of the statistic ``fingerprints'' my substitution-cipher left in the frequency distribution. Then I had a simple idea, depicted in Figure [*]. Once again, I took my favourite piece of plaintext (Figure [*]). Instead of counting the number of times each of the letters occurred, making a mark for each occurence, I would allocate a unique number each time a letter occurred.

Then I submitted Figure [*] to Bob over a trusted channel. The good thing was that, using this code, I could send messages to Bob, without having to repeat myself. For example, if I wanted to send the message

TEX2HTML_DEFERRED
I could simply look up one of the codewords I had allocated to H, e.g. 551, then one I had allocated to E, e.g. 617, then 642 for L. Then the letter L would appear again. The whole idea of this so-called homophonic ciphercipher@homophonic, is that it is better, in this situation, not to use the codeword 642 again, since this would leave an observable pattern in the frequency-distribution. But, since I had allocated several codewords, all of which decode to L, I could simply use a different one, namely 621, so the second occurence of L would be encoded differently, and finally the O could be encoded to 514. What Eve and Bob would be left with is the ciphertext,
TEX2HTML_DEFERRED
since I didn't have a suitable alphabet for encoding numbers that large. (Maybe, I could have used traditional Mandarin signs).

By allowing this kind of ambiguity, I had actually added random information to the cryptogram, since I could make a random choice and, if Bob wanted, he could theoretically have interpreted it in some way. Practically, there was, by definition, no interpretation, so Bob could safely disregard this information as useless noise, while, for Eve, this information would seriously add to the complexity of the decoding process.

Figure: The same ciphertext, encoded with the homophonic code.

   A    S    W    E    K    N    O    W    T    H    E    R    E    A    R
 469  156  647  937  498  016  514  365  204  551  921  772  345  458  289

   E    K    N    O    W    N    K    N    O    W    N    S    T    H    E
 315  989  974  800  033  052  158  920  436  373  359  516  170  360  755

   R    E    A    R    E    T    H    I    N    G    S    W    E    K    N
 313  095  082  531  248  738  298  186  618  302  434  628  199  479  968

   O    W    W    E    K    N    O    W    W    E    A    L    S    O    K
 783  050  712  722  705  346  760  803  126  662  241  642  449  125  411

   N    O    W    T    H    E    R    E    A    R    E    K    N    O    W
 719  104  169  674  167  591  384  485  533  300  913  919  963  635  915

   N    U    N    K    N    O    W    N    S    T    H    A    T    I    S
 173  975  569  474  119  093  530  740  083  634  355  511  796  408  693

   T    O    S    A    Y    W    E    K    N    O    W    T    H    E    R
 929  941  912  857  529  187  092  243  843  003  833  075  370  364  837

   E    A    R    E    S    O    M    E    T    H    I    N    G    S    W
 362  369  168  238  163  897  942  148  430  417  720  581  196  245  443

   E    D    O    N    O    T    K    N    O    W    B    U    T    T    H
 311  037  910  076  928  890  588  405  626  744  251  513  550  861  179

   E    R    E    A    R    E    A    L    S    O    U    N    K    N    O
 276  801  406  121  261  242  034  621  178  517  812  122  363  279  210

   W    N    U    N    K    N    O    W    N    S    T    H    E    O    N
 571  896  636  368  444  195  802  154  769  212  086  630  132  110  435

   E    S    W    E    D    O    N    T    K    N    O    W    W    E    D
 978  776  475  217  478  790  348  341  780  822  301  991  259  617  166

   O    N    T    K    N    O    W
 773  863  537  145  934  239  437

Then I did the same with the longer text (Figure [*]). Since I had used this text to determine how many codewords I would need for each letter, I didn't have to use any codeword twice. The ciphertext was therefore simply a sequence of codewords with a completely uniform distributiondistribution@uniform, i.e. with every codeword appearing exactly once. Perfect security!

Jan-13: Eve's Diary

Dear Diary, Today they sent me another code to break. Once again Alice had sent a message to Bob, not wanting the EESA to know what was inside that message. Apparently, Alice has something to hide, and Bob is somehow involved.

Figure: The pure ciphertext.

469 156  647 937  498 016  514 365  204 551  921 772  345 458  289 315
989 974  800 033  052 158  920 436  373 359  516 170  360 755  313 095
082 531  248 738  298 186  618 302  434 628  199 479  968 783  050 712
722 705  346 760  803 126  662 241  642 449  125 411  719 104  169 674
167 591  384 485  533 300  913 919  963 635  915 173  975 569  474 119
093 530  740 083  634 355  511 796  408 693  929 941  912 857  529 187
092 243  843 003  833 075  370 364  837 362  369 168  238 163  897 942
148 430  417 720  581 196  245 443  311 037  910 076  928 890  588 405
626 744  251 513  550 861  179 276  801 406  121 261  242 034  621 178
517 812  122 363  279 210  571 896  636 368  444 195  802 154  769 212
086 630  132 110  435 978  776 475  217 478  790 348  341 780  822 301
991 259  617 166  773 863  537 145  934 239  437

I had a look at the ciphertext she had sent this time (Figure [*]) and quickly recognized that trying to break that code would be pointless. How can I find patterns in a sequence of numbers, each of which occurs exactly once? Of course I could always think of any patterns in there. The point is, no matter what interpretation I would come up with, I could make up just as many interpretations in my mind as there are plaintext messages this ciphertext could have originated from, so I might just as well guess the contents of the message.

I simply reported to my officer that there was no way I could ever break that code.

Jan-13: Alice's Diary

Dear Diary. I am currently in prison. How did I get here? I don't know. ``You asked for it'', was their comment when they dragged me into a police car and brought me here. And guess who was already waiting there when I arrived in prison? Bob. ``You and Alice, you two, apparently have something to hide, and if you don't want us to know, then it must be something evil!'', was what they had told him. That's what I call irony. If we had, in fact, hidden anything we would obviously never have landed in prison, would we? ...but this gave me an idea.


next up previous contents
Next: 1. Introduction Up: Towards Linguistic Steganography: A Previous: List of Figures   Contents
Richard Bergmair 2005-01-31