OT1panrmn
Dear Diary, What am I doing here? Why exactly do they allow people to encrypt messages in the first place, if they employ people like myself, who have nothing better to do than to break these codes again?
Today they sent me another ciphertext to break by email, but
since I don't trust this modern stuff, I quickly scribbled
down the text on to a piece of paper (Figure
).
I soon realized that breaking this code could not be much of a challenge. The alphabet was pretty limited. Only the 26 English alphabetic characters occurred. Whoever had used this code was naive enough to provide me with a nicely laid out word length enciphermentcipher@word-length, so that blanks appeared in the enciphered message as cleartext, giving me valuable information about where single words began and ended.
The first thing I usually do in such cases is to check
the unilateral frequency distributiondistribution@unilateral frequency. So I wrote down
the characters and counted how often each of them
appeared in the ciphertext. I also wrote it down as a percentage
(Figure
).
The characteristics I found confirmed my suspicion that there was a natural language directly underlying the character-sequence, because some characters occurred much more frequently than others. One classic method for encryption is substitution, which is simply to replace letters in the cleartext by other letters, according to some systematic scheme, or to a table (the code), to make it unreadable. Another method is transposition, which is to change the order in which the letters appear in the cleartext.
I could quickly disregard the possibility of a transposition-system, since the letter E did not appear. The letter E is, in many languages, used quite frequently. If this ciphertext had been produced by a transposition system, the author would have managed to avoid using this letter at all. This is why it was far more reasonable to assume it had originated from some form of substitution.
The letter G could, for example, have originated from substitution for E, since it is most frequently used in the ciphertext. Yet one has to remain cautious, since the text is very short, and the statistics are therefore rather insignificant. In the army they had taught me that it would suffice to remember SENIORITA, a word containing the most frequently used letters in English.
I decided to assume a monoalphabetic substitution first. In this cipher only one letter is substituted at a time, but how could I crack that code? What substitution had produced this ciphertext? What did this table look like? After having a closer look at the frequencies, I recognized a pattern that I had seen before.
I took today's issue of the Washington Post that was still lying around on my office desk and counted some letters in English plaintext, and there it was: the pattern I had been looking for.
The frequencies for the letters E,F,G, and H in English plaintext were 16%,3%,1%, and 5%. The frequencies for the letters P,O,N,M (note the reverse alphabetic order) in the cipher I wanted to break were 14%,0%,1%, and 5%. I admit, there could have been better evidence, but this similarity was all I had.
I wrote down an alphabet, associated E with P,
F with O, G with N, and
H with M. Thereby, I had in fact broken four
codewords. I simply continued the alphabet in the same direction,
writing down letters and their substitution in a table
that I thought could have been the code (Figure
).
I used that table to replace characters in the ciphertext
and realized that it made perfect sense (Figure
).
Well, it made some sense at least, but it was undoubtably
English plaintext.
Dear Diary, Today, another one of my messages to Bob was intercepted and decoded. The EESA (Eve's Evil Security Agency) called me up this morning and warned me that I would get in serious trouble if I didn't stop wasting their time, sending around encrypted Rumsfeld-quotes. Apart from the question whether, or whether not, Rumsfeld-quotes constitute a serious threat to national security, I asked myself how I could overcome this weakness in my cipher.
|
Somehow, I had to get rid of the statistic ``fingerprints''
my substitution-cipher left in the frequency distribution.
Then I had a simple idea, depicted in Figure
.
Once again, I took my favourite piece of plaintext
(Figure
). Instead of counting the number of times
each of the letters occurred, making a mark for each occurence,
I would allocate a unique number each time a letter occurred.
Then I submitted Figure
to Bob over a trusted
channel. The good thing was that, using this code, I could
send messages to Bob, without having to repeat myself. For
example, if I wanted to send the message
TEX2HTML_DEFERREDI could simply look up one of the codewords I had allocated to H, e.g. 551, then one I had allocated to E, e.g. 617, then 642 for L. Then the letter L would appear again. The whole idea of this so-called homophonic ciphercipher@homophonic, is that it is better, in this situation, not to use the codeword 642 again, since this would leave an observable pattern in the frequency-distribution. But, since I had allocated several codewords, all of which decode to L, I could simply use a different one, namely 621, so the second occurence of L would be encoded differently, and finally the O could be encoded to 514. What Eve and Bob would be left with is the ciphertext,
TEX2HTML_DEFERREDsince I didn't have a suitable alphabet for encoding numbers that large. (Maybe, I could have used traditional Mandarin signs).
By allowing this kind of ambiguity, I had actually added random information to the cryptogram, since I could make a random choice and, if Bob wanted, he could theoretically have interpreted it in some way. Practically, there was, by definition, no interpretation, so Bob could safely disregard this information as useless noise, while, for Eve, this information would seriously add to the complexity of the decoding process.
|
Then I did the same with the longer text (Figure
).
Since I had used this text to determine how many codewords I would
need for each letter, I didn't have to use any codeword twice. The
ciphertext was therefore simply a sequence of codewords with a completely
uniform distributiondistribution@uniform, i.e. with every codeword
appearing exactly once. Perfect security!
Dear Diary, Today they sent me another code to break. Once again Alice had sent a message to Bob, not wanting the EESA to know what was inside that message. Apparently, Alice has something to hide, and Bob is somehow involved.
|
I had a look at the ciphertext she had sent this time
(Figure
) and quickly recognized that trying
to break that code would be pointless. How can I find patterns in
a sequence of numbers, each of which occurs exactly once? Of course
I could always think of any patterns in there. The point is, no matter
what interpretation I would come up with, I could make up just as
many interpretations in my mind as there are plaintext messages
this ciphertext could have originated from, so I might just as well
guess the contents of the message.
I simply reported to my officer that there was no way I could ever break that code.
Dear Diary. I am currently in prison. How did I get here? I don't know. ``You asked for it'', was their comment when they dragged me into a police car and brought me here. And guess who was already waiting there when I arrived in prison? Bob. ``You and Alice, you two, apparently have something to hide, and if you don't want us to know, then it must be something evil!'', was what they had told him. That's what I call irony. If we had, in fact, hidden anything we would obviously never have landed in prison, would we? ...but this gave me an idea.