CAB340讲解、辅导data留学生、Java编程语言讲解、辅导Python,c++
讲解SPSS|解析Haskell程序
CAB340 Assessment 1 Historical cryptanalysis,
Probability, Information theory
This is an individual assignment. It is acceptable to discuss the general approach with your fellow students and
the unit staff, but your answers should be your own.
Submission
There are 30 marks available in total for this assignment with part marks as shown, plus 5 marks of extra credit.
Your answers should be written in a report and submitted by the due date of Friday 11 September, 2019 at
11:59pm. Submission should be made online via Blackboard under Assessment > Assignment 1 submission.
Your submission must be in PDF format. If you are submitting a scan, make sure that it is easy to read.
1 Cryptanalysis of historical ciphers
This part of the assignment involves practical experiments with historical ciphers. It is allowed and expected
that you will make use of cryptanalytic software, such as the CrypTool1 and CrypTool2 packages, to perform
the experiments.
Although no other tools should be necessary, you are free to use any other software or even to write your
own little programs or shell scripts to solve the tasks at hand. Note that different versions of CrypTool have
slightly different capabilities; and in particular the java-based JCrypTool and browser-based CrypTool-Online
are unfortunately more limited than the Windows-only CrpyTool1 and CrypTool2.
For each experiment, describe clearly what outputs you found and explain why you think these results occurred
using the theory that we have studied in the lectures. Use tables and figures as appropriate.
Obtaining data
You need to obtain your individual folder of 4 ciphertext files. To do this, go to Blackboard and navigate to
Assessment > Assignment 1 and download the ZIP archive Assignment1-ciphertexts.zip (the archive is located
right next to the present PDF). Unzip the archive. It extracts to a folder containing 200 sub-folders of the form
. . .-Student, numbered from 000-Student to 199-Student.
Next, identify the last two digits of your QUT Student ID. If abcdefxy is your 8-digit QUT ID, then xy are
the last two digits. Your individual sub-folder is at your option either the one named 0xy-Student or the
one named 1xy-Student. Be sure to select either one of those two folders only. (You may delete all the other
sub-folders, which are useless to you.) Within your folder, you will find four files, 0.txt, 1.txt, 2.txt,
3.txt. These are the files that you will use for the questions in this task.
Be sure to write your full Student ID and the full name of the actual folder you used on your
report. No mix-and-match: you must use one folder only for the entire task, and that folder must be named
wxy-Student where xy equal the last two digits of your QUT ID and you get to choose w ∈ {0, 1}.
Cryptanalysis tasks
This problem solving tasks concern cryptanalysis of historical ciphers. You are given four ciphertext files. The
plaintexts were all written in English with a similar style of text. Each text is different and each is encrypted
with a different cipher. The four ciphers used, in random order in your set, are:
• random simple substitution cipher
• Vigen`ere cipher
• transposition cipher
• 2 × 2 Hill cipher
All plaintexts are written in English and use upper and lower case letters.
a. (3 marks) For each ciphertext, using either a table or a histogram, present the following characteristics:
• the 10 most common single-character frequencies;
• the 10 most common digram frequencies;
• the autocorrelation (as a function of the shift up to 10).
b. (3 marks) Explain briefly but clearly how each of the three characteristics measured in the preceding
part is expected to look for the ciphertext from each of the four cipher algorithms used.
Use the measured values to argue which ciphertext comes from which of the four ciphers.
c. (4 marks) Explain, with direct reference to the statistics that you found, the procedure you can use to
cryptanalyse each of the 4 ciphers. You are not expected to perform the cryptanalysis in this part — you
need to explain how it can be done in principle for each type of cipher and how the measured statistics
can help.
d. (4 marks) Now do the cryptanalysis for TWO ciphertexts of your choice, amongst the ones which you’ll
have identified as: (1) random simple substitution, (2) Vigen`ere, and (3) transposition — i.e., leave out
the significantly harder (4) Hill cipher as extra credit for the next question.
You can use the automatic tools in CrypTool or other where applicable, or write your own. If you are
using a tool, reference it. If you’re working by hand, you may need to calculate (for your own use) more
than the 10 most common frequencies that you wrote down in Task 1.
No need to decrypt the entire ciphertexts, especially if working by hand, but provide enough decrypted
text to convince us that your cryptanalyses succeeded.
e. Optional: extra credit (5 marks)
If you’re in for a little extra challenge, attempt to cryptanalyse the 2 × 2 Hill cipher as follows. Recall
that the attack on the Hill cipher we’ve seen in class is, at its core, a known-plaintext attack. Since we
don’t know the plaintext, we have to guess it, to make this into a ciphertext-only attack.
i. Determine the non-overlapping digram frequencies. Note that using the n-gram tool in CrypTool
directly will taint the statistics because it counts overlapping bigrams, which can start at every
position. (In the 2-by-2 Hill cipher, the relevant blocks start at every other position.) The nonoverlapping
digram frequencies can be obtained in CrypTool by:
Page 2
• defining the alphabet to include uppercase and lowercase letters with no space;
• using the format text document tool to remove spaces;
• using the format text document tool to split into blocks of size 2;
• using the n-gram analysis tool to count the bigrams.
(On MacOS and Linux, if you have a little bit of experience with basic Unix tools such as sed, grep
and sort, you can probably do it faster with a small shell script or directly on the command line.)
ii. Next, try to find the bigram which maps to ‘th’. This should be one of the most common of your
non-overlapping bigrams. Because spaces in the plaintext are preserved in the ciphertext you can
use the word structure of the original ciphertext to rule out most options. For example, ‘th’ will not
occur on its own as a word and you are likely to see it at the start of some 3-letter words.
iii. Next, try to find at least two more likely candidates for bigram plaintext/ciphertext pairs. Possible
plaintext bigrams you might search for are:
• ‘ti’ - this bigram cannot usually end a word or be a word on its own.
• ‘on’ - this can be a word on its own or part of another word.
• ‘he’ - this is likely to occur at the end of common 3-letter words and could be a word on its own.
iv. Once you have identified likely candidates for two bigrams, perform the known-plaintext attack that
we covered in the lecture, under the assumption that your guesses are correct.
You can get the full “extra credit” marks for this question without finding the plaintext as long as you
complete the method described above and show that at least two different reasonable candidate pairs do
not result in a sensible plaintext.
In any case, this “extra credit” question is time-consuming and entirely optional, so I recommend you
leave it until the end.
Practical hints
• The Hill cipher used here assumes that the character ‘A’ maps to the number 0. This is not the default
in CrypTool1. You can change this by clicking the “Further Hill options” button.
• The transposition cipher tool in both versions of CrypTool allow you to set various things to rows or
columns. The encryption was done reading in by rows, permuting by columns, and reading out by rows.
This is equivalent to the method used in class.
• CrypTool can be installed without special permissions, so you should be able to install it on lab computers
if needed.
• The java-based multi-platform JCrypTool and the platform-independent CrypTool-Online do not provide
all the tools or functionalities you’d want to use to complete all the tasks. If you use those versions,
be prepared to spend more time working out certain transformations by hand, or writing some basic
processing functions in your favourite programming or scripting language.
2 Probability
Imagine that you are playing the following game, involving three closed doors and a prize behind one of them.
You may have heard of this game before, perhaps presented to you in the form of a paradox, as some aspects
Page 3
of its strategies can be counter-intuitive. In this question, we’ll use probability theory to get to the bottom of
it and shine some bright light on any paradox.
In front of you are three closed doors, numbered 1, 2, 3. Behind one of the doors, call it D ∈ {1, 2, 3}, is a shiny
diamond (as you know: a highly valued form of pure carbon). Behind the two other doors are small lumps of
coal (also pure carbon, but considered much less valuable). You get to select one door, let’s call it S ∈ {1, 2, 3}.
You get to keep what is behind the door you open.
Clearly, the outcome you seek is S = D. You have control over S, but have no idea whatsoever how D was
selected. D could be uniformly random; D could be the constant 2; or D could be equivalent to the room
temperature modulo 3, you just don’t know. The only thing that you know is that D does not change once the
game has started, and in particular D is not affected by your choice of S.
a. (3 marks) Is there a strategy you can employ to choose S, so that, no matter how D was chosen, you
have a well defined prior probability of getting the diamond? (Hint: always picking the same door is not
the answer, and what you’re thinking next probably is.)
Justify your strategy as follows.
First write down the 3 × 3 joint probability table for Pr(D, S). Be sure to use placeholder mathematical
symbol(s) for any numerical probability(ies) which you don’t know — e.g., don’t make unstated assumptions
such as taking the prior distribution of D to be
); write it as (p1, p2, p3) with pi ≥ 0 and
p1 + p2 + p3 = 1 but otherwise unknown.
Then, using the table, determine as precisely as you can Pr(D = S), i.e., the marginal or prior probability
of the event of interest “D = S”.
Does your strategy induce an exact numerical probability for this event, even if there were numerical
unknowns in the joint table?
After you announce your choice of S, but before you can open that door, you are shown one of the other two
doors: call it B where B 6= S. You are told, truthfully, that door B is bad (B 6= D, i.e., it has coal behind it).
You are then given an opportunity to switch your selection from S to to the third door: call it T, where T 6= R
and T 6= B. Should you accept the offer to switch? In other words, should you open S or T? Let’s find out.
b. (3 marks) Assume for now that this “unexpected twist” was actually totally expected: you knew in
advance, maybe from the rules of the game, that — regardless of D and for any choice of S you make —
you would be shown a bad door B (B 6= S and B 6= D) and given the option to switch.
Under this rule of the game, write down the joint distribution Pr(D, S, B). Since this is a 3-dimensional
table, for clarity make a 2-dimensional table with axes D and S as you did above, and in each cell consider
all the values of B. Use as many additional undetermined placeholder symbols as you need to denote new
unknown probabilities. (Hint: you don’t know how D and B are chosen, but there will be constraints,
e.g., on what combinations of D, S, B can have non-zero probability, etc.)
Then, using the table, condition on the event that B = b for some specific value of b ∈ {1, 2, 3}: how do
the posterior probabilities of the respective events (D = S|B = b) and (D = T|B = b) compare?
Conclude whether or not you should be accepting the offer to switch.
(Hint: there are several ways to attack this problem; all will lead to the same conclusion, but some
more easily than others. Here I suggest that you investigate the effects of conditioning on B = b only.
“Without loss of generality”, you can even condition on B = b for b = 1. Now, you might think you should
be conditioning not just on B = b but on (S = s∧B = b), since by the time you’re told that B = b you have
Page 4
already announced S = s, but if you do that you’ll need to do some contortions to reach the conclusion
you want. Intuitively, you want to condition on B to see how learning B affects the probabilities that
D = S and D = T. However you don’t need or want to condition on S, as doing do might just unravel
the very benefit of the strategy for S which you studied in part (a) of this problem. One of the great
powers of rigorous probability theory is that you can make sound inferences while conditioning or not on
anything you want, but it takes a bit of flair (or trials) to figure out what inferences you want to make...)
c. (2 marks) Assume, contrarily to part (b), that the game host is adversarial and that any information
they tell you, while true, is trying to “manipulate” you in the worst possible way. Specifically, now the
host has discretion to tell you about B and offer you to switch — or not — based, e.g., on whether you
picked the good door or not.
With those slightly different rules, should you switch — as in: “definitely yes”, “neither here nor there”,
or “absolutely not”?
Briefly justify your answer in any way you see fit.
3 Information theory
AES128 is a symmetric cipher which uses a 128-bit key, which Kerchhoff’s principle dictates should be keyed
as uniformly at random as possible.
Due to a design flaw bug, the hardware random number generator we use turns out to have a (not so subtle)
bias. Specifically, when we use it to generate an AES128 key K, with probability 1
4
the generator gets stuck
and output a key made of 128 zero bits, 000 . . . 02. When the bug is not triggered, the generator functions
normally and outputs a uniformly distributed 128-bit random key (including possibly the all-zero key as one of
its random outputs).
a. (2 marks) Calculate the entropy of (the distribution of) the key K as generated by this generator.
b. (2 marks) Now, let F ∈ {0, 1} be a binary random variable such that F = 1 when the generator’s flaw
is triggered, and F = 0 when the generator is working properly.
Calculate the conditional entropy H(F|K) of F given K, and in one short sentence interpret the result.
c. (2 marks) Is this key generator secure (in the natural sense that: will an encryption system built that
way be “almost always” hard to break)? Briefly argue why or why not.
d. (1 mark) From the above, would you say that generating keys of sufficiently high entropy is: necessary;
sufficient; both; or neither ; for the cryptographic security of a system based on them?
e. (1 mark) Can you think of a possibly better metric than entropy to capture the security of a non-uniform
key distribution, but ideally with the same additive properties as entropy when combining independent
variables? (Hint: recall that the entropy is defined as a weighted average of something, but perhaps
averages are not the most suitable for this purpose?)
Page 5