Coursework Assessment Pro-forma
Module Code: CMT309
Module Title: Computational Data Science
Assessment Title: CMT309 Programming Exercises
Assessment Number: 3
Date set: 06-03-2020
Submission date and time: 08-05-2020 at 9:30 am
Return date:
This assignment is worth 40% of the total marks available for this module. If coursework is
submitted late (and where there are no extenuating circumstances):
1 - If the assessment is submitted no later than 24 hours after the deadline, the
mark for the assessment will be capped at the minimum pass mark;
2 - If the assessment is submitted more than 24 hours after the deadline, a mark
of 0 will be given for the assessment.
Your submission must include the official Coursework Submission Cover sheet, which can
be found here:
https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf
Submission Instructions
Your coursework should be submitted via Learning Central by the above deadline. You have
to upload the following files:
Description Type Name
Cover sheet Compulsory One PDF (.pdf) file Student_number.pdf
Your solution to question 1 Compulsory One Python (.py) file Q1.py
Your solution to question 2 Compulsory One Python (.py) file Q2.py
Your solution to question 3 Compulsory One Word (.docx) file Q3.docx
For the filename of the Cover Sheet, replace ‘Student_number’ by your student number, e.g.
“C1234567890.pdf”. Make sure to include your student number as a comment in all of the
Python files! Any deviation from the submission instructions (including the number and types
of files submitted) may result in a reduction of marks for the assessment or question part.
You can submit multiple times on Learning Central. ONLY files contained in the last attempt
will be marked, so make sure that you upload all files in the last attempt.
Assignment
Start by downloading the following files from Learning Central:
• Q1.py
• acronym_example1.txt, acronym_example2.txt, acronym_example3.txt,
acronym_example4.txt, acronym_tuples.txt
• Q2.py
• Q3.py
• Q3.docx
Then answer the following questions. You can use any Python expression or package that was
used in the lectures. Additional packages are not allowed unless instructed in the question.
Question 1 - What is the long form of the acronym? (Total 35 marks)
In this question, your task is to implement several functions that parse text strings for
acronyms and their long forms. Acronyms are abbreviations typically formed from the initial
letters of multiple words and pronounced as a word. For instance, the acronym "GPU" stands
for the long form "graphics processing unit". In this question, an acronym is defined as a
character sequence of at least two successive capital letters. Your task is to implement several
functions that together parse a text for acronyms and find their long forms.
As an example text, let us define the string
s = "A GPU, which stands for graphics processing unit, is different
from CPUs, says the IT expert. For some operations, a GPU is faster
than a CPU. GPUs are not always faster though."
Q1 a) Parse acronyms (10 marks)
Write a function read_file(filename) that receives as input a filename. The filename
includes the filepath. The function returns the entire content of the file as a single string.
Write a function find_acronyms(s) that receives as input a string s representing the text.
The function returns a list of acronyms. For our example above, find_acronyms(s) returns
the list ['GPU', 'CPU', 'IT']. Note: It is not important in which order the acronyms
appear in the returned list.
Q1 b) Find the long forms (15 marks)
In this question the hard work is done: given the acronyms, your task is to find their long
form in the text. To this end, write a function find_long_forms(s, acronyms). It receives
as input a string s representing the text and a Python list of acronyms. The function returns
a dictionary d with key-value pairs, where the key is the acronym and the value is its long
form. For instance, in our example above the output is the dictionary d = {'GPU' :
'graphics processing unit', 'CPU' : None, 'IT' : None}.
You can make the following assumptions:
• The long form is found in the same sentence as the acronym itself.
• If the acronym occurs multiple times in a text, its long form is found in the first
sentence that contains the acronym.
• Every '.' (dot) marks the end of a sentence. Sentences like "I talked to the Dr. and
raised my concerns." where dots are contained within the sentence will not occur.
• The first letter of the acronym is the same letter as the first letter of the first word of
the long form. All of the letters in the acronym need to appear in the long form.
• If no long form can be found for an acronym, it is set to None (Python's None type)
as in the dictionary above.
Four examples for texts with acronyms are given in the example files
acronym_example1.txt, acronym_example2.txt etc. The corresponding tuples
of (acronym, long form) are specified in the file acronym_tuples.txt.
Q1 c) Replace acronyms by long forms (10 marks)
Assume we want to make the document more self-explanatory and replace its acronyms with
their corresponding long forms. To this end, write a function replace_acronyms(s, d). It
receives as input a string s representing the text, and a dictionary d which contains