CSCI 1100 — Computer Science 1 Homework 6

Sets and Files

Overview

This homework is worth 100 points toward your overall homework grade, and is due Thursday,

November 7, 2019 at 11:59:59 pm. It has two parts, each worth 50 points. Please download

hw06_files.zip. and unzip it into the directory for your HW 6. You will find multiple data files

to be used in both parts.

The goal of this assignment is to work with sets and files. Most of the file processing is pretty

simple. The actual work lies in manipulating the data from files using sets. You must use sets in

both parts; both to get points but also to make sure your program runs in a reasonable amount

of time. An incorrect implementation that does not use sets, or one that inadvertently converts

from sets to lists will consume unnecessary amounts of processing power on Submitty and may not

complete, especially for part 1.

Please remember to name your files hw6 part1.py and hw6 part2.py.

As always, make sure you follow the program structure guidelines. You will be graded on program

correctness as well as good program structure.

Remember as well that we will be continuing to test homework assignments for similarity, so follow

our guidelines for the acceptable levels of collaboration. You can download the guidelines from the

Course Materials section on Submitty if you need a refresher. Note that guidelines also forbid using

someone else’s code from a previous semester. Make sure the code you submit is truly your own.

Autocorrect fail...

How does autocorrect work? We use it almost every day. The idea behind it is very simple. If you

type a word that is not in my dictionary, I go through all possible ways you could have misspelled a

word (within reason) and check whether any of the corrections of misspellings is in my dictionary,

if so I return it as my prediction.

Fair warning: we will solve a simplified version of autocorrect in this part. You must use sets. We

will revisit the same problem in HW 7 with a more complex solution using dictionaries. You will

be able to reuse most of what you write here at that time, so spend some time to structure your

code well, and use functions to make it more modular. Think about the future you of two weeks

from now trying to read and modify your code, and be nice to that person!

To solve this problem, your program will read the name of two files: the first containing a dictionary

of words and the second containing a list of words to autocorrect. Both files have a single word per

line.

Your program will read the words from the dictionary into a set. All operations that check for

membership of words in the dictionary or for finding the common words within a dictionary must

be done with sets.

Your program will then go through every single word in the input file and autocorrect each word.

To correct a single word, you will consider the following:

FOUND If the word is in the dictionary, it is correct. There is no need for a change. Print it as

found, and go on to the next word.

DROP If the word is not found, consider all possible ways to drop a single letter from the word.

If any of them are in the dictionary, then print the word as corrected with drop, and stop.

For example, quinecunx can be changed to quincunx by dropping e.

SWAP If the word is not yet corrected, consider all possible ways to swap two consecutive letters

from the word. If any of one of these in the dictionary, then print the word as corrected with

swap, and stop. For example, serednipity can be transformed to serendipity by swapping

the letters d and n.

REPLACE If the word is not yet corrected, consider all possible ways to change a single letter

in the word with any other letter from the alphabet. You can hardcode a list of all letters in

the alphabet for this part:

letters = [ a✬, ✬b✬, ✬c✬, ✬d✬, ✬e✬, ✬f✬, ✬g✬, ✬h✬, ✬i✬, ✬j✬, ✬k✬, ✬l✬, ✬m✬, ✬n✬, ✬o✬, ✬p✬, ✬q✬, ✬r✬, ✬s✬, ✬t✬, ✬u✬, ✬v✬, ✬w✬, ✬x✬, ✬y✬, ✬z✬]

If any of these are in the dictionary, then print it as corrected with replace, and stop. For

example, sockpolager can be changed to sockdolager by replacing p with d.

NO MATCH If the word is not corrected by any of the above steps, print NO MATCH.

If there are multiple matches in any step, return the match that is smallest lexicographically. This

is not super meaningful, but it will do for now. Assume words only contain English letters, no

hyphens, quotations, or other non-alphabetical symbols.

Implement the potential ways to do autocorrection in the order given to you: Match, Drop, Swap,

and Replace. You will get plenty of partial credit for all of your possible matches. When printing

words and matches, format them to 15 characters.

An example of the program run (how it will look when you run it using Spyder IDE) is provided in file hw6 part1 output 01.txt. (In order to access this file, you will need to download file

hw06_files.zip from the Course Materials section of Submitty and unzip it into your directory

for HW 6.) Note that this example output uses a specific dictionary file. We will use a more

extensive dictionary on Submitty, so your results will vary during submission.

When you are sure your homework works properly, submit it to Submitty. Your program must be

named hw6 part1.py to work correctly.

To finish up, you can see problems with this approach. Some later matches may have been better

than earlier ones for some words, but we considered them in the strict order. We also did not

consider some potential misspellings like adding a letter or phonetic misspellings of werdz! We

still would like to limit the potential changes to the most likely ones, like keeping in mind what

keyboard people are using or which words are more common within the language. We will address

some of these in the next homework by adding to this solution.

Fantastic beasts in the wizarding world of CS1...

In this part, you will use a file containing various beasts that were featured in different titles in the

Harry Potter series of books and movies. We give you a simple data file called titles.txt to test

your program, but we will be working with a more comprehensive one on Submitty.

Each line of the file contains a title (movie or book), followed by all the beasts featured in this title.

For example, the following line:

Harry Potter and the Goblet of Fire|Hippocampus|Merpeople|Niffler

mentions that the title Harry Potter and the Goblet of Fire featured three beasts: Hippocampus,

Merpeople, and Niffler. You can assume that the data is valid and that splitting on the '|' string

will correctly parse the lines in titles.txt.

Your program must do the following:

❼ Read the name of a title from the user. The user will enter any part of the title (in any case,

upper or lower). Find the first title in the list that contains the input string. For example,

the user may enter FIRE, which will match the above line.

If no match is found, print the message.

If a match is found, then do the following:

– Print all the beasts that were featured in this title in lexicographical order.

– Print all the other titles that have at least one beast in common with this title in

lexicographical order.

– Print all the beasts that were featured only in this title (i.e., no other title has these

beasts), again in lexicographical order.

Consider which set operation will help with the last two operations . . .

❼ Once you have finished processing the input (match or no match), ask for another input, until

the user inputs stop.

Your output will likely contain long lists of values. It must be formatted nicely! If your output is

correct but is not formatted properly, you will lose some points but not all. However, Submitty

may not catch this and you may only get points back when your TA reads your homework. In

either case, your formatting must be identical to that expected in Submitty to get full credit.

Each list you print should be separated with a comma and a space. You will find that using

print() to achieve this is difficult. It is easier to construct a string with the information you want,

for example: line = val1 + ", " + val2, and then print the line.

If you have a very long line, you can use the textwrap module to break it into a list of multiple

lines. Here is an example use of this module.

>>> import textwrap

>>> par = "this is a very long line with many unnecessary words using it to show how

textwrap works"

>>> lines = textwrap.wrap(par)

>>> lines

[✬this is a very long line with many unnecessary words using it to show✬, ✬how textwrap

works✬]

>>> for line in lines:

... print(line)

...

this is a very long line with many unnecessary words using it to show

how textwrap works

An example of the program run (how it will look when you run it using Spyder IDE) is provided

in file hw6 part2 output 01.txt.

When you are sure your homework works properly, name it correctly as hw6 part2.py and

submit it to Submitty.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

program编程讲解、sql编程语言... 2024-10-08
sehs4517讲解、java/c++编程设... 2024-10-08
辅导 java编程、讲解 python/c... 2024-10-07
讲解 program、辅导 python/ja... 2024-10-07
ma2552讲解、辅导 matlab设计... 2024-10-07
fit5147辅导、讲解 css/java编... 2024-10-07
讲解 c++程序、辅导 python/ja... 2024-10-07
讲解 bism7202 information sy... 2024-10-07
讲解 desc9169: daylight in b... 2024-10-07
辅导 construction technology... 2024-10-07
讲解 s322 assignment 1 – fa... 2024-10-07
辅导 assessment 2: fully-dev... 2024-10-07
讲解 fite7410 financial frau... 2024-10-07
讲解 me 588, dynamics and vi... 2024-10-07
讲解 ccc8013 the process of ... 2024-10-07
讲解 apam 2417: contemporary... 2024-10-07
辅导 stats 2da3 fall 2024 as... 2024-10-07
辅导 econ 372 intermediate m... 2024-10-07
辅导 af6036_ld6077 risk in f... 2024-10-07
辅导 stat 3201: practice mid... 2024-10-07

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！