首页 > > 详细

CSCI 1300辅导、讲解c/c++程序室、辅导read_data、c++编程语言调试 解析Haskell程序|讲解留学生Prolog

CSCI 1300
Summer 2019
Assignment 4
Task 1
Task 1.1
Write a function called read_data that takes the following
inputs:
● An array of doubles -- call it A
● The size of the array -- call it size
This function will read up to size numbers from a file called numbers.txt. As the name
suggests, this file contains a space separated list of numbers. Each number read from the file
will be stored in A. The function will continue reading numbers from numbers.txt until either it
reaches the end of the file, or it fills up the array. The function will return the amount of
numbers that it was able to read.
Task 1.2
Write your main. Your main will test out the functionality of your read_data function by filling
an array with double values. Use the returned value from read_data to determine how many
numbers were read into the input array. Use that value as the new size of the array. Call your
file task1.cpp
Task 2
Task 2.1
Write a function called count_words that takes a single argument, a string
filename.
● The function will open a file with the provided filename for reading.
● The goal of this function is to count the number of occurrences of each word
in the file
○ We will define a “word” as a sequence of characters
separated by a whitespace character.
● We will accomplish this using an unordered_map data structure from the
C++ standard library (keyed by a string, storing an integer value).○ As we read a word in the file, we can count it’s occurrence in the
following way
// unordered_map keyed by string, and storing an
integer unordered_map word_counter;
// adds one to the count of the string
“some_word” word_counter[“some_word”] +=
1;
Once all words have been read and counted, we will then return the
unordered_map
Task 2.2
In your main, we wish to count the occurrences of words in
two files:
● trainneg.txt
● trainpos.txt We will use the count_words function to return
two unordered_maps -- one for each file. We will then write to two files:

count_neg.txt
● count_pos.txt Each file will contain the list of unique words in
trainneg.txt and trainpos.txt along with their respective word counts on
each line.
Format your files in the following way:
● WORD
● NUM_OCCURENCES
Name your file task2.cpp
Extra Credit 2 -- 30pts
Our trainneg.txt/trainpos.txt files are more than simply collections of words. Each
line in these files contain a movie review. trainneg.txt contains negative reviews andtrainpos.txt contains positive reviews. So in our previous task, we basically counted the
occurrences of words in both negative and positive movie reviews. We can use this data
to predict whether a particular movie review is positive or negative depending on the words
used in the review! Your task is to write a program that does the following:
main:
● Reads two files produced from our previous task:
○ count_neg.txt
○ count_pos.txt
● Stores the words and their respective counts
into two unordered_maps
A separate function:
● This function will take the following inputs:
○ a string that represents a single movie review
○ The two unordered_maps that contain the word counts for
words in positive/negative reviews
● The function will be used to classify whether or not this movie review is
positive or negative (What return type do you think you would want for such a
function?).
● How do we classify a movie review given our word
counts?
○ We can use a simple probabilistic model called the Naive Bayes
Classifier to accomplish this task.
○ I recommend reading the following book chapter to learn about the Naive
Bayes Classifier https://web.stanford.edu/~jurafsky/slp3/6.pdf
○ TLDR; We can use the word counts for positive/negative reviews along with
a strong simplifying assumption to calculate the probability that a review is positive
or negative.
○ For those of you who do not understand basic probability notation,
here is a basic primer
■ Let P(x) denote the probability that ‘x’ occurs■ Let P(x | y) denote the probability that ‘x’ occurs given that
‘y’ has occurred.
Back in main:
● Read two files:
○ testpos.txt
○ testneg.txt
● These files will contain movie reviews for us to test the effectiveness of our
classifier For each review in these files, classify it as either positive or negative
Count the number of correct classifications and determine the accuracy of your
classifier
See if you can get greater than 60% classification accuracy.
Name your file task2E.cpp
Zip all your files and submit it to Moodle.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!