首页 > > 详细

CS602留学生讲解、辅导Programming、讲解Python语言、Python编程设计调试辅导Python程序|解析Haskell程

CS602 - Data-Driven Development with Python Fall 2019 Programming Assignment 6
1
Programming Assignment 6
Getting started
Review class handouts and examples, work on the reading and practice assignments posted on the course
schedule. This assignment is designed to practice data manipulation with Pandas and plotting with
Matplotlib.
Programming Project: Plotting worth: 25 points
Create a plot and barcharts visualizing hotel ratings.
Data and program overview
In this assignment you will be working with data on hotel reviews. The task will be to create a plot showing
mean ratings and number of reviews for a selection of hotels in a chosen state, and a barchart that shows
percentage of reviews.
The following data will be provided using csv files:
• A table with information on hotel location (hotels.csv); we will call this the hotel data.
• A table with records of customer reviews of their stays in the hotels (hotelreviews.csv); referred to
henceforth as the reviews data.
Each review references the hotel name and city; these parameters uniquely identify the
corresponding hotel in the hotel data.
The files are supplied in a zip file, which will create a data subfolder, when unpacked. Review the data
before you read the rest of this handout.
Overview of the program
The program should work as follows.
1. Ask the user for the subfolder and names of the two data files (see interaction).
2. Ask the user to enter a state, verifying that the state is one of the states for which hotel
information is available in the hotel data. If user input for the state was not found in the
appropriate column in hotel data, user input must be repeated, until a valid state is entered.
3. Identify all cities with a hotel in the state, based on the hotel data. Provide a numbered sequence of
cities in the specified state and ask the user to enter up to four numbers from the list. Input should
be repeated until the user enters one to four numbers from the numbered list. You may assume the
user will be entering numbers only.
4. Identify all hotels that are located in the selected city(s) (you can assume city names are unique
across all states). Display the names of these hotels.
5. Display a hotel reviews plot (described below), and save the plot as plot1.jpg file using
plt.savefig() function.
6. For the three highest rated hotels among the selected, display a rating percentage barchart
showing percentage of reviews with specific ratings (described below). Save these plots as
barchart1.jpg, barchart2.jpg, barchart3.jpg,
Sample interactions: user input appears in boldface with the generated plots shown after the text of the
interaction.
Please enter names of the subfolder and files: data hotels.csv hotelreviews.csv
CS602 - Data-Driven Development with Python Fall 2019 Programming Assignment 6
Please enter state, e.g. MA: MA
1 Auburn
2 Boston
3 Brockton
4 Cambridge
5 Fitchburg
6 West Springfield
dtype: object
Select cities from above list by entering up to four indices on the same line: 2 4
You have selected the following cities:
city
2 Boston
4 Cambridge
Displaying rating information for the following hotels:
name city province
0 The Inn @ St. Botolph Boston MA
1 40 Berkeley Hostel Boston MA
2 A Bed & Breakfast In Cambridge Cambridge MA
3 Holiday Inn Express Hotel and Suites Cambridge Cambridge MA
Exiting...
CS602 - Data-Driven Development with Python Fall 2019 Programming Assignment 6
3
The following interaction demonstrates how invalid input should be handled and the messages to be
shown for invalid input (highlighted). The graphs are omitted.
Please enter names of the subfolder and files: data hotels.csv hotelreviews.csv
Please enter state, e.g. MA: Provence
We have no data on hotels in Provence
Please enter state, e.g. MA: az
We have no data on hotels in az
Please enter state, e.g. MA: AZ
1 Eloy
2 Glendale
3 Mesa
4 Payson
5 Phoenix
6 Prescott Valley
7 Tucson
8 Wellton
dtype: object
Select cities from above list by entering up to four indices on the same line: 5 6 8
10
Selection must range from 1 to 8
Select cities from above list by entering up to four indices on the same line: 1 2 3
4 5 6 7
You selected 7 items, must select up to four
Select cities from above list by entering up to four indices on the same line: 5 7
You have selected the following cities:
city
5 Phoenix
7 Tucson
Displaying rating information for the following hotels:
name city province
0 La Quinta Inn and Suites Tucson - Reid Park Tucson AZ
1 La Posada Lodge & Casitas, An Ascend Hotel Col... Tucson AZ
2 Residence Inn By Marriott Tucson Williams Centre Tucson AZ
3 Holiday Inn Express & Suites Phoenix Downtown ... Phoenix AZ
4 Park Terrace Suites Phoenix AZ
Overview of plots
1. Hotel reviews plot is the first plot shown in the interaction.
For each of the hotels in the selected cities, this plot visualizes the number of reviews as a
coordinate on the x axis and the average rating, as a coordinate on the y axis. Hotels must be
displayed as colored points, annotated with the hotel name, and (for full credit) using a color
corresponding to a city, as shown in the plot legend. Axes must be clearly labeled as shown, and the
title should be as shown.
Do not worry about the hotel names overlapping due to placement of annotations, or extending
beyond the plot boundaries.
2. Rating percentage barchart is generated for three of the top-rated hotels. Each barchart
displays a bar graph produced using the Matplotlib function plt.bar(), showing what percentage
of all reviews have the specific rating (1 through 5). This percentage is computed by calculating
CS602 - Data-Driven Development with Python Fall 2019 Programming Assignment 6
the total number of reviews with the rating and dividing it by the total number of reviews for the
hotel.
The percentage value must be displayed clearly on top of the bar. Axes must be clearly marked and
labeled as shown, and the title of the chart should include the name of the hotel, its city and state
information as well as the total number of the reviews of the hotel.
Required Functions
Include
• main() to read the location of the input files and call other functions to run the whole program;
• function pickStateAndCities () that will run the state and city selection procedure and return
user chosen state and all cities in it;
• function selectHotelReviews() to select and return reviews for the hotels in the selected cities, so
that the Hotel reviews plot can be generated;
• functions reviewsRatingsPlot() and ratingPercentageBarchart() to generate and save the
appropriate plots.
Pick function parameters and return values as you see fit, and define other functions as needed.
General Requirements
• You can assume that the provided files will have all of the columns involved in the required
computations, but the number and content of records and order of columns may be different.
• Your program should have no code outside of function definitions, except for a single call to main()
and global variables described in the next bullet.
• In order to make the code easier to modify for a different set of column names, define global variables
that store the names of columns that your program uses (e.g. CITY = 'city') and use the global
variables throughout your code.
• All file related operations must use device-independent handling of paths (use os.getcwd() and
os.path.join() functions to create paths, instead of hardcoding them).
Submission and Grading
Submit your code along with the image files that your program will generate for the input data contained in
the first sample interaction. Grading will be based on the accuracy (conforming to all the requirements and
format of the interaction), generality of code and the appropriate use of pandas/numpy/matplotlib resources
(data structures and functions). Two points will be awarded for programming style.
Created by Tamara Babaian on November 23, 2019

 

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!