# 代做CSCI 4146实验、代写Java，Python程序设计、Data编程代做帮做C/C++编程|代做R语言编程

CSCI 4146 - The Process of Data Science - Fall 2020
Assignment 1
The submission must be done through Brightspace.
Due date and time as shown on Brightspace under Assignments.
● To prepare your assignment solution use the assignment template notebook available
on Brightspace.
● The detailed requirements for your writing and code can be found in the evaluation rubric
document on Brightspace.
● Questions will be marked individually with a letter grade. Their weights are shown in
parentheses after the question.
● Assignments can be done by a pair of students, or individually. If the submission is by a
pair of students, only one of the students should submit the assignment on Brightspace.
● We will use plagiarism tools to detect any type of cheating and copying (your code and
PDF).
● Your submission is a single Jupyter notebook and a PDF (With the compiled results
generated by your Jupyter notebook). File names should be:
○ A1--.ipynb
○ A1--.pdf
● Forgetting to submit both files results in 0 markings for both students.
In this assignment, you will need to build a model to predict the price of an Airbnb listing.
1. Data understanding and preprocessing (0.1)
a. Build the data quality report
b. Identify data quality issues and build the data quality plan
c. Preprocess your data according to the data quality plan
i. What is the neighbourhood with the highest average rating?
ii. What are the major characteristics of this neighbourhood (e.g., type of
listing, host rating, etc)?
2. Spatial data (0.2)
a. Plot listings on the city map with different colours corresponding to the listing’s
neighbourhood
b. Mark the “State station” (lat, long = 42.3570174,-71.071191) subway station on
the city map.
c. Plot the distance between the closest and most distant listings to State station.
3. Build a model to forecasts the price of a listing (0.7)
a. Explain what is the task you’re solving (e.g., supervised x unsupervised,
classification x regression x clustering or similarity matching x etc)
b. Use a feature selection method to select the features to build a model. Include in
the resulting dataset the distance from State station and exclude the free-text
(such as descriptions, reviews) and rating features.
c. Select the evaluation metric. Justify your choice.
d. Build a baseline model
i. Perform hyperparameter tuning if applicable.
ii. Tran and evaluate your model
iii. How do you make sure not to overfit?
iv. Plot learning curve
v. Analyze the results
e. Build a candidate final model (can be repeated for multiple models but only
include the final selection)
i. Perform hyperparameter tuning if applicable.
ii. Tran and evaluate your model
iii. How do you make sure not to overfit?
iv. Plot learning curve
v. Analyze the results
f. Compare the two models with a statistical significance test. Use a box-plot to