首页 > > 详细

辅导BUSS6002辅导留学生R设计、R编程解析

BUSS6002 2020S1 1 
BUSS6002 Group Assignment 
Due Date: Wednesday 3 June 2020 
Value: 25% of the total mark 
Rationale 
 
This group assignment has been designed to allow students to apply their data science 
skills on a real-world problem in business domains, as well as to help students develop 
collaborative skills when working in a team. 
Instructions 
1. Required submission items via Canvas: 
1. ONE written report (PDF format). 
• Assignments > Report Submission 
2. ONE Jupyter Notebook .ipynb 
• Assignments > Upload Your Code File 
3. ONE csv file of test results 
• Assignments > Submit Your Test Results 
2. The assignment is due at 17:00pm on Wednesday, 3 June 2020 AEST. The late 
penalty for the assignment is 5% of the assigned mark per day, starting after 
17:00pm on the due date. The closing date Wednesday, 10 June 2020, 17:00pm 
AEST is the last date on which an assessment will be accepted for marking. 
3. As per anonymous marking policy, please include the Group ID and Student IDs of 
all group members. Do NOT include names. The name of the report and code file 
must follow: GroupID_BUSS6002_2020S1, and the name of test results must 
follow: GroupID_Test_Results.csv. 
4. Your analyses and answers should be provided as a final report that gives full 
explanation and interpretation of any results you obtain. Output without 
explanation will receive zero marks. You are required to also submit your code that 
can reproduce your reported results, as reproducibility is a key component to data 
science. Not submitting your code will lead to a loss of 50% of the mark. 
5. Be warned that plagiarism between individuals is always obvious to the markers of 
the assignment and can be easily detected by Turnitin. 
6. Presentation of the assignment is part of the assignment. There will be 10% marks 
for the presentation of your final report and/or code. 
7. Numbers with decimals should be reported to the third-decimal point. 
Meeting Minutes and Peer Review 
 
1. Each group is required to submit at least 3 meeting minutes as the appendix 
attached to the final report. A template will be provided for preparing meetings 
minutes. You may use the template provided or a template you choose. 
2. We may ask for peer review from each student within a group. The instructions 
about how to do this will be released later. 
BUSS6002 2020S1 2 
3. Each group will be awarded a group mark as per the marking criteria. Individual 
adjustments to grades may be made if there is a dispute in a group or the 
quality/quantity of contributions made by individuals are significantly different. In 
such a case the unit coordinator will seek meeting minutes and peer review reports 
from individuals within a group to decide on individual marks. 
4. If you encounter any issues with your group members, please report and discuss 
with your unit coordinator as early as possible. 
 
Group Competition 
 
A competition will be run among groups to rank the performance of your models on the 
test data provided. The top 5 groups will be awarded with bonus marks to top up their 
overall assignment mark: the top 3 groups will receive an extra 5 marks, and the 4th and 5th 
groups will receive an extra 3 marks. 
Project Description and Dataset 
Nowadays, e-commerce has revolutionized the way companies do business and consumers 
make purchasing decisions. It has become common practice for consumers to use online 
reviews to inform their decision making and give opinions about their buying experience. 
Companies and individuals are increasingly using such data to better understand their 
audience and make better decisions. Through analyzing consumer opinions towards their 
products, companies can develop comprehensive insights to customers’ experience, and 
use this to improve their offering, build a better brand and improve their business. 
Individual consumers can check the opinions of existing users of a product to help them 
make wiser purchase decisions. 
 
Suppose you are now working in a Data Science Team for an online clothing retailer. The 
company has noticed a recent decline in their net promoter score which measures the share 
of customers who would recommend the company to a friend or colleague. Management 
suspects that this is the result of a recent change in their procurement strategy for some of 
their departments and they tasked you to understand what customers are thinking about the 
current collection. To facilitate this, you have been provided with a dataset that consists of 
detailed product descriptions and classifications of recently sold items and the reviews 
written by customers. Your team is tasked to analyze this dataset and report your findings 
to assist the company in improving its appeal to consumers, with the following research 
objectives: 
 
• Describe how recommendation and rating patterns are affected across departments 
and product types. 
• Understand the shopping behavior of consumers and assess how age would affect 
the buying and reviewing behavior. 
• Conduct an analysis and build a predictive model to understand what influences a 
customer’s decision to recommend a product. 
 
There are two data files provided: product_train.csv and product_test.csv. 
Only product_train.csv contains the target variable: Recommended, where 1 
indicates that the customer recommends the product and 0 indicates he/she does not 
recommend the product. The details of the features presented in the above datasets are 
given in dictionary.csv. As it may not be feasible to directly use some of these 
BUSS6002 2020S1 3 
features (in particular, reviews represented as raw text) to build a model, one of your tasks 
is to carefully extract or construct meaningful features as input to your analysis. 
 
Tasks 
Data Understanding: Conduct a thorough EDA to gain a better understanding of the 
given data and business objectives. This includes but not limited to: checking/dealing with 
missing data and outliers if any; top popular items sold and their characteristics; 
recommendation and rating patterns across departments and product types. buying and 
reviewing behavior of different age groups, etc. Carefully present your analysis and 
findings in your report. 
 
Build a Benchmark Model to Predict Recommendation: Build a simple logistic 
regression model to assess the feasibility of recommendation prediction and establish a 
baseline model. For this task, you are required to build your baseline model using bag of 
words of the review text only. Use scikit-learn’s logistic regression model with “solver” 
set to ‘liblinear’ and all other parameters set to default. Use scikit-learn’s CountVectorizer 
with “max_features” set to 500 and all other parameters set to default. You need to choose 
appropriate evaluation metrics and model evaluation strategies to validate your model. 
Present your analysis and discuss your findings. 
Improving Your Benchmark Model: You are required to make attempts to improve the 
performance of your benchmark model as much as you can. You should consider using 
more advanced feature engineering techniques and adding extra features to rebuild your 
model. Your choice of decisions should be justified based on the evidence from the data 
and accompanied by detailed explanation. You must properly validate your model and 
optimize appropriate hyperparameters that apply. Simply building a model without any 
consideration of validation and optimisation does not meet the minimum requirements. 
You should demonstrate evidence of your efforts and you will be assessed based on the 
depth of your exploration. Provide a summary of what has worked and what has not. 
Report on your improved models and make comparisons with the benchmark model. 
Note: You must use logistic regression and no other models are allowed for this task. 
Interpreting Results: Decide on your best model and provide analysis and interpretation 
of its behavior. For example, you may report on the features associated with 
positive/negative recommendation. For your interpretation, you should focus on 
identifying general rules that might be useful for the company to improve its business in 
the future. 
Final Test Results: Finally, apply your best model on the test data. You are asked to 
report the classification results on the test data. Save your results into a csv file containing 
two columns, one for the Review Index (ID from product_test.csv) and the other 
column Recommended for the predicted labels (1’s or 0’s). An example file of test results 
test_results_example.csv is also provided. Name your file as 
GroupID_Test_Results.csv. The results on the test data will be assessed to decide your 
group performance among the entire class (group competition!). 
 
BUSS6002 2020S1 4 
Presentation 
• The assignment material to be submitted will consist of a final report that: 
 
1) Takes a research article form in which you shall have a number of sections 
such as introduction, methodology, experiment results, 
findings/interpretation, and conclusion. All references should be properly 
cited and take a full bibliographical format. Here are a few examples 
http://cs229.stanford.edu/proj2015/007_report.pdf 
http://cs229.stanford.edu/proj2015/188_report.pdf 
http://cs229.stanford.edu/proj2015/031_report.pdf 
 
2) Details ALL steps and decisions taken by the group regarding requirements 
above. 
3) Demonstrates an understanding of the problem being addressed and the 
relevant principles of data science techniques used. 
4) Clearly and appropriately presents any relevant graphs and tables. 
 
• The report should be NOT more than 20 pages with font size no smaller than 
11pt, including everything like text, figures, tables, small sections of inserted code, 
etc., but excluding the cover page and the appendix containing the meeting 
minutes. Think about the best and most structured way to present your work, 
summarise the procedures implemented, support your results/findings and prove 
the originality of your work. 
• Your code submission has no length limit, however, make sure your code is as 
concise as possible and add comments when necessary to explain the functionality 
of your code segments. 
• Your group is required to submit at least 3 meetings minutes. Your group may use 
the provided template for preparing meeting minutes. Documentation should 
include attendance, discussion points, actions decided, etc. You may use your own 
form or find something online. 
• You, as a member of a group, may be also required to submit your peer review. 
Please use the provided criteria sheet for this purpose. You will be advised how to 
use an online form when it becomes available. 
 
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!