辅导 BUSS6002、讲解 Python编程设计
BUSS6002 - Individual Assignment
Semester 1, 2024
Due Date
• Due: before 23:59
1 on Wednesday 22 May 2024 (week 13).
• A late penalty of 5% per day applies if you submit your assignment late without a successful
special consideration or simple extension.
Rubric Overview
This assignment is worth 30% of the unit’s marks. The assessment is designed to test your
technical ability and statistical knowledge in modelling a real-world dataset, as well as your
communication skills in writing a concise and coherent report presenting your approach and
results. Refer to the Rubric later in this document for speciffc details.
Submission Instructions
You must submit:
• a written report (.PDF) with the following fflename format, replacing 1234134 with your
own student ID: BUSS6002 Report SID1234134.pdf.
• a Jupyter Notebook (.ipynb) ffle with the following fflename format, replacing 1234134
with your own student ID: BUSS6002 Notebook SID12341234.ipynb.
You may submit multiple times before the due date. Your latest submission before the due
date will be marked
2
. If you wish to re-submit after the due date please send an email to
buss6002.admin@sydney.edu.au so that markers are notiffed of your new submission.
1You may submit up to 30 minutes late without penalty.
2The fflename on Canvas will change to include “-n” where n is the submission number. You can ignore this.
1Overview
On September 19, 2023 twitter user @purplepingers (Jordan van den Berg) launched shitrentals.org.
The site allows tenants to submit testimonies about landlords, property managers and rental
properties. The reviews are then publicly viewable and searchable with the address of the
property and the name of the agency visible. You have been given access to the data from
shitrentals.org
3
.
As a data-scientist-in-training, your task is to create a publishable research
report that investigates and reports on the factors that drive the perceived quality
of a rental property. The effect of each factor must be captured in a Generalized
Linear Model (GLM) of your choice. All analysis and model building must be performed
using Python and collated into a single Jupyter Notebook, which is to be submitted at the same
time as your report.
Report
Sections
A template for the report is provided in the assignment pack. Your report must contain:
• Abstract
• Introduction
• Methods
• Results and Discussion
• Conclusion and Limitations
• Bibliography
You may also include Appendices with additional details, ffgures and tables.
Requirements
• There is a limit of 2500 words for the report excluding tables, captions, bibliography and
appendices.
• Assume the reader of your report is a competent and trained data scientist or analyst.
They are familiar with the content of BUSS6002.
• All plots, computational tasks, and results must be completed using Python.
• Do not include any Python code as part of your report.
• All ffgures must be appropriately sized and have readable axis labels and legends (where
applicable).
Latex
Using LaTeX is highly recommended, though not required for this assignment. If you do not
have LaTeX installed locally we recommend that you use overleaf.com. All students can sign
up for an Overleaf Pro+ account via resource portal. If you’re new to Overleaf and LaTeX, help
is available via their free introductory course and tutorial video.
3The context for this assignment is real but the data is fake.
2Notebook
The submitted .ipynb ffle must
• contain all the code used in the development of your report,
• be runnable on an Ed environment, and
• must be free of any errors.
Data Description
The dataset contains 1000 property reviews collected between 1/1/2023 and 31/12/2023. To
simplify analysis the properties have been restricted to:
• Flats and Units
• 1 and 2 bedroom properties
• 3 suburbs close to the university (Camperdown, Redfern and Newtown).
Refer to the data dictionary for descriptions of the variables.
File Pack
A link to download the BUSS6002 Assignment Pack.zip is provided on canvas. The pack
contains:
• report/
– BUSS6002 Report SID1234134.tex (LATEXtemplate)
– IEEEtran.cls (LATEXstyle ffle)
– references.bib (BibTeX ffle)
• analysis/
– shitrentals.csv
– shitrentals dictionary.csv
– BUSS6002 Notebook SID12341234.ipynb (Jupyter Notebook Template)
Hints
The following resources may be useful:
• https://www.statsmodels.org/stable/examples/notebooks/generated/ordinal regression.html
• https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/
3Rubric
Criteria FA PS CR DI HD
Abstract and
Introduction
10%
The abstract is uninformative
and does not give readers a
clear understanding of the
paper’s content. It is missing
one or more of the following:
clear summary of purpose,
methods and results. The introduction does not
expand on the abstract by
providing a description of the
context of the paper or
motivation. The abstract is informative, giving readers some understanding of the paper’s content. It contains a mostly clear summary of the purpose, description of methods and results. The introduction expands on the abstract by providing a brief or vague description of the context
of the paper and
motivation. The abstract is informative, giving readers a clear understanding of the paper’s content. It contains a mostly clear summary of purpose, methods and results. The introduction expands on the abstract by providing a brief description of the context of the paper and motivation. The abstract is concise, informative, giving readers a clear understanding of the paper’s content. It contains a summary of topics, purpose, description of methods and results. The introduction expands on the abstract by providing a description of the context of the paper and motivation. The abstract is concise, informative, and engaging, giving readers a clear understanding of the paper’s content and significance. It contains a summary of topics, purpose, description of methods and results. The introduction expands on the abstract by providing a thorough description of the context of the paper and convincing motivation. Both of which are supported by
evidence from literature.Methods
40%
The description of the model
is either absent or severely
lacking, making it difficult to
understand its
implementation or rationale. Decision making lacks any
meaningful support from
evidence, with little to no
reference to data-based
exploration (EDA) or
established best practices. External resources are either
not cited or improperly
integrated into the discussion. The presented model, if any,
demonstrates a significant
mismatch with the problem
context: - the choice of the model is
inappropriate or irrelevant to
the problem context - variables are either poorly
selected or not utilized at all - there is no effort to control
for variables not of direct
interest to the study - overfitting is not addressed
A rudimentary description
of the model is provided. Decision making attempts
to be supported by
evidence, but the
justification is minimal and
may rely more on intuition
than on data-based
exploration (EDA) or
established best practices. External resources are
cited sporadically, with
limited integration into the
discussion. The presented model: - is appropriate for the
problem context but lacks
explanation and
justification - uses only a few variables
to enhance predictive
performance, with
significant missed
opportunities or variables
not adequately leveraged - displays limited effort is
made to control for
variables not of direct
interest to the study - gives minimal attention
to overfitting, with little
validation or discussion
provided
A description of the model is
provided, though it may lack
depth or thoroughness. Decision making is
attempted to be supported
by evidence, but the
justification may be limited or
not fully grounded in
data-based exploration
(EDA) or established best
practices. External resources
are cited, but the integration
may be less seamless or
comprehensive. The presented model: - appropriate for the problem
context - generally plausible for the
problem context, but there
may be some gaps in
explanation or justification. - uses some but variables to
enhance predictive
performance, but there may
be missed opportunities or
variables not fully leveraged. - attempts to control for
variables not of direct
interest to the study, though
some could be more
rigorously addressed - attempts to not overfit to
the data, but lacks
thoroughness
A detailed description of the
model is provided. Decision
making is supported by
evidence, through data-based
exploration (EDA) or
reference to course materials
or external resources,
although some areas may
lack thorough justification. External resources are cited
appropriately. The presented model: - is appropriate for the
problem context - is plausible based on the
problem context, though
some aspects may require
further justification - uses of most variables to
enhance predictive
performance, though there
may be some missed
opportunities or oversights
(model selection) - controls for variables not of
direct interest to the study,
though some could be more
rigorously addressed - is shown to not overfit to the
data (validation)
A comprehensive and
detailed description of the
model is provided. Decision
making is thoroughly justified
through data based evidence
(EDA) or established best
practice provided by course
material or external
resources. External
resources are cited
appropriately. The presented model: - is appropriate for the
problem context - is plausible based on
problem context - makes full use of all
variables to maximise
predictive performance
(model selection) - controls for variables, which
are not of interest to the
study - is shown to not overfit to the
data (validation)Results
20%
The results section
inadequately presents
findings from the research,
with minimal to no discussion
of model outputs. The
interpretation demonstrates a
lack of understanding of the
implications within the
problem context, with little to
no attempt to relate findings
to the context.
The results section
presents basic findings
from the research, with
limited discussion of
model outputs. The
interpretation
demonstrates a
rudimentary
understanding of the
implications within the
problem context, but lacks
depth or thorough
exploration.
The results section presents
an analysis of the research
findings, accompanied by a
discussion of model outputs.
The interpretation
demonstrates a basic
understanding of the
implications within the
problem context, though
there may be limitations in
depth or clarity.
A detailed analysis of the
research findings is provided,
accompanied by a
substantive discussion of
model outputs. The
interpretation demonstrates a
solid understanding of the
implications within the
problem context, though there
may be areas where further
depth or clarity could enhance
the analysis.
A comprehensive and
insightful analysis of the
research findings is provided,
including a thorough
discussion of model outputs.
The interpretation shows a
good understanding of
implications within the
problem context.
Conclusion and
Limitations
10%
Fails to effectively summarise
the key findings of the study,
providing readers with a
vague or incomplete overview
of the research outcomes.
Limitations, if acknowledged,
are addressed inadequately
or may be entirely omitted,
demonstrating a lack of
awareness or understanding
of the study's constraints.
Suggestions for future
research directions are
absent or poorly articulated.
A rudimentary summary of
the key findings of the
study is provided, offering
readers a basic overview
of the research outcomes.
Limited acknowledgments
of limitations may be
included, indicating some
recognition of the study's
constraints, though they
may lack thorough
exploration. Suggestions
for addressing these
limitations in future
research, if present, are
brief and may lack
specificity.
A basic summary of the key
findings of the study is
provided, offering readers a
general overview of the
research outcomes. Some
acknowledgments of
limitations are included,
indicating a basic awareness
of the study's constraints.
Suggestions for addressing
these limitations in future
research are briefly
mentioned, but they may
lack depth or specificity.
An effective summary of the
key findings of the study is
presented, providing readers
with a clear overview of the
research outcomes.
Limitations are
acknowledged, indicating an
understanding of the study's
constraints and challenges.
Suggestions for addressing
these limitations in future
research are presented,
although they may lack
in-depth exploration or
specificity.
An effective summary of the
key findings of the study is
presented, providing readers
with a clear overview of the
research outcomes.
Limitations are
acknowledged, indicating an
understanding of the study's
constraints and challenges.
Suggestions for addressing
these limitations in future
research are provided.Report Presentation
10%
Formatting is unclear, illogical
or inconsistent.
The figures produced are of
sub standard quality.
Grammar and spelling pose
significant barriers to the
reader’s comprehension.
References do not follow a
consistent format e.g. APA
6th or 7th.
Formatting is mostly clear,
logical and consistent.
Writing presents a small
barrier to the reader’s
comprehension.
Visuals produced are of
poor visual quality.
References mostly follow
a consistent format e.g.
APA 6th or 7th.
Formatting is clear, logical
and consistent.
Writing contains some
grammatical or spelling
errors, but none that pose
any significant barrier to
reader comprehension.
Visuals produced are of
average visual quality.
References mostly follow a
consistent format e.g. APA
6th or 7th.
Formatting mostly follows
best practice of a research
paper i.e. uses LaTeX or
similar professional
typesetting.
Writing demonstrates
outstanding precision, clarity,
and concision.
Visuals produced are of good
visual quality and easy to
read.
References follow a
consistent format e.g. APA
6th or 7th.
Formatting follows best
practice of a research paper
i.e. uses LaTeX or similar
professional typesetting.
Writing demonstrates
outstanding precision, clarity,
and concision.
Visuals are of high quality
and increase the reader’s
understanding.
References follow a
consistent format e.g. APA
6th or 7th.
Notebook
10%
Features of the Jupyter
Notebook are not used
appropriately. The notebook
is incoherent. Code is hard to
read, poorly or inconsistently
formatted and no attempt has
been made at
documentation.,
Features of the Jupyter
Notebook are used mostly
appropriately. The
notebook is formatted
poorly. Code is reasonably
clear with inconsistencies
in places.
Features of the Jupyter
Notebook have been used
appropriately. The notebook
is formatted acceptably.
Code is clear with some
evidence of best practices.
Jupyter Notebook is
runnable without error.
Features of the Jupyter
Notebook have been used
appropriately. The notebook is
well laid out and formatted.
Code is clear, consistent and
follows the majority of best
practices.
Features of the Jupyter
Notebook have been used to
pleasing effect. The notebook
is well laid out and formatted.
Code is clear, consistent,
follows best practices,
descriptions and comments
are excellent.