首页 > > 详细

讲解 CSMAD Applied Data Science with Python讲解 Python编程

Department of Computer Science

Summative Coursework Set FrontPage

Module Title

Applied Data Science with Python

Module Code

CSMAD

Type of Assignment

(e.g., technical report, set exercise, in-class test)

Set exercise 2 of 2

Individual or Group Assignment

Individual

Weighting of the Assignment

50%

Word count/page limit

Approximately 1,500 words, excluding code, code comments, captions and

tables.

Expected hrs spent for the assignment (set by lecturer)

20

Items to be submitted

A single .zip archive, containing:

1. All final project code.

2. One fully executed Jupyter

Notebook file (.ipynb), displaying code, figures, and explanations (as Markdown)

3. One HTML file (.html), exported from above Jupyter Notebook.

Work to be submitted on-line via Blackboard Learn by

Monday, 27 January 2025, 12:00 noon

Work will be marked and returned by

Friday, 14 February 2025

1. Assessment classifications

This coursework assesses your ability to:

•    acquire   and   be   able   to   apply  statistical,   programming,  and   machine   learning techniques in Python for data science tasks;

•    evaluate, select and use state-of-the-art Python tools and platforms for solving data science problems;

•    design, implement, and execute solutions in Python for data science problems; and

•    evaluate   data   science   solutions   in   Python,   including   their   outcomes,  efficacy, constraints, and uncertainty.

You will gain credit for:

•    preparing and submitting required files as requested;

•    successful implementation of the requested coding tasks;

•    writing efficient, functional code;

•    providing thoughtful, clear, well-structured written analysis.

Your assignment will be marked according to the marking schemes provided below. The schemes are designed so that the collectively weighted assignment  mark will correspond to the following qualitative  master’s  degree classification descriptions. The table below describes what is typically expected of the work to obtain a given mark.

Classification Range

Typically, the work should meet these requirements

Distinction (>=70%)

Outstanding/excellent  work with  correct codes and results. An outstanding work should demonstrate coding proficiency with high efficiency  and  based  on  advanced  techniques.  Written analyses demonstrate  exceptional   understanding  and application  of the related concepts and techniques, with focused attention to details of the  results.   The  work exhibits originality and  includes critical analysis.

Merit (60-69%)

Good work with mostly correct results: most work has been carried out  correctly.  Some tasks  have  not  been  carried  out  or  are  not completely correct.     Coding with average efficiency.     Written analyses show a strong  understanding of the subject, with clear application of sensibly chosen concepts and techniques.  The work includes some critical evaluation and broad generalization of the results.

Pass (50-59%)

Achievement of the minimum requirements. Some significant part of the assignment is missing and/or has  partially correct  results. Coding   lacks    efficiency.       Written   analyses    meet    the   basic requirements,    demonstrating     adequate     understanding     and applications of key concepts, but the work may lack depth, contain technical errors, omit specific discussion of the results, or include improperly selected techniques.

Fail (<50%)

Incomplete solutions to limited part of the assignment. Most tasks have not been carried out with sufficient accuracy.  Results may not be  correct  or  technically  sound.    Coding  is inefficient.    Written analyses do not meet the required standards, demonstrating insufficient understanding. Work ignores consideration of specific results and is missing key components.

2. Assignment description

Data Description

The data for this coursework are available in a single CSMAD_CW2_data.zip file on the CSMAD Blackboard space, under the Assessment heading, Coursework 2 of 2.  You MUST use this version of the datasets.  The archive is organized as shown:

CSMAD_CW2_data.zip

└── data/

├── traffic/

│ ├── DailyStandard_Report_1_19078_01_01_2021 … .csv

│ ├── ... intermediate files ...

│ ├── DailyStandard_Report_1_19124_01_01_2024 … .csv

│ └── TRIS+-+User+Guide+r3.pdf

└── weather/

├── 03761099999_2021.csv

├── 03761099999_2022.csv

├── 03761099999_2023.csv

├── 03761099999_2024.csv

├── CSV_HELP.pdf

└── isd-format-document.pdf

Vehicle Traffic Data: traffic/

The traffic directory contains eight CSV files. These Daily Standard Reports describe the flow of traffic past two Motorway Incident Detection and Automated Signalling (MIDAS) observing stations on the M4, south of Reading.  Most column headers are self-explanatory.  To clarify other column headers, those with cm units record the count of vehicles of a size (length) within the stated range, those with mph ranges are  missing, Avg mph is  the  recorded  average  speed  of  vehicles,  and Total Volume is the count of all vehicles during the preceding 15 minutes.

The TRIS+-+User+Guide+r3.pdf file describes datasets from which the CSV data is sourced.  While this is included largely for informational purposes, you will need to reference the definition of Day Type for some portions of this coursework.

Source:https://webtris.highwaysengland.co.uk/

Weather Data: weather/

The weather directory contains four CSV files.  These describe common weather observations at a location near those of the above MIDAS observing stations.  The definition of the data in these files is NOT made obvious by the column headers. Instead, elements of this coursework will require that you use the CSV_HELP.pdf and isd-format-document.pdf files to understand and decode the existing data representations.  Make special note that you will need to choose between FM-12 and FM-15 report types.

Source:

https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.n cdc:C00532/html

Coursework Task: Analysing and Predicting Traffic Flow Using Regression and Time Series Models

In this coursework, students will develop a data science project aimed at predicting traffic flow using historical traffic and weather data spanning multiple years with high- frequency  observations.    The  coursework  emphasizes  the  application  of  multiple regression  formats  and  time  series  models  to  forecast  traffic  patterns.    You  will process  encoded  weather  data,  select  appropriate  predictors,  and  handle  data preparation and analysis through custom Python modules.   This will enable you to critically interpret results and develop a deeper understanding of analytical tools and Python programming concepts.

Note : The models are NOT required to exhibit excellent performance, but they need to be sensibly constructed and evaluated.

Key Objectives:

1.   Decode  and  pre-process  (e.g.,  set  regular  time  intervals,  handle  missing

values, identify trends, etc.) the weather and traffic datasets provided.  You will likely not need all of the weather data.

2.   Briefly explore the data to demonstrate understanding of its contents (e.g., statistical reporting and a few visualisations, maximum 5).

3.   Select  relevant  predictors for modelling, justifying your choices based on data  exploration  and  domain   knowledge.     Hint:  this  will   likely  involve feature engineering (e.g., to represent cyclical characteristics).

4.   Apply  regression and time series models of multiple types and designs to predict the sum of the number of vehicles passing the two sites combined; these model types are limited to those described in the module.   Design sensible testing targets (e.g., a chosen time horizon, data splits).

5.   Thoroughly  evaluate  the   performance  of  the  models  with  appropriate metrics.  Note features of particular importance in prediction.

6.   Compare the  above  models to additional implementations that include a representation of the Day Type to explore the effect of its inclusion.

7.   Present findings with textual, tabular, and visual code outputs.

8.   Explain    and   justify   the   chosen    methodologies   and    results   through markdown annotations in aJupyter Notebook, ensuring you tell a coherent story about the process of your analysis.

Technical Code Requirements:

1.   The submitted notebook will have minimal amounts of code necessary to execute the analysis.

2.   Analysis  code  will  reside  in  self-designed  external modules or  package directory to promote exploration of code reusability and modularity.  Only the code necessary to operate the module code and display results will be shown in the notebook.

a.   In terms of code, an exemplary submission might import your coded module(s)  and  execute   individual  functions to   load,  clean,  and prepare    the data,    display    statistics     and    explanatory    data visualisations, engineer features, select  predictors,  specify  model parameters, train models, and display model performance results.

3.   Modules,   classes,   methods,   and   functions  will   include  complete   and explanatory docstrings.  The existence of these should be demonstrated at least once within the notebook via use of the help() function.

4.   Code  should   be  commented  to   enhance  code   readability  and  explain complex logic or important steps.

5.   Code  should  contain  at least 3  instances of formal error handling; many specific applications are possible and acceptable (e.g., try-except blocks for data loading or model fitting errors).

Written Requirements :

1.   Organise the notebook into clearly defined and logically ordered sections.

2.   Provide  any  necessary  instructions  for  setting  up  the  environment  and dependencies, possibly including a basic requirements file.

3.   Provide  a  description  of your external  module/package organisation  and how their elements relate to the larger analysis.

4.   Explain  where  and  why  formal  error  handling  has  been  incorporated  to enhance your code’s robustness.

5.   Before execution of code in a notebook cell, include a statement describing its  purpose  and  intent.    This  may  also  include  justifications  for  chosen methodologies, where applicable.

6.   After execution of code in a notebook cell, describe its outputs and discuss their implications for your analysis.

7.   Following any series of related actions (e.g., execution of multiple models), provide a critical comparison of their features, strengths, and weaknesses.

8.   Conclude the  notebook with  a  recap of the key insights gained from the analyses, highlighting the effectiveness of different modelling approaches, acknowledging   any   data   or    methodological   limitations,   and   suggest potential future improvements.



联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!