ELEC2103/9103: Simulation and Numerical Solutions in
Engineering
School of Electrical and Information Engineering, The University of Sydney
Assignment Description
Modelling, predicting, and verifying the accuracy of models are vital skills in engineering and other fields.
This assignment will assess your ability to develop and validate statistical and machine learning models using
MATLAB, and in particular, the Statistics and Machine Learning Toolbox.
1 Key information
For this assignment, you need to complete two steps:
1. Complete the MATLAB Machine Learning Onramp and upload your certificate to the assignment
box. This is an individual assignment and each person needs to complete the training individually. This
is worth 5% of your total mark1
.
2. Perform statistical analysis and machine learning on the given dataset, write a report and submit it to
the assignment box. This is group work and you will work with the same group member as your lab to
complete it. You only need to submit one report as part of the group. This is worth 20% of your total
mark.
2 Background
This assignment asks you to explore and analyse a publicly available data set of your choice from the following
list.
1. Ausgrid distribution zone substation data: Ausgrid operates a network with over 180 zone substations.
These substations form the boundary between the sub-transmission network and the distribution (11kV)
network. Ausgrid is making available historical interval demand data (in Megawatts) for all zone
substations not subject to third party privacy concerns.
The dataset is available here:
https://www.ausgrid.com.au/Industry/Our-Research/Data-to-share/Distribution-zone-substation-data
2. World Bank Education Statistics: The World Bank EdStats Query holds around 2,500 internationally
comparable education indicators for access, progression, completion, literacy, teachers, population, and
expenditures. The indicators cover the education cycle from pre-primary to tertiary education. The
query also holds learning outcome data from international learning assessments (PISA, TIMSS, etc.),
equity data from household surveys, and projection data to 2050.
The dataset is available here:
https://databank.worldbank.org/source/education-statistics-%5e-all-indicators
1Please follow this link to access the MATLAB Machine Learning Onramp course: https://matlabacademy.mathworks.com/
details/machine-learning-onramp/machinelearning
1
Assignment Description ELEC2103/ELEC9103
3. World Bank Health Nutrition and Population Statistics: World Bank key health, nutrition and
population statistics gathered from a variety of international sources.
The dataset is available here:
https://databank.worldbank.org/source/health-nutrition-and-population-statistics
4. Australian Bureau of Statistics: Causes of Death, Australia Statistics on the number of deaths, by
sex, selected age groups, and cause of death classified to the International Classification of Diseases
(ICD).
The data set is available here:
https://www.abs.gov.au/statistics/health/causes-death/causes-death-australia/2020#data-download
5. Kaggle: Retail Analysis with Walmart Sales Data Historical sales data for 45 Walmart stores
located in different regions are available. There are certain events and holidays which impact sales
on each day. The business is facing a challenge due to unforeseen demands and runs out of stock
some times, due to inappropriate machine learning algorithm. Walmart would like to predict the
sales and demand accurately. An ideal ML algorithm will predict demand accurately and ingest
factors like economic conditions including CPI, Unemployment Index, etc.
The dataset is available here:
https://www.kaggle.com/rutuspatel/retail-analysis-with-walmart-sales-data
3 The assignment task
You are to explore and analyse some or all of the data files in one of the datasets above. You are to complete
your analysis using MATLAB, and present your analysis as a report contained in a script and other files that
can be published to a report in html using MATLAB’s Publish features. You are encouraged to share ideas,
but your submitted assignment must be uniquely your own.
3.1 The data
The data is mostly contained in csv files and might be separated for each financial year or month. You need to
fully understand the attributes in each dataset and be able to explain them in your report. You can also draw
on other data sources to inform your analysis (see the section on higher grades below). If you have something
particular in mind, I can advise you of whether it is freely available and where to find it, but the Australian
Bureau of Statistics (ABS) or the Bureau of Meteorology (BOM) are good places to consider.
3.2 Submission requirements
Your assignment will be submitted via Canvas in the form of a .zip file named in the following format:
Group_Group Number.zip.
Your .zip file must contain:
1. Your main file, called called elec2103a.m (regardless of if you are undergrad or postgrad). You are
provided with a MATLAB script stub to get you started, which is available on Canvas.
2. Any custom functions that you write.
3. The data that is needed to complete your analysis.
4. A PDF file including your answer to Part 1, the published version of your main file (Part 2), and your
answers to Part 3.
4 Assignment criteria and grades
The assignment will be given a grade out of 20. Marks will be allocated in three parts, as follows:
Page 2
Assignment Description ELEC2103/ELEC9103
4.1 Part 1 (Total: 4 marks)
Here, you need to clearly explain the dataset and the problem that you want to solve.
1. Problem Statement and Background (2 marks): A high-level statement of the problem you
intend to address/business case study. Give a clear and complete statement of the problem.
2. Resources (2 marks): Where do the data come from, and what are their characteristics?
(a) The data source(s), and
(b) characteristics of the data you intend to use (eg. attributes, data types, etc.)
Marks for part 1: Completing this part reasonably well will earn you 4 marks. Here, “reasonably”
means more than copying and pasting information already available on the websites you are downloading
the dataset from. You need to provide detailed evidence that you have understood the dataset you
are working on and the problem you are going to address.
4.2 Part 2 (Total: 6 marks)
The minimum requirements of this assignment are to:
1. Write a sub-routine to load some or all of the data (from one of the datasets) into a useable format in
MATLAB.
2. Write a sub-routine to analyse the data in MATLAB by modelling/fitting it, using regression, classification,
ANOVA or other machine learning methods. You may wish to pre-process the data in order to extract
some interesting values or variables of merit. Briefly explain your model.
3. Write a sub-routine that makes some assessment of the statistical errors or goodness-of-fit of your model
and returns or prints them in your report. Explain these figures.
4. Make appropriate use of plots and/or charts in your report.
5. In your main script, include a call to at least one custom function that you have written in a separate
m-file.
6. Put your analysis in a publishable MATLAB script that runs without errors. Build on the provided m-file
stub elec2103a.m.
Marks for part 2: Satisfying each of the minimum requirements 1-6 above will earn you 1 mark each.
“Satisfying” means more than joining two points at different times with a straight line, and will
be satisfied if you make proper use of a tool in the Statistics and Machine Learning Toolbox. In
other words, you need to provide evidence that you have learnt how to use some new MATLAB tools.
4.3 Part 3 (Total: 10 marks)
To earn higher grades, you need to complete Part 2 extremely well (4 marks), and
include one advanced form of statistical analysis and/or prediction, performed on the same data
set, with justification for your choice (6 marks).
You may consider one of the following advanced sub-routines:
1. Make a prediction using your model, perhaps into the future (for time series data) or across a new subset.
Discuss your prediction, including making an assessment of the reliability of the prediction.
2. Complete a formal statistical comparison of more than one model or method of analysis.
3. Make use of advanced statistical analysis testing the assumptions of your modelling choice, such as tests
of heteroscedasticity, multicollinearity, etc.
Page 3
Assignment Description ELEC2103/ELEC9103
4. Bootstrapping, jackknifing, k-folds or some other resampling-based validation of the predictive ability of
your model.
5. Sophisticated use of more than one data set (i.e. incorporating additional data beyond the dataset you
chose in your Part 2 analysis).
6. Use of an advanced statistical estimation or machine learning technique, with justification. This could
include:
(a) Using MATLAB’s neural network tools (which doesn’t take much effort)
(b) Estimating a stochastic volatility or hidden Markov model;
(c) Using Bayesian models;
(d) Advanced clustering and/or hierarchical analysis;
(e) If you have an interest in signal processing, you could investigate non-parametric kernel estimators
(akin to kernel smoothing techniques), principle component analysis, or apply a series of bandpass
filters over a time series and see what you get.
Page 4
Marks for part 3: You will need to include detailed discussions and justifications to obtain full marks.
4.4 Assignment length
There are no minimum or maximum lengths to the submission, but treat this like you are trying to
convince a busy person that you have something important to say. Being terse and direct is not a bad thing in
engineering and business communication.
4.5 Late submission penalties
Late assignments will be penalised by deducting 2 marks and by reducing the maximum grade achievable by 2 marks for
each 24 hours overdue, including weekends. Don’t be late!
5 Useful Resources
Tutorials and resources available online to learn how to use the MATLAB Machine Learning toolbox.
1. Introducing Machine Learning
https://www.mathworks.com/content/dam/mathworks/ebook/gated/machine-Learning-ebook.pdf
2. MATLAB for Machine Learning
https://au.mathworks.com/solutions/machine-learning.html
3. Mastering Machine Learning: A Step-by-Step Guide with MATLAB,
https://au.mathworks.com/content/dam/mathworks/ebook/gated/machine-learning-workflow-ebook.
pdf
4. Applied Machine Learning
https://au.mathworks.com/videos/series/applied-machine-learning.html
5. What Is Deep Learning? 3 things you need to know
https://au.mathworks.com/discovery/deep-learning.html
6. Predictive Analytics: 3 Things You Need to Know
https://au.mathworks.com/discovery/predictive-analytics.html