MULTIVARIATE ANALYSIS
Assignment 1 – Due 12(noon) Friday 2nd November 2018
Late work will be subject to University policy.
This assignment is worth 10% of the overall module mark for ST3MVA/ST4MVA.
You must create one Word document or PDF file of your work, and include your
student number(s) at the top of the first page, but not your name (to preserve
anonymity).
Student numbers
You MUST include your student number(s) within the title/subtitle of all graphs
that you include in your submission (else a mark of 0 will be given for the graph).
For example, if you are working in a group of two, with student numbers
12345678 and 87654321, then you add ‘12345678 and 87654321’ to the main or
subtitle of your plot.
Working as a group
Where the work is completed in a pre-approved group of two, each member of
the group will receive an identical mark for the assignment. It is up to you as a
group to ensure that you work together fairly and equally. Only groups that have
been declared (in class or by email) before Friday 26th October are allowed –
otherwise you have chosen to complete the work on your own.
Assessment criteria
You will be assessed on the correct and full application (in R, or R Studio) of the
multivariate methods that you use, and correct and full interpretation of all
results.
Department of
Mathematics and Statistics
Unit name goes here
Page 2
The data
The dataset ‘Glass – Students.csv’ available to download from Blackboard
contains attribute information regarding the composition of a number of
samples of glass. The variables in the dataset are:
• ID – a sample identifier
• RI – refractive index
• Na – Sodium
• Mg – Magnesium
• Al – Aluminium
• Si – Silicon
• K – Potassium
• Ca – Calcium
• Ba – Barium
• Fe - Iron
The measurement units for the variables Na to Fe is the weight percent in
corresponding oxide.
Your task
1)
Conduct a principal component analysis on the data. You should perform. a full
analysis, in your write up you should state the aim/s of your analysis, alongside
interpretations of your results, justifications of any analytical decisions you make,
and your conclusions.
[58 marks]
2)
You have now been informed that the samples of glass within the dataset are
from different types of glass, for example from building windows, headlamps,
tableware etc. What would be the aim of performing cluster analysis on this
dataset?
[2 marks]
Part of cluster analysis involves calculating the distance between two units.
Using the first two observations from the dataset, illustrate how to calculate the
Euclidean and Manhattan distance between two samples of glass (show full
workings).
[5 marks]
The dendrogram overleaf is the result of a complete linkage cluster analysis of
the data using Euclidean distances. Briefly interpret this dendrogram. Where
possible link your interpretations to your results from part 1), do the two analyses
agree?
[9 marks]
Finally, produce a series of glyphs for the data using R (or R Studio), you can
choose your preferred glyphs as there are lots of options available. Highlight at
least two examples of where your glyphs agree with the conclusions you have
drawn from the dendrogram.
[6 marks]