MULTIVARIATE ANALYSIS 
Assignment 1 – Due 12(noon) Friday 2nd November 2018 
Late work will be subject to University policy.  
This assignment is worth 10% of the overall module mark for ST3MVA/ST4MVA. 
You must create one Word document or PDF file of your work, and include your 
student number(s) at the top of the first page, but not your name (to preserve 
anonymity).  
Student numbers 
You MUST include your student number(s) within the title/subtitle of all graphs 
that you include in your submission (else a mark of 0 will be given for the graph). 
For example, if you are working in a group of two, with student numbers 
12345678 and 87654321, then you add ‘12345678 and 87654321’ to the main or 
subtitle of your plot.  
Working as a group 
Where the work is completed in a pre-approved group of two, each member of 
the group will receive an identical mark for the assignment. It is up to you as a 
group to ensure that you work together fairly and equally. Only groups that have 
been declared (in class or by email) before Friday 26th October are allowed – 
otherwise you have chosen to complete the work on your own.  
Assessment criteria  
You will be assessed on the correct and full application (in R, or R Studio) of the 
multivariate methods that you use, and correct and full interpretation of all 
results.  
Department of  
Mathematics and Statistics 
Unit name goes here 
Page 2 
The data 
The dataset ‘Glass – Students.csv’ available to download from Blackboard 
contains attribute information regarding the composition of a number of 
samples of glass.  The variables in the dataset are: 
• ID – a sample identifier 
• RI – refractive index 
• Na – Sodium 
• Mg – Magnesium 
• Al – Aluminium 
• Si – Silicon 
• K – Potassium 
• Ca – Calcium 
• Ba – Barium 
• Fe - Iron 
The measurement units for the variables Na to Fe is the weight percent in 
corresponding oxide. 
Your task 
1) 
Conduct a principal component analysis on the data.  You should perform. a full 
analysis, in your write up you should state the aim/s of your analysis, alongside 
interpretations of your results, justifications of any analytical decisions you make, 
and your conclusions.  
[58 marks] 
2) 
You have now been informed that the samples of glass within the dataset are 
from different types of glass, for example from building windows, headlamps, 
tableware etc.  What would be the aim of performing cluster analysis on this 
dataset?   
[2 marks] 
Part of cluster analysis involves calculating the distance between two units.  
Using the first two observations from the dataset, illustrate how to calculate the 
Euclidean and Manhattan distance between two samples of glass (show full 
workings). 
[5 marks] 
The dendrogram overleaf is the result of a complete linkage cluster analysis of 
the data using Euclidean distances.  Briefly interpret this dendrogram.  Where 
possible link your interpretations to your results from part 1), do the two analyses 
agree? 
[9 marks] 
Finally, produce a series of glyphs for the data using R (or R Studio), you can 
choose your preferred glyphs as there are lots of options available.  Highlight at 
least two examples of where your glyphs agree with the conclusions you have 
drawn from the dendrogram.  
[6 marks]