辅导 program编程、Python语言程序讲解
Visual Analytics Coursework Specification
Spring 2024
1. Overview
This coursework aims to give you experience of the whole lifecycle of carrying out a full
visual analytics project.
Your goals are:
• To follow a sound visual analytics process
• To develop a visualisation that displays important features of a dataset
• To write a clear report on your findings.
The outputs from this work should be
1. a Tableau dashboard and associate worksheets (as a packaged workbook: see
https://help.tableau.com/current/pro/desktop/enus/save_savework_packagedworkbooks.htm
);
2. a written report with sections as defined below.
The submission deadline is 13:00 on Wednesday 22
nd
May through Blackboard: create a
single zip file containing all the files in your submission. This coursework is worth 80% of the
marks for the unit.
2. Task Details
The task you are asked to carry out for the coursework is to design, construct, and evaluate
an exploratory analysis of a complex dataset using both information visualisation and data
projection. This dataset should be based on census data for England and Wales. You should
design the visualisation to address some socio-economic issues that is important to you.
You must submit at least two data projections using different algorithms. I expect that
you will do this work in Python (following the methods you have practiced in the labs) and for
each projection, create a matrix with two columns representing the two variables the data is
projected onto. If you save this matrix in a file (e.g. CSV format) it can then be imported
easily into Tableau and used in your visualisations. I want to review the Python code used to
generate the projections, so please include it in your submission. The purpose of data
projection is to show the data structure: clusters, outliers, and relationships between different
labels.
You may use data taken from the 2011 census in England and Wales which is indexed by
the Excel file 2011CensusIndexofTablesandTopics_v11_4_2.xlsx The tab labelled ‘All
Tables’ provides a list of tables and links to the underlying data. (I have found that the Excel
file links are valid, the NESS links don’t work as the server can’t be found, and the links to
NOMIS take you to a website where additional data can be downloaded.) You may find
Tableau’s Data Interpreter useful, and you may also need to edit some files to create usable
datasets.
There are more than 1600 tables in total: clearly, this is far too many to create an interesting
report. You should focus on a limited number of tables (probably around three or four) that
allow you to explore a particular aspect of socio-economic life in England and Wales: for
example, health and links to nationality or occupation.
A new census was carried out in 2021 (during the pandemic). Some of the results have been
released by the Office for National Statistics, but so far these have only been in certain
topics. A link to the topics that have been released can be found here
https://census.gov.uk/census-2021-results/phase-one-topic-summaries You should find that
you can click through on a topic to a map display https://www.ons.gov.uk/census/maps and
from here select a topic such as ‘Housing’. Selecting a variable changes the map and also provides a link to download the data for that variable. Perhaps simpler is to visit the bulk
downloads page https://www.nomisweb.co.uk/sources/census_2021_bulk
You need to use both data, the 2011 data and the 2021 data for at least one of your
visualisations.
Something to note: Some geographic definitions don’t necessarily match between the two
census dates. This site will help you manage this
https://www.ons.gov.uk/releases/censusmapsupdatechangeovertime
Your report should contain the following sections:
• Abstract. A brief description of the key points in the report.
• Introduction. The background of the problem.
• Data Preparation and Abstraction. Describe the data manipulation necessary to create
a dataset for analysis and the principal data types and semantics that you have
analysed.
• Task Definition. A description of the tasks using Munzner’s task taxonomy for which you
have created the visualisations.
• Visualisation Justification. Define the visualization techniques you use and justify your
choices. You should refer to the principles of info vis, relevant aspects of human
perception and cognition, and the scientific literature where appropriate. You should also
explain why you have chosen the data projection methods that you have used. This
justification and explanation is a very important assessment criterion, so do not skimp on
this and make sure that it is grounded in the theoretical concepts we have covered
during the course.
• Evaluation. Using appropriate levels and types of validation (as in Chapter 4 of
Munzner), assess the quality of your visualization by making appropriate measurements
and observations of the other students in your discussion group in an analytic task using
your visualisation. (The list of discussion groups is also available on Blackboard).
• Conclusion. I expect you to address two aspects.
• What you have learned about the socio-economic problem that was the basis of the
visualization.
• What you have learned about information visualisation from doing the coursework.
I am expecting the report to be about six to ten pages in length. This is an expectation, not a
strict limit, so there will be no penalty for exceeding it. But if you find yourself writing much
more than this, you are almost certainly providing too much detail. In particular, note that I
will see the visualisation you generate, so there should be little or no need for screenshots.
I use the term 'dashboard' in the Tableau sense of a set of visualisations on a single screen.
It is permissible to submit more than one Tableau dashboard or workbook if that supports
the task better. Do not feel you have to squeeze everything onto a single dashboard. You
may remember the system for visualising American census data that had every possible
graph interacting in lots of ways. It was just too crowded and complex to be useful.
Geocoding issues
It can be hard to plot the census data in Tableau because it does not contain outcode
information. This blog contains some geocoding packages and a video on how to use them
that support geographic information at many different levels of granularity. It should be
helpful for you.
You may have some problems with using geocoding packages, in which case this link to
Tableau help should be useful.
https://kb.tableau.com/articles/issue/error-the-custom-geocoding-folder-has-errors-whencreating-map
I have also provided a short guidance note written by Joshua Ramini on the Blackboard site.
3. Assessment
The assessment criteria are:
• Problem understanding: how well you have explained the goals of the tasks, taking
account of end-user requirements. (10 marks)
• Data preparation and task analysis: care taken over extracting and manipulating the
data; insights gained through the task analysis. (15 marks)
• Data visualisation: appropriateness of visualization and modelling approaches;
systematic use of statistical and visualisation methods; justification of visualization
approach used. (50 marks)
• Conclusions: what the user should learn from your analysis and what you have learned
about large-scale data visualisation. (15 marks)
• Presentation: fluency and coherence of the written text; quality of images and graphics
used. (10 marks)
Below are some general points that will help you when working on this coursework:
• Ensure that questions you set out to ask are answered by the visualisation and in the
report.
• Having the option of switching between absolute values and proportions is often a useful
feature. This is particularly helpful when comparing areas with different populations.
• When using dimensionality reduction it is important to communicate to the user which
variables were used in the original data space as otherwise, it is hard to interpret the
plots.
• Tooltips should identify the corresponding point (e.g. a location), particularly for projected
data.
• The introduction should contain some discussion of the type of user the visualization is
intended for.
• The report should note data anomalies (e.g. missing values) in the report, in particular,
quantifying the number of missing values, etc.
• The abstract should describe the main findings of the work.
• Data cleaning matters.
• The use of section and page numbers helps the reader to navigate the report.
• References to secondary literature are valuable tools to provide context.