COMP5310 Project Stage 2A
Summarise and Analyse the Data
Due: 11:59pm on 6th of April 2023 (Week 7)
Value: 10% of the unit This stage is usually done with the same group members as you worked with for Stage 1. However, if someone is currently in a group that is not in their timetabled lab, they will need to move groups to one in their timetabled lab. If this applies to you, please urgently email Nazanin.borhan@sydney.edu.au to arrange moving to a different group.
DISPUTE RESOLUTION If, during the course of the assignment work, there is a dispute among group members that
need to inform the unit coordinator, Nazanin.borhan@sydney.edu.au. Make sure that your email includes your group number and tutorial session, and is explicit about the difficulty. Also, make sure this email is copied to your tutor and all the members of the group (including anyone you are complaining about). We need to know about problems in time to help fix them and deal with non- mance until a few days before the work is due to complain that someone is not delivering on their tasks). If necessary, the unit
in a group by themselves (they will need to achieve all the outcomes on their own). This option is only available up until Monday March 27th, which is the last day with time to resolve the issue before the due date. For any group issues that arise after this time, you will need to try to resolve the problem on your own, and you will continue to be treated as a single group. If
the material required for the report, or their material is not of the agreed standard, you should still have the report show what that person did. Their section enough. In such case describes the circumstances. That way, we can consider how best to apply the marking scheme. Note that it is not expected or sensible for other members to do the work that someone failed to deliver.
TASKS There are TWO individual tasks and ONE group task. The tasks should be addressed in a report, identifying which group member answered which sub-task.
INDIVIDUAL TASKS: 1. [4 marks] Each group member should answer ONE of these two sub-tasks using a
different statistical technique. At least one person from the group must answer each sub-task, but more than one person can answer the same sub-task using a different statistical technique: a. Identify a statistical technique that might be appropriate for summarisation and analysis of your dataset. For that technique:
o Name and describe the technique.
o Outline the assumptions that are required for the technique to be valid.
o Describe to what extent the assumptions are true for your dataset.
Page 2 of 4
o Justify your choice of technique in the context of the business question. b. Identify a statistical technique that is clearly not appropriate for summarisation and analysis of your dataset. For that technique:
o Name and describe the technique.
o Outline the assumptions that are required for the technique to be valid.
o Describe what assumptions are violated in your dataset.
o Justify why this technique is not appropriate for your dataset.
o Propose whether the data can be transformed in a way that makes the assumptions true and justify whether this is appropriate or not in the context of your business question.
NOTE: When justifying your conclusions, consider for example whether the technique
requires too many assumptions that are only partially true, or might make your
conclusions too unreliable to apply in your business context. Also consider the cost of
making a Type I error, and the cost of a Type II error in your business context. 2. [2 marks] Each individual should create one chart that visualises some aspect of the dataset that informs your understanding of the data and research question. Describe what conclusions you draw from the chart, and what questions it raises that you could answer in Stage 2B.
GROUP TASK: 1. [4 marks] Answer the following questions as a group: a. Describe any exploratory analysis you have undertaken to refine your understanding of the data and research question, the strengths and limitations of the exploratory analysis you undertook compared to at least one alternative, and justification for the analysis you undertook. b. Propose an approach (a particular classifier model, hypothesis test, etc) that you might take to solving your research question in Stage 2B, and any limitations or strengths of the approach compared to at least one other approach and justify your choice of approach. c. Outline, at a high level, how you will validate the approach, the strengths and limitations of the validation techniques you chose compared to at least one alternative method and justify your choice of validation techniques.
WHAT TO SUBMIT There are TWO deliverables in this stage of the project, and both should be submitted by
ONE PERSON on behalf of the whole group. 1. A written report on your work, as a PDF document. There is a maximum length for the report of 1500 words for groups of 2 and 2000 words for groups of 3. The report should have a front page, that gives the group name and lists the members involved (giving their SID and unikey, not their name), and then the body of the report should include a section for each group member (the section should state the SID/unikey of the group member who did the work reported in this section), answering the questions from the sub-task they selected, and finally a section where the group provides the answers to the group questions. 2. The code and dataset that you used to produce the analysis and charts in your report.
Page 3 of 4
This should be submitted as a single zip or tar.gz file which contains a subfolder for each group member.
MARKING
The submitted code and data may be considered as evidence to check or clarify statements made in the report.
Note: you will not be penalized in marks if you explore a reasonable question about the domain,
by looking at appropriate relationships between some aspects, and then conclude that there is
no clear relationship revealed.
Individual Task 1:
[Flawed]: States the name of the technique and answers, with valid justifications, one bullet point in their sub-task.
[Pass]: States the name of the technique and answers, with valid justifications, two bullet points in their sub-task.
[Distinction]: States the name of the technique and answers, with valid justifications, three bullet points in their sub-task.
[Full marks]: States the name of the technique and answers, with valid justifications, all four of the bullet points in their sub-task.
Individual Task 2:
[Flawed]: A chart of some data attribute.
[Pass]: A chart of some data attribute, correctly documented encoding between data attributes and visual attributes in each chart.
[Distinction]: A chart of some data attribute, and correctly documented encoding and other decisions (such as style of chart, scale etc), and sensible justification of the choice of encoding in view of the effectiveness of different visual attributes.
[Full marks]: A chart of some data attribute, and correctly documented encoding and other decisions (such as style of chart, scale, etc), and sensible justification of the choice of encoding in view of the effectiveness of different visual attributes, as well as sensible conclusions from the chart/statement of the questions it raises for Project Stage 2B.
Group Task:
[Flawed]: An answer to ALL the group questions.
[Pass]: A well-reasoned answer to ALL the group questions, including a discussion of strengths and limitations.
[Distinction]: A well-reasoned answer to ALL the group questions, including a discussion of strengths and limitations in comparison to an alternative for each question respectively.
[Full marks]: A well-reasoned answer to ALL the group questions, including a discussion of strengths and limitations in comparison to an alternative, and a justification of your choice
Page 4 of 4
for each question respectively.
Penalties 10% of the overall mark will be deducted if your report is unnecessarily longwinded and does not address the marking criteria within the word limits.
Late Work As announced in the unit outline, late work (without approved special consideration or other arrangements) suffers a penalty of 5% of the maximum marks, for each calendar day after the due date. No late work will be accepted more than 10 calendar days after the due date