首页 > > 详细

data留学生编程辅导、讲解Java,Python程序设计、c++编程语言调试讲解数据库SQL|解析Java程序

2020/11/7 Quiz: Practice Exam Quiz
Practice Exam Quiz
Started: Nov 7 at 17:25
Quiz Instructions
Academic Integrity Declaration
By commencing and/or submitting this assessment I agree that I have read and understood the
University’s policy on academic integrity. (https://academicintegrity.unimelb.edu.au/)
I also agree that:
1. Unless paragraph 2 applies, the work I submit will be original and solely my own work (cheating);
2. I will not seek or receive any assistance from any other person (collusion) except where the work
is for a designated collaborative task, in which case the individual contributions will be indicated;
and,
3. I will not use any sources without proper acknowledgment or referencing (plagiarism).
4. Where the work I submit is a computer program or code, I will ensure that:
1. any code I have copied is clearly noted by identifying the source of that code at the start of the
program or in a header file or, that comments inline identify the start and end of the copied
code; and
2. any modifications to code sourced from elsewhere will be commented upon to show the nature
of the modification.
This exam begins at 12.00 PM Australian Eastern Standard Time (AEST) on Tuesday 23/06/2020 in
Canvas (lms.unimelb.edu.au). The exam must be completed by 2.15 PM AEST on Tuesday
23/06/2020. This exam has 15 minutes of reading time, and 120 minutes of writing time.
Answer all questions
Question 1 1 pts
Assuming each record is allocated to exactly one block and that all blocks are equally
sized, a blocking method that produces more blocks will have a higher reduction ratio.
For any blocking function, blocking reduces the original complexity of O(n^2) for pairwise
comparison to a linear complexity
The Pair Completeness score is likely to decrease if the sizes of all blocks are large.
Select all that are correct statements in the context of data linkage.
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 2/12
Question 2 4 pts
p 0 words
Consider the following XML file:



https://handbook.unimelb.edu.au/subjects/comp20008

Elements of Data Processing

1

(a) Modify the XML so that it is well formed.
(b) Explain why the data format is said to be semi-structured.

2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 3/12
Question 3 4 pts
p 0 words
Consider the following temperature data from various weather stations in Victoria:
16, 12, 15, 18, 13, 43, 10
The values are comma separated.
(a) Will the 43 value be classified as an outlier on the Tukey plot? Demonstrate
how you arrive at the conclusion.
(b) Suggest an imputation method for the data and justify your choice.

Question 4 2 pts
Consider the following two plots:
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 4/12
Plot (1) is a VAT plot
Plot (2) is a scatter plot of the first 2 Principal Components of the data.
plot (1)
plot (2)
The data scientist states that the two plots are created from the same dataset. Do
you believe the statement? Justify your answer.
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 5/12
p 0 words
Question 5 3 pts
p 0 words
Consider a dataset with 10000 rows and 500 features. Give three reasons why we
might want to apply PCA while analysing the dataset.

Question 6 8 pts
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 6/12
p 0 words
a) Explain with examples what supervised and unsupervised learning. is and what
the key differences are. 4 points
b) Assume you need to build a model from medical data that predicts if a patient
suffers from a particular illness or not. How would you decide whether to use
supervised or unsupervised learning? 4 points

Question 7 4 pts
Assume you use k-nn clustering on a data set. Describe a method for choosing the
best value for k?
Edit View Insert Format Tools Table
12pt Paragraph
Edit View Insert Format Tools Table
12pt Paragraph
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 7/12
p 0 words
Question 8 4 pts
p 0 words
You work in a bank and are in charge of classifying customer data into two groups:
loans that are likely to be repaid and loans that are likely not to be repaid. You
have come up with two feature sets that give you the same accuracy using a
decision tree algorithm. However, one set gives relatively more false positives,
whilst the other gives relatively more false negatives. Explain how you would
choose the 'best' set.

Edit View Insert Format Tools Table
12pt Paragraph
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 8/12
Question 9 6 pts
p 0 words
In the 1980s many regression type models forecast that the world would run out of
oil by 2010. Clearly we still have oil. Explain what went wrong and how you would
build a better forecasting model.

Question 10 4 pts
Revise the following regular expression meta operators:
( ) [ ] { } . * + ? ^ $ | \
For each of the following, give a couple of examples of strings which the regular
expression would match. Describe (colloquially, in a manner that a non-technical
person would understand) the set of strings that the pattern is designed to match.
(a) /[a-zA-Z]+/
(b) /p[aeiou]{0,2}t/
Edit View Insert Format Tools Table
12pt Paragraph
Edit View Insert Format Tools Table
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 9/12
p 0 words
Question 11 3 pts
Given the following set of training instances (A, B, C, D):
Feature 1
Feature
2
Feature
3
Class
A sunny hot high N
B sunny mild medium N
C overcast mild high Y
D overcast mild medium Y
Show the use “information gain” to perform “filter-based feature selection”. Select
the best 2 features.
Edit View Insert Format Tools Table
12pt Paragraph
Edit View Insert Format Tools Table
12pt Paragraph
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 10/12
p 0 words
Question 12 3 pts
Given a user query to be applied on a dataset, differential privacy involves adding
noise to the true result for the query, and then returning the noisy result to the user.
Explain two factors which influence how much noise should be added to the query
result. For each factor, you should explain how it is related to the level of noise that
gets added.
Edit View Insert Format Tools Table
12pt Paragraph
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 11/12
p 0 words
Question 13 1 pts
We can use Pearson correlation as the similarity measure
User based methods are personalised but item based methods are not.
Item based and user based methods are mathematically similar, both have similar
performance
Item based methods have the cold-start problem, but user based methods do not have
the problem.
In the context of recommender systems, which of the following is correct about
collaborative filtering method:
Question 14 1 pts
A scatter plot of the x and y attributes
A parallel coordinate plot
A boxplot
A bar plot
A dataset has 2 numeric attributes x and y. From which plot(s) can one observe
the distribution of the y attribute?
Question 15 1 pts
In the context of privacy, which of the statements about k-anonymity and ldiversity
is correct:
2020/11/7 Quiz: Practice Exam Quiz
https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 12/12
No new data to save. Last checked at 14:27
A dataset satisfies l-diversity if there are at least l combinations of values of the quasiidentifiers
for every sensitive attribute value.
The value of l will be no greater than k
A dataset satisfies k-anonymity if every record in the data is indistinguishable from at
least k− 1 other records with respect to each individual attribute
Question 16 1 pts
.xls is structured data.
.csv is structured
an image file is semi-structured
pdf is unstructured data
Which of the following are true? select all correct answers:
Submit Quiz

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!