首页 >
> 详细

Assignment 3: Frequent Itemsets, Clustering,

Advertising

Formative, Weight (15%), Learning objectives (1, 2, 3),

Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)

Due date: 11 : 59pm, 1 June, 2019

1 Overview

Read the following carefully as it di↵ers from the last assignment.

For students who are taking the course COMP SCI 3306 (i.e., undergraduate

students), this assignment can be done in groups consisting of two students. If

you have problems finding a group partner use the forum to search for group

partners.

For other students who are taking the course COMP SCI 7306, this assignment

should be done individually.

References to sections, examples, etc. refer to the book of “Leskovec, Rajaraman

and Ullman: Mining Massive Datasets (Second Edition)”.

2 Assignment

Exercise 1 Frequent Itemsets (15+15+10+10 points)

For this exercise, you have to read Section 6.4 up to 6.4.3.

1. Implement the simple, randomized algorithm given in 6.4.1

2. Implement the algorithm of Savasere, Omiecinski, and Navathe (SON algorithm)

in 6.4.3

3. Compare the two algorithms on the datasets T10I4D100K, T40I10D100K,

chess, connect, mushroom, pumsb, pumsb star provided at

http://fimi.ua.ac.be/data/

and report the outcomes.

1

COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2020

4. Experiment with dierent sample sizes in the simple randomized algorithm

such as 1, 2, 5, 10% and compare your results (including the result produced

by the SON algorithm).

Your approach should be as ecient

as possible in terms of runtime and

memory requirements.

Report on challenges that you might have observed in the implementation

and by running experiments.

Exercise 2 Clustering (10+20 points)

1. Perform a hierarchical clustering on the one-dimensional set of points

1, 4, 9, 16, 25, 36, 49, 64, 81.

assuming the clusters are represented by their centroid (average), and at

each step the clusters with the closest centroids are merged. (Exercise

7.2.1)

2. Implement the K-means algorithm and carry out experiments on the provided

Iris dataset.

a) You are asked to plot the K-means results by plotting the first 2 dimensions

of the input data as well as the converged centroids.

b) Provide some discussions about how you pick the value of K in K-means.

For the Iris data, only use the first 4 dimension for this exercise. In other

words, discard the label information.

Exercise 3 Advertising (Exercise 8.4.1) (10+10 points)

Consider Example 8.7. Suppose that there are three advertisers A, B, and

C. There are three queries x, y, and z. Each advertiser has a budget of 2.

Advertiser A only bids on x, B bids on x and y, and C bids on x, y, and z. Note

that on the query sequence xxyyzz, the optimal one

algorithm would yield a

revenue of 6, since all queries can be assigned.

1. Show that the greedy algorithm will assign at least 4 of the 6 queries

xxyyzz.

2. Find another sequence of queries such that the greedy algorithm can assign

as few as half the queries that the optimal oline algorithm would assign

to that sequence.

3 Procedure for handing in the assignment

Work should be handed in using Canvas. The submission should include:

COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2020

• a PDF file of your solutions for theoretical assignments. The solutions

should contain of a detailed description of how to obtain the result.

For Exercise 2.2, you should properly provide comments in your code to

show your understanding.

• all source files, all the project files.

• a README.txt file containing instructions to run the code, the names,

student numbers, and email addresses of the group members.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20