首页 >
> 详细

CSCE 474/874: Introduction to Data Mining

Spring 2021

Homework 3 March 02, 2021

Assignment

Implement the k-means algorithm to perform clustering and compare your results with

the results from Weka.

• Assume that all the attributes are continuous variables.

• Your program must allow the number of clusters (k) to be specified as input.

• Your program must allow the epsilon (change in the sum of the distances from the

cluster centers) to be specified as input.

• Your program must allow the number of iterations to be specified as input.

Your program should stop if either the number of iterations is reached or if the change in

the total sum of the squares of the distances (SSD) falls below epsilon.

Plot the runtime of the algorithm as a function of number of clusters, number of

dimensions and size of the dataset (number of transactions).

Plot the goodness of clustering as a function of the number of clusters and determine the

optimal number of clusters.

Compare the performance of your algorithm with that of Weka and summarize your

results.

For this assignment you will work in teams. Use the dataset from the domain you will be

working on for the project. If the data is not suitable, you may use one from the Weka

dataset.

All code must be written by the members of your team. You may NOT use any code

from ANY OTHER source, including other students and the Internet.

Due Date

The assignment is due on March 16 is worth 100 points.

Handin

Hand in a report along with the listing of your program, the output generated from the run

of the test file on Canvas. Make sure that you have uploaded a signed copy of the

Contributions form. Prepare and submit two files as follows:

• Your report named as “Lastname1_Lastname2.pdf” in pdf format. The signed

contributions form should be used as the cover page of your report.

• A zip file named “Lastname1_Lastname2.zip” that includes everything else (your

program, the output generated from the run of the test file, etc.). You must include

a README file that describes the usage of your program. Make sure your

implementation can successfully execute on the CSE server.

Grading Guidelines

Implement the k-means algorithm to perform clustering in a dataset. (50 points)

• Your implementation will be tested on cse.unl.edu server using the command you

provided in the README file. (30 points)

• In the report, you should write a paragraph about your program design (10 points)

Plot the runtime of the algorithm as a function of number of clusters, number of

dimensions and size of the dataset (number of transactions). (20 points)

• In the report, you should write a paragraph to summarize the observation and

elaborate on it.

Plot the goodness of clustering as a function of the number of clusters and determine the

optimal number of clusters. (20 points)

• In the report, you should write a paragraph to summarize the observation and

elaborate on it.

Compare the performance of your algorithm with that of Weka and summarize your

results. (10 points)

• Summarize the differences (if there is any) and elaborate on it (why/how).

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp2

- 辅导comp30027帮做python编程 2021-08-02
- 辅导csse2002/7023-Assignment 1辅导留学... 2021-08-02
- 辅导rush2辅导c/C++ 2021-08-02
- 辅导r语言编程|辅导spss|辅导web开发|辅导... 2021-05-10
- Data留学生编程辅导、辅导analysis程序、Sql语言程序调试辅导r语 2021-05-10
- 辅导31748程序语言、辅导programming编程设计、Java，Pyt 2021-05-10
- 辅导cis 657编程、辅导c/C++程序、C++编程调试帮做haskell 2021-05-10
- Com1005程序辅导、辅导java编程语言、辅导java程序辅导留学生pr 2021-05-10
- 辅导sit283程序、辅导c/C++，Python编程设计、Cs，Java程 2021-05-09
- C++程序辅导、辅导c++程序、辅导program编程语言辅导r语言编程|辅 2021-05-09
- 辅导0ccs0cse编程、辅导r，Java，Python程序语言辅导web开 2021-05-09
- Comp124编程语言辅导、Java程序辅导、辅导program语言编程辅导 2021-05-09
- Comp122编程语言辅导、辅导java程序语言、Java程序调试帮做has 2021-05-09
- 辅导ele00041i 调试java Programming 2021-05-08
- 辅导econ 2014-Assignment 1 Managerial... 2021-05-08
- 辅导mast90044-Assignment 1 Thinking An... 2021-05-08
- 辅导cs310-Assignment 2 Hash Tables 2021-05-08
- 辅导5pm 调试java编程、Java编程辅导 2021-05-08
- 辅导cs544 Final Exam Preparation Guide... 2021-05-08
- 辅导infs7450 Social Media Analytics 2021-05-08