首页 >
> 详细

3. In this problem, you are required to use spark.ml API. As in Problem 2, consider 3

objects:

(1) The first object, denoted by OA, is a ball centered at (0, 0, 0) of radius 1. As a

set of points, we write OA = {(x, y, z) | x

2 + y

2 + z

2 ≤ 1}.

(2) The second object, denoted by OB, is a cylinder defined by OB = {(x, y, z) |

x

2 + y

2 ≤ 4, 2 ≤ z ≤ 4}.

(3) The third object, denoted by OC, is an ellipsoid

OC = {(x, y, z) |(x − 2)2

Note that OA overlaps with OC a little bit.

Create a dataset in the following way:

(1) Each record in the dataset corresponds to a point contained in the union of OA,

OB and OC, which has a “features” part which is made of the xyz coordinates

2

of that point and a “label” part which tells which of OA, OB or OC this point

is contained in. Note that since OA ∩ OC is nonempty, if the point happens to

locate in OA ∩ OC, you still can only label it as OA or OC, but not both.

(2) The dataset you create should contain at least 500000 records. You should generate

the records randomly in the following way:

i. Each time, choose OA, OB or OC randomly. Suppose we choose OX (X is A,

B or C).

ii. Randomly create a point P contained in OX (think of how to do it). Now

the features of the newly created record is the coordinates of P and the

corresponding label is “OX”.

iii. After creating all the records, you should load and transform the dataset to

a spark Dataframe.

You are required to do the following work.

(1) Do classifications using both logistic regression and decision tree classifier. You

should try several different training/test split ratio on your dataset and for each

trained model, evaluate your model and show the accuracy of the test.

(2) Use K-means clustering to make cluster analysis on your data. Now only the

“feature” part of your data matters. Set the number K of clusters to 2, 3 and 4

respectively and make a comparison. Show the location of the centroids for each

case.

(3) Provide a visualization of the results of your classifications and cluster analysis.

In your report, you should provide both your codes and your demonstration of the

results. Take screenshots whenever necessary.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Coursework: Colliding Suns 2021-02-25
- Ecs 36 B Homework #6 2021-02-25
- Comp3258 Final Project 2021-02-25
- Int301 Bio-Computation 2021-02-25
- Assignment 2 Write A Device Driver 2021-02-25
- Cis 415 - Operating Systems 2021-02-25
- Data编程实验代做、代写java编程实验、Python，C++程序代做代写 2021-02-25
- Mat 3373编程语言代写、代做r编程设计、R程序调试代写python程序 2021-02-25
- 代写program编程、代做java，Cs程序语言、C++，Python编程 2021-02-25
- W21 Lab 4代做、代写c++程序、代做c/C++编程实验代做r语言编程 2021-02-25
- 代写ece36800编程课程、代做c++，Java程序、Python程序语言 2021-02-25
- 代做cs 325编程、代写java，Python，Cs程序语言代写web开发 2021-02-25
- Gr5241编程代做、代写r留学生程序、R实验编程调试代写python编程| 2021-02-25
- Program编程设计代做、代写g++程序设计、C++，Cs编程语言代写代写 2021-02-05
- Cs1003编程语言代做、代写programming程序、Java编程设计调 2021-02-05
- 代做cs 505程序实验、代写python，Cs语言编程、Java程序设计调 2021-02-05
- Ics 53留学生程序代做、代写java，C++，Python实验编程代写w 2021-02-05
- 代写csci 3162程序、代做matlab课程编程、Matlab编程语言调 2021-02-05
- Cs 480-2程序设计代做、代写data程序实验、C/C++编程语言调试代 2021-02-04
- Cst8237课程编程代做、代写java，Python编程实验、Cs，Jav 2021-02-04