首页 > > 详细

辅导Tree Algorithm、讲解data留学生、Python,Java,c/c++程序语言辅导辅导Web开发|解析Haskell程序

Assignment # 2 – Total: 100 Points
Classification Tree Algorithm

Classification trees are one of the most widely used data mining algorithms; they are simple yet effective, and are the basis for many complicated data mining algorithms. Classification tree learning is a form of supervised learning. A set of training examples with their correct classifications is used to generate a decision tree that, hopefully, classifies each example in the validation set correctly. To get started, consider the problem of learning whether or not to go jogging on a particular day. To keep things simple enough to work through this problem by hand, we use a very small number of examples from which we want to learn the concept.

Assume you are using the following attributes to describe the examples:
Attribute Possible Values

WEATHER Warm, Cold, Raining
JOGGED_YESTERDAY Yes, No

(Since each attribute's value starts with a different letter, for shorthand we'll just use that initial letter, e.g., 'W' for Warm.)
Our output decision is binary-valued, so we'll use '+' and '-' as our class labels, indicating a "jog" recommendation or not, respectively. Here is our TRAINING set, which contains data:
WEATHER JOGGED_YESTERDAY CLASSIFICATION
2.1. Constructing the Initial Decision Tree
Apply the decision tree steps described in the class using information gain as the decision criteria for every split in the tree (to choose the best attribute). Show all your works, including the final decision tree. To draw your decision trees and show your calculation, please see the example at Page 30 of the slide, “Week5_ClassificationTree”. Again, the process for selecting the best split can be simplified into one simple rule:
Select the best split (by attribute) using information gain

You may use Excel to calculate the information gain from partitioning the training example on a given attribute. Use the function =log(base, number), where base is 2 for entropy calculation.

2.2. Estimating Future Accuracy
Here is our Validation set:

WEATHER JOGGED_YESTERDAY CLASSIFICATION
Use the decision tree produced in part (a) to classify each example in the Validation set. Show all your works, and report the accuracy (i.e., percent correct classification) on these validation examples. Briefly discuss your results.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!