联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-10-12 11:42

Assignment # 2 – Total: 100 Points  

Classification Tree Algorithm


Classification trees are one of the most widely used data mining algorithms; they are simple yet effective, and are the basis for many complicated data mining algorithms. Classification tree learning is a form of supervised learning. A set of training examples with their correct classifications is used to generate a decision tree that, hopefully, classifies each example in the validation set correctly. To get started, consider the problem of learning whether or not to go jogging on a particular day. To keep things simple enough to work through this problem by hand, we use a very small number of examples from which we want to learn the concept.


Assume you are using the following attributes to describe the examples:

AttributePossible Values


WEATHERWarm,Cold, Raining

JOGGED_YESTERDAYYes, No


(Since each attribute's value starts with a different letter, for shorthand we'll just use that initial letter, e.g., 'W' for Warm.)

Our output decision is binary-valued, so we'll use '+' and '-' as our class labels, indicating a "jog" recommendation or not, respectively. Here is our TRAINING set, which contains data:

WEATHERJOGGED_YESTERDAYCLASSIFICATION

2.1. Constructing the Initial Decision Tree

Apply the decision tree steps described in the class using information gain as the decision criteria for every split in the tree (to choose the best attribute). Show all your works, including the final decision tree. To draw your decision trees and show your calculation, please see the example at Page 30 of the slide, “Week5_ClassificationTree”. Again, the process for selecting the best split can be simplified into one simple rule:

Select the best split (by attribute) using information gain


You may use Excel to calculate the information gain from partitioning the training example on a given attribute. Use the function =log(base, number), where base is 2 for entropy calculation.


2.2. Estimating Future Accuracy

Here is our Validation set:


WEATHERJOGGED_YESTERDAYCLASSIFICATION

Use the decision tree produced in part (a) to classify each example in the Validation set. Show all your works, and report the accuracy (i.e., percent correct classification) on these validation examples. Briefly discuss your results.







版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp