统计学习方法习题实战第五章

理论树

1.决策树的模型与学习:模型、决策树与if-then规则、条件概率分布以及决策树学习
2.决策树的特征选择:定义特征选择问题,依靠信息增益或信息增益比
3.决策树的生成:
ID3算法:仅有树的生成
C4.5算法:使用了信息增益比的ID3
4.决策树的剪枝:依据带惩罚项的损失函数最小
5.决策树的CART算法(最终):
生成分为:回归树的生成——最小二乘和划分空间
分类树的生成——使用了基尼指数的ID3
剪枝:子树序列

课后题

1.根据表5.1所给的训练数据集,利用信息增益比(C4.5算法)生成决策树。
答:
导包和数据集部分

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import graphviz

features = ["年龄", "有工作", "有自己的房子", "信贷情况"]
x = pd.DataFrame([
    ["青年", "否", "否", "一般"],
    ["青年", "否", "否", "好"],
    ["青年", "是", "否", "好"],
    ["青年", "是", "是", "一般"],
    ["青年", "否", "否", "一般"],
    ["中年", "否", "否", "一般"],
    ["中年", "否", "否", "好"],
    ["中年", "是", "是", "好"],
    ["中年", "否", "是", "非常好"],
    ["中年", "否", "是", "非常好"],
    ["老年", "否", "是", "非常好"],
    ["老年", "否", "是", "好"],
    ["老年", "是", "否", "好"],
    ["老年", "是", "否", "非常好"],
    ["老年", "否", "否", "一般"]
], columns=features)
y = pd.DataFrame(["否", "否", "是", "是", "否",
                        "否", "否", "是", "是", "是",
                        "是", "是", "是", "是", "否"])
class_names = [str(k) for k in np.unique(y)]

预处理

x = pd.get_dummies(x)
features = list(x.columns)

训练以及用graphviz画图

model_tree = DecisionTreeClassifier()
model_tree.fit(x, y)
dot_data = tree.export_graphviz(model_tree, out_file=None,
                                feature_names=x.columns,
                                class_names=class_names,
                                filled=True, rounded=True,
                                special_characters=True)
graph = graphviz.Source(dot_data)
graph

tree_text = tree.export_text(model_tree, feature_names=features)
print(tree_text)

8111677323563_.pic

2.已知如表5.2所示的训练数据,试用平方误差损失准则生成一个二叉回归树。
答:
仅会简洁实现

from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]).T
y = np.array([4.50, 4.75, 4.91, 5.34,
              5.80, 7.05, 7.90, 8.23, 8.70, 9.00])

rtree = DecisionTreeRegressor(max_depth=3)
rtree.fit(x, y)
x_dot = np.arange(0, 11, 0.01).reshape(-1, 1)
y_line = rtree.predict(x_dot)

plt.figure()
plt.scatter(x, y)
plt.plot(x_dot, y_line)
plt.title('DecisionTreeRegressor');

Unknown

浙ICP备19012682号