统计学习方法习题实战第三章

理论树

1.k近邻算法:框架
2.k近邻模型:模型、距离度量、k的选择以及分类决策规则
3.KD树:
加快检索速度:构造KD树:超平面划分以及排放实例、以x(l)轴上的实例中位数划分
搜索KD树:超球体与超矩形相交

习题

1.参照图3.1,在二维空间中给出实例点,画出k为1和2时的k近邻法构成的空间划分,并对其进行比较,体会k值选择与模型复杂度及预测准确率的关系。
答:
训练模型

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
%matplotlib inline
from sklearn.neighbors import KNeighborsClassifier

data = np.array([[5, 12, 1],
                 [6, 21, 0],
                 [14, 5, 0],
                 [16, 10, 0],
                 [13, 19, 0],
                 [13, 32, 1],
                 [17, 27, 1],
                 [18, 24, 1],
                 [20, 20, 0],
                 [23, 14, 1],
                 [23, 25, 1],
                 [23, 31, 1],
                 [26, 8, 0],
                 [30, 17, 1],
                 [30, 26, 1],
                 [34, 8, 0],
                 [34, 19, 1],
                 [37, 28, 1]])


X_train = data[:, 0:2]
y_train = data[:, 2]


models = (KNeighborsClassifier(n_neighbors=1, n_jobs=-1),
          KNeighborsClassifier(n_neighbors=2, n_jobs=-1))

models = (clf.fit(X_train, y_train) for clf in models)

画图部分

#title、坐标轴以及生成网格
titles = ('K Neighbors with k=1',
          'K Neighbors with k=2')
X0, X1 = X_train[:, 0], X_train[:, 1]
x_min, x_max = X0.min() - 1, X0.max() + 1
y_min, y_max = X1.min() - 1, X1.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1)) #生成网络的函数
cmap = ListedColormap(('red', 'blue')) #定义配色

#
fig = plt.figure(figsize=(15, 5))
plt.subplots_adjust(wspace=0.4, hspace=0.4) #子图间距
for clf, title, ax in zip(models, titles, fig.subplots(1, 2).flatten()):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) #打平并成特征空间
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap=cmap, alpha=0.5) #画边缘线
    ax.scatter(X0, X1, c=y_train, s=50, edgecolors='k', cmap=cmap, alpha=0.5)
    acc = clf.score(X_train, y_train)
    ax.set_title(title + ' (Accuracy: %d%%)' % (acc * 100))
plt.show()

2.利用例题3.2构造的kd树求点x=(3,4.5)T的最近邻点。
答:

import numpy as np
from sklearn.neighbors import KDTree

train_data = np.array([[2, 3],
                       [5, 4],
                       [9, 6],
                       [4, 7],
                       [8, 1],
                       [7, 2]])
tree = KDTree(train_data, leaf_size=2)
dist, ind = tree.query(np.array([[3, 4.5]]), k=1)
node_index = ind[0]

x1 = train_data[node_index][0][0]
x2 = train_data[node_index][0][1]
print("x点(3,4.5)的最近邻点是({0}, {1})".format(x1, x2))

x点(3,4.5)的最近邻点是(2, 3)

浙ICP备19012682号