python机器学习 第二章(2)自适应神经元
自适应线性神经元(Adaline)
可以视作是对感知器算法的优化改进。
由于其说明了定义最小化连续性代价函数的关键概念,为理解逻辑回归,支持向量机和回归模型等机器学习算法奠定了基础。
Adaline规则的权重更新基于线性**函数。不像感知器是基于单位越阶函数。虽然线性**函数可以用于学习权重,但是仍然用阈值函数做最终预测。也就是说相比于感知器,Adaline在进行更新权重时用到了一个效果更好的函数。其对比图如下:
首先我们有一个代价函数,它是所有标签与预测标签的平方和的一半,将输入信息输入到线性**函数后,得到的代价函数变得连续,凸起。即可以通过梯度下降法找到使代价函数最小的点
用python实现Adaline
class AdalineGD(object):
"""ADAptive LInear NEuron classifier.
Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.
random_state : int
Random number generator seed for random weight
initialization.
Attributes
-----------
w_ : 1d-array
Weights after fitting.
cost_ : list
Sum-of-squares cost function value in each epoch.
"""
def __init__(self, eta=0.01, n_iter=50, random_state=1):
self.eta = eta#学习率
self.n_iter = n_iter#训练次数
self.random_state = random_state
def fit(self, X, y):
""" Fit training data.
Parameters
----------
X : {array-like}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.
Returns
-------
self : object
"""
rgen = np.random.RandomState(self.random_state)
self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])#初始化权重
self.cost_ = []#代价总和
for i in range(self.n_iter):
net_input = self.net_input(X)#获取输入
# Please note that the "activation" method has no effect
# in the code since it is simply an identity function. We
# could write `output = self.net_input(X)` directly instead.
# The purpose of the activation is more conceptual, i.e.,
# in the case of logistic regression (as we will see later),
# we could change it to
# a sigmoid function to implement a logistic regression classifier.
output = self.activation(net_input)
errors = (y - output)
self.w_[1:] += self.eta * X.T.dot(errors)
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self
def net_input(self, X):
"""Calculate net input"""
return np.dot(X, self.w_[1:]) + self.w_[0]
def activation(self, X):
"""Compute linear activation"""
return X
def predict(self, X):
"""Return class label after unit step"""
return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)
在实践中往往需要通过一些实验找到一个好的学习率η以达到最优收敛。选择学习率0.1和0.0001两个不同的学习率,把代价函数与迭代次数在图中表现出来:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
ada1 = AdalineGD(n_iter=10, eta=0.01).fit(X, y)
ax[0].plot(range(1, len(ada1.cost_) + 1), np.log10(ada1.cost_), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaline - Learning rate 0.01')
ada2 = AdalineGD(n_iter=10, eta=0.0001).fit(X, y)
ax[1].plot(range(1, len(ada2.cost_) + 1), ada2.cost_, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaline - Learning rate 0.0001')
# plt.savefig('images/02_11.png', dpi=300)
plt.show()
左图学习率太大。因为选择的全局最小值过低,无法最小化代价函数,误差经过每次迭代变得越来越大。而当学习率=0.0001则过小,导致多次迭代才能收敛到全局最低代价。学习率不同导致的代价函数变化可以直观见于下图: