吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression

现在开始一个全新的篇章,学习吴恩达机器学习并用python完成其布置的作业。由于本人以后想从事机器学习/深度学习相关方面的工作。吴恩达建议使用Octave/MATLAB来完成作业,但是用什么语言都不影响算法的编写。因此博主尝试使用python完成其全部作业,一点一点记录这个过程
PART1 Linear regression with one variable
In this part of this exercise, you will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.
You would like to use this data to help you select which city to expand to next.The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.The ex1.m script has already been set up to load this data for you.
2.1 Plotting the Data
Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). (Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot.)
本次作业需要绘图和科学计算 因此使用到了Numpy 和 Matplotlib两个库

import numpy as np
import matplotlib.pyplot as plt

第一步是根据所给出的数据绘图即可,因为是离散数据,绘制散点图即可,代码如下:

data1 = np.loadtxt('ex1data1.txt', delimiter=',')
X = data1[:, 0]
y = data1[:, 1]
# print(X, y)

# 绘图
plt.scatter(x=X, y=y, c='r', marker='x')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()

吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
2.2 Gradient Descent
In this part, you will fit the linear regression parameters θ to our dataset using gradient descent.
2.2.1 Update Equations
The objective of linear regression is to minimize the cost function:
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
where the hypothesis hθ(x) is given by the linear model
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
Recall that the parameters of your model are the θj values. These are the values you will adjust to minimize cost J(θ). One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
With each step of gradient descent, your parameters θj come closer to the optimal values that will achieve the lowest cost J(θ)

2.2.2 Implementation
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
在进行损失函数计算之前先进行初始化,上图是吴恩达在matlab中给出的初始化参数,吴恩达在课程中讲述了,在X矩阵中添加一列全1列,这样可以方便做矩阵运算,下面是我用python初始化的代码:

m = len(data1)
temp = np.ones([m, 1])
X = X.reshape((m, 1))
X = np.hstack([temp, X])
# print(X)
y = y.reshape((m, 1))

theta = np.zeros([2, 1])
iterations = 1500
alpha = 0.01

2.2.3 Computing the cost J(θ)
As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. In this section, you will implement a function to calculate J(θ) so you can check the convergence of your gradient descent implementation. Your next task is to complete the code in the file computeCost.m, which is a function that computes J(θ). As you are doing this, remember that the variables X and y are not scalar values, but matrices whose rows represent
the examples from the training set. Once you have completed the function, the next step in ex1.m will run computeCost once using θ initialized to zeros, and you will see the cost printed to the screen.
You should expect to see a cost of 32.07

对于损失函数J(θ)的计算,依据如下两个公式进行的:
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression

# 定义中间函数h  此时X形状为[m,2] theta形状为[2,1] X dot theta = [m,1]
def h(theta, X):
    return np.dot(X, theta)

def J(theta, X, y):
    return 0.5 * np.mean(np.square(h(theta, X) - y))

2.2.4 Gradient descent
Next, you will implement gradient descent in the file gradientDescent.m.The loop structure has been written for you, and you only need to supply the updates to θ within each iteration.
As you program, make sure you understand what you are trying to optimize and what is being updated. Keep in mind that the cost J(θ) is parameterized by the vector θ, not X and y. That is, we minimize the value of J(θ) by changing the values of the vector θ, not by changing X or y. Refer to the equations in this handout and to the video lectures if you are uncertain.
A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. The starter code for gradientDescent.m calls computeCost on every iteration and prints the cost. Assuming you have implemented gradient descent and computeCost correctly, your value of J(θ) should never increase, and should
converge to a steady value by the end of the algorithm.
After you are finished, ex1.m will use your final parameters to plot the linear fit. The result should look something like Figure 2:
Your final values for θ will also be used to make predictions on profits in areas of 35,000 and 70,000 people. Note the way that the following lines in ex1.m uses matrix multiplication, rather than explicit summation or looping, to calculate the predictions. This is an example of code vectorization in Octave/MATLAB.
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
上面的工作已经完成了损失函数的计算工作,接下来需要完成梯度下降的计算工作。在课程中对于梯度下降的数学推导已经有过详细的教学了,这里我就不再重复赘述,直接上代码:

# 梯度下降
def GradientDescent(theta, X, y, iterations, alpha):
    cost = []
    cost.append(J(theta, X, y))
    for i in range(iterations):
        grad0 = np.mean(h(theta, X) - y)
        grad1 = np.mean((h(theta, X) - y) * (X[:, 1].reshape([m, 1])))
        theta[0] = theta[0] - alpha * grad0
        theta[1] = theta[1] - alpha * grad1
        cost.append(J(theta, X, y))
    return theta, cost

theta_result, cost_result = GradientDescent(theta, X, y, iterations, alpha)
print(theta_result)

# predict
predict1 = np.dot(np.array([1, 3.5]), theta)
predict2 = np.dot(np.array([1, 7]), theta)
print(predict1, predict2)

这里得到的theta_result为:[[-3.63029144] [ 1.16636235]], predict1, predict2为:[0.45197679] [4.53424501]

接下来就需要对得到的一元线性回归绘图,其代码和结果如下:

# 绘结果图
x_predict = [X[:, 1].min(), X[:,1 ].max()]
y_predict = [theta_result[0] + (theta_result[1] * X[:, 1].min()), theta_result[0] + (theta_result[1] * X[:, 1].max())]
plt.plot(x_predict, y_predict, c='b', label='predict')
plt.scatter(data1[:, 0], data1[:, 1], c='r', marker='x', label='data')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.legend()
plt.show()

由于是做的一元线性回归,绘制一条直线来拟合数据,因此只需要取数据中的最小、最大值的两个点建立直线,即可完成对数据的拟合。
吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
下面是对损失函数J(θ)的一个绘制:

plt.plot(cost_result)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()

吴恩达机器学习作业(python) 1.1 Programming Exercise 1: Linear Regression
至此,吴恩达机器学习的第一节,一元线性回归的作业就全部完成了~