Challenge: Machine Learning Basics

1: How Challenges Work

At Dataquest, we're huge believers in learning through doing and we hope this shows in the learning experience of the missions. While missions focus on introducing concepts, challenges allow you to perform deliberate practice by completing structured problems. You can read more about deliberate practice here and here. Challenges will feel similar to missions but with little instructional material and a larger focus on exercises.

For these challenges, we strongly encourage programming on your own computer so you practice using these tools outside the Dataquest environment. You can also use the Dataquest interface to write and quickly run code to see if you’re on the right track. By default, clicking the check code button runs your code and performs answer checking. You can toggle this behavior so that your code is run and the results are returned, without performing any answer checking. Executing your code without performing answer checking is much quicker and allows you to iterate on your work. When you’re done and ready to check your answer, toggle the behavior so that answer checking is enabled.

If you have questions or run into issues, head over to the Dataquest forums or our Slack community.

2: Data Cleaning

In this challenge, you'll build on the exploration from the last mission, where we tried to answer the question:

  • How do the properties of a car impact it's fuel efficiency?

We focused the last mission on capturing how the weight of a car affects it's fuel efficiency by fitting a linear regression model. In this challenge, you'll explore how the horsepower of a car affects it's fuel efficiency and practice using scikit-learn to fit the linear regression model.

Unlike the weight column, the horsepower column has some missing values. These values are represented using the ? character. Let's filter out these rows so we can fit the model. We've already read auto-mpg.data into a Dataframe named cars.

Instructions

  • Remove all rows where the value for horsepower is ? and convert the horsepower column to a float.
  • Assign the new Dataframe tofiltered_cars.

import pandas as pd
columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns)
filtered_cars=cars[cars["horsepower"]!="?"]
filtered_cars["horsepower"]=filtered_cars["horsepower"].astype("float")

3: Data Exploration

Now that the horsepower values are cleaned, generate a scatter plot that visualizes the relation between the horsepower values and thempg values. Let's compare this to the scatter plot that visualizes weight against mpg.

Instructions

  • Use the Dataframe plot to generate 2 scatter plots, in vertical order:
    • On the top plot, generate a scatter plot with thehorsepower column on the x-axis and the mpgcolumn on the y-axis.
    • On the bottom plot, generate a scatter plot with the weight column on the x-axis and the mpg column on the y-xis.

import matplotlib.pyplot as plt
%matplotlib inline
filtered_cars.plot("weight","mpg",kind="scatter")
filtered_cars.plot("acceleration","mpg",kind="scatter")
plt.show()

Challenge: Machine Learning BasicsChallenge: Machine Learning Basics

 

 

4: Fitting A Model

While it's hard to directly compare the plots since the scales for the x axes are very different, there does seem to be some relation between a car's horsepower and it's fuel efficiency. Let's fit a linear regression model using the horsepower values to get a quantitive understanding of the relationship.

Instructions

  • Create a new instance of the LinearRegression model and assign it to lr.
  • Use the fit method to fit a linear regression model using thehorsepower column as the input.
  • Use the model to make predictions on the same data the model was trained on (thehorsepower column fromfiltered_cars) and assign the resulting predictions topredictions.
  • Display the first 5 values inpredictions and the first 5 values in the mpg column fromfiltered_cars.

import sklearn
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(filtered_cars[["horsepower"]], filtered_cars["mpg"])
predictions = lr.predict(filtered_cars[["horsepower"]])
print(predictions[0:5])
print(filtered_cars["mpg"][0:5].values)

Output

[ 19.41604569 13.89148002 16.25915102 16.25915102 17.83759835]

[ 18. 15. 18. 16. 17.]

5: Plotting The Predictions

In the last mission, we plotted the predicted values and the actual values on the same plot to visually understand the model's effectiveness. Let's repeat that here for the predictions as well.

Instructions

  • Generate 2 scatter plots on the same chart (Matplotlib axes instance):
    • One containing thehorsepower values on the x-axis against the predicted fuel efficiency values on the y-axis. Use blue for the color of the dots.
    • One containing thehorsepower values on the x-axis against the actual fuel efficiency values on the y-axis. Use red for the color of the dots.

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(filtered_cars["horsepower"],predictions,c="blue")
plt.scatter(filtered_cars["horsepower"],filtered_cars["mpg"],c="red")
plt.show()

Challenge: Machine Learning Basics

6: Error Metrics

To evaluate how well the model fits the data, you can compute the MSE and RMSE values for the model. Then, you can compare the MSE and RMSE values with those from the model you fit in the last mission. Recall that the model you fit in the previous mission captured the relationship between the weight of a car (weight column) and it's fuel efficiency (mpg column).

Instructions

  • Calculate the MSE of the predicted values and assign tomse.
  • Calculate the RMSE of the predicted values and assign tormse.

 

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(filtered_cars["mpg"], predictions)
print(mse)
rmse = mse ** 0.5
print(rmse)

7: Next Steps

The MSE for the model from the last mission was 18.78 while the RMSE was 4.33. Here's a table comparing the approximate measures for both models:

 

  Weight Horsepower
MSE 18.78 23.94
RMSE 4.33 4.89

 

If we could only use one input to our model, we should definitely use the weight values to predict the fuel efficiency values because of the lower MSE and RMSE values. There's a lot more before we can build a reliable, working model to predict fuel efficiency however. In later missions, we'll learn how to use multiple features to build a more reliable predictive model.

 

转载于:https://my.oschina.net/Bettyty/blog/751301