Kaggle竞赛PetFinder日记,第2天:随机森林,调参

使用随机森林,调参,类似这样,找到最大点,结果稍微提高了一两个百分点,Train内测试都在4.0以上了。

由于xgboost在我机器上跑的慢,所以,不再选用。

加上网上找来的各省人口,人均GDP数据

GDPAVG  =  {41336: 99, 41325: 99, 41367: 80, 41401: 662, 41415: 7327, 41324: 582, 41332: 404, 41335: 225, 41330: 119, 41380: 934, 41327: 309, 41345: 73, 41342: 93, 41326: 187, 41361: 255}
Population = {41336: 346, 41325: 204, 41367: 168, 41401: 168, 41415: 9, 41324: 79, 41332: 103, 41335: 157, 41330: 244, 41380: 25, 41327: 161, 41345: 327, 41342: 528, 41326: 256, 41361: 115}
df['GDPAVG'] = df['State'].map(GDPAVG)
df['Population'] = df['State'].map(Population)

Kaggle竞赛PetFinder日记,第2天:随机森林,调参

import matplotlib.pyplot as plt
test = []
ranges= range(2,20)
for i in ranges:
    rfc = RandomForestClassifier(n_estimators=230
                                 ,max_depth= 11
                                 ,max_features=4
                                 ,min_samples_split=10
                                 ,random_state=10
                                 ,min_samples_leaf=i
                                 )
    rfc.fit(x_train, y_train)
    # rfc_y_predict = rfc.predict(x_test)
    score = rfc.score(x_test, y_test)
    test.append(score)
plt.plot(ranges,test,color="red",label="min_samples_leaf")
plt.legend()
plt.show()