sklearn RandomizedSearchCV与流水线KerasClassifier
我正在Keras模型上执行超参数调优优化任务与sklearn。我想一个管道内优化KerasClassifiers ... 代码如下:sklearn RandomizedSearchCV与流水线KerasClassifier
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold,RandomizedSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.pipeline import Pipeline
my_seed=7
dataframe = pd.read_csv("z:/sonar.all-data.txt", header=None)
dataset = dataframe.values
# split into input and output variables
X = dataset[:,:60].astype(float)
Y = dataset[:,60]
encoder = LabelEncoder()
Y_encoded=encoder.fit_transform(Y)
myScaler = StandardScaler()
X_scaled = myScaler.fit_transform(X)
def create_keras_model(hidden=60):
model = Sequential()
model.add(Dense(units=hidden, input_dim=60, kernel_initializer="normal", activation="relu"))
model.add(Dense(1, kernel_initializer="normal", activation="sigmoid"))
#compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
return model
def create_pipeline(hidden=60):
steps = []
steps.append(('scaler', StandardScaler()))
steps.append(('dl', KerasClassifier(build_fn=create_keras_model,hidden=hidden, verbose=0)))
pipeline = Pipeline(steps)
return pipeline
my_neurons = [15, 30, 60]
my_epochs= [50, 100, 150]
my_batch_size = [5,10]
my_param_grid = dict(hidden=my_neurons, epochs=my_epochs, batch_size=my_batch_size)
model2Tune = KerasClassifier(build_fn=create_keras_model, verbose=0)
model2Tune2 = create_pipeline()
griglia = RandomizedSearchCV(estimator=model2Tune, param_distributions = my_param_grid, n_iter=8)
griglia.fit(X_scaled, Y_encoded) #this works
griglia2 = RandomizedSearchCV(estimator=create_pipeline, param_distributions = my_param_grid, n_iter=8)
griglia2.fit(X, Y_encoded) #this does not
我们看到,RandomizedSearchCV
作品与griglia,虽然它不与griglia2工作,返回
"TypeError: estimator should be an estimator implementing 'fit' method, was passed".
是否有可能修改代码以使其在Pipeline对象下运行?
在此先感谢
estimator参数需要一个对象,而不是指针。目前您正在传递一个指向生成管道对象的方法的指针。尝试添加()
给它解决这个问题:
griglia2 = RandomizedSearchCV(estimator=create_pipeline(), param_distributions = my_param_grid, n_iter=8)
现在对于关于该无效参数错误的第二注释。您需要将创建管道时定义的名称追加到实际参数中,以便它们可以成功传递。
查看Pipeline usage here的描述。
使用此:
my_param_grid = dict(dl__hidden=my_neurons, dl__epochs=my_epochs,
dl__batch_size=my_batch_size)
通知的dl__
(两个下划线)。当您想调整管道内多个对象的参数时,这非常有用。
例如,让我们与上述参数一起说,您还想调整或指定StandardScaler的参数。
那么你的参数网格变为:
my_param_grid = dict(dl__hidden=my_neurons, dl__epochs=my_epochs,
dl__batch_size=my_batch_size,
scaler__with_mean=False)
希望这将清除的东西。
感谢它的工作 –
'estimator'参数需要一个对象,而不是指针。尝试改为'griglia2 = RandomizedSearchCV(estimator = create_pipeline(),param_distributions = my_param_grid,n_iter = 8)' –
@VivekKumar,感谢您的初步了解。我仍然得到一个(新的)错误信息,现在是“ValueError:Estimator Pipeline的无效参数batch_size。用'estimator.get_params()。keys()'。”检查可用参数列表。 –