MNIST数据集上的逻辑回归
问题描述:
在this后,你可以找到一个非常好的教程,介绍如何将SVM分类器应用到MNIST数据集。我想知道是否可以使用逻辑回归而不是SVM分类器。所以我搜索了openCV中的Logistic回归,并且我发现两个分类器的语法几乎完全相同。所以我猜我可能只是注释掉以下部分:MNIST数据集上的逻辑回归
cv::Ptr<cv::ml::SVM> svm = cv::ml::SVM::create();
svm->setType(cv::ml::SVM::C_SVC);
svm->setKernel(cv::ml::SVM::POLY);//LINEAR, RBF, SIGMOID, POLY
svm->setTermCriteria(cv::TermCriteria(cv::TermCriteria::MAX_ITER, 100, 1e-6));
svm->setGamma(3);
svm->setDegree(3);
svm->train(trainingMat , cv::ml::ROW_SAMPLE , labelsMat);
,取而代之的是:
cv::Ptr<cv::ml::LogisticRegression> lr1 = cv::ml::LogisticRegression::create();
lr1->setLearningRate(0.001);
lr1->setIterations(10);
lr1->setRegularization(cv::ml::LogisticRegression::REG_L2);
lr1->setTrainMethod(cv::ml::LogisticRegression::BATCH);
lr1->setMiniBatchSize(1);
lr1->train(trainingMat, cv::ml::ROW_SAMPLE, labelsMat);
但首先,我得到这个错误: OpenCV的错误:错误的参数(数据和标签必须是浮点矩阵)
然后,我改变
cv::Mat labelsMat(labels.size(), 1, CV_32S, labelsArray);
:
cv::Mat labelsMat(labels.size(), 1, CV_32F, labelsArray);
现在我得到这个错误:OpenCV的错误:错误的参数(数据应该有ATLEAST两班)
我有10个班(0,1,...,9),但我不知道为什么我得到这个错误。我的代码与上述教程中的代码几乎完全相同。
答
在Python中,你可以做这样的事情:
import matplotlib.pyplot as plt
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
from sklearn.linear_models import LogisticRegression
# The digits dataset
digits = datasets.load_digits()
# The data that we are interested in is made of 8x8 images of digits, let's
# have a look at the first 3 images, stored in the `images` attribute of the
# dataset. If we were working from image files, we could load them using
# pylab.imread. Note that each image must have the same size. For these
# images, we know which digit they represent: it is given in the 'target' of
# the dataset.
images_and_labels = list(zip(digits.images, digits.target))
for index, (image, label) in enumerate(images_and_labels[:4]):
plt.subplot(2, 4, index + 1)
plt.axis('off')
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Training: %i' % label)
# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
选择你喜欢下面
# Create a classifier: a support vector classifier
classifier = svm.SVC(gamma=0.001)
# create a Logistic Regression Classifier
classifier = LogisticRegression(C=1.0)
# We learn the digits on the first half of the digits
classifier.fit(data[:n_samples/2], digits.target[:n_samples/2])
# Now predict the value of the digit on the second half:
expected = digits.target[n_samples/2:]
predicted = classifier.predict(data[n_samples/2:])
print("Classification report for classifier %s:\n%s\n"
% (classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))
images_and_predictions = list(zip(digits.images[n_samples/2:], predicted))
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
plt.subplot(2, 4, index + 5)
plt.axis('off')
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediction: %i' % prediction)
plt.show()
你可以看到整个代码here
可能的分类之一,你'将'labelsArray'中的整数值解释为浮点数。尝试这种方式,让我知道:'cv :: Mat labelsMat(labels.size(),1,CV_32S,labelsArray); labelsMat.convertTo(labelsMat,CV_32F);'(数据相同) – Miki
@miki这很好用。谢谢 – MoNo