用java实现基于Logistic回归和Sigmoid函数的二分类
首先大家了解一下Logistic回归,如下:
logistic回归又称logistic回归分析,是一种广义的线性回归分析模型。Logistic 回归通过使用其固有的 logistic 函数估计概率,来衡量因变量与一个或多个自变量(特征)之间的关系。Logistic回归一般是标量的二分归类。
然后大家了解一下Sigmoid函数,如下:
Sigmoid函数是一个在生物学中常见的S型函数,也称为S型生长曲线。 在信息科学中,由于其单增以及反函数单增等性质,Sigmoid函数常被用作神经网络的阈值函数,将变量映射到0,1之间。
Sigmoid函数由下列公式定义
我们查阅资料得到相关python代码(代码本来在基于python2的,我做了一些修改以适应python3)
def sigmoid(inX):
return 1.0 / ( 1 + exp(-inX))
#迭代回归最优参数
def stocGradAscentl(dataMatrix,classLabels,numIter=500):
m,n=shape(dataMatrix)
weights=ones(n)
for j in range(numIter):
dataIndex=list(range(m))
for i in range(m):
alpha=4/(1.0+j+i)+0.01
randIndex=int(random.uniform(0,len(dataIndex)))
h=sigmoid(sum(dataMatrix[randIndex]*weights))
error=classLabels[randIndex]-h
tmp=[]
for k in range(len(dataMatrix[randIndex])):
tmp.append(alpha*error*dataMatrix[randIndex][k])
weights=weights+tmp
del(dataIndex[randIndex])
return weights
#二分类
def classifyVeotor(inX,weights):
prob=sigmoid(sum(inX*weights))
if prob>0.5:return 1.0
else:return 0.0
该python代码的Logistic回归最优参数的推算是基于随机梯度上升算法
我们现在开始用java实现
首先是Sigmoid函数
public static double sigmoid(double src) {
return 1.0 / ( 1 + Math.exp(-src));
}
然后是最优参数的迭代推算
public static double sumMatrixRowMutiWeight(DenseMatrix64F src,int inx,double [] weights) {
double rs =0;
for(int i=0;i<src.numCols;i++) {
rs += src.get(inx, i)*weights[i];
}
return rs;
}
public static double[] adjustWeights(DenseMatrix64F src,int inx,double [] weights,double alpha,double error) {
double [] rs = new double[weights.length];
for(int i=0;i<weights.length;i++) {
rs[i] = alpha*error*src.get(inx, i)+weights[i];
}
return rs;
}
public static double[] stocGradAscentl(DenseMatrix64F datas,double[] classLabels,int numIter) {
Random r=new Random();
double[] rs = new double[datas.numCols];
for(int i=0;i<rs.length;i++)
rs[i]=1;
for(int i=0;i<numIter;i++) {
for(int j=0;j<datas.numRows;j++) {
double alpha=4/(1.0+j+i)+0.01;
int randIndex = r.nextInt(datas.numRows-j);
double h=sigmoid(sumMatrixRowMutiWeight(datas,randIndex,rs));
double error=classLabels[randIndex]-h;
rs = adjustWeights(datas,randIndex,rs,alpha,error);
}
}
return rs;
}
最后是分类器
public static double classifyVeotor(double [] intX,double [] weights) {
double prob = 0;
for(int i=0;i<intX.length;i++) {
prob += intX[i]*weights[i];
}
prob = sigmoid(prob);
if(prob > 0.5)
return 1.0;
else
return 0;
}
到这里可以进行测试
double data[][] = {
{1.0,1.5,1.6},
{1.0,1.4,1.45},
{1.0, 1.3, 1.34},
{1.0, 1.23, 1.5},
{1.0, 1.33, 1.36},
{1.0, 1.4, 1.61},
{1.0, 1.25, 1.56},
{1.0, 1.45, 1.46},
{1.0, 1.25, 1.16},
{1.0, 1.35, 1.46},
{1.0, 0.8, 0.9},
{1.0, 0.9, 1.11},
{1.0, 0.72, 0.88},
{1.0, 0.75, 0.9},
{1.0, 0.78, 0.97},
{1.0, 0.86, 0.95},
{1.0, 0.45, 0.7},
{1.0, 0.8, 0.94},
{1.0, 0.4, 0.6},
{1.0, 0.5, 0.7}
};
DenseMatrix64F matdatas = new DenseMatrix64F(data);
double [] labelMat = {1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0};
double[] weights = stocGradAscentl(matdatas,labelMat,150);
for(int i=0;i<weights.length;i++)
System.out.println("weights "+i+":"+weights[i]);
System.out.println(classifyVeotor(new double[]{1.0, 1.6, 1.7},weights));
System.out.println(classifyVeotor(new double[]{1.0, 0.6, 0.7},weights));
测试结果正确如下: