基于R_判别分析
- 判别分析的概念
- 用途
- 种类
- Fisher判别法
- 提出
- Fisher线性判别函数
- 举例判别法
- 两总体
- 多总体
- Bayes判别法
- Bayes判别准则
- 正态总体的Bayes判别
- 要求:
- 理解判别分析的目的及其统计思想
- 了解熟悉哦按别分析的三种类型
- 掌握不同的判别方法的判别规则和判别函数
- 运用R语言
1 判别分析简介
线性判别函数
1.1 概念
Discriminat Analysis:用于判别样本所属类型的一种统计分析方法
1.2 方法
在已知分类的下,对新的样本,利用该方法选定一个判别标准,进行判定
1.3 种类
- 确定性:Fisher型判别
- 线性型
- 距离型
- 非线性型
- 概率性判别:Bayes型判别
- 概率型
- 损失型
1.4 案例
1.4.1 图形分析
d6.1 = read.table('clipboard', header = T)
boxplot(x1~G,d6.1)
t.test(x1~G,d6.1)
boxplot(x2~G,d6.1)
t.test(x2~G,d6.1)
Welch Two Sample t-test
data: x1 by G
t = 0.59897, df = 11.671, p-value = 0.5606
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.443696 6.043696
sample estimates:
mean in group 1 mean in group 2
0.92 -0.38
Welch Two Sample t-test
data: x2 by G
t = -3.2506, df = 17.655, p-value = 0.004527
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.118792 -2.381208
sample estimates:
mean in group 1 mean in group 2
2.10 8.85
1.4.2 Logistic模型分析
summary(glm(G-1~x1+x2,family=binomial,d6.1))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.81637 -0.63629 0.04472 0.54520 2.13957
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0761 1.1082 -1.873 0.0610 .
x1 -0.1957 0.1457 -1.344 0.1791
x2 0.3813 0.1681 2.269 0.0233 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.726 on 19 degrees of freedom
Residual deviance: 17.036 on 17 degrees of freedom
AIC: 23.036
Number of Fisher Scoring iterations: 5
1.4.3 Fisher判别分析
判别分析函数lda用法
lda(formula,data,...)
formula 形如y~x1+x2+...的公式框架,data是数据集
#直观分析
attach(d6.1) #绑定数据
plot(x1, x2);
text(x1, x2, G, adj=-0.5)
#标志点所属类别G
library(MASS)
ld=lda(G~x1+x2)
ld
Call:
lda(G ~ x1 + x2)
Prior probabilities of groups:
1 2
0.5 0.5
Group means:
x1 x2
1 0.92 2.10
2 -0.38 8.85
Coefficients of linear discriminants:
LD1
x1 -0.1035305
x2 0.2247957
1.4.4 预判
lp = predict(ld)
G1 = lp$class
data.frame(G,G1)
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 2
7 1 1
8 1 1
9 1 1
10 1 1
11 2 2
12 2 2
13 2 2
14 2 2
15 2 1
16 2 2
17 2 2
18 2 2
19 2 2
20 2 2
#了解一下下
tab1 = table(G,G1);
tab1
# G1
#G 1 2
# 1 9 1
# 2 1 9
#计算符合率
sum(diag(prop.table(tab1)))
#0.9
1.5 两总体距离判别
马氏距离::
二次判别函数qda用法
qda(formula, data, ...)
formula 一个形如groups~x1+x2..的公式框架,data数据框
#非线性判别模型
qd = qda(G~x1+x2);qd
qp = predict(qd)
G2 = qp$class
data.frame(G,G1,G2)
tab2 = table(G,G2);tab2
G2
#G 1 2
# 1 9 1
# 2 2 8
sum(diag(prop.table(tab2)))
#[1] 0.85
predict(qd,data.frame(x1=8.1,x2=2.0))
#$`class`
#[1] 1
#Levels: 1 2
#
#$posterior
# 1 2
#1 0.9939952 0.006004808
Call:
qda(G ~ x1 + x2)
Prior probabilities of groups:
1 2
0.5 0.5
Group means:
x1 x2
1 0.92 2.10
2 -0.38 8.85
#
G G1 G2
1 1 1 2
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 2 1
7 1 1 1
8 1 1 1
9 1 1 1
10 1 1 1
11 2 2 2
12 2 2 2
13 2 2 2
14 2 2 2
15 2 1 1
16 2 2 1
17 2 2 2
18 2 2 2
19 2 2 2
20 2 2 2