回归与相关性

1.简单线性回归

通过线性回归来描述连个变量之间的联系。函数lm(linear model,线性模型)可以用来进行线性回归分析。

> attach(thuesen)
> lm(short.velocity~blood.glucose)

Call:
lm(formula = short.velocity ~ blood.glucose)

Coefficients:
  (Intercept)  blood.glucose  
      1.09781        0.02196  
> summary(lm(short.velocity~blood.glucose))

Call:
lm(formula = short.velocity ~ blood.glucose)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40141 -0.14760 -0.02202  0.03001  0.43490 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)    1.09781    0.11748   9.345 6.26e-09
blood.glucose  0.02196    0.01045   2.101   0.0479
                 
(Intercept)   ***
blood.glucose *  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2167 on 21 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1737,	Adjusted R-squared:  0.1343 
F-statistic: 4.414 on 1 and 21 DF,  p-value: 0.0479
> plot(blood.glucose,short.velocity)
> abline(lm(short.velocity~blood.glucose))

回归与相关性回归与相关性

2,残差与回归值


析取函数fitted(返回的是回归值)和resid(显示的回归值与观测值之差)

> lm.velo <- lm(short.velocity~blood.glucose)
> fitted(lm.velo)
       1        2        3        4        5 
1.433841 1.335010 1.275711 1.526084 1.255945 
       6        7        8        9       10 
1.214216 1.302066 1.341599 1.262534 1.365758 
      11       12       13       14       15 
1.244964 1.212020 1.515103 1.429449 1.244964 
      17       18       19       20       21 
1.190057 1.324029 1.372346 1.451411 1.389916 
      22       23       24 
1.205431 1.291085 1.306459 
> resid(lm.velo)
           1            2            3 
 0.326158532  0.004989882 -0.005711308 
           4            5            6 
-0.056084062  0.014054962  0.275783754 
           7            8            9 
 0.007933665 -0.251598875 -0.082533795 
          10           11           12 
-0.145757649  0.005036223 -0.022019994 
          13           14           15 
 0.434897199 -0.149448964  0.275036223 
          17           18           19 
-0.070057471  0.045971143 -0.182346406 
          20           21           22 
-0.401411486 -0.069916424 -0.175431237 
          23           24 
-0.171085074  0.393541161

回归与相关性

> qqnorm(resid(lm.velo))

也可以用Q-Q图的线性性

3. 预测与置信带

回归线通常与不确切的边界一起展示。窄边界,又叫置信带反映了这条线本身的不确定性,宽边界,又称预测带,包含了未来观测值的不确定性。

> predict(lm.velo)
       1        2        3        4        5 
1.433841 1.335010 1.275711 1.526084 1.255945 
       6        7        8        9       10 
1.214216 1.302066 1.341599 1.262534 1.365758 
      11       12       13       14       15 
1.244964 1.212020 1.515103 1.429449 1.244964 
      17       18       19       20       21 
1.190057 1.324029 1.372346 1.451411 1.389916 
      22       23       24 
1.205431 1.291085 1.306459 
> predict(lm.velo,int = "c")
        fit      lwr      upr
1  1.433841 1.291371 1.576312
2  1.335010 1.240589 1.429431
3  1.275711 1.169536 1.381887
4  1.526084 1.306561 1.745607
5  1.255945 1.139367 1.372523
6  1.214216 1.069315 1.359118
7  1.302066 1.205244 1.398889
8  1.341599 1.246317 1.436881
9  1.262534 1.149694 1.375374
10 1.365758 1.263750 1.467765
11 1.244964 1.121641 1.368287
12 1.212020 1.065457 1.358583
13 1.515103 1.305352 1.724854
14 1.429449 1.290217 1.568681
15 1.244964 1.121641 1.368287
17 1.190057 1.026217 1.353898
18 1.324029 1.230050 1.418008
19 1.372346 1.267629 1.477064
20 1.451411 1.295446 1.607377
21 1.389916 1.276444 1.503389
22 1.205431 1.053805 1.357057
23 1.291085 1.191084 1.391086
24 1.306459 1.210592 1.40232

predit函数加上参数,就可以在预测值向量的基础上得到边界的值。

4. 相关性

相关性就是一个对称并且不随尺度变化的量,用于衡量两个随机变量之间的关联程度(-1到1),一个 变量的较大值与另一个变量的较小值有关联时,相关性是负的,两个变量有同时变大或者减小的趋势,那么相关性就是正的。

4.1 皮尔逊相关系数

函数cor能计算两个或者多个向量之间的相关系数

> cor(blood.glucose,short.velocity,use = "complete.obs")
[1] 0.4167546
> cor.test(blood.glucose,short.velocity)

	Pearson's product-moment correlation

data:  blood.glucose and short.velocity
t = 2.101, df = 21, p-value = 0.0479
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.005496682 0.707429479
sample estimates:
      cor 
0.4167546 

斯皮尔曼相关系数(非参数检验)

> cor.test(blood.glucose,short.velocity,method = "spearman")

	Spearman's rank correlation rho

data:  blood.glucose and short.velocity
S = 1380.4, p-value = 0.1392
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.318002 

Warning message:
In cor.test.default(blood.glucose, short.velocity, method = "spearman") :
  无法给连结计算精確p值

肯德尔等级相关系数(基于统计一致对和不一致对的数量)

> cor.test(blood.glucose,short.velocity,method = "kendall")

	Kendall's rank correlation tau

data:  blood.glucose and short.velocity
z = 1.5604, p-value = 0.1187
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.2350616 

Warning message:
In cor.test.default(blood.glucose, short.velocity, method = "kendall") :
  无法给连结计算精確p值