为什么将数据加载为表格会产生不同的anova结果以将数据加载为堆栈？

问题描述：

为什么将数据加载为表会产生不同的anova结果来将数据加载为堆栈？为什么将数据加载为表格会产生不同的anova结果以将数据加载为堆栈？

（1）我加载下表并获取anova。

31 42 14 80 
42 26 25 106 
84 21 19 83 
26 60 36 69 
14 35 44 48 
16 80 28 76 
29 49 80 39 
32 38 76 84 
45 65 15 91 
30 71 82 39 

> raw <- read.table("demotablenolabels.txt", sep="\t", header=FALSE) 
> rawstack = stack(raw) 
> rawstack$sample = rep(rownames(raw),4) 
> repeated = aov(values ~ ind + sample, data=rawstack) 
> summary(repeated) 
      Df Sum Sq Mean Sq F value Pr(>F) 
ind   3 7553 2517.7 4.036 0.0171 * 
sample  9 1565 173.9 0.279 0.9751 
Residuals 27 16843 623.8

（2）我将以下rawstack数据保存到文件中，加载它并获得不同的anova结果。

values ind sample 
31 V1 1 
42 V1 2 
84 V1 3 
26 V1 4 
14 V1 5 
16 V1 6 
29 V1 7 
32 V1 8 
45 V1 9 
30 V1 10 
42 V2 1 
26 V2 2 
21 V2 3 
60 V2 4 
35 V2 5 
80 V2 6 
49 V2 7 
38 V2 8 
65 V2 9 
71 V2 10 
14 V3 1 
25 V3 2 
19 V3 3 
36 V3 4 
44 V3 5 
28 V3 6 
80 V3 7 
76 V3 8 
15 V3 9 
82 V3 10 
80 V4 1 
106 V4 2 
83 V4 3 
69 V4 4 
48 V4 5 
76 V4 6 
39 V4 7 
84 V4 8 
91 V4 9 
39 V4 10 

> stackwithlabels <- read.table("demostackwithlabels.txt", sep="\t", header=TRUE) 
> repeatedstack = aov(values ~ ind + sample, data=stackwithlabels) 
> summary(repeatedstack) 
      Df Sum Sq Mean Sq F value Pr(>F) 
ind   3 7553 2517.7 4.918 0.00591 ** 
sample  1 492 492.1 0.961 0.33356 
Residuals 35 17916 511.9

（3）我转换stackwithlabels回表，重复的程序，我得到的原单因素方差分析结果（参见图1）。

> stackwithlabels[c(3)] <- list(NULL) 
> rawwithoutlabels = unstack(stackwithlabels) 
> restackwithoutlabels = stack(rawwithoutlabels) 
> restackwithoutlabels$sample = rep(rownames(raw),4) 
> rerepeatedstack = aov(values ~ ind + sample, data=restackwithoutlabels) 
> summary(rerepeatedstack) 
      Df Sum Sq Mean Sq F value Pr(>F) 
ind   3 7553 2517.7 4.036 0.0171 * 
sample  9 1565 173.9 0.279 0.9751 
Residuals 27 16843 623.8

你能格式化你的问题来区分代码和普通文本吗？ – Jakob

答

在您的原始数据帧rawstack，sample被表示为特征向量。

str(rawstack$sample) 
# chr [1:40] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "1" "2" "3" "4" "5" "6" "7" "8" ...

因此，sample在aov解释为因子，并具有10 - *1 = 9辈分。

summary(aov(values ~ ind + sample, data = rawstack)) 
#    Df Sum Sq Mean Sq F value Pr(>F) 
# ind   3 7553 2517.7 4.036 0.0171 * 
# sample  9 1565 173.9 0.279 0.9751 
# Residuals 27 16843 623.8     
# --- 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

如果你将数据写入一个文本文件和读取数据返回到R，sample表示为一个数值向量。

stackwithlabels <- read.table(text = capture.output(write.table(rawstack))) 
str(stackwithlabels$sample) 
# int [1:40] 1 2 3 4 5 6 7 8 9 10 ...

因此aov结果是不同的。请注意，sample现在有1个*度。

summary(aov(values ~ ind + sample, data = stackwithlabels)) 
#    Df Sum Sq Mean Sq F value Pr(>F) 
# ind   3 7553 2517.7 4.918 0.00591 ** 
# sample  1 492 492.1 0.961 0.33356 
# Residuals 35 17916 511.9     
# --- 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

如何将数字向量转换为字符串向量？ – user2302840

我发现它：stackwithlabels $ sample < - as.character（stackwithlabels $ sample） – user2302840

它是：stackwithlabels $ sample user2302840

为什么将数据加载为表格会产生不同的anova结果以将数据加载为堆栈？

相关推荐