操纵数据集以解决重复测量
问题描述:
考虑:操纵数据集以解决重复测量
df <- data.frame(
CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"),
Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3))
我需要弄清楚如何获得:
df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders",
"Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo"),
Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10",
"1","2","3","4","5","6","7","8","9","10"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA))
说明: 我有,我在接受调查的人,每天一次数据10天的课程。在一个完美的世界中,我会从每个参与者那里收到10个答复,记为day1:day10。然而,由于不答复,一些参与者给出了3个响应,其他参与者6和其他参与者10等。我将数据设置为运行增长模型,因此我需要列“Day”以始终读取Day1 - 第10天,不管这些回复是否有数据。我试图通过向没有全部10天数据的行添加NA来证明这一点。
我该怎么办?
感谢先进!
答
尝试这种情况:
library(tidyr)
df %>%
complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>%
data.frame()
输出:
CompanyID Email Day var1
1 Drinkers [email protected] 1 4
2 Drinkers [email protected] 2 5
3 Drinkers [email protected] 3 5
4 Drinkers [email protected] 4 5
5 Drinkers [email protected] 5 5
6 Drinkers [email protected] 6 2
7 Drinkers [email protected] 7 3
8 Drinkers [email protected] 8 2
9 Drinkers [email protected] 9 7
10 Drinkers [email protected] 10 6
11 Liquders [email protected] 1 7
12 Liquders [email protected] 2 NA
13 Liquders [email protected] 3 6
14 Liquders [email protected] 4 6
15 Liquders [email protected] 5 NA
16 Liquders [email protected] 6 NA
17 Liquders [email protected] 7 NA
18 Liquders [email protected] 8 NA
19 Liquders [email protected] 9 NA
20 Liquders [email protected] 10 NA
21 PelletCoffeeCo [email protected] 1 2
22 PelletCoffeeCo [email protected] 2 NA
23 PelletCoffeeCo [email protected] 3 3
24 PelletCoffeeCo [email protected] 4 NA
25 PelletCoffeeCo [email protected] 5 NA
26 PelletCoffeeCo [email protected] 6 NA
27 PelletCoffeeCo [email protected] 7 NA
28 PelletCoffeeCo [email protected] 8 NA
29 PelletCoffeeCo [email protected] 9 NA
30 PelletCoffeeCo [email protected] 10 NA
编辑:
上述代码填充每个组节列值与一组完整日值的由现有值的最小值和最大值定义在列(即1和10)。这些Day值填充的组可以根据需要重新定义,但我选择在这里将它们定义为Company + Email,并使用“nesting(CompanyID,Email)”行。 data.frame()行就在那里将输出转换为data.frame而不是tibble。如果data.frame输出不是必需的,请随时更换或删除该行。
答
首先,创建唯一公司ID的数据框。 接下来,创建所需日期的数据框。
交叉将这些加在一起。
然后加入您的原始数据集以填写表格。
comp <- data.frame(CompanyID = unique(df$CompanyID))
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10"))
compDay <- merge(comp, Day, all = TRUE)
dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE)
+0
太棒了!非常感谢。它像一个魅力。 – D500
太棒了!非常感谢。它像一个魅力。 我有一些其他变量,x1:x10,我希望它的工作原理是一样的。 你能解释一下功能吗?我看到它是如何工作的,但不知道完成和嵌套是如何协同工作的 - 然后为什么需要在最后添加data.frame参数? – D500
@ D500 - 没问题。请参阅上面添加的说明。 – www