计算数据帧R和添加到列的差异
问题描述:
我希望能够通过表格内的订单号来比较差异,并附上说明差异的列。例如,我想这计算数据帧R和添加到列的差异
order color type shape alert
1 1 blue a circle type
2 1 blue b circle
3 2 green a circle color
4 2 blue a circle color type shape
5 2 yellow b triangle type
6 2 yellow c triangle
7 3 orange c triangle
看起来像这样
order color type shape alert
1 1 blue a circle type
2 1 blue b circle
3 2 green a circle color type shape
4 2 blue a circle
5 2 yellow b triangle
6 2 yellow c triangle
7 3 orange c triangle
我的代码只比较两行旁边,是对方我怎么有效地使用相同的订单号码比较所有行?我可以避免循环?这里是我的代码
order = c(0001, 0001, 0002, 0002, 0002, 0002, 0003)
color = c("blue", "blue", "green", "blue", "yellow", "yellow", "orange")
type = c("a", "b", "a", "a", "b", "c", "c")
shape = c("circle", "circle", "circle", "circle", "triangle", "triangle", "triangle")
df = data.frame(order, color, type, shape)
df$alert <- ""
for(i in 1:nrow(df)-1){
if(identical(df$order[i+1],df$order[i])){
if(!identical(df$color[i+1],df$color[i])){
df$alert[i] <- paste(df$alert[i],"color")
}
if(!identical(df$type[i+1],df$type[i])){
df$alert[i] <- paste(df$alert[i],"type")
}
if(!identical(df$shape[i+1],df$shape[i])){
df$alert[i] <- paste(df$alert[i],"shape")
}
}
}
答
这里有一个dplyr
基础的解决方案:
library(dplyr)
dat1 %>% gather(measure, val, -order) %>%
group_by(order, measure) %>%
summarise(alerts = length(unique(val))) %>%
filter(alerts>1) %>%
summarise(alerts = paste0(measure, collapse = " ")) %>%
left_join(dat1, .)
order color type shape alerts
1 1 blue a circle type
2 1 blue b circle type
3 2 green a circle color type shape
4 2 blue a circle color type shape
5 2 yellow b triangle color type shape
6 2 yellow c triangle color type shape
7 3 orange c triangle <NA>