计算数据帧R和添加到列的差异

计算数据帧R和添加到列的差异

问题描述:

我希望能够通过表格内的订单号来比较差异,并附上说明差异的列。例如,我想这计算数据帧R和添加到列的差异

order color type shape    alert 
1  1 blue a circle    type 
2  1 blue b circle     
3  2 green a circle    color 
4  2 blue a circle color type shape 
5  2 yellow b triangle    type 
6  2 yellow c triangle     
7  3 orange c triangle     

看起来像这样

order color type shape    alert 
1  1 blue a circle    type 
2  1 blue b circle     
3  2 green a circle    color type shape 
4  2 blue a circle 
5  2 yellow b triangle    
6  2 yellow c triangle     
7  3 orange c triangle     

我的代码只比较两行旁边,是对方我怎么有效地使用相同的订单号码比较所有行?我可以避免循环?这里是我的代码

order = c(0001, 0001, 0002, 0002, 0002, 0002, 0003) 
color = c("blue", "blue", "green", "blue", "yellow", "yellow", "orange") 
type = c("a", "b", "a", "a", "b", "c", "c") 
shape = c("circle", "circle", "circle", "circle", "triangle", "triangle", "triangle") 
df = data.frame(order, color, type, shape) 

df$alert <- "" 

for(i in 1:nrow(df)-1){ 
    if(identical(df$order[i+1],df$order[i])){ 
    if(!identical(df$color[i+1],df$color[i])){ 
     df$alert[i] <- paste(df$alert[i],"color") 
    } 
    if(!identical(df$type[i+1],df$type[i])){ 
     df$alert[i] <- paste(df$alert[i],"type") 
    } 
    if(!identical(df$shape[i+1],df$shape[i])){ 
     df$alert[i] <- paste(df$alert[i],"shape") 
    } 
    } 
} 

这里有一个dplyr基础的解决方案:

library(dplyr) 
dat1 %>% gather(measure, val, -order) %>% 
     group_by(order, measure) %>% 
     summarise(alerts = length(unique(val))) %>% 
     filter(alerts>1) %>% 
     summarise(alerts = paste0(measure, collapse = " ")) %>% 
     left_join(dat1, .) 

    order color type shape   alerts 
1  1 blue a circle    type 
2  1 blue b circle    type 
3  2 green a circle color type shape 
4  2 blue a circle color type shape 
5  2 yellow b triangle color type shape 
6  2 yellow c triangle color type shape 
7  3 orange c triangle    <NA>