如何对特定的组元素应用条件并从同一个表中的另一个组中找到排列?

如何对特定的组元素应用条件并从同一个表中的另一个组中找到排列?

问题描述:

我有以下data.frame如何对特定的组元素应用条件并从同一个表中的另一个组中找到排列?

Category Product Status 
1  A  qwe  In 
2  A  rty  In 
3  A  ewq Out 
4  B  dfs  In 
5  B  sgf  In 
6  C  mnb Out 
7  C  ves Out 
8  C  klm Out 
9  C  nbc Out 

我的目标是从类别在水平OnlyIn创建列标志每组,OnlyOutBothInOut ,对应于状态列中的值。

作为它的一部分,我计算的计数在进出每组使用下面的代码:

Data <- Data %>% 
    group_by(Category,Status) %>% 
    dplyr::mutate(InCounts = length(Status[Status == "in"]), 
       OutCounts = length(Status[Status == "out"])) 

而且我得到了如下的结果:

Category Product Status CountIn CountOut 
1  A  qwe  In  2  0 
2  A  rty  In  2  0 
3  A  ewq Out  0  1 
4  B  dfs  In  2  0 
5  B  sgf  In  2  0 
6  C  mnb Out  0  4 
7  C  ves Out  0  4 
8  C  klm Out  0  4 
9  C  nbc Out  0  4 

现在,我不确定如何利用这些信息来创建新的列标志,方法是计算每个类别的总入出量并添加适当的值。

实施例,如果有进出既作为一个类别状态是,则该标志应改为“BothInOut”

样本输出:

Category Product Status CountIn CountOut  Flag 
1  A  qwe  In  2  0 BothInOut 
2  A  rty  In  2  0 BothInOut 
3  A  ewq Out  0  1 BothInOut 
4  B  dfs  In  2  0 OnlyIn 
5  B  sgf  In  2  0 OnlyIn 
6  C  mnb Out  0  4 OnlyOut 
7  C  ves Out  0  4 OnlyOut 
8  C  klm Out  0  4 OnlyOut 
9  C  nbc Out  0  4 OnlyOut 

用于数据

重现的输入
structure(list(Category = c("A", "A", "A", "B", "B", "C", "C", 
"C", "C"), Product = c("qwe", "rty", "ewq", "dfs", "sgf", "mnb", 
"ves", "klm", "nbc"), Status = c("In", "In", "Out", "In", "In", 
"Out", "Out", "Out", "Out"), CountIn = c(2, 2, 0, 2, 2, 0, 0, 
0, 0), CountOut = c(0, 0, 1, 0, 0, 4, 4, 4, 4), Flag = c("BothInOut", 
"BothInOut", "BothInOut", "OnlyIn", "OnlyIn", "OnlyOut", "OnlyOut", 
"OnlyOut", "OnlyOut")), .Names = c("Category", "Product", "Status", 
"CountIn", "CountOut", "Flag"), row.names = c(NA, 9L), class = "data.frame") 
+2

'df%>%group_by(Category)%> mutate(Flag1 = toString(unique(Status)))' – Sotos

+0

然后就完成了。 – sunitprasad1

裁判@Sotos评论:

Data <- Data %>% group_by(Category) %>% mutate(Flag1 = toString(unique(Status))) 

Data$Flag <- ifelse(Data$Flag1 == "In","OnlyIn", 
        ifelse(Data$Flag1 == "Out","OnlyOut","BothInOut")) 

获取所做的工作。

Category Product Status Flag1  Flag2 
1  A  qwe  In In, Out BothInOut 
2  A  rty  In In, Out BothInOut 
3  A  ewq Out In, Out BothInOut 
4  B  dfs  In  In OnlyIn 
5  B  sgf  In  In OnlyIn 
6  C  mnb Out  Out OnlyOut 
7  C  ves Out  Out OnlyOut 
8  C  klm Out  Out OnlyOut 
9  C  nbc Out  Out OnlyOut 

我会说@Sotos评论做得很好,另一种方法来获得你想要的确切标签将是

df <- df %>% 
    group_by(Category) %>% 
    mutate(Flag2 = ifelse("In" %in% unique(Status) & "Out" %in% unique(Status), "BothInOut", ifelse("In" %in% unique(Status), "OnlyIn", "OnlyOut"))) 

> df 
Source: local data frame [9 x 7] 
Groups: Category [3] 

# A tibble: 9 x 7 
    Category Product Status CountIn CountOut  Flag  Flag2 
    <chr> <chr> <chr> <dbl> <dbl>  <chr>  <chr> 
1  A  qwe  In  2  0 BothInOut BothInOut 
2  A  rty  In  2  0 BothInOut BothInOut 
3  A  ewq Out  0  1 BothInOut BothInOut 
4  B  dfs  In  2  0 OnlyIn OnlyIn 
5  B  sgf  In  2  0 OnlyIn OnlyIn 
6  C  mnb Out  0  4 OnlyOut OnlyOut 
7  C  ves Out  0  4 OnlyOut OnlyOut 
8  C  klm Out  0  4 OnlyOut OnlyOut 
9  C  nbc Out  0  4 OnlyOut OnlyOut 

我会建议做@Sotos评论更稳健,即标签的顺序不应该依赖于数据的顺序加入sort

df %>% group_by(Category) %>% 
    mutate(Flag1 = toString(sort(unique(Status))) 

如果你想拥有标注为你建议的数据,你可以把它扩展到:

df %>% group_by(Category) %>% 
    mutate(Flag1 = paste0(sort(unique(Status)), collapse = "") %>% 
       paste0(ifelse(. == "InOut", "Both", "Only"), .)) 

其中产量:

Category Product Status CountIn CountOut  Flag  Flag1 
    <chr> <chr> <chr> <dbl> <dbl>  <chr>  <chr> 
1  A  qwe  In  2  0 BothInOut BothInOut 
2  A  rty  In  2  0 BothInOut BothInOut 
3  A  ewq Out  0  1 BothInOut BothInOut 
4  B  dfs  In  2  0 OnlyIn OnlyIn 
5  B  sgf  In  2  0 OnlyIn OnlyIn 
6  C  mnb Out  0  4 OnlyOut OnlyOut 
7  C  ves Out  0  4 OnlyOut OnlyOut 
8  C  klm Out  0  4 OnlyOut OnlyOut 
9  C  nbc Out  0  4 OnlyOut OnlyOut