如何对特定的组元素应用条件并从同一个表中的另一个组中找到排列?
问题描述:
我有以下data.frame如何对特定的组元素应用条件并从同一个表中的另一个组中找到排列?
Category Product Status
1 A qwe In
2 A rty In
3 A ewq Out
4 B dfs In
5 B sgf In
6 C mnb Out
7 C ves Out
8 C klm Out
9 C nbc Out
我的目标是从类别在水平OnlyIn创建列标志每组,OnlyOut和BothInOut ,对应于状态列中的值。
作为它的一部分,我计算的计数在进出每组使用下面的代码:
Data <- Data %>%
group_by(Category,Status) %>%
dplyr::mutate(InCounts = length(Status[Status == "in"]),
OutCounts = length(Status[Status == "out"]))
而且我得到了如下的结果:
Category Product Status CountIn CountOut
1 A qwe In 2 0
2 A rty In 2 0
3 A ewq Out 0 1
4 B dfs In 2 0
5 B sgf In 2 0
6 C mnb Out 0 4
7 C ves Out 0 4
8 C klm Out 0 4
9 C nbc Out 0 4
现在,我不确定如何利用这些信息来创建新的列标志,方法是计算每个类别的总入出量并添加适当的值。
实施例,如果有进出既作为一个类别状态是,则该标志应改为“BothInOut”
样本输出:
Category Product Status CountIn CountOut Flag
1 A qwe In 2 0 BothInOut
2 A rty In 2 0 BothInOut
3 A ewq Out 0 1 BothInOut
4 B dfs In 2 0 OnlyIn
5 B sgf In 2 0 OnlyIn
6 C mnb Out 0 4 OnlyOut
7 C ves Out 0 4 OnlyOut
8 C klm Out 0 4 OnlyOut
9 C nbc Out 0 4 OnlyOut
用于数据
重现的输入structure(list(Category = c("A", "A", "A", "B", "B", "C", "C",
"C", "C"), Product = c("qwe", "rty", "ewq", "dfs", "sgf", "mnb",
"ves", "klm", "nbc"), Status = c("In", "In", "Out", "In", "In",
"Out", "Out", "Out", "Out"), CountIn = c(2, 2, 0, 2, 2, 0, 0,
0, 0), CountOut = c(0, 0, 1, 0, 0, 4, 4, 4, 4), Flag = c("BothInOut",
"BothInOut", "BothInOut", "OnlyIn", "OnlyIn", "OnlyOut", "OnlyOut",
"OnlyOut", "OnlyOut")), .Names = c("Category", "Product", "Status",
"CountIn", "CountOut", "Flag"), row.names = c(NA, 9L), class = "data.frame")
答
裁判@Sotos评论:
Data <- Data %>% group_by(Category) %>% mutate(Flag1 = toString(unique(Status)))
Data$Flag <- ifelse(Data$Flag1 == "In","OnlyIn",
ifelse(Data$Flag1 == "Out","OnlyOut","BothInOut"))
获取所做的工作。
Category Product Status Flag1 Flag2
1 A qwe In In, Out BothInOut
2 A rty In In, Out BothInOut
3 A ewq Out In, Out BothInOut
4 B dfs In In OnlyIn
5 B sgf In In OnlyIn
6 C mnb Out Out OnlyOut
7 C ves Out Out OnlyOut
8 C klm Out Out OnlyOut
9 C nbc Out Out OnlyOut
答
我会说@Sotos评论做得很好,另一种方法来获得你想要的确切标签将是
df <- df %>%
group_by(Category) %>%
mutate(Flag2 = ifelse("In" %in% unique(Status) & "Out" %in% unique(Status), "BothInOut", ifelse("In" %in% unique(Status), "OnlyIn", "OnlyOut")))
> df
Source: local data frame [9 x 7]
Groups: Category [3]
# A tibble: 9 x 7
Category Product Status CountIn CountOut Flag Flag2
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 A qwe In 2 0 BothInOut BothInOut
2 A rty In 2 0 BothInOut BothInOut
3 A ewq Out 0 1 BothInOut BothInOut
4 B dfs In 2 0 OnlyIn OnlyIn
5 B sgf In 2 0 OnlyIn OnlyIn
6 C mnb Out 0 4 OnlyOut OnlyOut
7 C ves Out 0 4 OnlyOut OnlyOut
8 C klm Out 0 4 OnlyOut OnlyOut
9 C nbc Out 0 4 OnlyOut OnlyOut
答
我会建议做@Sotos评论更稳健,即标签的顺序不应该依赖于数据的顺序加入sort
:
df %>% group_by(Category) %>%
mutate(Flag1 = toString(sort(unique(Status)))
如果你想拥有标注为你建议的数据,你可以把它扩展到:
df %>% group_by(Category) %>%
mutate(Flag1 = paste0(sort(unique(Status)), collapse = "") %>%
paste0(ifelse(. == "InOut", "Both", "Only"), .))
其中产量:
Category Product Status CountIn CountOut Flag Flag1
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 A qwe In 2 0 BothInOut BothInOut
2 A rty In 2 0 BothInOut BothInOut
3 A ewq Out 0 1 BothInOut BothInOut
4 B dfs In 2 0 OnlyIn OnlyIn
5 B sgf In 2 0 OnlyIn OnlyIn
6 C mnb Out 0 4 OnlyOut OnlyOut
7 C ves Out 0 4 OnlyOut OnlyOut
8 C klm Out 0 4 OnlyOut OnlyOut
9 C nbc Out 0 4 OnlyOut OnlyOut
'df%>%group_by(Category)%> mutate(Flag1 = toString(unique(Status)))' – Sotos
然后就完成了。 – sunitprasad1