使用R根据特定条件从数据框中删除重复的行

问题描述:

我正在开发一个项目,我需要根据人们的投票方式对数据进行排序。我无法找到一个函数,我可以根据满足的某些条件删除重复的行。使用R根据特定条件从数据框中删除重复的行

我正在寻找一个函数,该函数将基于具有重复值的一列和满足特定条件的另一列删除重复行。

例如在下表中,我想删除在三次不同选举中投票的选民。保罗需要从这个数据框中删除。

df <- data.frame(Name=c("Paul","Paul","Mary","Bill","Jane","Paul","Mary","John", 
"Bill","John"),ElectionDay=c("November 2010","November 2014", 
"November 2010","November 2010","November 2014","November 2006", 
"November 2014","November 2010","November 2014","November 2014")) 

df 
# Name ElectionDay 
# 1 Paul November 2010 
# 2 Paul November 2014 
# 3 Mary November 2010 
# 4 Bill November 2010 
# 5 Jane November 2014 
# 6 Paul November 2006 
# 7 Mary November 2014 
# 8 John November 2010 
# 9 Bill November 2014 
# 10 John November 2014 

下面是我要寻找的结果的一个例子:

Name ElectionDay 
1 Mary November 2010 
2 Bill November 2010 
3 Jane November 2014 
4 Mary November 2014 
5 John November 2010 
6 Bill November 2014 
7 John November 2014 

我们可以使用data.table。我们将'data.frame'转换为'data.table'(setDT(df)),按'Name'分组,我们得到唯一的'ElectionDay'长度(uniqueN(ElectionDay))。如果长度小于3,我们得到Data.Table的子集(.SD)。

library(data.table)#v1.9.6+ 
setDT(df)[, if(uniqueN(ElectionDay) < 3) .SD, by = Name] 

类似基R选项将使用ave。我们得到lengthunique'ElectionDay'的元素按'Name'分组,并检查它是否小于3以获得逻辑索引。索引可以用于子集数据集的行。

df[with(df, ave(as.character(ElectionDay), Name, 
       FUN=function(x) length(unique(x)))) < 3,] 
# Name ElectionDay 
#3 Mary November 2010 
#4 Bill November 2010 
#5 Jane November 2014 
#7 Mary November 2014 
#8 John November 2010 
#9 Bill November 2014 
#10 John November 2014 

发生在超过2行的名称被计算为

names(which(table(df$Name) > 2)) 
#[1] "Paul" 

所以,你需要的是

df[!(df$Name %in% names(which(table(df$Name) > 2))), ] 
# Name ElectionDay 
#3 Mary November 2010 
#4 Bill November 2010 
#5 Jane November 2014 
#7 Mary November 2014 
#8 John November 2010 
#9 Bill November 2014 
#10 John November 2014 
+1

或'df [df $%name%in names names(which(table(df $ Name) Saksham

或者你也可以使用dplyr,计数选举的数每个人投票,然后删除计数为3的行:

library(dplyr) 
df %>% 
    group_by(Name) %>% 
    mutate(NumberElections = length(unique(ElectionDay))) %>% 
    ungroup() %>% 
    filter(NumberElections != 3) 
+3

您可以使用'df%>%group_by(Name)%>%filter n_distinct(选举日) akrun