包含单一观测的下降因子水平

问题描述:

我想知道是否存在一个简单的函数(类似于drop.levels)从包含一个观察因子的因子中删除水平。我将在下面提供一个可重现的例子。到目前为止,我只能通过一次观察就能够存储包含关卡的因素名称,但编写所有代码以放弃特定关卡将是一件痛苦的事情,有没有一些快捷方式可以实现?包含单一观测的下降因子水平

db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"), 
        let2 = sample(letters[6:11], 100, replace = T)) 

#Checking which factor has levels with only one obs 
facLevels <- lapply(db0, table) 
facNames <- list() 
for(i in 1:length(facLevels)){ 
    facNames[i]<-ifelse(min(facLevels[[i]])==1, names(facLevels[i]), NA) 
} 
facNames <- as.character(facNames[!is.na(facNames)]) 

基本上我想要做的就是放下让z的水平。 谢谢。

+2

究竟你“下降的Z级”是什么意思?你想从你的数据中删除该行吗?所以你想把这个值设置为NA而不是z? – MrFlick

+0

是的,将该行设置为na将是一个解决方案,因为我可以很容易地将其删除。请记住,我有许多关卡因素,并且我不知道哪些关卡包含单一观察结果,所以我选择这种方法而不是手动进行。 –

这里的for循环将设置任意因子级别,其中一个观察值为NA,然后通过重构从列中完全删除该因子级别。

db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"), 
    let2 = sample(letters[6:11], 100, replace = T)) 

#Checking which factor has levels with only one obs 
facLevels <- lapply(db0, table) 
# make a list for each factor level that has one value 
to_change <- lapply(facLevels, function(x) names(x)[x==1]) 

for(i in 1:ncol(db0)){ 
    if(length(to_change[[i]])>0){ 
    # set as NA 
    db0[which(db0[,i] %in% to_change[[i]]),i] <- NA 
    # removes the factor level, remove the code below if this is not what 
    # what you wanted to do 
    db0[,i] <- as.factor(db0[,i]) 
    } 
} 

> tail(db0) 
    let let2 
95  b i 
96  a g 
97  c k 
98  d j 
99  d f 
100 <NA> j 

> levels(db0[,i]) 
[1] "f" "g" "h" "i" "j" "k" 
+0

谢谢,这就是我一直在寻找的 –

而如果你不喜欢写循环

# create a sample dataset 
db0 <- data.frame(let1 = c(sample(letters[1:5], 99, replace = T),"z"), 
        let2 = sample(letters[6:11], 100, replace = T)) 

# calculate how many times each level is present 
facLevel <- lapply(db0, table) 

# drop levels which are present once 
test <- sapply(facLevel, function(x) x[x != 1]) 

# drop rows in the original dataset where a unique level is present (do this for both columns) 
db1 <- db0[rowSums(mapply(function(x, y) x %in% names(y), db0, test)) == ncol(db0), ]