包含单一观测的下降因子水平
问题描述:
我想知道是否存在一个简单的函数(类似于drop.levels)从包含一个观察因子的因子中删除水平。我将在下面提供一个可重现的例子。到目前为止,我只能通过一次观察就能够存储包含关卡的因素名称,但编写所有代码以放弃特定关卡将是一件痛苦的事情,有没有一些快捷方式可以实现?包含单一观测的下降因子水平
db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"),
let2 = sample(letters[6:11], 100, replace = T))
#Checking which factor has levels with only one obs
facLevels <- lapply(db0, table)
facNames <- list()
for(i in 1:length(facLevels)){
facNames[i]<-ifelse(min(facLevels[[i]])==1, names(facLevels[i]), NA)
}
facNames <- as.character(facNames[!is.na(facNames)])
基本上我想要做的就是放下让z的水平。 谢谢。
答
这里的for
循环将设置任意因子级别,其中一个观察值为NA,然后通过重构从列中完全删除该因子级别。
db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"),
let2 = sample(letters[6:11], 100, replace = T))
#Checking which factor has levels with only one obs
facLevels <- lapply(db0, table)
# make a list for each factor level that has one value
to_change <- lapply(facLevels, function(x) names(x)[x==1])
for(i in 1:ncol(db0)){
if(length(to_change[[i]])>0){
# set as NA
db0[which(db0[,i] %in% to_change[[i]]),i] <- NA
# removes the factor level, remove the code below if this is not what
# what you wanted to do
db0[,i] <- as.factor(db0[,i])
}
}
> tail(db0)
let let2
95 b i
96 a g
97 c k
98 d j
99 d f
100 <NA> j
> levels(db0[,i])
[1] "f" "g" "h" "i" "j" "k"
+0
谢谢,这就是我一直在寻找的 –
答
而如果你不喜欢写循环
# create a sample dataset
db0 <- data.frame(let1 = c(sample(letters[1:5], 99, replace = T),"z"),
let2 = sample(letters[6:11], 100, replace = T))
# calculate how many times each level is present
facLevel <- lapply(db0, table)
# drop levels which are present once
test <- sapply(facLevel, function(x) x[x != 1])
# drop rows in the original dataset where a unique level is present (do this for both columns)
db1 <- db0[rowSums(mapply(function(x, y) x %in% names(y), db0, test)) == ncol(db0), ]
究竟你“下降的Z级”是什么意思?你想从你的数据中删除该行吗?所以你想把这个值设置为NA而不是z? – MrFlick
是的,将该行设置为na将是一个解决方案,因为我可以很容易地将其删除。请记住,我有许多关卡因素,并且我不知道哪些关卡包含单一观察结果,所以我选择这种方法而不是手动进行。 –