替换NA取决于条件的值
问题描述:
> str(store)
'data.frame': 1115 obs. of 10 variables:
$ Store : int 1 2 3 4 5 6 7 8 9 10 ...
$ StoreType : Factor w/ 4 levels "a","b","c","d": 3 1 1 3 1 1 1 1 1 1 ...
$ Assortment : Factor w/ 3 levels "a","b","c": 1 1 1 3 1 1 3 1 3 1 ...
$ CompetitionDistance : int 1270 570 14130 620 29910 310 24000 7520 2030 3160 ...
$ CompetitionOpenSinceMonth: int 9 11 12 9 4 12 4 10 8 9 ...
$ CompetitionOpenSinceYear : int 2008 2007 2006 2009 2015 2013 2013 2014 2000 2009 ...
$ Promo2 : int 0 1 1 0 0 0 0 0 0 0 ...
$ Promo2SinceWeek : int NA 13 14 NA NA NA NA NA NA NA ...
$ Promo2SinceYear : int NA 2010 2011 NA NA NA NA NA NA NA ...
$ PromoInterval : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 1 3 3 1 1 1 1 1 1 1 ...
我试图用Promo2值替换NA。值应该用列均值代替。替换NA取决于条件的值
不明白为什么我的代码不能编辑商店数据。
for (i in 1:nrow(store)){
if(is.na(store[i,])== TRUE & store$Promo2[i] ==0){
store[i,] <- ifelse(is.na(store[i,]),0,store[i,])
}
else if (is.na(store[i,])== TRUE & store$Promo2[i] ==1){
for(j in 1:ncol(store)){
store[is.na(store[i,j]), j] <- mean(store[,j], na.rm = TRUE)
}
}
}
答
对于Promo2SinceWeek柱:
store$Promo2SinceWeek[store$Promo2==0 & is.na(store$Promo2SinceWeek)] <- 0
store$Promo2SinceWeek[store$Promo2==1 & is.na(store$Promo2SinceWeek)] <- mean(store$Promo2SinceWeek, na.rm=TRUE)
对于其他列,使用同样的方法。矢量化功能R.
答
的一个非常有用的功能来修复for循环:
for(i in 1:nrow(store)) {
col <- which(is.na(store[i,]))
store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}
或者,如果你不希望任何if语句:
for (i in 1:nrow(store)) {
store[i,][is.na(store[i,]) & store$Promo2[i] ==0] <- 0
store[i,][is.na(store[i,]) & store$Promo2[i] ==1] <-
colMeans(store[,is.na(store[i,]) & store$Promo2[i] ==1], na.rm = TRUE)
}
你的循环是不因为if
陈述接受一个条件值从测试工作。您的循环向它发送if(is.na(store[i,])== TRUE & store$Promo2[i] ==0)
。但是该条件声明将具有许多值TRUE FALSE FALSE FALSE TRUE...
。这是一系列的修复和错误时,它应该只有一个值,或者是一个 TRUE或一个错误。只有当您给出倍数时,该函数才会取第一个值。
重复的例子,
store
# Promo2 gear carb
#Mazda RX4 1 NA NA
#Mazda RX4 Wag 1 4 4
#Datsun 710 1 4 1
#Hornet 4 Drive 0 3 1
#Hornet Sportabout 0 3 NA
#Valiant 0 3 1
for(i in 1:nrow(store)) {
col <- which(is.na(store[i,]))
store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}
store
# Promo2 gear carb
#Mazda RX4 1 3.4 1.75
#Mazda RX4 Wag 1 4.0 4.00
#Datsun 710 1 4.0 1.00
#Hornet 4 Drive 0 3.0 1.00
#Hornet Sportabout 0 3.0 0.00
#Valiant 0 3.0 1.00
数据
store <- head(mtcars)
store <- store[-(1:8)]
names(store)[1] <- "Promo2"
store[1,2] <- NA
store[5,3] <- NA
store[1,3] <- NA
store
你需要学习一些基本的R. –