在列中将NA替换为相邻列中的值

问题描述:

此问题与具有类似标题(replace NA in an R vector with adjacent values)的帖子有关。我想扫描数据框中的一列,并用相邻单元格中的值替换NA。在前面提到的文章中,解决方案是用不是来自相邻向量(例如数据矩阵中的相邻元素)的值代替NA,而是取代固定值的条件。下面是我的问题的重复的例子:在列中将NA替换为相邻列中的值

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300) 
STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE', 
        'ACTIVE','INACTIVE') 
TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18', 
         '2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15') 
START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30', 
        '2007-10-29','2008-02-05','2008-06-30','2009-02-07') 
STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31', 
        '2008-02-04','2008-06-29','2009-02-06','4712-12-31') 

TEST < - data.frame(UNIT,状态,终止,启动,停止) TEST

UNIT STATUS TERMINATED  START  STOP 
1 NA ACTIVE 1999-07-06 2007-04-23 2008-12-05 
2 NA INACTIVE 2008-12-05 2008-12-06 4712-12-31 
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 

#using the syntax for a conditional replace and hoping it works :/   
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST 

    UNIT STATUS TERMINATED  START  STOP 
1  1 ACTIVE 1999-07-06 2007-04-23 2008-12-05 
2  2 INACTIVE 2008-12-05 2008-12-06 4712-12-31 
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 

结果应该是:

 UNIT STATUS TERMINATED  START  STOP 
1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05 
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 
3  200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 
4  200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 
5  200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 
6  200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 
7  200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 
8  300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 
9  300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 
10  300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 
+0

也许试试'TEST $ UNIT [is.na(TEST $ UNIT)] Seth 2013-03-26 05:23:17

+2

您不能在数据框中混合列中的类型。 – 2013-03-26 05:24:12

它没有工作,因为地位是一个因素。当您将因素与数字混合时,数字是限制最少的。通过强制状态为字符,你得到你想要的是结果,现在列是一个字符向量:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)]) 

##  UNIT STATUS TERMINATED  START  STOP 
## 1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05 
## 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 
## 3  200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 
## 4  200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 
## 5  200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 
## 6  200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 
## 7  200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 
## 8  300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 
## 9  300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 
## 10  300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 
## 11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 
+0

比我快6秒。 +1(我正在删除我的)。 – A5C1D2H2I1M1N2O1R2T1 2013-03-26 05:25:56

+2

好东西它是代码而不是手枪:) – 2013-03-26 05:26:29

+0

谢谢你们!那个伎俩 – 2013-03-26 05:53:24

你要做

TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)] 

以使该值将与被替换相邻值。否则,要替换的值的数量与要替换的值之间不匹配。这将导致值按行顺序被替换。它在这种情况下起作用,因为两个值被替换为前两个值。

+0

我认为这是可以作为答案。当然,解决方案与其他人给出的解决方案相同,但是您已经添加了对正在发生的事情的解释。在我看来,它不应该是一个评论。 – 2016-08-31 16:01:14