如何创建具有多列条件的单个虚拟变量?

问题描述:

我试图根据数据集中7个变量(col9-15)中的一个或多个是否具有特定值(35)来有效地在我的数据集中创建一个二元虚拟变量(1/0),但我不想测试所有列。如何创建具有多列条件的单个虚拟变量?

虽然as.numeric是理想的一般,我只能得到它在同一时间有一列工作:

data$indicator <- as.numeric(data$col1 == 35) 

任何想法,我怎么能修改上面的代码,这样,如果任何的data$col9 - data$col15是“35”,那么我的指标变量是1?

谢谢!

您可以使用rowSums(矢量化解决方案)是这样的:

set.seed(123) 
dat <- matrix(sample(c(35,1:100),size=15*20,rep=T),ncol=15,byrow=T) 
cbind(dat,rowSums(dat[,9:15] == 35) > 0) 
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] 
[1,] 29 79 41 89 94 4 53 90 55 46 96 45 68 57 10  0 
[2,] 90 24 4 33 96 89 69 64 100 66 71 54 60 29 14  0 
[3,] 97 91 69 80 2 48 76 21 32 23 14 41 41 37 15  0 
[4,] 14 23 47 26 86 4 44 80 12 56 20 12 76 90 37  0 
[5,] 67 9 38 27 82 45 81 82 80 44 76 63 71 35 48  1 
[6,] 22 38 61 35 11 24 67 42 79 10 43 99 90 89 17  0 
[7,] 13 65 34 66 32 18 79 9 47 51 60 33 49 96 48  0 
[8,] 89 92 61 41 14 94 30 6 95 72 14 55 96 59 40  0 
[9,] 65 32 31 22 37 99 15 9 14 69 62 90 67 74 52  0 
[10,] 66 83 79 98 44 31 41 1 18 85 23 24  7 24 73  0 
[11,] 85 50 39 24 11 39 57 21 44 22 50 35 65 37 35  1 
[12,] 53 74 22 41 26 63 18 87 75 67 62 37 53 88 58  0 
[13,] 84 31 71 26 60 48 26 57 92 91 27 32 99 62 94  0 
[14,] 47 41 66 15 57 24 97 60 52 40 88 36 29 17 17  0 
[15,] 48 25 21 68 4 70 35 41 82 92 28 97 73 69  5  0 
[16,] 39 48 56 70 92 62 43 54 5 26 40 19 84 15 81  0 
[17,] 55 66 17 63 31 73 40 97 97 73 25 22 59 27 53  0 
[18,] 79 16 40 47 87 93 89 68 95 52 58 33 35  2 50  1 
[19,] 87 35 7 16 77 74 98 47 7 65 76 13 40 22  5  0 
[20,] 39 6 22 5 67 30 10 7 88 76 82 99 10 10 80  0 

编辑

我通过transform更换cbind。由于该列将是布尔值,我强制它获得0/1。

transform(dat,x=as.numeric((rowSums(dat[,9:15] == 35) > 0))) 

结果是一个data.frame。(通过变换从矩阵裹挟)

EDIT2(由@flodel所建议的)

data$indicator <- as.integer(rowSums(data[paste0("col", 9:15)] == 35) > 0) 

其中data是OP的数据。帧。

+0

哇。感谢所有伟大的投入。 – km5041 2013-02-22 18:23:15

适用于救援:)

# this sample data frame is pre-loaded 
mtcars 

# test whether any of the values in the 
# 2nd - 5th columns of mtcars equal four.. 

# save the result into a new vector.. 
indicator.col <- 
    apply( 
     mtcars[ , 2:5 ] , 
     1 , 
     FUN = function(x) max(x == 4) 
    ) 

# ..that quickly binds onto mtcars 
# and bind it with the original mtcars 
mtcars2 <- cbind(mtcars , indicator.col) 

# look at your result 
mtcars2 
+0

谢谢!这工作像一个魅力,光年比我的循环更快。非常感激。 – km5041 2013-02-22 02:54:33

你也可以试试这个(从agstudy的答案借贷样本数据)

> set.seed(123) 
> dat <- matrix(sample(c(35,1:100),size=15*20,rep=T),ncol=15,byrow=T) 


#Create indicator initialized with 0. 
> indicator <- rep(0, nrow(dat)) 
#Replace the elements at indices which are equal to rows in dat where you find 35 
> indicator[which(dat[,9:15]==35)%%nrow(dat)] <- 1 
#bind the indicator to original data 
> cbind(dat, indicator) 
                indicator 
[1,] 29 79 41 89 94 4 53 90 55 46 96 45 68 57 10   0 
[2,] 90 24 4 33 96 89 69 64 100 66 71 54 60 29 14   0 
[3,] 97 91 69 80 2 48 76 21 32 23 14 41 41 37 15   0 
[4,] 14 23 47 26 86 4 44 80 12 56 20 12 76 90 37   0 
[5,] 67 9 38 27 82 45 81 82 80 44 76 63 71 35 48   1 
[6,] 22 38 61 35 11 24 67 42 79 10 43 99 90 89 17   0 
[7,] 13 65 34 66 32 18 79 9 47 51 60 33 49 96 48   0 
[8,] 89 92 61 41 14 94 30 6 95 72 14 55 96 59 40   0 
[9,] 65 32 31 22 37 99 15 9 14 69 62 90 67 74 52   0 
[10,] 66 83 79 98 44 31 41 1 18 85 23 24 7 24 73   0 
[11,] 85 50 39 24 11 39 57 21 44 22 50 35 65 37 35   1 
[12,] 53 74 22 41 26 63 18 87 75 67 62 37 53 88 58   0 
[13,] 84 31 71 26 60 48 26 57 92 91 27 32 99 62 94   0 
[14,] 47 41 66 15 57 24 97 60 52 40 88 36 29 17 17   0 
[15,] 48 25 21 68 4 70 35 41 82 92 28 97 73 69 5   0 
[16,] 39 48 56 70 92 62 43 54 5 26 40 19 84 15 81   0 
[17,] 55 66 17 63 31 73 40 97 97 73 25 22 59 27 53   0 
[18,] 79 16 40 47 87 93 89 68 95 52 58 33 35 2 50   1 
[19,] 87 35 7 16 77 74 98 47 7 65 76 13 40 22 5   0 
[20,] 39 6 22 5 67 30 10 7 88 76 82 99 10 10 80   0