R:约因子水平列

问题描述:

调查数据集中的A柱看起来像R:约因子水平列

Factor w/ 163305 levels "['032']","['A10', 'A11', 'B31']",..: 1 76209 134581 134581 75649 134581 84340 134871 74475 87044 ... 

有没有办法来['A10', 'A11', 'B31']分成三列,分别由不同的字母,字母?

+2

嗨,欢迎来到SO。请考虑阅读[问]和如何产生[可重现的例子](http://*.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。它使其他人更容易帮助你。 – Heroka

尝试:

# Data (I assume that each value is separated by 1 comma and some other punctuation) 
x <- c("['032']","['A10', 'A11', 'B31']") 

# Find maximum number of values in 1 string (counts the commas in each string and returns the maximum number + 1, as that is the most values there are) 
mx <- max(sapply(gregexpr("\\,",x),length)) + 1 

# Create a matrix containing each value in a separate column; str_split_fixed can take an argument that will determine the number of columns (mx in our case) 
library(stringr) 
str_split_fixed(gsub("[^[:alnum:],]","",x),",",mx) 
#  [,1] [,2] [,3] 
# [1,] "032" "" "" 
# [2,] "A10" "A11" "B31" 

如果每个字符串只有一个值,那么你会得到一个矩阵有两列,其第二列将只有空字符串。否则,它应该工作得很好。