R：约因子水平列

问题描述：

调查数据集中的A柱看起来像R：约因子水平列

Factor w/ 163305 levels "['032']","['A10', 'A11', 'B31']",..: 1 76209 134581 134581 75649 134581 84340 134871 74475 87044 ...

有没有办法来['A10', 'A11', 'B31']分成三列，分别由不同的字母，字母？

嗨，欢迎来到SO。请考虑阅读[问]和如何产生[可重现的例子]（http://*.com/questions/5963269/how-to-make-a-great-r-reproducible-example）。它使其他人更容易帮助你。 – Heroka

答

尝试：

# Data (I assume that each value is separated by 1 comma and some other punctuation) 
x <- c("['032']","['A10', 'A11', 'B31']") 

# Find maximum number of values in 1 string (counts the commas in each string and returns the maximum number + 1, as that is the most values there are) 
mx <- max(sapply(gregexpr("\\,",x),length)) + 1 

# Create a matrix containing each value in a separate column; str_split_fixed can take an argument that will determine the number of columns (mx in our case) 
library(stringr) 
str_split_fixed(gsub("[^[:alnum:],]","",x),",",mx) 
#  [,1] [,2] [,3] 
# [1,] "032" "" "" 
# [2,] "A10" "A11" "B31"

如果每个字符串只有一个值，那么你会得到一个矩阵有两列，其第二列将只有空字符串。否则，它应该工作得很好。

相关推荐