R:约因子水平列
问题描述:
调查数据集中的A柱看起来像R:约因子水平列
Factor w/ 163305 levels "['032']","['A10', 'A11', 'B31']",..: 1 76209 134581 134581 75649 134581 84340 134871 74475 87044 ...
有没有办法来['A10', 'A11', 'B31']
分成三列,分别由不同的字母,字母?
答
尝试:
# Data (I assume that each value is separated by 1 comma and some other punctuation)
x <- c("['032']","['A10', 'A11', 'B31']")
# Find maximum number of values in 1 string (counts the commas in each string and returns the maximum number + 1, as that is the most values there are)
mx <- max(sapply(gregexpr("\\,",x),length)) + 1
# Create a matrix containing each value in a separate column; str_split_fixed can take an argument that will determine the number of columns (mx in our case)
library(stringr)
str_split_fixed(gsub("[^[:alnum:],]","",x),",",mx)
# [,1] [,2] [,3]
# [1,] "032" "" ""
# [2,] "A10" "A11" "B31"
如果每个字符串只有一个值,那么你会得到一个矩阵有两列,其第二列将只有空字符串。否则,它应该工作得很好。
嗨,欢迎来到SO。请考虑阅读[问]和如何产生[可重现的例子](http://*.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。它使其他人更容易帮助你。 – Heroka