用不同的分隔符将数据读入R
问题描述:
我正在尝试将文件读入R,第一行中具有不同的分隔符的文件具有空格作为分隔符,但是从第二行到第一列之间的最后一行有一个空格,第二个和第三个之间相同,那么两个零和一个块的所有块应该是不同的列。 任何提示?!用不同的分隔符将数据读入R
ID Chip AX-77047182 AX-80910836 AX-80737273 AX-77048714 AX-77048779 AX-77050447
3811582 1 2002202222200202022020200200220200222200022220002200000201202000222022
3712982 1 2002202222200202022020200200220200222200022220002200000200202000222022
3712990 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713019 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713025 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713126 1 2002202222200202022020200200220200222200022220002200000200202000222022
答
肯定不是最完美的解决方案,但你可以尝试以下。如果我已经正确理解了您的示例数据,那么您并未提供零行/一行/二行所需的所有列名称(AX-77047182,...)。如果我的理解错误,下面的方法不会产生所需的结果,但仍可能帮助您找到解决方法 - 您可能只需在第二个split命令中调整分隔符。我希望这有助于...
#read file as character vector
chipstable <- readLines(".../chips.txt")
#extact first line to be used as column names
tablehead <- unlist(strsplit(chipstable[1], " "))
#split by first delimiter, i.e., space
chipstable <- strsplit(chipstable[2:length(chipstable)], " ")
#split by second delimiter, i.e., between each character (here number)
#and merge the two split results in one line
chipstable <- lapply(chipstable, function(x) {
c(x[1:2], unlist(strsplit(x[3], "")))
})
#combine all lines to a data frame
chipstable <- do.call(rbind, chipstable)
#assign column names
colnames(chipstable) <- tablehead
#turn values to numeric (if needed)
chipstable <- apply(chipstable, 2, as.numeric)
答
你可以试试... read(pattern = " || 1 ", recursive = TRUE)
使后绑定
例如:
data <- "ID Chip AX-77047182 AX-80910836 AX-80737273 AX-77048714 AX-77048779 AX-77050447
3811582 1 2002202222200202022020200200220200222200022220002200000201202000222022
3712982 1 2002202222200202022020200200220200222200022220002200000200202000222022
3712990 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713019 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713025 1 2002202211200202021011100101210200111101022121112100111110211110122122
3713126 1 2002202222200202022020200200220200222200022220002200000200202000222022"
teste <- strsplit(data, split = "\n")
for(i in seq(1, length(teste[[1]]),1)) {
if (i==1) {
dataOut <- strsplit(teste[[1]][i], split = " ")
print(dataOut)
} else
dataOut <- strsplit(teste[[1]][i], split = " 1 ")
print(dataOut)
}
您能否提供一个预期输出的例子? – emilliman5
确定为前三行:第一行:ID芯片AX-77047182 AX-80910836。第二行:3811582 1 2 0.分隔符应该是空格。 – Nico
请修改您的帖子,而不是评论 – emilliman5