根据条件将多行字符串折叠为一行。

问题描述:

说我有这样的数据:根据条件将多行字符串折叠为一行。

df <- data.frame(
    text = c("Treatment1: This text is","on two lines","","Treatment2:This text","has","three lines","","Treatment3: This has one") 
       ) 
df 
         text 
1 Treatment1: This text is 
2    on two lines 
3       
4  Treatment2:This text 
5      has 
6    three lines 
7       
8 Treatment3: This has one 

我将如何解析这个文本,以使所有的“治疗”是他们自己的行与下面的所有文字在同一行?

例如,这是需要的输出:

text 
1 Treatment1: This text is on two lines 
2 Treatment2: This text has three lines     
3 Treatment3: This has one 

谁能推荐一个办法做到这一点?

也许像下面这样。
首先,数据格式为dput,最佳格式是在帖子*享数据集。

df <- 
structure(list(text = c("Treatment1: This text is", "on two lines", 
"", "Treatment2:This text", "has", "three lines", "", "Treatment3: This has one" 
)), .Names = "text", class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8")) 

现在的base R代码。

fact <- cumsum(grepl("treatment", df$text, , ignore.case = TRUE)) 
result <- do.call(rbind, lapply(split(df, fact), function(x) 
        trimws(paste(x$text, collapse = " ")))) 
result <- as.data.frame(result) 
names(result) <- "text" 
result 
#         text 
#1 Treatment1: This text is on two lines 
#2 Treatment2:This text has three lines 
#3    Treatment3: This has one 

编辑。
正如Rich Scriven在他的评论中指出的那样,tapply可以大大简化上面的代码。 (我没有看到,我有时复杂太多。)

result2 <- data.frame(
    text = tapply(df$text, fact, function(x) trimws(paste(x, collapse = " "))) 
) 

all.equal(result, result2) 
#[1] "Component “text”: 'current' is not a factor" 
+0

看一看'tapply()'。它可以代替'do.call(rbind,lapply(split(...),...))' –

+0

@RichScriven谢谢你,回答编辑你的建议。 –

x <- gsub("\\s+Treatment", "*BREAK*Treatment", 
      as.character(paste(df[[1]], collapse = " "))) 
data.frame(text = unlist(strsplit(x, "\\*BREAK\\*")))