提取了一些长短不一

问题描述：

testVector <- c("I have 10 cars", "6 cars", "You have 4 cars", "15 cars")

有没有一种方法去分析这个矢量，所以我可以只存储数值：

10, 6, 4, 15

如果问题只是“15辆汽车”和“6辆汽车”，我知道如何解析，但我对前面带有文字的字符串也有困难！任何帮助是极大的赞赏。

答

我们可以使用str_extract的模式\\d+这意味着匹配一个或多个数字。它可以写成[0-9]+。

library(stringr) 
as.numeric(str_extract(testVector, "\\d+")) 
#[1] 10 6 4 15

如果有一个字符串多个号码，我们使用str_extract_all其返回永存一个list输出。

这也可以用base R（无外部使用的包）

as.numeric(regmatches(testVector, regexpr("\\d+", testVector))) 
#[1] 10 6 4 15

或者使用gsub从base R

as.numeric(gsub("\\D+", "", testVector)) 
#[1] 10 6 4 15

BTW做，有些功能只是用gsub，从extract_numeric

function (x) 
{ 
    as.numeric(gsub("[^0-9.-]+", "", as.character(x))) 
}

所以，如果我们需要一个功能，我们可以创建一个（不使用任何外部包装）

ext_num <- function(x) { 
      as.numeric(gsub("\\D+", "", x)) 
     } 
ext_num(testVector) 
#[1] 10 6 4 15

谢谢！你能帮我描述一下“\\ d +”是什么意思吗？ – Sheila

@Sheila更新了帖子 – akrun

正则表达式https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ – Nate

答

对于这个特殊的共同任务，有一个在tidyr一个不错的辅助函数称为extract_numeric：

library(tidyr) 

extract_numeric(testVector) 
## [1] 10 6 4 15

答

这也可能派上用场。

testVector <- gsub("[:A-z:]","",testVector) 
testVector <- gsub(" ","",testVector) 

> testVector 
[1] "10" "6" "4" "15"