从数据帧
问题描述:
的因素删除引号我有我使用RWeka
已经discretized
的数据帧。 RWeka的离散化在其中创建带有单引号的分箱。虽然它们不会导致任何问题,同时绘制它看起来丑陋有一个变量'All'
类别。从数据帧
这里的离散数据帧:
structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast",
"rainy"), class = "factor"), temperature = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes",
"no"), class = "factor")), .Names = c("outlook", "temperature",
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")
我怎样才能把数据的单引号和重建的因素?
答
这应做到:
df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
outlook temperature humidity windy play
1 sunny All All FALSE no
2 sunny All All TRUE no
3 overcast All All FALSE yes
4 rainy All All FALSE yes
5 rainy All All FALSE yes
6 rainy All All TRUE no
7 overcast All All TRUE yes
8 sunny All All FALSE no
9 sunny All All FALSE yes
10 rainy All All FALSE yes
11 sunny All All TRUE yes
12 overcast All All TRUE yes
13 overcast All All FALSE yes
14 rainy All All TRUE no
如果你需要跨越几列做同样的,这可能是更有效的。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
gsub("\\'", "", x)
})
脱逃报价符号必要在这里? (我这样做防守,太...) – joran
可能不是,但它肯定是我的一个防御性反应。 – Maiasaura
谢谢@Maiasaura。 “应用”解决方案是我所需要的,因为我有超过100列。现在,我只需要找出哪些列中有引号的列值。在值虽然看后,它看起来像'gsub'创造特征向量,不是一个因素,但是这很容易 – karlos