扩展的数据帧具有在原始行
问题描述:
我有一个数据帧作为多的行,两列的范围如下:扩展的数据帧具有在原始行
structure(list(symbol = c("u", "n", "v", "i", "a"), start = c(9L,
6L, 10L, 8L, 7L), end = c(14L, 15L, 12L, 13L, 11L)), .Names = c("symbol",
"start", "end"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5"))
我希望尽可能多的行,也有在的范围内的值(开始,结束)每个符号。所以,最后的数据帧的样子:
structure(list(symbol = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("a", "l", "n", "v", "y"
), class = "factor"), value = c(7L, 8L, 9L, 10L, 11L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 8L, 9L, 10L, 11L, 12L, 10L,
11L, 12L, 13L, 14L, 15L, 9L, 10L, 11L, 12L, 13L)), class = "data.frame", row.names = c(NA,
-30L), .Names = c("symbol", "value"))
我想我可以简单地每行值的列表,然后使用tidyr
包的unnest
如下:
df$value <- apply(df, 1, function(x) as.list(x[2]:x[3]))
dput(df)
structure(list(symbol = structure(c(4L, 3L, 5L, 2L, 1L), .Label = c("a",
"i", "n", "u", "v"), class = "factor"), start = c(9L, 6L, 10L,
8L, 7L), end = c(14L, 15L, 12L, 13L, 11L), value = structure(list(
`1` = list(9L, 10L, 11L, 12L, 13L, 14L), `2` = list(6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), `3` = list(10L,
11L, 12L), `4` = list(8L, 9L, 10L, 11L, 12L, 13L), `5` = list(
7L, 8L, 9L, 10L, 11L)), .Names = c("1", "2", "3", "4",
"5"))), .Names = c("symbol", "start", "end", "value"), row.names = c("1",
"2", "3", "4", "5"), class = "data.frame")
df
symbol start end value
1 u 9 14 9, 10, 11, 12, 13, 14
2 n 6 15 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
3 v 10 12 10, 11, 12
4 i 8 13 8, 9, 10, 11, 12, 13
5 a 7 11 7, 8, 9, 10, 11
然后做:
library(tidyr)
unnest(df, value)
不过,我觉得我打这个悬而未决的特性/错误: https://github.com/tidyverse/tidyr/issues/278
Error: Each column must either be a list of vectors or a list of data frames [value]
有没有更好的方法来做到这一点,特别是避免申请家庭?
答
赋值给每个一行dplyr
,我们可以使用rowwise
与do
library(dplyr)
df1 %>%
rowwise() %>%
do(data.frame(symbol= .$symbol, value = .$start:.$end)) %>%
arrange(symbol)
# A tibble: 30 x 2
# symbol value
# <chr> <int>
# 1 a 7
# 2 a 8
# 3 a 9
# 4 a 10
# 5 a 11
# 6 i 8
# 7 i 9
# 8 i 10
# 9 i 11
#10 i 12
# ... with 20 more rows
答
你可以使用data.table
和所需的行数(基于start
和end
每个symbol
)复制df
,再经过
library(data.table)
setDT(df)
df[rep(1:.N, (end - start + 1))][, value := (start - 1) + (1:.N), by = symbol][]
# symbol start end value
# 1: u 9 14 9
# 2: u 9 14 10
# 3: u 9 14 11
# 4: u 9 14 12
# 5: u 9 14 13
# ... etc
答
也许你可以使用map2
来添加一个我们可以从unnest
到所需的结果列。
library(tidyverse)
df %>%
mutate(value = map2(start, end, ~ seq(from = .x, to = .y))) %>%
select(symbol, value) %>%
unnest()
#> symbol value
#> 1 u 9
#> 2 u 10
#> 3 u 11
#> 4 u 12
#> 5 u 13
#> 6 u 14
#> 7 n 6
#> 8 n 7
#> 9 n 8
#> 10 n 9
#> ...etc
织补简单,呵呵!我只是一直忘记'做'具有多大的力量。试图玩一点这个问题,但只是不能提出正确的步骤。完善。谢谢! – Gopala