R:将长格式转换为宽格式填写缺失日期
问题描述:
我正在重塑我公司的小时注册数据,以适应某种格式。我已将输入修改为如下所示:R:将长格式转换为宽格式填写缺失日期
employee project month day hours
1 A 16-001 9 9 5
2 B 16-001 9 29 1
3 A 16-001 9 3 5
4 B 16-001 9 28 2
5 A 16-002 9 8 6
6 B 16-002 9 9 4
7 A 16-002 10 25 6
8 B 16-002 10 21 8
9 A overig 10 6 6
10 B overig 10 17 7
11 A overig 10 9 1
12 B overig 10 10 7
#reproducicle data:
df <- data.frame(employee = rep(c("A","B"),6),project=rep(c("16-001","16-002","overig"), each=4), month=rep(c(9,10),each=6),day=sample(1:30,12,replace=T), hours=sample(1:8,12,replace=T))
#Now, I need to move this to a cross table:
res <- ftable(xtabs(hours~month+employee+project+day, aggregate(hours~month+employee+project+day, data=df, FUN=sum)))
#And put this cross table in a data.frame (for export to csv)
library(reshape2)
df_res <- dcast(as.data.frame(res), as.formula(paste(paste(names(attr(res, "row.vars")), collapse="+"), "~", paste(names(attr(res, "col.vars"))))))
df_res
month employee project 3 6 8 9 10 17 21 25 28 29
1 9 A 16-001 5 0 0 5 0 0 0 0 0 0
2 9 A 16-002 0 0 6 0 0 0 0 0 0 0
3 9 A overig 0 0 0 0 0 0 0 0 0 0
4 9 B 16-001 0 0 0 0 0 0 0 0 2 1
5 9 B 16-002 0 0 0 4 0 0 0 0 0 0
6 9 B overig 0 0 0 0 0 0 0 0 0 0
7 10 A 16-001 0 0 0 0 0 0 0 0 0 0
8 10 A 16-002 0 0 0 0 0 0 0 6 0 0
9 10 A overig 0 6 0 1 0 0 0 0 0 0
10 10 B 16-001 0 0 0 0 0 0 0 0 0 0
11 10 B 16-002 0 0 0 0 0 0 8 0 0 0
12 10 B overig 0 0 0 0 7 7 0 0 0 0
我不确定这是最好的方式,但现在格式不错。然而,我需要把所有的德日作为列,而不仅仅是我的data.frame中的日子(所以31列,最好是不存在的日期(例如31),其余为0。建议如何获取?
答
我觉得这是一个可以接受的解决方案,它会处理闰年太(加分)。不过趁着tidyr::spread()
真好因素填充行为与drop = F
,但现在使用功能lubridate::days_in_month()
。只,但流传至今这里,我们去:
library(tidyr)
library(lubridate)
library(purrr)
df$year <- 2016
df$num_in_month <- ymd(paste(df$year, df$month, df$day)) %>%
days_in_month()
df %>% split(.$month) %>%
map(~mutate(., day = factor(day, levels = 1:unique(num_in_month)))) %>%
map(~spread(., key = day, value = hours, fill = 0, drop = F)) %>%
bind_rows() %>%
select(-num_in_month)
employee project month year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 A 16-001 9 2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 8 0 0 NA
2 A 16-002 9 2016 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
3 B 16-001 9 2016 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 NA
4 B 16-002 9 2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
5 A 16-002 10 2016 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 A overig 10 2016 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 B 16-002 10 2016 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 B overig 10 2016 0 0 0 0 6 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
干杯
我不知道(+1),但它ð oes没有完全回答这个问题。首先'spread'会抛出一个错误“行重复标识符”,这实际上可能存在于数据中。其次,所有的日期都充满了NA,既有存在的日期(如sep-1),也有日期(sep-31)。 – RHA
啊,我误解了你填写新生的标准。 – Nate
你是否打算让这种行为能够认识闰年? – Nate