将日期范围转换为R中的日期类型
问题描述:
日期范围的这个向量包含在我的类“字符”的数据框中。该格式取决于日期范围是否跨越到一个不同的月份:将日期范围转换为R中的日期类型
dput(pollingdata$dates)
c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6",
"Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3",
"Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3",
"Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19",
"Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26",
"Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22",
"Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3",
"Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13",
"Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1",
"Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18",
"Aug. 10-16", "Jan. 12")
我想这个向量转换成两个单独列在我的数据帧,1开始日期和结束日期2,在开始和结束的范围内。两栏应保存为'Date'类,这将使我更容易在项目中使用这些数据。有谁知道一个简单的方法来做这个操作?我一直在努力。
由于提前,
答
我们可以通过-
分裂载体导入list
,通过paste
替换具有在端部只有数字元素荷兰国际集团月子,附加NA为那些具有使用小于2组的元素(length<-
),并转换为data.frame
(与do.call(rbind.data.frame
)
lst <- lapply(strsplit(v1, "-"), function(x) {
i1 <- grepl("^[0-9]+", x[length(x)])
if(i1) {
x[length(x)] <- paste(substr(x[1], 1, 4), x[length(x)])
x} else x})
d1 <- do.call(rbind.data.frame, lapply(lst, `length<-`, max(lengths(lst))))
colnames(d1) <- c("Start_Date", "End_Date")
按照该OP的帖子,我们需要转换为Date
类,但Date
类遵循format
的%Y-%m-%d
。在向量中,没有一年,不确定我们可以粘贴当前年份并转换为Date
类。如果这是允许的,那么
d1[] <- lapply(d1, function(x) as.Date(paste(x, 2017), "%b. %d %Y"))
head(d1)
# Start_Date End_Date
#1 2017-11-01 2017-11-07
#2 2017-11-01 2017-11-07
#3 2017-10-24 2017-11-06
#4 2017-10-04 2017-11-06
#5 2017-10-30 2017-11-06
#6 2017-10-25 2017-10-31
答
您可以使用库stringr
功能“str_split_fixed”分裂字段,然后处理数据。地图图书馆stringr和流程如下:
library(stringr)
dat <- data.frame(date=c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6",
"Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3",
"Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3",
"Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19",
"Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26",
"Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22",
"Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3",
"Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13",
"Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1",
"Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18",
"Aug. 10-16", "Jan. 12"))
输出处理:
#spliting with space and dash
dt <- data.frame(str_split_fixed(dat$date, "[-]|\\s",4))
names(dt) <- c("stdt1","stdt2","endt1","endt2")
##Removing dot(.) and replacing with ""
dt1 <- data.frame(sapply(dt,function(x)gsub("[.]","",x)))
dt1$stdt <- as.Date(paste0(dt1$stdt2,dt1$stdt1,"2016"),format="%d%b%Y")
dt1$endt <- ifelse(dt1$endt2=="",paste0(dt1$endt1,dt1$stdt1,"2016"),
paste0(dt1$endt2,dt1$endt1,"2016"))
dt1$endt <-as.Date(ifelse(nchar(dt1$endt)==7,paste0(dt1$stdt2,dt1$endt),dt1$endt),"%d%b%Y")
假设:
1)没有提供今年,所以我已年2016。
2)第10行和第43行,结束日期“day”没有信息,因此I已假定当天开始日期。
答:
> dt1
stdt1 stdt2 endt1 endt2 stdt endt
1 Nov 1 7 2016-11-01 2016-11-07
2 Nov 1 7 2016-11-01 2016-11-07
3 Oct 24 Nov 6 2016-10-24 2016-11-06
4 Oct 4 Nov 6 2016-10-04 2016-11-06
5 Oct 30 Nov 6 2016-10-30 2016-11-06
6 Oct 25 31 2016-10-25 2016-10-31
7 Oct 7 27 2016-10-07 2016-10-27
8 Oct 21 Nov 3 2016-10-21 2016-11-03
9 Oct 20 24 2016-10-20 2016-10-24
10 Jul 19 2016-07-19 2016-07-19
这个伟大工程,让我钻进去了。这些列不是日期格式,但我可能能得到 – Canovice
@Canvice Date需要年份信息,在您的数据集中它不会显示。如果您可以随意粘贴一年,那么它会转换为“日期”(显示在更新中) – akrun