每第n行每日信息的平均列数
问题描述:
我在R上很新。我每天都会观察12年的温度和PP(6574行,6col,一些NA)。例如,我想计算一下,例如,我在前面提到的每一年的每个月的平均值,从2001年1月1日至10日,然后是11-20日,最后是21日至31日等。每第n行每日信息的平均列数
但我也有问题,因为2月有时有28或29天(闰年)。
这是我打开我的文件是一个CSV,与read.table
# READ CSV
setwd ("C:\\Users\\GVASQUEZ\\Documents\\ESTUDIO_PAMPAS\\R_sheet")
huancavelica<-read.table("huancavelica.csv",header = TRUE, sep = ",",
dec = ".", fileEncoding = "latin1", nrows = 6574)
这是我的CSV的输出文件
Año Mes Dia PT101 TM102 TM103
1 1998 1 1 6.0 15.6 3.4
2 1998 1 2 8.0 14.4 3.2
3 1998 1 3 8.6 13.8 4.4
4 1998 1 4 5.6 14.6 4.6
5 1998 1 5 0.4 17.4 3.6
6 1998 1 6 3.4 17.4 4.4
7 1998 1 7 9.2 14.6 3.2
8 1998 1 8 2.2 16.8 2.8
9 1998 1 9 8.6 18.4 4.4
10 1998 1 10 6.2 15.0 3.6
. . . . . . .
答
随着数据设置,你有一个相当尝试和真正的方法应该工作:
# add 0 in front of single digit month variable to account for 1 and 10 sorting
huancavelica$MesChar <- ifelse(nchar(huancavelica$Mes)==1,
paste0("0",huancavelica$Mes), as.character(huancavelica$Mes))
# get time of month ID
huancavelica$timeMonth <- ifelse(huancavelica$Dia < 11, 1,
ifelse(huancavelica$Dia > 20, 3, 2)
# get final ID
huancavelica$ID <- paste(huancavelica$Año, huancavelica$MesChar, huancavelica$timeMonth, sep=".")
# average stat
huancavelica$myStat <- ave(huancavelica$PT101, huancavelica$ID, FUN=mean, na.rm=T)
答
我们可以尝试
library(data.table)
setDT(df1)[, Grp := (Dia - 1)%/%10+1, by = .(Ano, Mes)
][Grp>3, Grp := 3][,lapply(.SD, mean, na.rm=TRUE), by = .(Ano, Mes, Grp)]
答
它增加了一点复杂性,但你可以将每个月减少到三分之一,并获得平均每三分之一。例如:
library(dplyr)
library(lubridate)
# Fake data
set.seed(10)
df = data.frame(date=seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by="1 day"),
value=rnorm(365))
# Cut months into thirds
df = df %>%
mutate(mon_yr = paste0(month(date, label=TRUE, abbr=TRUE) , " ", year(date))) %>%
group_by(mon_yr) %>%
mutate(cutMonth = cut(day(date),
breaks=c(0, round(1/3*n()), round(2/3*n()), n()),
labels=c("1st third","2nd third","3rd third")),
cutMonth = paste0(mon_yr, ", ", cutMonth)) %>%
ungroup %>%
mutate(cutMonth = factor(cutMonth, levels=unique(cutMonth)))
date value cutMonth 1 2015-01-01 0.01874617 Jan 2015, 1st third 2 2015-01-02 -0.18425254 Jan 2015, 1st third 3 2015-01-03 -1.37133055 Jan 2015, 1st third ... 363 2015-12-29 -1.3996571 Dec 2015, 3rd third 364 2015-12-30 -1.2877952 Dec 2015, 3rd third 365 2015-12-31 -0.9684155 Dec 2015, 3rd third
# Summarise to get average value for each 1/3 of a month
df.summary = df %>%
group_by(cutMonth) %>%
summarise(average.value = mean(value))
cutMonth average.value 1 Jan 2015, 1st third -0.49065685 2 Jan 2015, 2nd third 0.28178222 3 Jan 2015, 3rd third -1.03870698 4 Feb 2015, 1st third -0.45700203 5 Feb 2015, 2nd third -0.07577199 6 Feb 2015, 3rd third 0.33860882 7 Mar 2015, 1st third 0.12067388 ...
欢迎计算器。如果你把你的代码作为文本而不是图像,人们会欣赏它,而不是发布你的代码的图片。这使得检查更容易。 – lmo
谢谢你的建议我会做@lmo – Guisseppe
我想一个简单的方法是创建一个新的列,1到10天,然后2为11到20,3为> 20。调用列' x',然后尝试类似'aggregate(TM102〜Mes + x,data = huancavelica,mean)'。有可能有更好的方法,但这是一个简单的方法。另见'?聚合'或像[这一个]的问题(http://*.com/questions/21982987/mean-per-group-in-a-data-frame)。 – Laterow