用相应的组填充所有行的平均值(ddply?)
问题描述:
可能是一个关于ddply
的简单任务的愚蠢问题,但奇怪的是我找不到解决方案。所以,让我们说我有一个数据帧,国家内部含有的受访者,以及一些工作被申请人在已经举办了他或她的职业生涯:用相应的组填充所有行的平均值(ddply?)
mydata <- structure(list(country = structure(c(11L, 6L, 7L, 12L, 12L, 3L,
7L, 10L, 6L, 4L, 5L, 12L, 3L, 1L, 4L, 13L, 2L, 4L, 7L, 3L), contrasts = structure(c(1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, -1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1), .Dim = c(13L,
12L), .Dimnames = list(c("Austria", "Germany", "Sweden", "Netherlands",
"Spain", "Italy", "France", "Denmark", "Greece", "Switzerland",
"Belgium", "Czechia", "Poland"), c("AT", "DE", "SE", "NL", "ES",
"IT", "FR", "DK", "GR", "CH", "BE", "CZ"))), .Label = c("Austria",
"Germany", "Sweden", "Netherlands", "Spain", "Italy", "France",
"Denmark", "Greece", "Switzerland", "Belgium", "Czechia", "Poland"
), class = "factor"), njobs = c(2, 2, 3, 2, 1, 2, 4, 2, 1, 3,
2, 3, 3, 2, 8, 3, 1, 2, 9, 3)), .Names = c("country", "njobs"
), class = "data.frame", row.names = c(NA, -20L))
我想补充的第三列变量,包含平均职业在该特定国家的职位数。这是很容易在两行做:
ctry.means <- ddply(mydata,.(country),summarize,avejobs=mean(njobs))
result <- merge(mydata,ctry.means,by="country")
然而,这是这样一个简单的和经常使用的操作,我觉得必须有做一步到位,一些技巧与ddply
简单的方法。在更一般的情况下,这涉及在单个summarize
或mutate
语句中组合组级和个案级变量。
答
如果你满意的一个简单的碱溶液,
mydata$new = ave(mydata$njobs, mydata$country)
会做到这一点。
只需使用'transform/mutate'而不是'summarize'。 – Ramnath
我知道我是愚蠢的:)谢谢@Ramnath –
或与dplyr:'mydata%。%group_by(country)%。%mutate(avejobs = mean(njobs))' – hadley