R中

R中

问题描述:

创建子组的权重目前,我有R中的表/ CSV:R中

Name Value Sector Date 
Company1 100 Financials 3/31/2015 
Company2 100 Energy 3/31/2015 
Company3 100 Healthcare 3/31/2015 
Company4 100 Financials 3/31/2015 
Company5 100 Energy 3/31/2015 
Company6 100 Healthcare 3/31/2015 
Company1 100 Financials 6/30/2015 
Company2 200 Energy 6/30/2015 
Company3 200 Healthcare 6/30/2015 
Company4 200 Financials 6/30/2015 
Company5 200 Energy 6/30/2015 

我想创建是每个季度结束日期,权重为基础值的每个部门。

我一直在使用:

cdata <- ddply(Test.Exposure, c("Date", "Sector"), summarise, 
       Sumx1 = sum(Value)) 

这给了我:

  Date  Sector Sumx1 
1  3/31/2015  Energy 200 
2  3/31/2015 Financials 200 
3  3/31/2015 Healthcare 200 
4  6/30/2015  Energy 400 
5  6/30/2015 Financials 300 
6  6/30/2015 Healthcare 400 

1)是否有一种方式来获得一笔%

2)是否有可能要为每个季度结束日期只有一条线,而有这样一列中的每个部门:

  Financials Energy Healthcare 
3/31/2015 33,33% 33,33% 33,33% 
6/30/2015 ... ... ... 

可以使用xtabs通过rowSums如下:

a <- xtabs(Sumx1~Date+Sector, d) 

#   Sector 
#Date  Energy Financials Healthcare 
# 3/31/2015 200  200  200 
# 6/30/2015 400  300  400 

round(a/rowSums(a)*100, 2) 

#   Sector 
#Date  Energy Financials Healthcare 
# 3/31/2015 33.33  33.33  33.33 
# 6/30/2015 36.36  27.27  36.36 

数据

d <- read.table(text="   Date  Sector Sumx1 
1  3/31/2015  Energy 200 
2  3/31/2015 Financials 200 
3  3/31/2015 Healthcare 200 
4  6/30/2015  Energy 400 
5  6/30/2015 Financials 300 
6  6/30/2015 Healthcare 400", header=T) 
+0

谢谢@ m0h3n这工作得很好! 我的最后的代码是: CDATA lapioche75

我们可以这样做dplyr/tidyr

library(dplyr) 
library(tidyr) 
Test.Exposure %>% 
    group_by(Date, Sector) %>% 
    summarise(Sumx1 = sum(Value)) %>% 
    group_by(Date) %>% 
    mutate(Sumx1 = round(100*Sumx1/sum(Sumx1),2)) %>% 
    spread(Sector, Sumx1) 
#  Date Energy Financials Healthcare 
#  <chr> <dbl>  <dbl>  <dbl>  
#1 3/31/2015 33.33  33.33  33.33 
#2 6/30/2015 36.36  27.27  36.36 
+0

在我的原始数据中,Value列中的数字是类型“100,125,125”io 100 125 125. So R阅读这些字符,并不会让我这样做:总结(Sumx1 =总和(价值)) 我尝试各种as.numeric(as.character(值),但这将无法正常工作任何想法。谢谢! – lapioche75

+0

@ lapioche75如果在你的文章中显示的'ddply'在那个数据集中是'dplyr'选项也应该工作。或者这是一个不同的问题? – akrun

+0

@ lapioche75也许你需要分割它。 '库(tidyr); (测试。曝光,值,转换= TRUE)%>%group_by(Date,Sector)%>%summarize(Sum1 = sum(Value))'然后完成其余 – akrun