子集数据帧和应用功能计算每个因子水平
问题描述:
的频率我有一个DF:子集数据帧和应用功能计算每个因子水平
df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))
我想通过plot
到子集的数据。对于每个plot
子集,我想计算每个唯一的interact
类型的频率。输出应该是这样的:
df<- data.frame(region= c("1", "1", "1","1", "2","2",
"2"),plot=c("1",
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D",
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1))
然后我想使计算的DF的每个plot
子集以下功能:
sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions)
prop<-unique(df$freq)/sum #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions)
prop2<-prop^2 # Square this proportion
D<-sum(prop2) # Find the sum of these proportion for each plot subset
simp<-1/D)# Use this to calculate simpsons diversity
我想使用的功能是相似在下页解释:http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html。然而,引用的版本是在宽数据集上执行的,我的数据集将会很长。
最后,我将有值的每个情节DF:
result<-
Plot div
1 1.8
2 1.8
3 2.6
答
我用dplyr
然而导致对其plot3是不同的,我不知道为什么。你能提供你的结果,每次计算或检查我的,让我知道错误在哪里?
另外。如果您有兴趣的计算多样性指数,你可以熟悉vegan
封装,特别是功能
df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),
plot=c("1", "1", "1","2","2","2", "3","3","3","3"),
interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))
library(dplyr)
df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n())
df2 <- df1 %>% group_by(plot) %>% mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2)
df2
A tibble: 7 x 7
# Groups: plot [3]
region plot interact freq sum prop prop2
<fctr> <fctr> <fctr> <int> <int> <dbl> <dbl>
1 1 1 A_B 1 3 0.3333333 0.1111111
2 1 1 C_D 2 3 0.6666667 0.4444444
3 1 2 C_D 2 3 0.6666667 0.4444444
4 1 2 E_F 1 3 0.3333333 0.1111111
5 1 3 D_E 2 4 0.5000000 0.2500000
6 2 3 A_B 1 4 0.2500000 0.0625000
7 2 3 C_B 1 4 0.2500000 0.0625000
df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D)
A tibble: 3 x 3
plot D simp
<fctr> <dbl> <dbl>
1 1 0.5555556 1.800000
2 2 0.5555556 1.800000
3 3 0.3750000 2.666667
这里是使用功能从vegan
包的方法。
首先,你需要使用传播创造一个“矩阵”与你互动作为单独的列
library(vegan)
library(tidyr)
library(dplyr)
df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
df6 <-spread(data=df5, key = interact, value = freq, fill=0)
df6
# A tibble: 3 x 6
# Groups: plot [3]
plot A_B C_B C_D D_E E_F
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 2 0 0
2 2 0 0 2 0 1
3 3 1 1 0 2 0
比你计算的多样性,给作为数据矩阵DF6无1列,这是情节。最后,您可以将计算出的多样性作为列添加到df6中。
simp <-diversity(x=df6[,-1], index = "invsimpson")
df6$simp <- simp
df6
# A tibble: 3 x 7
# Groups: plot [3]
plot A_B C_B C_D D_E E_F simp
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 2 0 0 1.800000
2 2 0 0 2 0 1 1.800000
3 3 1 1 0 2 0 2.666667
或甚至do()
短,tidy()
从broom
包
df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
library(broom)
df5 %>% spread(key = interact, value = freq, fill=0) %>%
do(tidy(diversity(x=.[,-1], index = "invsimpson")))
谢谢!这个解决方案非常有效。我对我的情节进行了编辑3.我对这个计算做了一个错误,你的回答是正确的。 – Danielle
此外,我想使用来自纯素的多样性(),但我的理解是它需要矩阵格式。您是否有一种有效的方法将多样性()集成到子集上?再次感谢。 – Danielle
当然,这也有可能:)我通过使用'diversity()'函数添加方法来编辑我的帖子。看一看。 – MikolajM