子集数据帧和应用功能计算每个因子水平

问题描述：

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))

我想通过plot到子集的数据。对于每个plot子集，我想计算每个唯一的interact类型的频率。输出应该是这样的：

df<- data.frame(region= c("1", "1", "1","1", "2","2", 
"2"),plot=c("1", 
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D", 
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1))

然后我想使计算的DF的每个plot子集以下功能：

sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions) 
prop<-unique(df$freq)/sum #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions) 
prop2<-prop^2 # Square this proportion 
D<-sum(prop2) # Find the sum of these proportion for each plot subset 
simp<-1/D)# Use this to calculate simpsons diversity

我想使用的功能是相似在下页解释：http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html。然而，引用的版本是在宽数据集上执行的，我的数据集将会很长。

最后，我将有值的每个情节DF：

result<- 
     Plot div 
      1  1.8 
      2  1.8 
      3  2.6

答

我用dplyr然而导致对其plot3是不同的，我不知道为什么。你能提供你的结果，每次计算或检查我的，让我知道错误在哪里？

另外。如果您有兴趣的计算多样性指数，你可以熟悉vegan封装，特别是功能

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"), 
       plot=c("1", "1", "1","2","2","2", "3","3","3","3"), 
       interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B")) 

library(dplyr) 

df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n()) 
df2 <- df1 %>% group_by(plot) %>% mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2) 
df2 

A tibble: 7 x 7 
# Groups: plot [3] 
    region plot interact freq sum  prop  prop2 
    <fctr> <fctr> <fctr> <int> <int>  <dbl>  <dbl> 
1  1  1  A_B  1  3 0.3333333 0.1111111 
2  1  1  C_D  2  3 0.6666667 0.4444444 
3  1  2  C_D  2  3 0.6666667 0.4444444 
4  1  2  E_F  1  3 0.3333333 0.1111111 
5  1  3  D_E  2  4 0.5000000 0.2500000 
6  2  3  A_B  1  4 0.2500000 0.0625000 
7  2  3  C_B  1  4 0.2500000 0.0625000 


df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D) 

A tibble: 3 x 3 
    plot   D  simp 
    <fctr>  <dbl> <dbl> 
1  1 0.5555556 1.800000 
2  2 0.5555556 1.800000 
3  3 0.3750000 2.666667

这里是使用功能从vegan包的方法。

首先，你需要使用传播创造一个“矩阵”与你互动作为单独的列

library(vegan) 
library(tidyr) 
library(dplyr) 

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n()) 
df6 <-spread(data=df5, key = interact, value = freq, fill=0) 
df6 

# A tibble: 3 x 6 
# Groups: plot [3] 
    plot A_B C_B C_D D_E E_F 
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> 
1  1  1  0  2  0  0 
2  2  0  0  2  0  1 
3  3  1  1  0  2  0

比你计算的多样性，给作为数据矩阵DF6无1列，这是情节。最后，您可以将计算出的多样性作为列添加到df6中。

simp <-diversity(x=df6[,-1], index = "invsimpson") 
df6$simp <- simp 
df6 

# A tibble: 3 x 7 
# Groups: plot [3] 
    plot A_B C_B C_D D_E E_F  simp 
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
1  1  1  0  2  0  0 1.800000 
2  2  0  0  2  0  1 1.800000 
3  3  1  1  0  2  0 2.666667

或甚至do()短，tidy()从broom包

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n()) 

library(broom) 

df5 %>% spread(key = interact, value = freq, fill=0) %>% 
    do(tidy(diversity(x=.[,-1], index = "invsimpson")))

谢谢！这个解决方案非常有效。我对我的情节进行了编辑3.我对这个计算做了一个错误，你的回答是正确的。 – Danielle

此外，我想使用来自纯素的多样性（），但我的理解是它需要矩阵格式。您是否有一种有效的方法将多样性（）集成到子集上？再次感谢。 – Danielle

当然，这也有可能:)我通过使用'diversity（）'函数添加方法来编辑我的帖子。看一看。 – MikolajM

子集数据帧和应用功能计算每个因子水平

相关推荐