使用ggplot2在R中分组的散点图在R中分组的散点图

使用ggplot2在R中分组的散点图在R中分组的散点图

问题描述:

我正在使用ggplot2创建具有散点图叠加的分组盒图。我想将每个散点图数据点与其对应的分组盒图组合在一起。使用ggplot2在R中分组的散点图在R中分组的散点图

但是,我还希望scatterplot点是不同的符号。我似乎能够将我的散点图组与我的分组箱形图组合起来,或者将我的散点图分为不同的符号......但不能同时出现。下面是一些示例代码来说明发生了什么:

library(scales) 
library(ggplot2) 

# Generates Data frame to plot 
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24)) 
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20)) 
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5)) 
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
      rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000)) 
    value <- sqrt(value*value) 
     Tdata <- cbind(Gene, Clone, variable) 
     Tdata <- data.frame(Tdata) 
      Tdata <- cbind(Tdata,value) 

# Creates the Plot of All Data 
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape 
# and I'd like them to each have different shapes.       
ln_clr <- "black" 
bk_clr <- "white" 
point_shapes <- c(0,15,1,16,2,17) 
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4") 

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
       size = 0.7, coef = 4) + 
    geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
        alpha = 1, colour = ln_clr) + 
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
       pch=15) 


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000), 
        labels = trans_format("log10", math_format(10^.x))) 

ggsave("Scatter Grouped-Wrong Symbols.png") 

#************************************************************************************************************************************* 
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes 
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
       size = 0.7, coef = 4) + 
    geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
        alpha = 1, colour = ln_clr) + 
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
       aes(shape=Clone)) 


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000), 
        labels = trans_format("log10", math_format(10^.x))) 

ggsave("Scatter Ungrouped-Right Symbols.png") 

如果有人有任何建议,我会非常感激。

谢谢 弥敦道

为了得到箱线图出现,shape审美需求在里面geom_point,而不是在主调用ggplot。其原因是,当shape美学是在主ggplot调用,它适用于所有几何,包括geom_boxplot。但是,应用shape=Clone美学原因会导致geom_boxplotClone的每个级别创建单独的箱形图。由于每个组合variableClone只有一行数据,因此不会生成箱形图。

shape美学影响geom_boxplot对我来说似乎违反直觉,但也许有一个我不知道的原因。无论如何,将shape美学移到geom_point通过将shape审美仅应用于geom_point来解决该问题。

然后,要得到出现的点与正确的boxplot,我们需要groupGene。我还添加了theme_classic,使其更容易看到的情节(尽管它仍然是非常繁忙):

ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) + 
    geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7, 
      aes(shape=Clone, group=Gene)) + 
    scale_fill_manual(values=blue_cols) + labs(y="Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) + 
    theme_classic() 

enter image description here

我觉得剧情会更容易理解,如果你使用小面的Gene和X轴为variable。把时间放在X轴上看起来更直观,而使用小平面则可以释放点的颜色美感。对于六个不同的克隆来说,要区分点标记仍然很困难(至少对我来说),但这比我以前的版本更清晰。

library(dplyr) 

ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)), 
     aes(x=gsub("Day","",variable), y=value)) + 
    stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) + 
    geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) + 
    theme_classic() + 
    facet_grid(. ~ Gene) + 
    labs(y = "Fold Change", x="Day") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) 

enter image description here

如果你真的需要保留的点,也许这将是更好的箱图和点带部分手动闪避分开:

set.seed(10) 
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)), 
         Gene = gsub("Gene","Gene ", Gene)), 
     aes(x=Day - 2, y=value, group=Day)) + 
    stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) + 
    geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21, 
      position=position_jitter(width=1, height=0)) + 
    theme_classic() + 
    facet_grid(. ~ Gene) + 
    labs(y="Fold Change", x="Day") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) 

enter image description here

一更多的事情:为了将来的参考,你可以简化你的数据创建代码:

Gene = rep(paste0("Gene",LETTERS[1:5]), each=24) 
Clone = rep(paste0("D",1:6), 20) 
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5) 
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
       sd=rep(c(0.5,8,900,9000,3000), each=24)) 

Tdata = data.frame(Gene, Clone, variable, value) 
+1

这也许是我见过的最好,最彻底,表达清晰的答案。非常感谢你的帮助。你写的所有内容都非常有帮助。如果我能以某种方式给你更多的信贷,而不是最后的投票,我会这么做。谢谢。 – Nathan

+0

谢谢,Nathan!谢谢您的好意。 – eipi10