單基因泛癌表達(dá)箱線圖是泛癌分析文章必不可少的一個(gè)圖,配對(duì)箱線圖在很多文章中也有出現(xiàn)。本文講解如何實(shí)現(xiàn)單個(gè)基因在泛癌表達(dá)箱線圖和配對(duì)箱線圖展示。
先上效果圖:
image.png
1. 單基因泛癌表達(dá)箱線圖
本首先是從xena下載泛癌矩陣。下載地址:https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 一般下載tpm格式即可。
image
然后是在同頁(yè)面下載臨床信息。
image
通過(guò)tidyverse R包進(jìn)行數(shù)據(jù)的合并、轉(zhuǎn)置等操作,得到如下數(shù)據(jù)框,其中每一行為一個(gè)樣本,每一列為基因或臨床信息(這里的tpm竟然有負(fù)數(shù))。也可以參考2022新版TCGA數(shù)據(jù)下載與整理,人肉下載再手動(dòng)合并(鏈接中是下載的count矩陣,也可以選擇下載tpm那一列)。
image
以CBX3基因?yàn)槔媹D。
library(ggpubr)
# Type為Control和Tumor的分組,Cancer為33種腫瘤的名稱。
p <- ggboxplot(pandata, x = "Cancer", y = "CBX3",
color = "Type", palette = "jco")+
rotate_x_text(angle = 90) #將x軸腫瘤名稱旋轉(zhuǎn)90°展示
p + stat_compare_means(aes(group = Type), label = "p.signif", label.y =11)
# label = "p.signif"表示星號(hào)表示,label="p.format"表示p值展示
# label.y表示設(shè)置星號(hào)的縱坐標(biāo)。
圖如下:
image
還可以顯示散點(diǎn):
library(ggpubr)
p <- ggboxplot(pandata, x = "Cancer", y = "CBX3",
color = "Type", palette = "jco",
add = "jitter")+
rotate_x_text(angle = 90)
p + stat_compare_means(aes(group = Type),label = "p.signif", label.y =11)
image
2. 單基因配對(duì)箱線圖
先以BRCA為例
library(tidyverse)
BRCA=drawdata[pandata$Cancer=="BRCA",]
BRCA$ID=stringr::str_sub(BRCA$ID,1,12) # 取樣本名字前12位
Normal=filter(BRCA,Type=="Normal")
Tumor=filter(BRCA,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),] #去除腫瘤組中的重復(fù)樣本
index <- intersect(Normal$ID,Tumor$ID) #取正常和腫瘤組中共有患者
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
data=rbind(T1,N1)
library(ggpubr)
p <- ggpaired(data, x = "Type", y = "CBX3",
color = "black",
fill = c("#E11E24","#FBB96F"),
line.color = "gray", line.size = 0.4,
ylab = "expression of CBX3",
palette = "npg")
p + stat_compare_means(paired = TRUE,label="p.signif", label.x.npc=0.4,comparisons=list(c("Tumor","Normal")))
image.png
p值顯示:
library(tidyverse)
BRCA=drawdata[pandata$Cancer=="BRCA",]
BRCA$ID=stringr::str_sub(BRCA$ID,1,12) # 取樣本名字前12位
Normal=filter(BRCA,Type=="Normal")
Tumor=filter(BRCA,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),] #去除腫瘤組中的重復(fù)樣本
index <- intersect(Normal$ID,Tumor$ID) #取正常和腫瘤組中共有患者
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
data=rbind(T1,N1)
library(ggpubr)
p <- ggpaired(data, x = "Type", y = "CBX3",
color = "black",
fill = c("#E11E24","#FBB96F"),
line.color = "gray", line.size = 0.4,
ylab = "expression of CBX3",
palette = "npg")
p + stat_compare_means(paired = TRUE,label="p.format", label.x.npc=0.4,comparisons=list(c("Tumor","Normal")))
image.png
至于單基因配對(duì)箱線圖泛癌展示我還沒(méi)想好,暫時(shí)只能用分面來(lái)解決。
library(ggpubr)
data=pandata
data$ID=stringr::str_sub(data$ID,1,12)
Tumor = subset(data,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),]
Normal = subset(data,Type=="Normal")
index <- intersect(Normal$ID,Tumor$ID)
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
paireddata=rbind(T1,N1)
p <- ggpaired(paireddata,x="Type", y="CBX3",
color = "Type",palette = "jco",
line.color = "gray",line.size = 0.4,
facet.by = "Cancer",short.panel.labs = F)
p + stat_compare_means(label="p.signif",paired=T,label.x.npc=0.4,label.y=9)
image.png