作者:白介素2
相關閱讀:
R語言ggplot2繪制箱線圖R語言生存分析04-Cox比例風險模型診斷
R語言生存分析03-Cox比例風險模型
R語言生存分析-02-ggforest
R語言生存分析-01
ggpubr-專為學術繪圖而生(二)
ggstatsplot-專為學術繪圖而生(一)
生存曲線
R語言GEO數據挖掘01-數據下載及提取表達矩陣
R語言GEO數據挖掘02-解決GEO數據中的多個探針對應一個基因
R語言GEO數據挖掘03-limma分析差異基因
R語言GEO數據挖掘04-功能富集分析
如果沒有時間精力學習代碼,推薦了解:零代碼數據挖掘課程
廣而告之
說一個事,鑒于簡書平臺在信息傳播方面有不足之處,應粉絲要求,白介素2的個人微信平臺已經開啟,繼續聊臨床與科研的故事,R語言,數據挖掘,文獻閱讀等內容。當然也不要期望過高,微信平臺目前的定位是作為自己的讀書筆記,如果對大家有幫助最好。如果感興趣, 可以掃碼關注下。
image
載入數據
Sys.setlocale('LC_ALL','C')
load(file = "F:/Bioinfor_project/Breast/AS_research/AS/result/hubgene.Rdata")
head(data)
require(cowplot)
require(tidyverse)
require(ggplot2)
require(ggsci)
require(ggpubr)
mydata<-data %>%
## 基因表達數據gather,gather的范圍應調整
gather(key="gene",value="Expression",CCL14:TUBB3) %>%
##
dplyr::select(ID,gene,Expression,everything())
head(mydata) ## 每個基因作為一個變量的寬數據
創建帶有pvalue的箱線圖
- 參考資料
- 展示繪圖細節控制
p <- ggboxplot(mydata, x = "group", y = "Expression",
color = "group", palette = "jama",
add = "jitter")
# Add p-value
p + stat_compare_means()
image.png
改變統計方法
# Change method
p + stat_compare_means(method = "t.test")
image.png
統計學意義標注
- label="p.signif"
- p.format等
- label.x標注位置
p + stat_compare_means( label = "p.signif")
image.png
多組比較
- 給出global pvalue
# Default method = "kruskal.test" for multiple groups
ggboxplot(mydata, x = "gene", y = "Expression",
color = "gene",add="jitter", palette = "jama")+
stat_compare_means()
# Change method to anova
ggboxplot(mydata, x = "gene", y = "Expression",
color = "gene", add="jitter", palette = "jama")+
stat_compare_means(method = "anova")
image.png
image.png
指定比較
- 配對比較:會完成各個變量的比較,默認wilcox.test法,可修改
- my_comparisions:可以指定自己想要進行的比較
- 指定參考組,進行比較
require(ggpubr)
compare_means(Expression ~ gene, data = mydata)
## 指定自己想要的比較
# Visualize: Specify the comparisons you want
my_comparisons <- list( c("CCL14", "HBA1"), c("HBA1", "CCL16"), c("CCL16", "TUBB3") )
ggboxplot(mydata, x = "gene", y = "Expression",
color = "group",add = "jitter", palette = "jama")+
stat_compare_means(comparisons = my_comparisons)#+ # Add pairwise comparisons p-value
#stat_compare_means() # Add global p-value
image.png
指定參考組
指定CCL14作為參考組與其它各組比較
ref.group
compare_means(Expression ~ gene, data = mydata, ref.group = "CCL14",
method = "t.test")
# Visualize
mydata %>%
filter(group=="TNBC") %>% # 篩選TNBC數據
ggboxplot( x = "gene", y = "Expression",
color = "gene",add = "jitter", palette = "nejm")+
stat_compare_means(method = "anova")+ # Add global p-value
stat_compare_means(label = "p.signif", method = "t.test",
ref.group = "CCL14")
image.png
多基因分面
按另外一個變量分組比較
## 比較各個基因在TNBC與Normal表達
compare_means( Expression ~ group, data = mydata,
group.by = "gene")
# Box plot facetted by "gene"
p <- ggboxplot(mydata, x = "group", y = "Expression",
color = "group", palette = "jco",
add = "jitter",
facet.by = "gene", short.panel.labs = FALSE)
# Use only p.format as label. Remove method name.
p + stat_compare_means(label = "p.format")
image.png
將pvalue換成星號
- hide.ns = TRUE.參數可隱藏ns
p + stat_compare_means(label = "p.signif", label.x = 1.5)
image.png
將各個圖繪制在一張圖中
p <- ggboxplot(mydata, x = "gene", y = "Expression",
color = "group", palette = "nejm",
add = "jitter")
p + stat_compare_means(aes(group = group))
image.png
修改下pvalue展示的方式
# Show only p-value
p + stat_compare_means(aes(group = group), label = "p.format")
image.png
用星號表示pvalue
# Use significance symbol as label
p + stat_compare_means(aes(group = group), label = "p.signif")
image.png
配對樣本比較
要求x,y具有相同的樣本數,進行一一配對比較
head(ToothGrowth)
compare_means(len ~ supp, data = ToothGrowth,
group.by = "dose", paired = TRUE)
# Box plot facetted by "dose"
p <- ggpaired(ToothGrowth, x = "supp", y = "len",
color = "supp", palette = "jama",
line.color = "gray", line.size = 0.4,
facet.by = "dose", short.panel.labs = FALSE)
# Use only p.format as label. Remove method name.
p + stat_compare_means(label = "p.format", paired = TRUE)
image.png
封裝為函數命名為group_box
- 功能:已經選定的基因繪制箱線圖
- 參數1:group分組變量,可以是自己所有感興趣的變量
- 參數2:mydata為整理好的清潔數據,gene為長數據(gather版本)
head(mydata)
group_box<-function(group=group,data=mydata){
p <- ggboxplot(mydata, x = "gene", y = "Expression",
color = group,
palette = "nejm",
add = "jitter")
p + stat_compare_means(aes(group = group))
}
##
group_box(group="PAM50",data = mydata)
封裝為函數命名為group_box
- 功能:已經選定的基因繪制箱線圖
- 參數1:group分組變量,可以是自己所有感興趣的變量
- 參數2:mydata為整理好的清潔數據,gene為長數據(gather版本)
head(mydata)
group_box<-function(group=group,data=mydata){
p <- ggboxplot(mydata, x = "gene", y = "Expression",
color = group,
palette = "nejm",
add = "jitter")
p + stat_compare_means(aes(group = group))
}
##
group_box(group="PAM50",data = mydata)
image.png
封裝函數gene_box
- 目的功能:對感興趣的基因繪制和分組繪制boxplot
- 注意這時使用的應該是基因的寬數據,因為涉及到單個基因作為變量
head(data)
usedata<-data
## 封裝函數
gene_box<-function(gene="CCL14",group="group",data=usedata){
p <- ggboxplot(data, x = group, y = gene,
ylab = sprintf("Expression of %s",gene),
xlab = group,
color = group,
palette = "nejm",
add = "jitter")
p + stat_compare_means(aes(group = group))
}
gene_box(gene="CCL14")
image.png
牛刀小試
gene_box(gene="CCL16",group="PAM50")
image.png
批量繪制
- 目的功能:繪制任意基因,任意分組,批量繪制一氣呵成了
- 封裝函數+lapply批量繪制無敵
- 在lapply中的函數參數設置,不在原函數中,而是直接放置在lapply中
- do.call中參數1為函數,+c()包含原函數的參數設置,同樣參數設置不在原函數中
require(gridExtra)
head(data)
## 需要批量繪制的基因名
name<-colnames(data)[3:6]
## 批量繪圖
p<-lapply(name,gene_box,group = "T_stage")
## 組圖
do.call(grid.arrange,c(p,ncol=2))
image.png
本期的內容就到這里,我是老朋友白介素2,下期再見。