前面介紹了3種獲取TCGA數據的方法:使用TCGA2STAT、TCGAbiolinks、RTCGA。這里再介紹一個包:RTCGAToolbox包,這個包是我最為推薦的,原因是我使用時它下載數據最快、最為穩定可靠。
RTCGAToolbox下載方法
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("RTCGAToolbox")
幫助文檔:
http://bioconductor.org/packages/release/bioc/manuals/RTCGAToolbox/man/RTCGAToolbox.pdf
案例介紹
#包下載
source("https://bioconductor.org/biocLite.R")
biocLite("RTCGAToolbox")
#加載包
library(RTCGAToolbox)
#哪些癌癥數據可以下載
> getFirehoseDatasets()
[1] "ACC" "BLCA" "BRCA" "CESC" "CHOL" "COADREAD" "COAD" "DLBC" "ESCA"
[10] "FPPP" "GBMLGG" "GBM" "HNSC" "KICH" "KIPAN" "KIRC" "KIRP" "LAML"
[19] "LGG" "LIHC" "LUAD" "LUSC" "MESO" "OV" "PAAD" "PCPG" "PRAD"
[28] "READ" "SARC" "SKCM" "STAD" "STES" "TGCT" "THCA" "THYM" "UCEC"
[37] "UCS" "UVM"
#數據庫中更新時間
> getFirehoseRunningDates()
[1] "20151101" "20150821" "20150601" "20150402" "20150204" "20141206" "20141017" "20140902" "20140715"
[10] "20140614" "20140518" "20140416" "20140316" "20140215" "20140115" "20131210" "20131114" "20131010"
[19] "20130923" "20130809" "20130715" "20130623" "20130606" "20130523" "20130508" "20130421" "20130406"
[28] "20130326" "20130309" "20130222" "20130203" "20130116" "20121221" "20121206" "20121114" "20121102"
[37] "20121024" "20121020" "20121018" "20121004" "20120913" "20120825" "20120804" "20120725" "20120707"
[46] "20120623" "20120606" "20120525" "20120515" "20120425" "20120412" "20120321" "20120306" "20120217"
[55] "20120124" "20120110" "20111230" "20111206" "20111128" "20111115" "20111026"
#下載所需要的數據,這里以乳腺癌為例,數據下載完后會直接放在你的工作目錄,不同地方下載的速度不一樣,我這里等待了好久才下完。
brcaData = getFirehoseData (dataset="READ", runDate="20150402",forceDownload = TRUE,
Clinic=TRUE, Mutation=TRUE)
按照上面的代碼進行,你就可以獲取到TCGA的數據了,然后進行你的牛逼實驗,發表一流的工作研究成果。加油,恭喜!
個人見解
強烈推薦這種下載方法來下載TCGA數據,它會是你的下載更加的靠譜。靠譜,就是穩定、快!