deepTools 使用指南

deepTools

deepTools 是一套基于python開發(fā)的工具,適用于有效處理分析高通量測序數(shù)據(jù),可用于ChIP-seq, RNA-seq 或 MNase-seq。


#1. deepTools 系列工具

deepTools workflow

##1.1 deepTools 系列工具信息匯總

tool type input files main output file(s) application
multiBamSummary data integration 2 or more BAM interval-based table of values perform cross-sample analyses of read counts –> plotCorrelation, plotPCA
multiBigwigSummary data integration 2 or more bigWig interval-based table of values perform cross-sample analyses of genome-wide scores –> plotCorrelation, plotPCA
plotCorrelation visualization bam/multiBigwigSummary output clustered heatmap visualize the Pearson/Spearman correlation
plotPCA visualization bam/multiBigwigSummary output 2 PCA plots visualize the principal component analysis
plotFingerprint QC 2 BAM 1 diagnostic plot assess enrichment strength of a ChIP sample
computeGCBias QC 1 BAM 2 diagnostic plots calculate the exp. and obs. GC distribution of reads
correctGCBias QC 1 BAM, output from computeGCbias 1 GC-corrected BAM obtain a BAM file with reads distributed according to the genome’s GC content
bamCoverage normalization BAM bedGraph or bigWig obtain the normalized read coverage of a single BAM file
bamCompare normalization 2 BAM bedGraph or bigWig normalize 2 files to each other (e.g. log2ratio, difference)
computeMatrix data integration 1 or more bigWig, 1 or more BED zipped file for plotHeatmap or plotProfile compute the values needed for heatmaps and summary plots
estimateReadFiltering information 1 or more BAM files table of values estimate the number of reads filtered from a BAM file or files
alignmentSieve QC 1 BAM file 1 filtered BAM or BEDPE file filters a BAM file based on one or more criteria
plotHeatmap visualization computeMatrix output heatmap of read coverages visualize the read coverages for genomic regions
plotProfile visualization computeMatrix output summary plot (“meta-profile”) visualize the average read coverages over a group of genomic regions
plotCoverage visualization 1 or more BAM 2 diagnostic plots visualize the average read coverages over sampled genomic positions
bamPEFragmentSize information 1 BAM text with paired-end fragment length obtain the average fragment length from paired ends
plotEnrichment visualization 1 or more BAM and 1 or more BED/GTF A diagnostic plot plots the fraction of alignments overlapping the given features
computeMatrixOperations miscellaneous 1 or more BAM and 1 or more BED/GTF A diagnostic plot plots the fraction of alignments overlapping the given features

##1.2 BAM 和bigWig文件處理工具

? 利用兩個或多個bam文件計算基因組區(qū)段reads覆蓋度;BED-file 指定基因組區(qū)域,bins模式可用于全基因組范圍分析;產(chǎn)生的結(jié)果(.npz)可用于plotCorrelation進(jìn)行相關(guān)性分析和用于plotPCA進(jìn)行主成分分析。

$ deepTools2.0/bin/multiBamSummary bins \
 --bamfiles testFiles/*bam \ # using all BAM files in the folder
 --minMappingQuality 30 \
 --region 19 \ # limiting the binning of the genome to chromosome 19
 --labels H3K27me3 H3K4me1 H3K4me3 HeK9me3 input \
 -out readCounts.npz --outRawCounts readCounts.tab

 $ head readCounts.tab
 #'chr'     'start' 'end'   'H3K27me3'      'H3K4me1'       'H3K4me3'       'HeK9me3'       'input'
 19 10000   20000   0.0     0.0     0.0     0.0     0.0
 19 20000   30000   0.0     0.0     0.0     0.0     0.0
 19 30000   40000   0.0     0.0     0.0     0.0     0.0
 19 40000   50000   0.0     0.0     0.0     0.0     0.0
 19 50000   60000   0.0     0.0     0.0     0.0     0.0
 19 60000   70000   1.0     1.0     0.0     0.0     1.0
 19 70000   80000   0.0     1.0     7.0     0.0     1.0
 19 80000   90000   15.0    0.0     0.0     6.0     4.0
 19 90000   100000  73.0    7.0     4.0     16.0    5.0
  • multiBigwigSummary
    ? 與multiBamSummary相比,輸入文件格式是bigWig 。

  • correctGCBias
    ? 矯正GC-bias;

  • bamCoverage

    norm_IGVsnapshot_indFiles

    ? bamCoverage 利用測序數(shù)據(jù)比對結(jié)果轉(zhuǎn)換為基因組區(qū)域reads覆蓋度結(jié)果。可以自行設(shè)定覆蓋度計算的窗口大小(bin);bamCoverage 內(nèi)置了各種標(biāo)準(zhǔn)化方法:scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).

Example : bamCoverage 用于ChIPseq分析

bamCoverage --bam a.bam -o a.SeqDepthNorm.bw \
    --binSize 10
    --normalizeUsing RPGC
    --effectiveGenomeSize 2150570000
    --ignoreForNormalization chrX
    --extendReads
    --outFileFormat bedgraph
  • bamCompare
    ? 兩個BAM 文件相比較,計算二者之間窗口中的reads豐度比率。
usage:  bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw
  • bigwigCompare
  • computeMatrix
    ? 給基因組區(qū)段打分,產(chǎn)生的文件可用于plotHeatmapplotProfiles作圖;基因組區(qū)段可以是基因或其他區(qū)域,使用BED格式文件定義即可。

computeMatrix 有兩種不同的模式

computeMatrix two modes

  • reference-point(relative to a point): 計算某個點(diǎn)的信號豐度
  • scale-regions(over a set of regions): 把所有基因組區(qū)段縮放至同樣大小,然后計算其信號豐度
    如下命令查看幫助:
$ computeMatrix scale-regions –help
$ computeMatrix scale-regions -S <biwig file(s)> -R <bed file> -b 1000
$ computeMatrix reference-point –help
$ computeMatrix reference-point -S <biwig file(s)> -R <bed file> -a 3000 -b 3000

Example 1:單個輸入文件 (reference-point mode)

$ computeMatrix reference-point \ # choose the mode
       --referencePoint TSS \ # alternatives: TSS, TES, center
       -b 3000 -a 10000 \ # define the region you are interested in
       -R testFiles/genes.bed \
       -S testFiles/log2ratio_H3K4Me3_chr19.bw  \
       --skipZeros \
       -o matrix1_H3K4me3_l2r_TSS.gz \ # to be used with plotHeatmap and plotProfile
       --outFileSortedRegions regions1_H3K4me3_l2r_genes.bed

? 注:point-BED文件指定基因組區(qū)段的起始位置

Example 2:多個輸入文件 (scale-regions mode)

$ deepTools2.0/bin/computeMatrix scale-regions \
  -R genes_chr19_firstHalf.bed genes_chr19_secondHalf.bed \ # separate multiple files with spaces
  -S testFiles/log2ratio_*.bw  \ or use the wild card approach
  -b 3000 -a 3000 \
  --regionBodyLength 5000 \
  --skipZeros -o matrix2_multipleBW_l2r_twoGroups_scaled.gz \
  --outFileNameMatrix matrix2_multipleBW_l2r_twoGroups_scaled.tab \
  --outFileSortedRegions regions2_multipleBW_l2r_twoGroups_genes.bed
Note that the reported regions will have the same coordinates as the ones in 

##1.3 質(zhì)控工具

  • plotCorrelation
    ? 基于multiBamSummary 或multiBigwigSummary結(jié)果計算樣品間的相關(guān)性。并且還可以通過ScatterplotHeatmap進(jìn)行展示。

Example 1:Scatterplot

$ deepTools2.0/bin/plotCorrelation \
-in scores_per_transcript.npz \
--corMethod pearson --skipZeros \
--plotTitle "Pearson Correlation of Average Scores Per Transcript" \
--whatToPlot scatterplot \
-o scatterplot_PearsonCorr_bigwigScores.png   \
--outFileCorMatrix PearsonCorr_bigwigScores.tab
scatterplot_PearsonCorr_bigwigScores

Example 2:Heatmap

$ deepTools2.0/bin/plotCorrelation \
    -in readCounts.npz \
    --corMethod spearman --skipZeros \
    --plotTitle "Spearman Correlation of Read Counts" \
    --whatToPlot heatmap --colorMap RdYlBu --plotNumbers \
    -o heatmap_SpearmanCorr_readCounts.png   \
    --outFileCorMatrix SpearmanCorr_readCounts.tab
heatmap_SpearmanCorr_readCounts1
  • plotPCA
    ? 基于multiBamSummary 或multiBigwigSummary結(jié)果進(jìn)行主成分分析,并作出基于兩個主成分的圖和前五個特征代表性的圖。

Example

$ deepTools2.0/bin/plotPCA -in readCounts.npz \
-o PCA_readCounts.png \
-T "PCA of read counts"
PCA_readCounts
  • plotFingerprint
    ? 對樣本比對結(jié)果reads累積情況進(jìn)行展示。一定長度窗口(bin)上reads數(shù)進(jìn)行計數(shù),然后排序,再依次累加畫圖。input (能測到90%DNA片段)在基因組理論上是均勻分布,隨著測序深度增加趨近于直線,實(shí)驗(yàn)組在排序越高的窗口處reads累積速度越快,說明這些區(qū)域富集的越特異。
    QC_fingerprint

Example

$ deepTools2.0/bin/plotFingerprint \
 -b testFiles/*bam \
--labels H3K27me3 H3K4me1 H3K4me3 H3K9me3 input \
--minMappingQuality 30 --skipZeros \
--region 19 --numberOfSamples 50000 \
-T "Fingerprints of different samples"  \
--plotFile fingerprints.png \
--outRawCounts fingerprints.tab
fingerprints1
  • bam PEFragmentSize
    ? 計算bam文件中雙端reads的fragment size長度。
  • compute GCBias
    ? 計算GC-bias
  • plot Coverage
    ? 計算樣品測序深度。隨機(jī)抽取1 million bp ,計算reads數(shù),統(tǒng)計堿基覆蓋率和覆蓋次數(shù)。

##1.4 熱圖和總結(jié)圖

Example 1: 根據(jù)computeMatrix結(jié)果畫熱圖

# run compute matrix to collect the data needed for plotting
$ computeMatrix scale-regions -S H3K27Me3-input.bigWig \
                                 H3K4Me1-Input.bigWig  \
                                 H3K4Me3-Input.bigWig \
                              -R genes19.bed genesX.bed \
                              --beforeRegionStartLength 3000 \
                              --regionBodyLength 5000 \
                              --afterRegionStartLength 3000
                              --skipZeros -o matrix.mat.gz
$ plotHeatmap -m matrix.mat.gz \
      -out ExampleHeatmap1.png \
plot Heatmap

Example 2: plotHeatmap還可以進(jìn)行聚類分析

$ plotHeatmap -m matrix_two_groups.gz \
     -out ExampleHeatmap2.png \
     --colorMap RdBu \
     --whatToShow 'heatmap and colorbar' \
     --zMin -3 --zMax 3 \
     --kmeans 4  #聚類參數(shù)

[圖片上傳失敗...(image-d28547-1556115525034)]
其他參數(shù)

顏色自定義:--colorList 'black, yellow' 'white,blue' '#ffffff,orange,#000000'
去掉熱圖邊框:--boxAroundHeatmaps no
  • plotProfile
    根據(jù)computeMatrix結(jié)果畫圖。

Example 1: 根據(jù)樣本畫圖

# run compute matrix to collect the data needed for plotting
$ computeMatrix scale-regions -S H3K27Me3-input.bigWig \
                                 H3K4Me1-Input.bigWig  \
                                 H3K4Me3-Input.bigWig \
                              -R genes19.bed genesX.bed \
                              --beforeRegionStartLength 3000 \
                              --regionBodyLength 5000 \
                              --afterRegionStartLength 3000
                              --skipZeros -o matrix.mat.gz

$ plotProfile -m matrix.mat.gz \
              -out ExampleProfile1.png \
              --numPlotsPerRow 2 \
              --plotTitle "Test data profile"
Example Profile1

Example 2: 根據(jù)基因畫圖

$ plotProfile -m matrix.mat.gz \
     -out ExampleProfile2.png \
     --plotType=fill \ # add color between the x axis and the lines
     --perGroup \ # make one image per BED file instead of per bigWig file
     --colors red yellow blue \
     --plotTitle "Test data profile"
Example Profile2

Example 3: 聚類畫圖

$ plotProfile -m matrix.mat.gz \
      --perGroup \
      --kmeans 2 \
      -out ExampleProfile3.png
Example Profile3

Example 4: 畫熱圖

$ plotProfile -m matrix.mat.gz \
      --perGroup \
      --kmeans 2 \
      -plotType heatmap \
      -out ExampleProfile3.png

Example Profile4

plotEnrichment
? 統(tǒng)計樣本BED文件中peak或者GTF文件中feature 在chipseq結(jié)果中富集情況

Example

$ plotEnrichment -b Input.bam H3K4Me1.bam H3K4Me3.bam \
--BED up.bed down.bed \
--regionLabels "up regulated" "down regulated" \
-o enrichment.png
plot Enrichment

參考:

The tools of deeptools

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,250評論 6 530
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 97,923評論 3 413
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 175,041評論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經(jīng)常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,475評論 1 308
  • 正文 為了忘掉前任,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 71,253評論 6 405
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 54,801評論 1 321
  • 那天,我揣著相機(jī)與錄音,去河邊找鬼。 笑死,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 42,882評論 3 440
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 42,023評論 0 285
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 48,530評論 1 331
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 40,494評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 42,639評論 1 366
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,177評論 5 355
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 43,890評論 3 345
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,289評論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,552評論 1 281
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,242評論 3 389
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 47,626評論 2 370