ChAMP 包分析甲基化數(shù)據(jù)

參考:https://bioconductor.org/packages/release/bioc/vignettes/ChAMP/inst/doc/ChAMP.html
http://blog.csdn.net/joshua_hit/article/details/54982018
http://www.lxweimin.com/p/6411e8acfab3

ChAMP package 是用來(lái)分析illuminate甲基化數(shù)據(jù)的包 (EPIC and 450k)。

包括:

  • 不同格式的數(shù)據(jù)導(dǎo)入 (e.g. from .idat files or a beta-valued matrix)
  • Quality Control plots
  • Type-2 探針的矯正方法:SWAN1, Peak Based Correction (PBC)2 and BMIQ3 (the default choice).
  • The popular Functional Normalization function offered by the minfi package is also available.
  • 查看批次效應(yīng)的方法:singular value decomposition (SVD) method,for correction of multiple batch effects the ComBat method
  • 通過(guò)RefbaseEWAS矯正cell-type heterogeneity
  • 也可以推斷CNV變異
  • Differentially Methylated Regions (DMR) (Lasso method,Bumphunter and DMRcate)
  • find Differentially Methylated Blocks
  • Gene Set Enrichment Analysis (GSEA)
  • infer gene modules in user-specified gene-networks that exhibit differential methylation between phenotypes (整合FEM package)
  • 其他分析甲基化數(shù)據(jù)的包:(including IMA, minfi, methylumi, RnBeads and wateRmelon)
1、安裝ChAMP包:
source("https://bioconductor.org/biocLite.R")
biocLite("ChAMP")
#或者直接安裝依賴(lài)包
source("http://bioconductor.org/biocLite.R")
biocLite(c("minfi","ChAMPdata","Illumina450ProbeVariants.db","sva","IlluminaHumanMethylation450kmanifest","limma"))

#如果報(bào)錯(cuò),試用
biocLite("YourErrorPackage")
#最后加載包
library("ChAMP")
如果報(bào)錯(cuò):
錯(cuò)誤: package or namespace load failed for 'ChAMP' in inDL(x, as.logical(local), as.logical(now), ...):
 無(wú)法載入共享目標(biāo)對(duì)象‘D:/work/R-3.4.3/library/mvtnorm/libs/x64/mvtnorm.dll’::
  `已達(dá)到了DLL數(shù)目的上限...

解決方案就是設(shè)置環(huán)境變量R_MAX_NUM_DLLS, 不管是什么操作系統(tǒng),R語(yǔ)言對(duì)應(yīng)的環(huán)境變量都可以在.Renviron文件中進(jìn)行設(shè)置。

這個(gè)文件可以保存在任意目錄下,文件中就一句話,內(nèi)容如下

R_MAX_NUM_DLLS=500

500表示允許的最多的dll文件數(shù)目,設(shè)置好之后,重新啟動(dòng)R, 然后輸入如下命令

normalizePath("d:/Documents/.Renviron", mustWork = FALSE)

第一個(gè)參數(shù)為.Renviron文件的真實(shí)路徑,然后在加載ChAMP包就可以了
2、用測(cè)試數(shù)據(jù)跑流程

測(cè)試數(shù)據(jù)包括450k(.idat)和850k(simulated EPIC data)兩個(gè)數(shù)據(jù)集

#450k數(shù)據(jù)集包括8個(gè)樣本的肺癌數(shù)據(jù),4個(gè)腫瘤組織(T)和4個(gè)對(duì)照(C)
testDir=system.file("extdata",package="ChAMPdata")
myLoad <- champ.load(testDir,arraytype="450K")

#850k數(shù)據(jù)集包括16個(gè)樣本,但是都是由一個(gè)樣本修改DMP 和 DMR而來(lái)。
data(EPICSimData)
3、ChAMP Pipeline
untitled.png

綠色發(fā)光線表示主要的分析步驟,灰色為可選的步驟。黑點(diǎn)表示準(zhǔn)備好的甲基化數(shù)據(jù)。
藍(lán)色表示準(zhǔn)備工作,比如Loading, Normalization, Quality Control checks etc.
紅色表示產(chǎn)生分析結(jié)果:Differentially Methylated Positions (DMPs), Differentially Methylated Regions (DMRs), Differentially methylated Blocks, EpiMod (a method for detecting differentially methylated gene modules derived from FEM package), Pathway Enrichment Results etc.
黃色表示交互界面畫(huà)圖

  • 450k步驟
  • Full Pipeline
#一步跑完結(jié)果,但是可能報(bào)錯(cuò)
champ.process(directory = testDir)
  • 一步一步跑
myLoad <- cham.load(testDir)
# Or you may separate about code as champ.import(testDir) + champ.filter()
CpG.GUI()
champ.QC() # Alternatively: QC.GUI()
myNorm <- champ.norm()
champ.SVD()
# If Batch detected, run champ.runCombat() here.
myDMP <- champ.DMP()
DMP.GUI()
myDMR <- champ.DMR()
DMR.GUI()
myBlock <- champ.Block()
Block.GUI()
myGSEA <- champ.GSEA()
myEpiMod <- champ.EpiMod()
myCNA <- champ.CNA()

# If DataSet is Blood samples, run champ.refbase() here.
myRefbase <- champ.refbase()
  • EPIC pipeline
# myLoad <- champ.load(directory = testDir,arraytype="EPIC")
# We simulated EPIC data from beta value instead of .idat file,
# but user may use above code to read .idat files directly.
# Here we we started with myLoad.

data(EPICSimData)
CpG.GUI(arraytype="EPIC")
champ.QC() # Alternatively QC.GUI(arraytype="EPIC")
myNorm <- champ.norm(arraytype="EPIC")
champ.SVD()
# If Batch detected, run champ.runCombat() here.This data is not suitable.
myDMP <- champ.DMP(arraytype="EPIC")
DMP.GUI()
myDMR <- champ.DMR()
DMR.GUI()
myDMR <- champ.DMR(arraytype="EPIC")
DMR.GUI(arraytype="EPIC")
myBlock <- champ.Block(arraytype="EPIC")
Block.GUI(arraytype="EPIC") # For this simulation data, not Differential Methylation Block is detected.
myGSEA <- champ.GSEA(arraytype="EPIC")
myEpiMod <- champ.EpiMod(arraytype="EPIC")

# champ.CNA(arraytype="EPIC")
# champ.CNA() function call for intensity data, which is not included in our Simulation data.

最多在8G內(nèi)存電腦上可以跑200個(gè)樣本,如果在服務(wù)器上多核跑,需要命令

library("doParallel")
detectCores()

Description of ChAMP Pipelines

6.1 Loading Data
image.png

image.png

.idat files 為原始芯片文件,包括pd file (Sample_Sheet.csv)文件(表型,編號(hào)等)


image.png
#以450k數(shù)據(jù)作為演示:
#首先看下pd數(shù)據(jù),有時(shí)候不同的實(shí)驗(yàn)格式不一致
myLoad$pd
#發(fā)現(xiàn)Sample_Group欄C代表control,T代表Tumor。有的可能用“Diagnose” or “CancerType”等代替
6.2 Filtering Data
  • ChAMP提供了 champ.filter() 函數(shù),可以輸入 (beta, M, Meth, UnMeth, intensity)格式的文件并進(jìn)行過(guò)濾質(zhì)控。 新版本的ChAMP包中champ.load()函數(shù)已經(jīng)包含了此功能。
  • champ.filter() 函數(shù)有個(gè)參數(shù)autoimpute,可以填補(bǔ)或保留由過(guò)濾導(dǎo)致的NA空缺值。
  • 如果輸入多個(gè)數(shù)據(jù)框進(jìn)行過(guò)濾,他們的行名和列名必須一致,否則champ.filter()認(rèn)為是不同來(lái)源的數(shù)據(jù),將停止過(guò)濾。
  • 低質(zhì)量的樣本(有較多的探針沒(méi)有信號(hào))將會(huì)被過(guò)濾掉,Sample_Name 要與pd file中的列名稱(chēng)一致。
  • imputation需要detection P matrix, beta or M matrix信息,且ProbeCutoff 不能等于0,這個(gè)參數(shù)控制探針的NA ratio,來(lái)決定是否填補(bǔ)。
  • 如果想用beadcount信息進(jìn)行過(guò)濾,champ.import() 函數(shù)會(huì)返回beads信息。
    使用方法為:
myImport <- champ.import(testDir)
myLoad <- champ.filter()
#與champ.load()函數(shù)功能一致

# champ.load()具體步驟為:
Section 1: Read PD Files Start: Reading CSV File
Section 2: Read IDAT files Start:Extract Mean value for Green and Red Channel Success
    Your Red Green Channel contains 622399 probes.
Section 3: Use Annotation Start:Reading 450K Annotation,there are 613 control probes in Annotation,Generating Meth and UnMeth Matrix,485512 Meth probes
  Generating beta Matrix
  Generating M Matrix
  Generating intensity Matrix
  Calculating Detect P value
  Counting Beads
# 導(dǎo)入之后可以用champ.filter() 函數(shù)進(jìn)行過(guò)濾
You may want to process champ.filter() next,This function is provided for user need to do filtering on some beta (or M) matrix, which contained most filtering system in champ.load except beadcount.

#過(guò)濾步驟
Section 1:  Check Input Start:You have inputed beta,intensity for Analysis.
Checking Finished :filterDetP,filterBeads,filterMultiHit,filterSNPs,filterNoCG,filterXY would be done on beta,intensity.
  You also provided :detP,beadcount .

Section 2: Filtering Start
The fraction of failed positions per sample
   Failed CpG Fraction.
C1         0.0013429122
C2         0.0022162171
C3         0.0003563249
C4         0.0002842360
T1         0.0003831007
T2         0.0011946152
T3         0.0014953286
T4         0.0015447610
Filtering probes with a detection p-value above 0.01.
    Removing 2728 probes.

Filtering BeadCount Start
    Filtering probes with a beadcount <3 in at least 5% of samples.
    Removing 9291 probes

  Filtering NoCG Start
    Only Keep CpGs, removing 2959 probes from the analysis.

  Filtering SNPs Start
    Using general 450K SNP list for filtering.
    Filtering probes with SNPs as identified in Zhou's Nucleic Acids Research Paper 2016.
    Removing 49231 probes from the analysis.

  Filtering MultiHit Start
    Filtering probes that align to multiple locations as identified in Nordlund et al
    Removing 7003 probes from the analysis.

  Filtering XY Start
    Filtering probes located on X,Y chromosome, removing 9917 probes from the analysis.

  Updating PD file

  Fixing Outliers Start
    Replacing all value smaller/equal to 0 with smallest positive value.
    Replacing all value greater/equal to 1 with largest value below 1..

過(guò)濾步驟為:

  • detection p-value (< 0.01)。這個(gè)值儲(chǔ)存在.idat文件中,champ.import()函數(shù)讀入這個(gè)值并形成數(shù)據(jù)框。p< 0.01的探針認(rèn)為實(shí)驗(yàn)失敗。過(guò)濾過(guò)程為:樣本探針失敗率閾值=0.1,再在剩下的樣本中過(guò)濾探針。參數(shù)SampleCutoff 和 ProbeCutoff控制這兩個(gè)閾值。
  • ChAMP will filter out probes with <3 beads ( filterBeads 參數(shù)控制) in at least 5% (beadCutoff 參數(shù)控制)of samples per probe.
  • 默認(rèn)過(guò)濾non-CpG probes
  • by default ChAMP will filter all SNP-related probe。需要用population參數(shù)選擇群體。如果不選,用General Recommended Probes provided by Zhou to do filtering。
  • ChAMP will filter all multi-hit probes.
  • 默認(rèn)過(guò)濾掉chromosome X and Y上的探針。filterXY 參數(shù)控制。
    如果沒(méi)有原始的.IDAT 數(shù)據(jù),用champ.filter() 函數(shù)進(jìn)行過(guò)濾。

注意:

champ.load() can not perform filtering on beta matrix alone. For users have no .IDAT data but beta matrix and Sample_Sheet.csv, you may want perform filtering using the champ.filter() function and then use following functions to do analysis.

CpG.GUI() 函數(shù)查看甲基化位點(diǎn)的分布情況。CpGs on chromosome, CpG island, TSS reagions.

# 在分析中的任何位置都可用這個(gè)函數(shù),
CpG.GUI(CpG=rownames(myLoad$beta),arraytype="450K")
#注意用的beta值,任何時(shí)候用beta值都可以,比如看DMP的情況
image.png
6.3 Further quality control and exploratory analysis

用champ.QC() function and QC.GUI() function檢查數(shù)據(jù)質(zhì)量

champ.QC()

champ.QC()函數(shù)會(huì)生成三個(gè)圖:


image.png

mdsPlot (Multidimensional Scaling Plot): 基于前1000個(gè)最易變化的位點(diǎn)查看樣本的相似度,用顏色標(biāo)記不同的樣本分組。


image.png

densityPlot: 查看每個(gè)樣本的beta值分布,有嚴(yán)重偏離的樣本預(yù)示著質(zhì)量較差(如亞硫酸鹽處理不完全等)
image.png

dendrogram:所有樣本的聚類(lèi)圖。champ.QC()函數(shù)中Feature.sel="None" 參數(shù)表示直接通過(guò)探針數(shù)值來(lái)計(jì)算樣本的距離,比較耗內(nèi)存;還有 “SVD” method。

QC.GUI() 函數(shù)也可以畫(huà)圖,但是比較耗內(nèi)存。
包括5張圖:mdsPlot, type-I&II densityPlot, sample beta distribution plot, dendrogram plot and top 1000 most variable CpG’s heatmap.

QC.GUI(beta=myLoad$beta,arraytype="450K")
image.png
  • type-I&II densityPlot圖可以幫助查看兩個(gè)探針的標(biāo)準(zhǔn)化狀態(tài)。
  • Top variable CpGs’ heatmap將前1000個(gè)差異最大的位點(diǎn)和狀態(tài)表示出來(lái)。
6.4 Normalization

type-I 和 type-II 兩種探針化學(xué)反應(yīng)不同,設(shè)計(jì)也不同,導(dǎo)致分布區(qū)域也不同。這兩種探針檢測(cè)出的差異可能是因?yàn)樘结標(biāo)谖恢貌黄胶鈱?dǎo)致的生物學(xué)差異引起的(如CpG位置的差異引起的)。最主要是type-II 探針exhibit a reduced dynamic range. 因此,針對(duì) type-II probe bias的矯正是必要的。
champ.norm() 函數(shù)可以實(shí)現(xiàn)這個(gè)功能。針對(duì)type-II 探針有4種標(biāo)準(zhǔn)化的方法:BMIQ, SWAN, PBC 和 unctionalNormliazation。
850k 芯片用BMIQ標(biāo)準(zhǔn)化要好一點(diǎn)。但是BMIQ對(duì)質(zhì)量差的樣本或者甲基化偏差比較大的control樣本效果不好。“cores”參數(shù)控制電腦核數(shù),PDFplot=TRUE將圖保存在resultsDir里。

myNorm <- champ.norm(beta=myLoad$beta,arraytype="450K",cores=5)
#標(biāo)準(zhǔn)化后可以再作圖看看差異
QC.GUI(myNorm,arraytype="450K")
6.5 SVD Plot

The singular value decomposition method (SVD) 用來(lái)用于評(píng)估數(shù)據(jù)集中變量的主要成分。這些顯著性位點(diǎn)可能與我們感興趣的生物學(xué)現(xiàn)象相關(guān)聯(lián),也可能與技術(shù)相關(guān),如批次效應(yīng)或群體效應(yīng)。樣本的病歷信息越詳細(xì)越好(如:dates of hybridization, season in which samples were collected, epidemiological information, etc),可以將這些因素包含進(jìn)SVD中。
如果從 .idat導(dǎo)入原始文件,設(shè)置champ.SVD()函數(shù)的RGEffect=TRUE ,芯片上18個(gè)內(nèi)置的對(duì)照探針(包括亞硫酸鹽處理效率)將納入確信的因素進(jìn)行分析。
champ.SVD()函數(shù)將把pd文件中的所有協(xié)變量和表型數(shù)據(jù)納入進(jìn)行分析。可以用cbind()函數(shù)將自己的協(xié)變量與myLoad$pd合并進(jìn)行分析。但是對(duì)于分類(lèi)變量和數(shù)字變量處理方法是不一樣的。 分類(lèi)變量要轉(zhuǎn)換成“factor” or “character”類(lèi)型,數(shù)字變量轉(zhuǎn)換成數(shù)字類(lèi)型。
champ.SVD()分析時(shí)會(huì)把協(xié)變量打印在屏幕上,結(jié)果是熱圖,保存為SVDsummary.pdf文件。黑色表示最顯著的p值。如果發(fā)現(xiàn)技術(shù)因素有影響,就需要用ComBat等方法重新標(biāo)準(zhǔn)化數(shù)據(jù),包括variation related to the beadchip, position and/or plate。

champ.SVD(beta=myNorm,pd=myLoad$pd)
image.png

上圖是用自帶的測(cè)試數(shù)據(jù)繪制的,不是很復(fù)雜,看不出來(lái)。下圖用GSE40279的656個(gè)樣本繪制的。其中年齡是數(shù)字變量,其他都為分類(lèi)變量。


image.png
6.6 Batch Effect Correction

ComBat方法是sva 包里的一個(gè)方法,已經(jīng)整合到ChAMP包里了,batchname=c("Slide")參數(shù)控制矯正因素。champ.runCombat() 函數(shù)自動(dòng)把Sample_Group作為協(xié)變量矯正,現(xiàn)在又加入了另一個(gè)參數(shù)variablename用來(lái)加入自己的協(xié)變量進(jìn)行矯正。如果用戶(hù)在 champ.runCombat()函數(shù)中寫(xiě)的 batchname正確,函數(shù)將自動(dòng)進(jìn)行批次效應(yīng)矯正。
ComBat如果直接用beta值進(jìn)行矯正,輸出可能不在0-1之間,所以計(jì)算機(jī)在計(jì)算前需要做一個(gè)變換。如果用M-values矯正,參數(shù) logitTrans=FALSE設(shè)置。有時(shí)候批次效應(yīng)和變異會(huì)混雜在一起,如果矯正了批次效應(yīng),變異也會(huì)消失,

#矯正批次效應(yīng),這步非常耗時(shí)。
myCombat <- champ.runCombat(beta=myNorm,pd=myLoad$pd,batchname=c("Slide"))
#查看矯正后 的結(jié)果
champ.SVD() 
6.7 Differential Methylation Probes(DMP & DMR & DMB)

目的是找出幾百萬(wàn)CpG中的哪些在疾病中發(fā)生了變化,而這些變化又是如何導(dǎo)致了基因發(fā)生了變化,最終導(dǎo)致了人體生病。

DMP代表找出Differential Methylation Probe(差異化CpG位點(diǎn)),DMR代表找出Differential Methylation Region(差異化CpG區(qū)域),Block代表Differential Methylation Block(更大范圍的差異化region區(qū)域)。

champ.DMP() 實(shí)現(xiàn)了 limma包中利用linear model計(jì)算差異甲基化位點(diǎn)的p-value。最新的champ.DMP()包支持分析數(shù)值型變量如年齡,分類(lèi)型變量如包含多個(gè)表型的:“tumor”, “metastasis”, “control”。數(shù)值型變量(如年齡)會(huì)用linear regression模型作為協(xié)變量進(jìn)行分析,to find your covariate-related CpGs, say age-related CpGs.分類(lèi)型變量會(huì)按類(lèi)型分類(lèi)進(jìn)行比較,如比較“tumor–metastatic”, “tumor-control”, and “metastatis-control”之間的差異,結(jié)果會(huì)輸出一個(gè)數(shù)據(jù)框,包含差異的探針:P-value, t-statistic and difference in mean methylation(被轉(zhuǎn)換為logFC,類(lèi)似于RNA-seq中的log fold-change)。還包括每個(gè)探針的注釋?zhuān)嗤M的平均beta值,兩組之間的delta beta值(與 logFC相同的意思,老版本的包需要)。高級(jí)用戶(hù)可以用limma 包進(jìn)一步用輸出的探針及p值進(jìn)行DMR分析。

#DMPs 分析
myDMP <- champ.DMP(beta = myNorm,pheno=myLoad$pd$Sample_Group)
#查看結(jié)果
head(myDMP[[1]])

champ.DMP()返回的是list,新版本的ChAMP包含GUI交互界面檢查myDMP的結(jié)果。用戶(hù)提供未經(jīng)修改的champ.DMP (myDMP)函數(shù)產(chǎn)生的orginal beta matrix結(jié)果和covariates,DMP.GUI() 函數(shù)自動(dòng)檢測(cè)covariates是數(shù)值型還是分類(lèi)型。分類(lèi)型如case/control, DMP.GUI()自動(dòng)畫(huà)出顯著性差異位點(diǎn)。

DMP.GUI(DMP=myDMP[[1]],beta=myNorm,pheno=myLoad$pd$Sample_Group)
# myDMP is a list now, each data frame is stored as myDMP[[1]], myDMP[[2]], myDMP[[3]]...

6.7.2 Hydroxymethylation Analysis 羥甲基化

一些用戶(hù)想做羥甲基化,下面為示例代碼

myDMP <- champ.DMP(beta=myNorm, pheno=myLoad$pd$Sample_Group, compare.group=c("oxBS", "BS"))
# In above code, you can set compare.group() as "oxBS" and "BS" to do DMP detection between hydroxymethylatio and normal methylation.

hmc <- myDMP[[1]][myDMP[[1]]$deltaBeta>0,]
# Then you can use above code to extract hydroxymethylation CpGs.

6.8 Differential Methylation Regions 差異甲基化區(qū)域

DMRs主要指一連串的CpG都會(huì)出現(xiàn)很明顯的差異,champ.DMR()函數(shù)計(jì)算并返回一個(gè)數(shù)據(jù)框,包括:detected DMRs, with their length, clusters, number of CpGs annotated.
函數(shù)包含三種算法Bumphunter, ProbeLasso and DMRcate. Bumphunter比較可靠,精確度可以有90%以上,ProbeLasso有75%左右,DMRcate是后來(lái)集成進(jìn)去的,沒(méi)有評(píng)測(cè)過(guò)。Bumphunter 算法首先將所有的探針?lè)殖蓭仔☆?lèi),然后用隨機(jī)permutation方法評(píng)估候選的DMRs.

myDMR <- champ.DMR(beta=myNorm,pheno=myLoad$pd$Sample_Group,method="Bumphunter")
head(myDMR$DMRcateDMR)
DMR.GUI(DMR=myDMR)
# It might be a little bit slow to open DMR.GUI() because function need to extract annotation for CpGs from DMR. Might take 30 seconds.

6.9 Differential Methylation Blocks

在Block-finder 功能中,champ.Block()函數(shù)首先在全基因組范圍上計(jì)算small clusters (regions) ,然后對(duì)于每個(gè)cluster,計(jì)算平均值和位置,將每個(gè)區(qū)域壓縮為一個(gè)單元。 When we finding DMB, only single unit from open sea would be used to do clustering. Here Bumphunter algorithm will be used to find “DMRs” over these regions (single units after collapse). In our previous paper23, and other scientists’ work24 we demonstrated that Differential Methylated Blocks may show universal feature across various cancers

myBlock <- champ.Block(beta=myNorm,pheno=myLoad$pd$Sample_Group,arraytype="450K")
head(myBlock$Block)
Block.GUI(Block=myBlock,beta=myNorm,pheno=myLoad$pd$Sample_Group,runDMP=TRUE,compare.group=NULL,arraytype="450K")

6.10 Gene Set Enrichment Analysis

尋找作用通路網(wǎng)絡(luò)中的疾病關(guān)聯(lián)小網(wǎng)絡(luò)
After previous steps, you may already get some significant DMPs or DMRs, thus you may want to know if genes involved in these significant DMPs or DMRs are enriched for specific biological terms or pathways. To achieve this analysis, you can use champ.GSEA() to do GSEA analysis.champ.GSEA() would automatically extract gene information, transfer CpG information into gene information then conduct GSEA on each list.

There are two ways to do GSEA. In previous version, ChAMP used pathway information downloaded from MSigDB. Then Fisher Exact Test will be used to calculate the enrichment status of each pathway. After gene enrichment analysis, champ.GSEA() function would automatically return pathways with P-value smaller then adjPval cutoff.

However, as pointed out by Geeleher [citation], since different genes has different numbers of CpGs contained inside, the two situation that one genes with 50 CpGs inside but only one of them show significant methylation, and one gene with 2 CpGs inside but two are significant methylated should not be eaqualy treated. The solution is use number of CpGs contained by genes to correct significant genes. as implemented in the gometh function from missMethyl package25. In gometh function, it used number of CpGs contained by each gene replace length as biased data, to correct this issue. The idea of gometh is fitting a curve for numbers of CpGs across genes related with GSEA, then using the probability weighting function to correct GO’s p value.

champ.GSEA() function as “goseq” to use goseq method to do GSEA, or user may set it as “fisher” to do normal Gene Set Enrichment Analysis.

myGSEA <- champ.GSEA(beta=myNorm,DMP=myDMP[[1]], DMR=myDMR, arraytype="450K",adjPval=0.05, method="fisher")
# myDMP and myDMR could (not must) be used directly.

6.11 Differential Methylated Interaction Hotspots

champ.EpiMod() This function uses FEM package to infer differentially methylated gene modules within a user-specific functional gene-network. This network could be e.g. a protein-protein interaction network. Thus, the champ.EpiMod() function can be viewed as a functional supervised algorithm, which uses a network of relations between genes (usually a PPI network), to identify subnetworks where a significant number of genes are associated with a phenotype of interest (POI). The EpiMod algorithm can be run in two different modes: at the probe level, in which case the most differentially methylated probe is assigned to each gene, or at the gene-level in which case a DNAm value is assigned to each gene using an optimized procedure described in detail in Jiao Y, Widschwendter M, Teschendorff AE Bioinformatics 2014. Originally, the FEM package was developed to infer differentially methylated gene modules which are also deregulated at the gene expression level, however here we only provide the EpiMod version, which only infers differentially methylated modules. More advanced user may refer to FEM package for more information.

myEpiMod <- champ.EpiMod(beta=myNorm,pheno=myLoad$pd$Sample_Group)

6.13 Cell Type Heterogeneity

由于DNA甲基化有高的細(xì)胞特異性,許多DMPs/DMRs的變化是由細(xì)胞成分導(dǎo)致的。許多方法可以矯正這個(gè)問(wèn)題:RefbaseEWAS用組織的細(xì)胞類(lèi)型做參考數(shù)據(jù)庫(kù),確定細(xì)胞比例。In ChAMP, we include a reference databases for whole blood, one for 27K and the other for 450K. After champ.refbase() function, cell type heterogeneity corrected beta matrix, and cell-type specific proportions in each sample will be returned. Do remember champ.refbase() can only works on Blood Sample Data Set.

myRefBase <- champ.refbase(beta=myNorm,arraytype="450K")
# Our test data set is not whole blood. So it should not be run here.

閱讀推薦:

生信技能樹(shù)公益視頻合輯:學(xué)習(xí)順序是linux,r,軟件安裝,geo,小技巧,ngs組學(xué)!

B站鏈接:https://m.bilibili.com/space/338686099

YouTube鏈接:https://m.youtube.com/channel/UC67sImqK7V8tSWHMG8azIVA/playlists

生信工程師入門(mén)最佳指南:https://mp.weixin.qq.com/s/vaX4ttaLIa19MefD86WfUA

學(xué)徒培養(yǎng):https://mp.weixin.qq.com/s/3jw3_PgZXYd7FomxEMxFmw

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,533評(píng)論 6 531
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 98,055評(píng)論 3 414
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人,你說(shuō)我怎么就攤上這事。” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 175,365評(píng)論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我,道長(zhǎng),這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 62,561評(píng)論 1 307
  • 正文 為了忘掉前任,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 71,346評(píng)論 6 404
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 54,889評(píng)論 1 321
  • 那天,我揣著相機(jī)與錄音,去河邊找鬼。 笑死,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 42,978評(píng)論 3 439
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 42,118評(píng)論 0 286
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 48,637評(píng)論 1 333
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 40,558評(píng)論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 42,739評(píng)論 1 369
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,246評(píng)論 5 355
  • 正文 年R本政府宣布,位于F島的核電站,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 43,980評(píng)論 3 346
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 34,362評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 35,619評(píng)論 1 280
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 51,347評(píng)論 3 390
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 47,702評(píng)論 2 370

推薦閱讀更多精彩內(nèi)容