文章轉自
作者:阿糖胞苷
鏈接:http://www.lxweimin.com/p/41372e039194
來源:簡書
簡書著作權歸作者所有,任何形式的轉載都請聯系作者獲得授權并注明出處。
Monocle2直接輸入Seurat object只適用于Seuratv2.0版本的Seurat object。Monocle3無法將Seurat Object 轉為cds,
自己手動構建celldataset,--De novo construct monocle v2 的 celldataset
Although Monocle can be used with raw read counts, these are not directly proportional to expression values unless you normalize them by length, so some Monocle functions could produce nonsense results. If you don't have UMI counts, We recommend you load up FPKM or TPM values instead of raw read counts.
1.Generate the Required Format Files
a. expression matrix
(bulk-raw reads count < TPM; RPKM/FPKM < UMI)
b.featuredata (fd) 基因特征注釋矩陣
c.phenodata (pd) 細胞特征注釋矩陣
library(monocle)
library(Seurat)
data<-readRDS("../myo_0509.rds")
a. construct expr-matrix (細胞-基因表達矩陣)
Seurat object中的@assay中的@counts會存放單細胞測序的raw data (UMI),所以選擇將@counts轉換為expression matrix
1.data@assays$RNA@data
存放 relative expression values (TPM, FPKM/RPKM)
2.data@assays$RNA@counts
存放 absolute transcript counts (TPM, FPKM/RPKM)
data_matrix<-as(as.matrix(data@assays$RNA@counts), 'sparseMatrix')
!#UMI counts 存儲為稀疏矩陣 save more memeory
!#大多的matrix 都是sparseMatrix format(eg:MTX),DON'T convert it into dense matrix.
b. construct featuredata 基因特征注釋矩陣
featuredata需要兩個col,一個是gene_id,一個是gene_short_name,
row對應counts的rownames
feature_ann<-data.frame(gene_id=rownames(data_matrix),gene_short_name=rownames(data_matrix))
rownames(feature_ann)<-rownames(data_matrix)
data_fd<-new("AnnotatedDataFrame", data = feature_ann)
b. construct phenodata 細胞特征注釋矩陣
Seurat object中的@meta.data一般會存放表型相關的信息如cluster、sample的來源、group等,所以選擇將metadata轉換為phenodata
sample_ann<-data@meta.data
rownames(sample_ann)<-colnames(data_matrix)
data_pd<-new("AnnotatedDataFrame", data =sample_ann)
2. Creat CDS object
create cds object --Use the right distribution! specify the appropriate model
data.cds<-newCellDataSet(data_matrix,phenoData =data_pd,featureData =data_fd,expressionFamily=negbinomial.size())
!#Converting TPM/FPKM values into mRNA counts (alternative:)
!#if you first convert your relative expression values to transcript counts using relative2abs().
!#This often leads to much more accurate results than using tobit()
!#UMIs /read counts- negbinomial.size()
查看phenodata、featuredata
head(pData(data.cds))
head(fData(data.cds))