最近學(xué)習(xí)微生物宏基因組分箱(binning),按官方文檔安裝metaWRAP,踩了一堆坑,記錄一下報(bào)錯(cuò)及解決方法:
1. metaWRAP安裝
作者推薦使用Conda/Mamba安裝,不推薦使用bioconda及docker,于是找了個(gè)包含conda的docker鏡像,開(kāi)始了漫漫長(zhǎng)路的第一步:
(1)conda安裝軟件
conda create -y -n metawrap-envpython=2.7source activate metawrap-envconda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky
conda install-y -c ursky metawrap-mg
conda install-y blas=2.5=mkl
裝完大概5GB大小,提交到了docker hub上:
docker push raser216/metawrap:v1.0.0
本以為大功告成,結(jié)果隨之而來(lái)的是一系列的報(bào)錯(cuò)……
(2)安裝libtbb2庫(kù)
運(yùn)行到quant_bins,才發(fā)現(xiàn)少了個(gè)依賴庫(kù)沒(méi)裝,導(dǎo)致salmon軟件統(tǒng)計(jì)基因豐度時(shí)報(bào)錯(cuò):
salmon: errorwhileloading shared libraries: libtbb.so.2
解決方法:
#安裝libtbb2庫(kù)
apt-getinstalllibtbb2
(3)安裝libGL.so.1
bin_refinement步驟figures目錄下無(wú)圖片,python繪圖程序報(bào)錯(cuò):
ImportError: Failed to import any qt binding
#python2.7 已安裝matplotlib,但無(wú)法導(dǎo)入
import matplotlib
import matplotlib.pyplot as plt
ImportError: libGL.so.1: cannot open sharedobjectfile: No suchfileor directory
解決方法:安裝libGL.so.1依賴。
apt-get -y update
apt-getinstall-y libgl1-mesa-glx
#安裝后,python2可以導(dǎo)入該模塊,不再報(bào)錯(cuò)
python 2.7import matplotlib.pyplot as plt
(4)prokka安裝失敗,報(bào)錯(cuò)
prokka無(wú)法使用,安裝失敗:
可能原因:metawrap安裝的perl版本不符合prokka要求?(metawrap不支持perl 5.26?)。
prokka -h
Can't locate Bio/Root/Version.pm in @INC (you may need to install the Bio::Root::Version module) (@INC contains: /opt/conda/envs/metawrap-env/bin/../perl5 /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2//x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/ /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2 /opt/conda/envs/metawrap-env/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/5.26.2 .) at /opt/conda/envs/metawrap-env/bin/prokka line 32.BEGIN failed--compilation aborted at /opt/conda/envs/metawrap-env/bin/prokka line32.
解決方法:在當(dāng)前metawrap 環(huán)境中用conda重裝prokka-1.13。
conda create -n prokka-test prokka=1.13minced=0.3.0parallel=20180522blast=2.12.0source activate prokka-test
2.conda報(bào)錯(cuò)
(1)無(wú)法進(jìn)入conda環(huán)境
無(wú)法在shell腳本中通過(guò)source activate metawrap-env進(jìn)入conda環(huán)境,報(bào)錯(cuò):
/opt/conda/envs/metawrap-env/etc/conda/activate.d/activate-binutils_linux-64.sh: line65: ADDR2LINE: unbound variable
解決方法:通過(guò)dockerfile進(jìn)入conda環(huán)境,并把安裝軟件的路徑加到環(huán)境變量中:
cat metawrap_v1.dockerfile
#dockerfile內(nèi)容如下
FROM raser216/metawrap:v1.0.0RUN echo"source activate metawrap-env"> ~/.bashrc
ENV PATH /opt/conda/envs/metawrap-env/bin:$PATH
3.數(shù)據(jù)庫(kù)路徑及版本
metaWRAP中調(diào)用的比對(duì)軟件(kraken、BLAST等)的數(shù)據(jù)庫(kù)可以外置,但數(shù)據(jù)庫(kù)外置的路徑需要在config中寫(xiě)明:
#config文件路徑whichconfig-metawrap/opt/conda/envs/metawrap-env/bin/config-metawrap
#用sed -i更改為各數(shù)據(jù)庫(kù)真實(shí)路徑
kraken_database=/database/kraken_database/kraken_newdb2/axel_dowload
nt_database=/database/newdownload3
tax_database=/database/metawrap_database/ncbi_taxonomysed-i"s#~/KRAKEN_DB#$kraken_database#g"/opt/conda/envs/metawrap-env/bin/config-metawrapsed-i"s#~/NCBI_NT_DB#$nt_database#g"/opt/conda/envs/metawrap-env/bin/config-metawrapsed-i"s#~/NCBI_TAX_DB#$tax_database#g"/opt/conda/envs/metawrap-env/bin/config-metawrap
該文件必須有寫(xiě)權(quán)限,否則bin_refinement步驟報(bào)錯(cuò):
#bin_refinement步驟報(bào)錯(cuò)
You donot seem to have permission to edit the checkm configfilelocated at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG
解決方法:改變config文件權(quán)限,不再報(bào)錯(cuò)。
chmod777/opt/conda/envs/metawrap-env/bin/config-metawrap
4. kraken軟件報(bào)錯(cuò)
kraken是個(gè)直接對(duì)測(cè)序reads(fastq)進(jìn)行物種注釋的軟件,目前有兩個(gè)主版本,1代(kraken)耗內(nèi)存極高(>100GB),2代(kraken2)改良了很多(35GB左右就行)。
(1)注釋行導(dǎo)致的報(bào)錯(cuò)
kraken.sh腳本路徑在/opt/conda/envs/metawrap-env/bin/metawrap-modules/,該腳本第123-125行的注釋信息直接寫(xiě)在行后,導(dǎo)致kraken.sh運(yùn)行報(bào)錯(cuò)(錯(cuò)誤信息未記錄):
123 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' |\ #combine paired end reads onto one line? ? 124 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ #shuffle reads, select top N reads, and then restore tabulation
? 125 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}' #separate reads into F and R files
解決方法:把注釋行全部換到新行
123 # combine paired end reads onto one line, then? ? 124 # shuffle reads, select top N reads, and then restore tabulation, then? 125# separate reads into F and R files126 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' |\? 127 shuf | head -n $depth | sed 's/\t\t/\n/g' | \
? 128 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}'
(2) 腳本無(wú)權(quán)限報(bào)錯(cuò)
注意kraken.sh腳本權(quán)限應(yīng)為可執(zhí)行,否則使用時(shí)報(bào)錯(cuò):
/opt/conda/envs/metawrap-env/bin/metawrap: line69: /opt/conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: Permission denied
解決方法:修改腳本權(quán)限為775,不再報(bào)錯(cuò)。
chmod775kraken.shls-l kraken.sh-rwxrwxr-x1root root8.9K Sep2220:12kraken.sh
(3)python注釋腳本報(bào)錯(cuò)
python腳本kraken2_translate.py,字典names_map遇到未知key,報(bào)KeyError錯(cuò)誤。
Something went wrong with running kraken-translate... Exiting.
Traceback (most recent call last):
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line120,in? ? main()
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line114,in main
? ? translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file)
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line98,in translate_kraken2_annotations
? ? taxonomy = get_full_name(taxid, names_map, ranks_map)
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line30,in get_full_name
? ? name = names_map[taxid]
KeyError: '1054037'
?解決方法:修改字典獲取值的方式,改為dict.get()函數(shù),并加入None值判斷。
vi/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py
#修改get_full_name函數(shù),使key不存在時(shí)names_map不報(bào)錯(cuò):
? ? fortaxidin taxid_lineage:
? ? ? ? #name = names_map[taxid]
? ? ? ? name = names_map.get(taxid)
? ? ? ? ifname == None:
? ? ? ? ? ? name ="unknown"? ? ? ? names_lineage.append(name)
(4)找不到taxonomy數(shù)據(jù)庫(kù)報(bào)錯(cuò)
下載的NCBI taxonomy數(shù)據(jù)庫(kù)需要放到下載的kraken數(shù)據(jù)庫(kù)目錄下,否則報(bào)錯(cuò):
Traceback (most recent calllast):
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line120,in? ? main()
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line114,in main
? ? translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file)
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line87,in translate_kraken2_annotations
? ? names_map, ranks_map = load_kraken_db_metadata(kraken2_db)
? File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line50,in load_kraken_db_metadata
? ? with open(names_path) as input:
IOError: [Errno 2] No suchfileor directory:'/database/kraken_database/kraken_newdb2/axel_dowload/taxonomy/names.dmp'
?解決方法:把taxonomy數(shù)據(jù)庫(kù)復(fù)制到kraken數(shù)據(jù)庫(kù)目錄下。
(5)kraken軟件與數(shù)據(jù)庫(kù)版本不相符,報(bào)錯(cuò)
此前用過(guò)kraken2(2代軟件),服務(wù)器上已經(jīng)下載了2代所需的(巨大的)數(shù)據(jù)庫(kù),不想再下一次kraken(1代軟件)數(shù)據(jù)庫(kù),于是試了試2代的數(shù)據(jù)庫(kù)能否兼容1代軟件,果然不行,報(bào)錯(cuò):
kraken: database ("/database/kraken_database/kraken_newdb2/axel_dowload") does not contain necessaryfiledatabase.kdb
遂考慮更新metaWRAP中的kraken版本,結(jié)果發(fā)現(xiàn),默認(rèn)安裝的metaWRAP不支持kraken2,需要更新到最新的1.3.2版本:
解決方法:更新metaWRAP版本至1.3.2。
condainstall-y -c ursky metawrap-mg=1.3.2#更新后需要重新修改config文件權(quán)限,及其中的內(nèi)容chmod777/opt/conda/envs/metawrap-env/bin/config-metawrap
5.checkM軟件報(bào)錯(cuò)
(1)py換行符報(bào)錯(cuò)
checkM是用于檢測(cè)基因組拼接組裝完整性的軟件,bin_refinement會(huì)用到,直接報(bào)錯(cuò):
Traceback (most recent calllast):
? File "/opt/conda/envs/metawrap-env/bin/checkm", line36,in? ? from checkm import main
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line25,in? ? from checkm.defaultValues import DefaultValues
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line26,in? ? class DefaultValues():
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line29,in DefaultValues
? ? __DBM = DBManager()
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line114,in __init__
? ? if not self.setRoot():
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line140,in setRoot
? ? path = self.confirmPath(path=path)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line162,in confirmPath
? ? path = raw_input("Where should CheckM store it's data?\n" \
EOFError: EOF when reading a line
解決方法:修改checkmData.py文件raw_input()函數(shù)參數(shù)。
該py腳本所在路徑:/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/
報(bào)錯(cuò)原因:第162行的raw_input()函數(shù)加了“\”作為換行符,python沒(méi)識(shí)別
162path = raw_input("Where should CheckM store it's data?\n" \163Please specify a location or type'abort'to stop trying: \n")
解決方法:刪除該換行符。
162path = raw_input("Where should CheckM store it's data?\nPlease specify a location or type 'abort' to stop trying: \n")
(2)找不到數(shù)據(jù)庫(kù)報(bào)錯(cuò)
第一次運(yùn)行checkM時(shí),會(huì)被要求選擇數(shù)據(jù)庫(kù)位置,所以最好是在安裝后就運(yùn)行一下checkm data setRoot,先設(shè)置好數(shù)據(jù)庫(kù)路徑:
checkm data setRoot******************************************************************************* [CheckM - data] Checkfor database updates. [setRoot]*******************************************************************************Where should CheckM store it's data?Please specify a location or type'abort' to stop trying: /checkm_database
Path [/checkm_database] exists and you have permission towriteto this folder.
否則,checkM找不到數(shù)據(jù)庫(kù),會(huì)顯示以下信息:
It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'.
You do not seem to have permission to edit the checkm config file
located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG
Please try again with updated privileges.
Unexpected error: <type 'exceptions.TypeError'>
?(3)tmpdir路徑過(guò)長(zhǎng),報(bào)錯(cuò)
******************************************************************************* [CheckM - tree] Placing binsin reference genome tree.*******************************************************************************? Identifying marker genes in8bins with32 threads:
Process SyncManager-1:
Traceback (most recent call last):
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line267,in _bootstrap
? ? self.run()
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line114,in run
? ? self._target(*self._args, **self._kwargs)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line550,in _run_server
? ? server = cls._Server(registry, address, authkey, serializer)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line162,in __init__
? ? self.listener = Listener(address=address, backlog=16)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line132,in __init__
? ? self._listener = SocketListener(address, family, backlog)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line256,in __init__
? ? self._socket.bind(address)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/socket.py", line228,in meth
? ? return getattr(self._sock,name)(*args)
error: AF_UNIX path too longTraceback (most recent call last):
? File "/opt/conda/envs/metawrap-env/bin/checkm", line708,in? ? checkmParser.parseOptions(args)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line1251,in parseOptions
? ? self.tree(options)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line133,in tree
? ? options.bCalledGenes)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line67,infind? ? binIdToModels = mp.Manager().dict()
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/__init__.py", line99,in Manager
? ? m.start()
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line528,in start
? ? self._address = reader.recv()
EOFError
?解決方法:修改binning.sh等腳本中指定的checkm --tmpdir,指定一個(gè)絕對(duì)路徑較短的臨時(shí)文件存放目錄。
#該路徑下這3個(gè)腳本都用到checkM,都需要改默認(rèn)的--tmpdir
cd /opt/conda/envs/metawrap-env/bin/metawrap-modulesgrepcheckm *sh|awk-F":"'{print $1}'|sort|uniqbin_refinement.shbinning.shreassemble_bins.sh#以binning.sh為例
#在checkm命令前加一行,新建一個(gè)較短的tmp目錄,用于存放checkM的tmp文件mkdir-p /tmp/$(basename${1}).tmp
#修改checkm的--tmpdir61checkm lineage_wf -x fa ${1} ${1}.checkm -t $threads --tmpdir /tmp/$(basename${1}).tmp --pplacer_threads $p_threads62if[[ ! -s ${1}.checkm/storage/bin_stats_ext.tsv ]];thenerror"Something went wrong with running CheckM. Exiting...";fi#運(yùn)行完畢后刪除該tmp目錄rm-r /tmp/$(basename${1}).tmp
#其余兩個(gè)腳本同樣需要修改對(duì)應(yīng)checkm行#bin_refinement.sh腳本修改
if[ ! -d /tmp/$(basename${bin_set}) ];thenmkdir-p /tmp/$(basename${bin_set}).tmp;fiif["$quick"=="true"];then? ? ? ? comm "Note: running with --reduced_tree option"? ? ? ? checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename${bin_set}).tmp --pplacer_threads $p_threads --reduced_treeelse? ? ? ? checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename${bin_set}).tmp --pplacer_threads $p_threadsfiif[[ ! -s ${bin_set}.checkm/storage/bin_stats_ext.tsv ]];thenerror"Something went wrong with running CheckM. Exiting...";fi${SOFT}/summarize_checkm.py ${bin_set}.checkm/storage/bin_stats_ext.tsv $bin_set | (read -r; printf"%s\n""$REPLY";sort) > ${bin_set}.statsif[[ $? -ne0]];thenerror"Cannot make checkm summary file. Exiting.";firm-r ${bin_set}.checkm;rm-r /tmp/$(basename ${bin_set}).tmpmkdir-p /tmp/binsO.tmpif["$quick"=="true"];then? ? ? ? checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads --reduced_treeelse? ? ? ? checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threadsfiif[[ ! -s binsO.checkm/storage/bin_stats_ext.tsv ]];thenerror"Something went wrong with running CheckM. Exiting...";firm-r /tmp/binsO.tmp
#reassemble_bins.sh腳本修改mkdir-p /tmp/$(basename ${out}).tmp
checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename${out}).tmp --pplacer_threads $p_threadsif[[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]];thenerror"Something went wrong with running CheckM. Exiting...";fi${SOFT}/summarize_checkm.py ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv | (read -r; printf"%s\n""$REPLY";sort) > ${out}/reassembled_bins.statsif[[ $? -ne0]];thenerror"Cannot make checkm summary file. Exiting.";firm-r /tmp/$(basename ${out}).tmpmkdir-p /tmp/$(basename ${out}).tmp
checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename${out}).tmp --pplacer_threads $p_threadsif[[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]];thenerror"Something went wrong with running CheckM. Exiting...";firm-r /tmp/$(basename${out}).tmp
?該錯(cuò)誤會(huì)連帶導(dǎo)致bin_refinement報(bào)錯(cuò)(因?yàn)閏heckM未正確運(yùn)行,無(wú)對(duì)應(yīng)統(tǒng)計(jì)結(jié)果):
Traceback (most recent calllast):
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line277,in _run_finalizers
? ? finalizer()
? File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line207,in __call__
? ? res = self._callback(*self._args, **self._kwargs)
? File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line266,in rmtree
? ? onerror(os.remove, fullname, sys.exc_info())
? File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line264,in rmtree
? ? os.remove(fullname)
OSError: [Errno 16] Device or resource busy:'binsO.tmp/pymp-REeR36/.nfs9061e516f4bd263400000b82'mv: cannotstat'binning_results.eps': No suchfile or directorymv: cannotstat'binning_results.eps': No suchfileor directory
?6.BLAST報(bào)錯(cuò)
blobology步驟,BLAST版本與已下載的nt數(shù)據(jù)庫(kù)(下載的是version 5,最新版數(shù)據(jù)庫(kù))版本不符,報(bào)錯(cuò):
BLAST Database error: Error: Not a valid version4database.
?解決方法:更新BLAST版本。
#下載并解壓新版BLAST軟件wgethttps://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gztar-xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz
#替換掉conda鏡像中的BLASTmkdir/opt/conda/envs/metawrap-env/bin/bakforiin$(ls);domv/opt/conda/envs/metawrap-env/bin/$i /opt/conda/envs/metawrap-env/bin/bak;cp$i /opt/conda/envs/metawrap-env/bin;done
7.prokka報(bào)錯(cuò)
(1)不識(shí)別blast版本,報(bào)錯(cuò)
prokka軟件用于注釋組裝好的基因組,是一個(gè)perl腳本,對(duì)軟件blastp及makeblastdb的要求為版本大于2.8及以上,但此處判斷條件有點(diǎn)問(wèn)題,識(shí)別不了我的blast 2.12.0(認(rèn)為版本2.12小于2.8……)。
不懂perl語(yǔ)言,沒(méi)法優(yōu)化,只好把MINVER都改成了2.1:
'blastp'=> {
? ? GETVER? =>"blastp -version",
? ? REGEXP? => qr/blastp:\s+($BIDEC)/,
? ? MINVER? =>"2.1",
? ? NEEDED? =>1,
? },
? 'makeblastdb'=> {
? ? GETVER? =>"makeblastdb -version",
? ? REGEXP? => qr/makeblastdb:\s+($BIDEC)/,
? ? MINVER? =>"2.1",
? ? NEEDED? =>0,? # onlyif--proteins used
? },