bioawk是awk的超集.
bioawk能額外解析的生物數(shù)據(jù)格式
$ bioawk -c help
- bed:
1:chrom 2:start 3:end 4:name 5:score 6:strand 7:thickstart 8:thickend 9:rgb 10:blockcount 11:blocksizes 12:blockstarts - sam:
1:qname 2:flag 3:rname 4:pos 5:mapq 6:cigar 7:rnext 8:pnext 9:tlen 10:seq 11:qual - vcf:
1:chrom 2:pos 3:id 4:ref 5:alt 6:qual 7:filter 8:info - gff:
1:seqname 2:source 3:feature 4:start 5:end 6:score 7:filter 8:strand 9:group 10:attribute - fastx:
1:name 2:seq 3:qual 4:comment
bed為常見的染色體位置格式
sam格式為比對(duì)結(jié)果的格式
vcf為突變結(jié)果格式
gff格式用的少
fastx為序列格式,包括fasta和fastq
示例
只取長(zhǎng)度為166的序列
bioawk -c fastx 'length($seq)==166{print "@"$name"\n"$seq"\n+\n"$qual}' in.fq > out.fq