Module:Bio::SearchIO

From BioPerl

Jump to: navigation, search


Pdoc documentation: Bio::SearchIO CPAN documentation: Bio::SearchIO

This is a factory module for plugging in different parsers for pairwise alignment objects. It will produce Bio::Search::Result::ResultI compliant objects which in turn contain Bio::Search::Hit::HitI compliant objects and these contain Bio::Search::HSP::HSPI modules.

Supported formats

This module can parser many different pairwise alignment search algorithm results.

Application Bio::SearchIO module sub-formats comments
BLAST Bio::SearchIO::blast WU BLAST, NCBI BLAST, bl2seq, rpsblast, psiblast, phiblast, TimeLogic BLAST, UCSC BLAST-like This supports many different BLAST flavors
Tabular BLAST Bio::SearchIO::blasttable -m9 and -m8 (NCBI) or -mformat 2 and -mformat 3 (WU-BLAST) tabular format This is the tab-delimited column format from NCBI BLAST or WU BLAST
Megablast BLAST Bio::SearchIO::megablast Format 0 Format 2 should be parseable as standard BLAST output.
XML BLAST Bio::SearchIO::blastxml NCBI XML format NCBI's XML DTD
FASTA Bio::SearchIO::fasta -m 9 -d 0 OR the default -m 1 options Default or compact formats
HMMER Bio::SearchIO::hmmer hmmsearch and hmmpfam output parsed Domains are Hits and alignments fall into the HSP category. Note that query-subject relationship flips between hmmsearch and hmmpfam (i.e. the domain is subject in hmmpfam and query in hmmsearch)
Exonerate Bio::SearchIO::exonerate CIGAR and VULGAR formats Only one of CIGAR or VULGAR should be parsed (provide the -vulgar =>1 or -cigar=>1 when initializing the object)
Genewise Genomewise Bio::SearchIO::wise -genesf output This parses the -genesf or -genes output from Genewise and Genomewise
Sim4 Bio::SearchIO::sim4 A={0,1,3,4} output Cannot parse LAV or 'exon file' formats (A=2 or A=5)
PSL alignment format BLAT UCSC tools Bio::SearchIO::psl psl format with or without -noHead option
AXT alignment format BLAT UCSC tools Bio::SearchIO::axt Directly produced from blat or see the lavToAxt If you run BLASTZ you can convert the LAV alignment format to AXT alignment format with lavToAxt
WABA Bio::SearchIO::waba This processes the waba output (not the human readable portion) The format is a lot like AXT but has four lines per alignment block

These objects represent the three components of a BLAST or FASTA pairwise database search result. The can be though of like

  • Result - a container object for a given query sequence, there will be a Result for every query sequence in a database search
    • Hit - a container object for each identified sequence found to be similar to the query sequence, it contains HSPs
      • HSP - represents the alignment of the query and hit sequence. For BLAST there can be multiple HSPs while only a single one for FASTA results. The HSP object will contain reference to the query and subject alignment start and end.
Personal tools