[Bioperl-l] How sequence fetching should fail?

Heikki Lehvaslaiho heikki at nildram.co.uk
Sat Apr 10 09:07:22 EDT 2004

When I started to make the changes, the problem turned out to be a slightly 
more complicated than I anticipated.
Firstly, refreshing my memory of the BioFetch spec
It should be returning an EXCEPTION. The implementaion catches that and and 
prints it out like warning that I could not turn off. I am not worried by 
this descrepancy because it goes well with the behaviour of other modules.

The WARNING from GenBank turned out to be due to  the parser 
(Bio::SeqIO::genbank, line 239). next_seq() returns undef if the first line 
does not start with 'LOCUS. In EMBL and SWISS-PROT parsers, this modification 
was missing and the parser threw an error on misformed ID line.
My understanding is that the modification helps quick streaming of entries 
from the NCBI server. Since we get EMBL and SWISS-PROT entries from the EBI 
BioFetch server, the first response line starts with "ERROR" and this get 
passed to the parser which throws an error. I've now modified the parser and 
the situation now looks like this:

Bio::DB::BioFetch    WARNING
Bio::DB::GenBank     WARNING
Bio::DB::GenPept     WARNING
Bio::DB::SwissProt   WARNING
Bio::DB::RefSeq      WARNING
Bio::DB::EMBL        WARNING


On Monday 05 Apr 2004 11:55, Heikki Lehvaslaiho wrote:
> Last week Web Barris asked more questions about sequence retrieval.
> I had a look how different modules work when the retrieval fails due to
> nonexisting id. The response can be summarised as follows:
> Bio::DB::BioFetch    WARNING
> Bio::DB::GenBank     WARNING
> Bio::DB::GenPept     WARNING
> Bio::DB::SwissProt   EXCEPTION
> Bio::DB::RefSeq      WARNING
> Bio::DB::EMBL        EXCEPTION
> I suggest that we treat this situation as an error that needs to be fixed
> in both development cvs head and in the 1.4 branch. All modules should
> print a warning (rather than die on an error) and return undef when
> retieval fails. It is then up to the use to test the if the sequence
> variable got assingned. This is the functionality defined in the OBDA (Open
> Data Base Access) specs and implemeted in Bio::DB::BioFetch.
> The use code will always look something like this:
> $db = new Bio::DB::SeqRetrievalClass;
> for (@ids) {
> 	$seq = $gb->get_Seq_by_id($_);
> 	if ($seq) {
> 		# do what you wanted
> 	} else {
> 		# skip and keep log
> 	}
> }
> Unless I hear any strong differing opinions within a day or two, I'll
> commit the necessary changes. The critical question here is: will this
> break any existing code?
> 	-Heikki

______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

More information about the Bioperl-l mailing list