[Bioperl-l] load_seqdatabase.pl does not like fasta format

Andy Hammer facemann at yahoo.com
Mon Jun 14 19:03:40 EDT 2004

Thanks all, 

Marc, your one line of code:
was all that it took to make the script work.  But now
I am a bit more aware of how biosql is storing the

Thank you for the lesson!


--- Marc Logghe <Marc.Logghe at devgen.com> wrote:
> Hi Andy,
> > your fasta sequence was 'unknown'. Since the
> triple of  
> > (accession,version,namespace) is constrained by
> and used as a unique  
> > key, and given that fasta doesn't provide version
> numbers, your  
> > sequences will all be considered identical if the
> accession is  
> > 'unknown' for all of them. I.e., after the first
> one is 
> > inserted, the  
> > second one and all others will fail to insert.
> That is because when you load from fasta, the seqID
> goes into the bioperl display_name slot and finally
> into the biosql name field.
> The accession number (bioperl accession_number slot)
> is empty and set to unknown by default. As this slot
> ends up in the accession field in the biosql schema,
> you end up into troubles because EVERY accession
> will be unknown.
> I solved this be adding a --pipeline argument (e.g.
> Bio::SeqProcessor::Accession) with a really simple
> SeqProcessor that copies the display_name into the
> accesion_number slot
> package Bio::SeqProcessor::Accession;
> use strict;
> use vars qw(@ISA);
> use Bio::Seq::BaseSeqProcessor;
> sub process_seq{
>     my ($self,$seq) = @_;
>     my $display_id = $seq->display_id;
>     $seq->accession_number($display_id);
>     return ($seq);
> }
> HTH,
> Marc

Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.

More information about the Bioperl-l mailing list