[Bioperl-l] Error while running load_seqdatabase.pl

Hilmar Lapp hlapp at gmx.net
Mon Jan 8 23:11:38 EST 2007


George,

this is almost certainly caused by using FASTA format and bioperl's  
treatment of it. I am guilty of not having written a FAQ yet for  
Bioperl-db, as this would certainly be there.

Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl  
uses Bioperl to parse sequence files) does not extract the accession  
number from the description line of the fasta sequence, and instead  
sets the accession_number property if sequence objects it creates to  
"unknown". Since there is a unique key constraint on  
(accession,version,namespace) the second sequence loaded will raise  
an exception as it will violate the constraint.

The simplest way to deal with this is to write a SeqProcessor that  
massages the accession_number appropriately and then supply the  
module to load_seqdatabase.pl using the --pipeline command line switch.

There are several examples for how to do this in the email archives.  
See for example this thread on the Biosql list:

http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html

with two links to examples, and Marc Logghe gives another one in the  
thread itself.

Hth,

	-hilmar

On Jan 8, 2007, at 3:17 PM, George Heller wrote:

> Hi all.
>
>   I am new to Bioperl and am trying to run the load_seqdatabase.pl  
> script to load sequence data from a file into Postgres database. I  
> am invoking the script through the following command:
>
>   perl load_seqdatabase.pl -host localhost -dbname biodb06 -format  
> fasta
> -dbuser postgres -driver Pg <name of file>
>
>   I am getting the following error:
>
>   -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values  
> were ("FGENES
> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| 
> 1","unknown"
> ,"","0","") FKs (1,<NULL>)
> ERROR:  duplicate key violates unique constraint  
> "bioentry_accession_key"
>   ---------------------------------------------------
> Could not store unknown:
> ------------- EXCEPTION  -------------
> MSG: error while executing statement in  
> Bio::DB::BioSQL::SeqAdaptor::find_by_uni
> que_key: ERROR:  current transaction is aborted, commands ignored  
> until end of t
> ransaction block
>   STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ 
> lib/perl
> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5
> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ 
> perl5/site_perl/5
> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/ 
> site_perl/5.
> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ 
> site_perl/5.8.
> 5/Bio/DB/Persistent/PersistentObject.pm:271
> STACK (eval) load_seqdatabase.pl:620
> STACK toplevel load_seqdatabase.pl:602
>   --------------------------------------
>    at load_seqdatabase.pl line 633
>
>   Can anyone tell me how I can correct this error and get my script  
> running? Thanks!!!
>
>   George.
>
>
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







More information about the Bioperl-l mailing list