[Bioperl-l] Re: GO dbxrefs in swissprot
henschel at mpi-cbg.de
Tue Jul 6 09:14:42 EDT 2004
Hilmar Lapp wrote:
> Pretty weird what you describe if it works for one entry but not
> another. Also, the DR lines don't look suspiciously different.
> If there's no direct reason that prevents you from doing so you should
> definitely upgrade to the 1.4.x series, possibly even to the latest
> version of the stable branch from CVS. There were quite some fixes
> meanwhile, some of which do affect how sequences get loaded into
> biosql because the affect the annotation bundle.
I installed bioperl from cvs and repopulated the swissprot db into
BioSQL. The entries I checked so far are apparantly correct. With the
particular example I found that it was obviously a bug in the sequence
annotation parsing of the 1.2.1 version. Sorry for having bothered you
with versioning, I simply trusted the biosql installation instructions
that claimed a patched 1.2.1 would do.
What still puzzles me is the size of the database: starting with a 543MB
flatfile, the first run (with the faulty parser) gave me 600MB database
and 9100 GO annotations. After the rerun with load_seqdatabase (...)
--lookup --remove I get 1.1GB database but only 5100 GO annotations in
the dbxref table. Is this due to the normalization?
Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?,
PDB? etc) and the resp. place to download?
More information about the Bioperl-l