[Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotationfails due to data inconsistency.

Cook, Malcolm MEC at stowers-institute.org
Wed Jan 10 15:31:26 EST 2007


Aloha,

For those tracking this (or otherwise lurking) Flybase have released new
versions of dmel_r5_1 GFF files that remove the data problem.

--Malcolm  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Cook, Malcolm
> Sent: Tuesday, January 09, 2007 1:39 PM
> To: bioperl list; Blanchette, Marco
> Subject: [Bioperl-l] bp_seqfeature_load of latest Flybase GFF 
> annotationfails due to data inconsistency.
> 
> 
> Drat!
> 
> bash> bp_seqfeature_load.PLS --fast --dsn
> 'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <(
> flygenegff
> ./flybase.net/genomes/Drosophila_melanogaster/dmel_r5.1/gff/*.gff )
> 
> 
> (note: `flygenegff` used above sorts and filters the GFF input so that
> the GFF features are loaded in order needed: gene before mRNA before
> exon)
> 
> This worked fine with the last release of Flybase.  But now I get:
> 
> ------------- EXCEPTION  -------------
> MSG: FBtr0110936 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seq
> feature_lo
> ad.PLS:76
> 
> And indeed, sleuthing the data proves that FBtr0110936 is an 
> example of
> a Flybase transcript identifier that is annotated as being one of the
> multiple parents of exons but that does not itself have an entry in
> Flybase!
> 
> Proof: 
> 
> `grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no
> gene, CDS, UTR, or mRNA)
> 
> ... whereas, grepping for any of the other three transcripts mentioned
> as parents of those exons yields the expected additional 
> feature of type
> mRNA, protein, CDS, etc
> 
> By the way, this data-bug manifests itself when searching the Flybase
> website (FB2006_01, released December 8, 2006) for transcript
> FBtr0110936 as:
> 
> "ERROR: report for FBtr0110936 not found"
> 
> I wonder if anyone can tell me what causes this data problem, and tell
> me whether it is ubiquitous (i.e. are there other transcripts 
> mentioned
> as exon parents that do not have their own feature)?
> 
> I am trying to load this latest Flybase GFF into Lincoln Steins
> Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load
> fails due to this data problem.   Any recommendations/workarounds to
> this issue are quite welcome.
> 
> 
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>  
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list