[Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotation fails due to data inconsistency.
MEC at stowers-institute.org
Tue Jan 9 14:38:48 EST 2007
bash> bp_seqfeature_load.PLS --fast --dsn
'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <(
(note: `flygenegff` used above sorts and filters the GFF input so that
the GFF features are loaded in order needed: gene before mRNA before
This worked fine with the last release of Flybase. But now I get:
------------- EXCEPTION -------------
MSG: FBtr0110936 doesn't have a primary id
And indeed, sleuthing the data proves that FBtr0110936 is an example of
a Flybase transcript identifier that is annotated as being one of the
multiple parents of exons but that does not itself have an entry in
`grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no
gene, CDS, UTR, or mRNA)
... whereas, grepping for any of the other three transcripts mentioned
as parents of those exons yields the expected additional feature of type
mRNA, protein, CDS, etc
By the way, this data-bug manifests itself when searching the Flybase
website (FB2006_01, released December 8, 2006) for transcript
"ERROR: report for FBtr0110936 not found"
I wonder if anyone can tell me what causes this data problem, and tell
me whether it is ubiquitous (i.e. are there other transcripts mentioned
as exon parents that do not have their own feature)?
I am trying to load this latest Flybase GFF into Lincoln Steins
Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load
fails due to this data problem. Any recommendations/workarounds to
this issue are quite welcome.
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
More information about the Bioperl-l