[Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation

Lincoln Stein lstein at cshl.edu
Tue Dec 19 16:58:24 EST 2006

Hi Malcom,

Your second guess was right. The use case of augmenting an existing gene
with additional splice forms isn't provided for. You can get the
functionality by making direct calls to Bio::DB::SeqFeature::Store methods:

my @genes = $db->get_features_by_name('FBgn0017545');
@genes == 1 or die "Didn't get exactly one gene";
my $parent = $genes[0];

my $parent = $genes[0];
my $chr    = $parent->seq_id;
my $start  = $parent->start;
my $end    = $parent->end;
my $strand = $parent->strand;

my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA',
                       -source      => 'added',
                       -seq_id   => '4r',
                       -strand   => $strand,
                       -start    => $start+10,
                       -end      => $end,

for my $pos ([$start+10,$start+100],[$start+200,$end]) {
  my ($e_start,$e_end) = @$pos;
  my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon',
                                      -store       => $db,
                      -seq_id      => '4r',
                      -strand     => $strand,
                      -start       => $e_start,
                      -end         => $e_end);

I found a bug in updating the seqfeature database when I wrote this script,
so you'll have to get the latest biperl live. I think you can use this to
write a splice form updating script.

In order to support the idea of adding new splice forms to an existing gene
using the GFF3 format, I will have to either modify the loader, or write a
separate script (probably better to do the latter). It shouldn't be hard if
you'd like to give it a try.


On 12/19/06, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> Lincoln and fellow Bio::DB::SeqFeature travelers,
> I find that using bp_seqfeature_load.PLS to load subfeatures of genes
> already loaded using bp_seqfeature_load.PLS fails with
> ------------- EXCEPTION  -------------
> MSG: FBgn0017545 doesn't have a primary id
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
> ad.PLS:76
> Where FBgn0017545 is the ID of a gene previously loaded.
> I am unsure how to remedy my situation and welcome any advise on correct
> or improved approach to my problem.
> Here's some detail if it helps.  I am developing a pipeline to design a
> microarray probes capable of distinguishing among splice variants in
> drosophila (using latest Flybase dmel_r5.1 annotation).  So I
> 1) load a filtered selection of Flybase annotation using
> bp_seqfeature_load.  (for testing purposes, I am using a single gene's
> worth of annotation, FBgn0017545.gff, attached).  This is done as
> follows:
>         > bp_seqfeature_load.PLS  --create FBgn0017545.gff
> 2) analyze all the genes in the database, and create GFF3 output each
> feature of which has a 'Parent' that is a previously loaded gene (i.e.
> FBgn0017545).  (These features represent the unique introns, splice
> sites, and exonic design targets. Output of this analysis,
> FBgn0017545_matd.gff, is also attached)
> 3) load these analysis results into the same database, as follows:
>         > bp_seqfeature_load.PLS          FBgn0017545_matd.gff
> It is at this point that I get the above error.
> However, I don't get any error and the data loads fine if I load the two
> files together, as follows:
>         > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
> FBgn0017545_matd.gff)
> So, I suspect that either I am misunderstanding when/how to use
> bp_seqfeature_load.PLS or else this use case has not yet arisen and must
> be provided for somehow.
> I am running against bioperl-live
> Thanks for your thoughts and assistance,
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri

Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
SANDRA MICHELSEN, AT michelse at cshl.edu

More information about the Bioperl-l mailing list