[Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values

Lincoln Stein lstein at cshl.edu
Fri Feb 23 16:20:26 EST 2007


Excellent!

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
>  Lincoln,
>
> OK.  I'll do that...
>
> ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
>
> ...ok - parse_attributes _looks_ right to me
>
> ...so, let's try it
>
> #load a feature into a new database:
>
> bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
> -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,
> blat;Name=mec\n")
>
> #It loaded ok.  Now, let's print it out in GFF3:
>
> perl -MBio::DB::SeqFeature::Store -e 'foreach
> (Bio::DB::SeqFeature::Store->new(-dsn =>
> "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type
> => "PH:A")) {print $_->gff3_string . "\n"}'
> J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
>
> #output looks good to me
>
> Note, I tried loading attributes foo=bar;foo=blat and it came back
> foo=bar,blat.  So, you can load either way.
>
> I'll commit later today.
>
> --Malcolm
>
>
>  ------------------------------
> *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On
> Behalf Of *Lincoln Stein
> *Sent:* Friday, February 23, 2007 11:16 AM
> *To:* Cook, Malcolm
> *Cc:* bioperl list; lstein at cshl.org
> *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with
> multiple values
>
> Hi Malcom,
>
> You're quite right, and I appreciate your work in tracking down and fixing
> it. Before you commit the patch, can you confirm that the loader is working
> correctly so that comma-separated values are read back into the data
> structure as multiple attributes?
>
> Lincoln
>
> On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> >
> > Lincoln, and other Bio::DB::SeqFeature wanderers:
> >
> > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> > does not respect the following:
> >
> > "Multiple attributes of the same type are indicated by separating the
> > values with the comma "," character"  (c.f.
> > http://www.sequenceontology.org/gff3.shtml)
> >
> > This one-liner demonstrates the problem:
> >
> > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> > -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> > J       A       PH      1       2       .       .       .
> > foo=bar;foo=blat;Name=mec
> >
> > Do you agree this is a problem?
> >
> > The fix is in the post-sig patch to
> > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> > stylistic privilege of promoting any ID, Parent, or Name attribute to
> > the front of column 9, so output is now:
> >
> > J       A       PH      1       2       .       .       .
> > Name=mec;foo=bar,blat
> >
> > Do you agree this is better?
> >
> > I am poised to commit it, as well as the functionally same patch to the
> > equivilent function in Bio/Graphics/FeatureBase.pm
> >
> > All clear?
> >
> > -- Malcolm Cook
> >
> >
> >
> > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> > ***************
> > *** 481,494 ****
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> > @values;
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   push @result,"ID=".$self->escape($id)                     if defined
> >
> > $id;
> > !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> > $parent;
> > !   push @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> > --- 481,498 ----
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> >
> > @values;
> > !     # NO! Multiple attributes of the same type are indicated by
> > !     # separating the values with the comma "," character - per
> > !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> > !     #push @result,join '=',$self->escape($t),join(',', map
> > {$self->escape($_)} @values);
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   unshift @result,"ID=".$self->escape($id)                     if
> > defined $id;
> > !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> > defined $parent;
> > !   unshift @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


More information about the Bioperl-l mailing list