[Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq

Scott Cain cain at cshl.edu
Thu Dec 7 17:46:09 EST 2006


I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:


(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).


On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
> I'm reading in some fungal GFFs generated by Jason Stajich. I
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
> (Code below)
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment.bin 

More information about the Bioperl-l mailing list