[Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq

Amir Karger akarger at CGR.Harvard.edu
Thu Dec 7 16:32:51 EST 2006


I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.

I'm reading in some fungal GFFs generated by Jason Stajich. I

- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq

(Code below)

Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.

If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?

I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University

P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
    # Create a new object representing the exons' gene
    my $coding_loc_obj = new Bio::Location::Split;
    foreach my $exon (@sorted_exons) {
        $coding_loc_obj->add_sub_Location($exon->location);
    }

    # Build a spliced feature representing the whole gene
    my $spliced_feat = new Bio::SeqFeature::Generic(
        -start  => $coding_loc_obj->start,
        -end    => $coding_loc_obj->end,
        -strand => $strand_num,
        -primary=> "splicedGene",
    );
    $spliced_feat->location($coding_loc_obj);

    # Attach a contig object containing the sequence
    $spliced_feat->attach_seq($contig_obj->bioperl_object);

    # Get the spliced seq and translate to protein:
    my $coding_seq = $spliced_feat->spliced_seq->seq;
    my $protein = $spliced_feat->spliced_seq->translate->seq;



More information about the Bioperl-l mailing list