[Bioperl-l] Parsing individual exons from EMBL file

Jason Stajich jason at bioperl.org
Fri Dec 17 19:32:31 EST 2010

You need to operate on the sub-locations.
for my $loc ( $feature->location->each_Location ) {
  print $loc->start .. $loc->end, "\n";

But for converting to GFF3 will want to look at the Unflattener which 
basically does this for you and the bp_unflatten_seq.pl script which 
implements it.  What you may know by now is that all EMBL/GenBank 
records are not consistent in how things are annotated (how ID, product, 
description are used) so mapping this to properly formatted GFF3 for 
Gbrowse, etc can be a tedious process sometimes.

FYI -- APIDB also provides GFF3 if you would rather...

Gowthaman Ramasamy wrote:
> Hi All,
> I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example).
> For example...my EMBL contains...
> FT   CDS             complement(join(55387..56181,56187..56300))
> FT                   /ID="apidb|cds_LmjF01.0200-1"
> FT                   /description="."
> FT                   /size="903"
> FT                   /Parent="apidb|rna_LmjF01.0200-1"
> FT                   /feature_order="115"
> FT                   /product="hypothetical+protein%2C+conserved"
> FT                   /Name="cds"
> Use Bio::SeqIO;
> While(my $seqobj = $file_io->next_seq()){
>      My @features = $seqobj->all_SeqFeatures();
>      Foreach $feat (@features){
>          $feat->start;
>          $feat->end;
>      }
> }
> When I use $feat->start; it gives me 55387 and   $feat>end; it gives me 56300.
>   Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300).  When when I tried to use the "sub_SeqFeature()" it does not return anything.
> Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions...
> Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons.
> Thanks,
> Gowtham
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org

More information about the Bioperl-l mailing list