[Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq

Gathman, Allen agathman at semo.edu
Tue Aug 23 14:38:42 EDT 2005

Hi, BioPerl gurus: 


Although this question involves a Gbrowse database, I think it's actually a
BioPerl question at heart - and in any case it appears that there's a lot of
overlap between the people who answer questions in both lists, so I'm
guessing this is a good place for this question.  


I've written a script that finds particular pfam hits in a GBROWSE database,
then uses "overlapping_features" to find predicted gene features of  type
"transcript:GLEAN" that overlap those pfams. I've set the aggregator
"transcript" as {CDS/mRNA} already.  I select features using a regular
expression to choose particular names, then I use spliced_seq to return the
spliced CDS of each feature - but I'm only getting back the CDS that
actually overlap the pfam hit, not the full predicted gene.  So my question
is, what do I need to do in order to get ALL the CDS of each predicted gene
feature spliced together, instead of only the ones that actually overlap the
pfam hit I used to select that predicted gene?  


Thanks in advance for any help you can give...


Here's the code:




use strict;

use Bio::DB::GFF;

use Bio::Seq;

use Bio::SeqIO;

use Getopt::Long;


my $outfile; 


           'o|outfile=s' => \$outfile,



my $outfa= Bio::SeqIO -> new (-file => ">$outfile",

                              -format => 'Fasta'



my $db      = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',

                               -dsn        =>

                               -fasta      => '/gbrowse/databases/cc'





     for (my $i =1; $i<=20; $i++){

      my $pfamname="Peptidase_C$i";

        my @pfams = $db->get_feature_by_name( Domain => $pfamname);

        foreach my $pfamhit (@pfams){

             my $desc = $pfamname;

             my $score=$pfamhit->score;

             my $name = $pfamhit->name;

             $desc.= " $score ";

             $desc.= $pfamhit->location->seq_id();

             $desc.= ": ";


# Here's where I'm selecting predicted genes that overlap the Pfam hit


             my @genes = $db -> overlapping_features(

                                   -refseq => $pfamhit->location->seq_id,

                                   -start => $pfamhit->start,

                                   -stop => $pfamhit->stop,

                                   -types =>'transcript:GLEAN' 



# Now I'm choosing the ones with names I want out of the selected genes


                  foreach my $gene (@genes){

                       my $gid=$gene->display_id();

                       if ($gid =~/aug_GLEAN/){


                            $desc.=" - "; 



# Here I'm splicing the gene, tacking on a description, and outputting it.



                            my $splseq = $gene->spliced_seq();





                       }# end if aug_GLEAN   

                  }# end foreach gene

         }# end foreach pfamhit

   } # end for numbers

close OUT;



Allen Gathman



More information about the Bioperl-l mailing list