[Bioperl-l] Extracting gene seq from Bio::DB::GFF

Chris Fields cjfields at uiuc.edu
Fri Aug 11 18:19:09 EDT 2006


Do you mean you get duplicates of sequences back, or that you get more than
one chunk of the same sequence back?  

Is it possible that each query using an ID could contain more than one
feature?  That might explain it (you could check by testing the size of the
array @feats).  

I'm not sure how split locations are handled within Bio:DB::GFF, but do the
specific features have split locations?


> Many thanks Scott,
> At the same time I got your email I was coming to the same conclusion as
> you.
> Now I have a stranger problem in my hands... My goal is quite simple, I
> try
> to get the sequence of the genes back from the Bio::DB::GFF database
> loaded
> on MySQL. The gene list is from a file with one gene id per per line. When
> I
> run the following script:
> use Bio::DB::GFF;
> use Bio::SeqIO;
> my $out = Bio::SeqIO->new(    -fh => \*STDOUT,
>                             -format => 'fasta');
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                               -dsn => 'dbi:mysql:database=dmel_43_new');
> while (<>){
>     chomp;
>     my $id = $_;
>     my @feats = $db->get_feature_by_name($id);
>     for my $f (@feats){
>         $out->write_seq( $f->seq ) if $f->type =~/gene/;
>     }
> }
> I get more sequence back than the number of gene in my input file. I
> double
> check there. Some of the duplicated entries are the same, some are not!


More information about the Bioperl-l mailing list