[Bioperl-l] Genes from MySQL database using Bio::DB::GFF

Marco Blanchette mblanche at berkeley.edu
Wed Aug 16 22:59:04 EDT 2006


Dear all,

I am desperately trying to get a list of gene coordinates from a MySQL
database version of the fly genome populated using the Bio::DB::GFF module.
I have a list of 277 id in a text file that when parsed through the
following script return 279 entries (2 more entries then the number of genes
in the starting list).

Here is the script:

use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                              -dsn => 'dbi:mysql:database=dmel_43_new');
while (<>){
    chomp;
    my @feat = $db->get_feature_by_name($_);
    for my $f (@feat){
        if ($f->type->method eq 'gene'){
        print     "Name: ", $f->name,
                " Strand: ", $f->strand,
                " Start: ", $f->start,
                " End: ", $f->end,
                "\n";
        }
    }
}

I totally don¹t understand where the 2 extra entries are coming from.
Nothing differentiate them from each other. Moreover, when I double check
the MySQL database, both genes are having only a single Œgene¹ entry in the
fdata table.

Is there a bug in the way I am trying to fetch the individual genes or
something is wrong with the latest Bio::DB::GFF module from the CVS
repository?

Here is a test script and it¹s output that I am using to try to tract down
what the problem is. Hope this could help:

use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                              -dsn => 'dbi:mysql:database=dmel_43_new');
my %dups;
my ($j, $i) =0;
while (<>){
    chomp;
    my $id = $_;
    my @feat = $db->get_feature_by_name($id);
    my $feat_size = $#feat;
    $j++ if $feat_size == 2;
    
    for my $f (@feat){
        $i++;
          
        if (exists $dups{$f->group} && $f->type->method eq 'gene'){
            print     "Calling >>>", $f->group,
                        " ID=", $i,
                        " from \@feat of size $feat_size",
                        "\n";
            print     "Chr: ", $f->refseq,
                    " Strand: ", $f->strand,
                    " Start: ", $f->start,
                    " End: ", $f->end,
                    "\n";
            print "Offending >>>", $dups{$f->group}->[0]->group,
                  " ID=", $dups{$f->group}->[1], "\n";
            print     "Chr: ", $dups{$f->group}->[0]->refseq,
                    " Strand: ", $dups{$f->group}->[0]->strand,
                    " Start: ", $dups{$f->group}->[0]->start,
                      " End: ", $dups{$f->group}->[0]->end;
            print "\n\n";
         } elsif ($f->type->method eq 'gene') {
            $dups{$f->group} = [$f, $i];
         }
    }
}

print "#### there was $j \@feat with only 2 features\n";

Output of the test script:

$ perl test.pl hrp36_targets.txt
Calling >>>FBgn0025803 ID=98 from @feat of size 2
Chr: 3R Strand: 1 Start: 16966463 End: 17038413
Offending >>>FBgn0025803 ID=97
Chr: 3R Strand: 1 Start: 16966463 End: 17038413

Calling >>>FBgn0025681 ID=304 from @feat of size 2
Chr: 2L Strand: 1 Start: 2992964 End: 2998614
Offending >>>FBgn0025681 ID=303
Chr: 2L Strand: 1 Start: 2992964 End: 2998614

#### there was 11 @feat with only 2 features

With the hope someone can find out the problem...

Cheers,

Marco

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 





More information about the Bioperl-l mailing list