[Bioperl-l] About extracting sequence from genewise format result

zhaoy at mail.cbi.pku.edu.cn zhaoy at mail.cbi.pku.edu.cn
Wed Aug 11 04:17:42 EDT 2010


Dear authors:

Hello!

Recently I am trying to parse the genewise format result for extracting
the nuclear sequence using method "hit_string" in module "SearchIO",
however, the result is empty. What's more terrible, the cycle seems not
working, because I always get the last result. I'm confused.

My perl code is shown below:

#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'wise',
                           -wisetype => 'genewise',
                           -file   => 'test');
while( my $result = $in->next_result ) {
        while (my $hit = $result->next_hit) {
           while (my $hsp = $hit->next_hsp){
                print "Query=",      $result->query_name, "\n",
                      "Length=",     $hsp->length('total'),"\n",
                      "hit_string:", $hsp->hit_string, "\n";
}
}
}

And one of the genewise format results is shown below:

genewise $Name: wise2-4-0alpha $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       Cpa_s110_24
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            global
Target Sequence      Bdi_chr3:38292015..38292302
Strand:              forward
Start/End (protein)  global
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 37.97 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info

Cpa_s110_24        1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI--
                     MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++
                     MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH
Bdi_chr3:382920    1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc
                     tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta
                     gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc


Cpa_s110_24       47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV
                              P+ +    A + R+T++KLL+P DTL++GQVYRLIT+Q
                     VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ--
Bdi_chr3:382920  148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc
                     tcggcgacacctcgggcccccgtcatattacgactttgatagttcca
                     cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca


Cpa_s110_24       92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR

                     -------------------------------------------------
Bdi_chr3:382920  289




Cpa_s110_24      141 SRVAASTNQAGLKSRTWQPSLKSISEAAS

                     -----------------------------
Bdi_chr3:382920  289


//
Gene 1
Gene 1 288
  Exon 1 288 phase 0
     Supporting 1 54 1 18
     Supporting 58 141 19 46
     Supporting 160 288 47 89
//

......


The part of output of this code is shown below:
Query=Aly_481360
Length=0
hit_string:

Query=Aly_481360
Length=0
hit_string:

......

What's wrong with my code and how can I get the correct result? I'm
looking forward to your reply.

Thanks very much!

Best regards,
Zackaly



More information about the Bioperl-l mailing list