From chandan.kr.singh at gmail.com Thu Feb 2 02:26:09 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu, 2 Feb 2006 12:56:09 +0530 Subject: [Bioperl-l] Sorry, failure in post on the net, so still via email In-Reply-To: <001001c62793$bef08f70$93656785@zhur> References: <001001c62793$bef08f70$93656785@zhur> Message-ID: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Hi It seems that its not a proxy problem. I tried today and faced the same problem. It has been months since my last try and therefore something might have changed. Try reading more on this problem. I myself will try to do it. Regards Chandan On 2/2/06, Huang Jian wrote: > > I tried some "Quick getting started scripts" in bptutorial. > > use Bio::Perl; > $seq = get_sequence('swiss',"ROA1_HUMAN"); > # uses the default database - nr in this case > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_result); > > It returns "Submitted Blast for [ROA1_HUMAN] " > It does not return me any error after I run the script. However, it does > not > return me any result either. The file "roa1.blast" is created but is > always > empty. > > I found the return is like the code below in function "blast_sequence" > if( $verbose ) { > print STDERR "Submitted Blast for [".$seq->id."] "; > } > sleep 5; > .... > I have tested "( env_proxy => 1 )" ...The problem remains the same... > > Help! By the way, could you send me an invitation letter of gmail, I want > to have a gmail account too... :-) > > Best Regards! > Jian Huang > > From osborne1 at optonline.net Thu Feb 2 17:06:25 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 02 Feb 2006 17:06:25 -0500 Subject: [Bioperl-l] Sorry, failure in post on the net, so still via email In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Message-ID: Chandan, I'd be interested in what you find. This is not a new problem, this same code snippet has been mentioned many times, but for many others, like me, the code always works. Brian O. On 2/2/06 2:26 AM, "CHANDAN SINGH" wrote: > Hi > It seems that its not a proxy problem. I tried today and faced the same > problem. It has been months since my last try and therefore something might > have changed. > Try reading more on this problem. > I myself will try to do it. > Regards > Chandan > > On 2/2/06, Huang Jian wrote: >> >> I tried some "Quick getting started scripts" in bptutorial. >> >> use Bio::Perl; >> $seq = get_sequence('swiss',"ROA1_HUMAN"); >> # uses the default database - nr in this case >> $blast_result = blast_sequence($seq); >> write_blast(">roa1.blast",$blast_result); >> >> It returns "Submitted Blast for [ROA1_HUMAN] " >> It does not return me any error after I run the script. However, it does >> not >> return me any result either. The file "roa1.blast" is created but is >> always >> empty. >> >> I found the return is like the code below in function "blast_sequence" >> if( $verbose ) { >> print STDERR "Submitted Blast for [".$seq->id."] "; >> } >> sleep 5; >> .... >> I have tested "( env_proxy => 1 )" ...The problem remains the same... >> >> Help! By the way, could you send me an invitation letter of gmail, I want >> to have a gmail account too... :-) >> >> Best Regards! >> Jian Huang >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nagesh.chakka at anu.edu.au Thu Feb 2 20:23:50 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Fri, 03 Feb 2006 12:23:50 +1100 Subject: [Bioperl-l] RemoteBlast.pm version 1.28 In-Reply-To: <003901c6285e$d1b36670$93656785@zhur> References: <43E28C39.2060308@anu.edu.au> <003901c6285e$d1b36670$93656785@zhur> Message-ID: <43E2B0A6.7000307@anu.edu.au> Hi Huang, Thanks for the message. The older version of RemoteBlast.pm works on the logic of checking the temporary file size to determine whether the Blast results are ready. This condition is not getting satisfied may be due to some changes brought about by NCBI. I had this problem recently and figured out that the solution was to use the latest version which has this problem fixed (does not use file size logic any more) which is not yet included in the BioPerl package. Cheers Nagesh Huang Jian wrote: > Dear Nagesh, > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > me. Now it works perfectly!!! > > Thank you!! > > Huang > > ----- Original Message ----- From: "Nagesh Chakka" > > To: "Huang Jian" ; "bioperl-l" > > Sent: Friday, February 03, 2006 7:48 AM > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > via email > > >> Hi Huang, >> I see that you are submitting a sequence for a remote blast search. Can >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If >> not I have attached it with this email, try to replace it with the old >> one which has a bug. >> Let me know if it works. >> Nagesh > > > From cjfields at uiuc.edu Fri Feb 3 10:45:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 09:45:23 -0600 Subject: [Bioperl-l] RemoteBlast.pm version 1.28 In-Reply-To: <43E2B0A6.7000307@anu.edu.au> Message-ID: <001501c628d8$d91cd430$15327e82@pyrimidine> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will work for saving text output. However, it will not parse anything using next_result (it will likely hang) and will not save XML format. See these bugs: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 for explanations and possible fixes (changes to RemoteBlast and Bio::SearchIO::blast). Note that these haven't been checked in yet so are still not included in bioperl-live; they may be further modified before committing to CVS. If you're not worried about XML, you could just try the first fix, which is a change to SearchIO::blast. Nagesh, I remember you posting to the list a month ago using a script which had problems; the script you used saves the output but doesn't actually parse it (i.e. you don't use next_result() to go through the data). Is the version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried parsing the output using "-readmethod => SearchIO" or "-readmethod => blast" using your version of RemoteBlast and method next_result()? Like below (from perldoc): while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # parsing starts here my $result = $rc->next_result(); # it should hang here #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My script hanged if I used next_result() in any way prior to the fixes. I want to see how many others are having the same issues with parsing using the CVS version of bioperl-live. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > Sent: Thursday, February 02, 2006 7:24 PM > To: Huang Jian; bioperl-l > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Hi Huang, > Thanks for the message. The older version of RemoteBlast.pm works on the > logic of checking the temporary file size to determine whether the Blast > results are ready. This condition is not getting satisfied may be due to > some changes brought about by NCBI. I had this problem recently and > figured out that the solution was to use the latest version which has > this problem fixed (does not use file size logic any more) which is not > yet included in the BioPerl package. > Cheers > Nagesh > > Huang Jian wrote: > > > Dear Nagesh, > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > me. Now it works perfectly!!! > > > > Thank you!! > > > > Huang > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > To: "Huang Jian" ; "bioperl-l" > > > > Sent: Friday, February 03, 2006 7:48 AM > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > via email > > > > > >> Hi Huang, > >> I see that you are submitting a sequence for a remote blast search. Can > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > >> not I have attached it with this email, try to replace it with the old > >> one which has a bug. > >> Let me know if it works. > >> Nagesh > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Feb 3 13:05:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 03 Feb 2006 13:05:44 -0500 Subject: [Bioperl-l] Documentation in the Bioperl package Message-ID: bioperl-l, The recent work on the Bioperl Wiki moved much of the Bioperl documentation online. Since we cannot maintain 2 locations for all of this we?ll be removing a number of files from the package, specifically: biodatabases.pod biodesign.pod bioperl.pod bioscripts.pod doc/howto/* doc/faq/* FAQ Rest assured that all of these files have been gone over in detail to make sure that no important information was lost during the migration. All of this will be replaced by a single file, such as ?README.docs?, that explains where all the documentation is. It?s not entirely clear what will happen to bptutorial.pl. Moving its content to different online locations is possible but in this case we loose its functionality as a script. Are there any comments or questions or concerns? Brian O. From saldroubi at yahoo.com Fri Feb 3 13:38:26 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 3 Feb 2006 10:38:26 -0800 (PST) Subject: [Bioperl-l] Gibbs sampling algorithm? Message-ID: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com> Hi everyone, I am wondering if anyone has implemented the Gibbs sampling algorithm in BioPerl or otherwise for finding motifs. I saw Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs program which is not free open source, I think. I prefer not to write my one Gibbs sampling algorithm if it is already out there. Any comments are appreciated. Thank you Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From cjfields at uiuc.edu Fri Feb 3 14:34:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 13:34:27 -0600 Subject: [Bioperl-l] Gibbs sampling algorithm? In-Reply-To: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com> Message-ID: <001901c628f8$d89917b0$15327e82@pyrimidine> Do you mean this Gibbs program? ftp://ncbi.nlm.nih.gov/pub/neuwald/ You can also request a license from the Gibbs Motif Sampler homepage, which is more up to date: http://bayesweb.wadsworth.org/gibbs/gibbs.html. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sam Al-Droubi > Sent: Friday, February 03, 2006 12:38 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Gibbs sampling algorithm? > > Hi everyone, > > I am wondering if anyone has implemented the Gibbs sampling algorithm in > BioPerl or otherwise for finding motifs. I saw > Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs > program which is not free open source, I think. I prefer not to write my > one Gibbs sampling algorithm if it is already out there. Any comments are > appreciated. > > Thank you > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Fri Feb 3 14:44:50 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Fri, 03 Feb 2006 14:44:50 -0500 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <001501c628d8$d91cd430$15327e82@pyrimidine> Message-ID: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu> Hi, Everybody, I see this post and am wondering if this is the reason for the malfunctionning of my webserver. We set up a webserver named MAK, for MITE sequence analysis. It was working very well until around November 2005, when it stopped returning any result (the site is fine and seems to be doing sth after submission). In the CGI script, I used remoteblast (that work was done in 2003) to do searches. I currently do not have access to the server because I moved. Quite several people sent emails to us about its malfunctioning. Is there any suggestion on fixing the problem? Should I simplily ask the remoteblast.pm be replaced with the new version? Thanks a lot, Guojun Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-l at bioperl.org] Sent: Fri, 03 Feb 2006 10:45:23 -0500 Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will work for saving text output. However, it will not parse anything using next_result (it will likely hang) and will not save XML format. See these bugs: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 for explanations and possible fixes (changes to RemoteBlast and Bio::SearchIO::blast). Note that these haven't been checked in yet so are still not included in bioperl-live; they may be further modified before committing to CVS. If you're not worried about XML, you could just try the first fix, which is a change to SearchIO::blast. Nagesh, I remember you posting to the list a month ago using a script which had problems; the script you used saves the output but doesn't actually parse it (i.e. you don't use next_result() to go through the data). Is the version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried parsing the output using "-readmethod => SearchIO" or "-readmethod => blast" using your version of RemoteBlast and method next_result()? Like below (from perldoc): while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # parsing starts here my $result = $rc->next_result(); # it should hang here #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My script hanged if I used next_result() in any way prior to the fixes. I want to see how many others are having the same issues with parsing using the CVS version of bioperl-live. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > Sent: Thursday, February 02, 2006 7:24 PM > To: Huang Jian; bioperl-l > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Hi Huang, > Thanks for the message. The older version of RemoteBlast.pm works on the > logic of checking the temporary file size to determine whether the Blast > results are ready. This condition is not getting satisfied may be due to > some changes brought about by NCBI. I had this problem recently and > figured out that the solution was to use the latest version which has > this problem fixed (does not use file size logic any more) which is not > yet included in the BioPerl package. > Cheers > Nagesh > > Huang Jian wrote: > > > Dear Nagesh, > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > me. Now it works perfectly!!! > > > > Thank you!! > > > > Huang > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > To: "Huang Jian" ; "bioperl-l" > > > > Sent: Friday, February 03, 2006 7:48 AM > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > via email > > > > > >> Hi Huang, > >> I see that you are submitting a sequence for a remote blast search. Can > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > >> not I have attached it with this email, try to replace it with the old > >> one which has a bug. > >> Let me know if it works. > >> Nagesh > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gbazykin at Princeton.EDU Fri Feb 3 15:38:04 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Fri, 3 Feb 2006 15:38:04 -0500 Subject: [Bioperl-l] proposed additions to Tree and cladogram In-Reply-To: <148174979677.20051026172707@princeton.edu> References: <148174979677.20051026172707@princeton.edu> Message-ID: <8010525745.20060203153804@princeton.edu> Hi all, a while ago, I mailed to bioperl-l some proposed additions to phylogeny-related modules (see below). I am doing a project on hiv phylogeny now, and rely on these additions heavily. They expand on what was already present in the corresponding modules. I expected them to be also of general usage (at least the first one). However, I never got any answer, so I assumed that these additions were considered superfluous by most. I am now working on an addition to Tree::Draw::Cladogram module. For my project, I need to color individual tree edges (including internal) into colors from red to blue (according to the nosynonymous/synonymous ratios of these branches). This should be technically easy (I guess I will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in Cladogram to color preceding edges), but I have two questions: - will this add-on be of general interest - should I try to do it "the right way", updating the pods etc.; - in general, are there any guidelines about how specific an issue a method should address to be included in bioperl distribution? Thanks, Yegor Bazykin This is a forwarded message From: Georgii Bazykin To: bioperl-l at bioperl.org Date: Wednesday, October 26, 2005, 4:27:07 PM Subject: suggestions for additions to Tree ===8<==============Original message text=============== Hi, here are some tree-related methods I needed and added to my bioperl. Hope someone else finds any of them useful as well. Yegor Bazykin ============================================= To NodeI: # modified from total_branch_length in Tree:Tree module # gets sum of branches in the subtree - descendents of given node =head2 children_branch_length Title : children_branch_length Usage : my $size = $node->children_branch_length Function: Returns the sum of the length of all branches of the subtree which starts at given node Returns : integer Args : none =cut sub children_branch_length { my ($self) = @_; return 0 if($self -> is_Leaf) ; my $sum = 0; for ($self -> get_all_Descendents) { $sum += $_->branch_length || 0; } return $sum; } ----------------------------------- =head2 height_nodes Title : height_nodes Usage : my $len = $node->height_nodes Function: Returns the height of the tree starting at this node. Height is the maximum branchlength to get to the tip. Returns : The longest length to a leaf, in nodes Args : none =cut sub height_nodes{ my ($self) = @_; return 0 if( $self->is_Leaf ); my $max = 0; foreach my $subnode ( $self->each_Descendent ) { my $s = $subnode->height_nodes + 1; if( $s > $max ) { $max = $s; } } return $max; } ---------------------------------- =head2 get_all_Descendent_Leaves Title : get_all_Descendent_Leaves($sortby) Usage : my @nodes = $node->get_all_Descendent_Leaves; Function: Recursively fetch all the nodes and their descendents, only selecting leaves *NOTE* This is different from each_Descendent Returns : Array or Bio::Tree::NodeI objects Args : $sortby [optional] "height", "creation" or coderef to be used to sort the order of children nodes. =cut sub get_all_Descendent_Leaves{ my ($self, $sortby) = @_; $sortby ||= 'height'; my @nodes; foreach my $node ( $self->each_Descendent($sortby) ) { if ($node->is_Leaf) { push @nodes, $node; } else { push @nodes, ($node->get_all_Descendents($sortby)); } } return @nodes; } ===================================================== To Tree: =head2 total_internal_branch_length Title : total_internal_branch_length Usage : my $size = $tree->total_internal_branch_length Function: Returns the sum of the length of all branches, excluding branches leading to leaves Returns : integer Args : none =cut sub total_internal_branch_length { my ($self) = @_; my $sum = 0; if( defined $self->get_root_node ) { for ( $self->get_root_node->get_Descendents() ) { unless ($_->is_Leaf) { # YB: THIS IS ALL I ADDED $sum += $_->branch_length || 0; } } } return $sum; } ================================================= To TreeFunctionsI: =head2 distance_nodes Title : distance_nodes Usage : distance_nodes(-nodes => \@nodes ) Function: returns the distance between two given nodes in numbers of nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test =cut # YB: distance_nodes is very similar to distance method in TreeFunctionsI except that # it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and # daughter, 2 between two sisters, etc.) sub distance_nodes { my ($self, at args) = @_; my ($nodes) = $self->_rearrange([qw(NODES)], at args); if( ! defined $nodes ) { $self->warn("Must supply -nodes parameter to distance_nodes() method"); return undef; } my ($node1,$node2) = $self->_check_two_nodes($nodes); # algorithm: # Find lca: Start with first node, find and save every node from it # to root, saving cumulative distance. Then start with second node; # for it and each of its ancestor nodes, check to see if it's in # the first node's ancestor list - if so it is the lca. Return sum # of (cumul. distance from node1 to lca) and (cumul. distance from # node2 to lca) # find and save every ancestor of node1 (including itself) my %node1_ancestors; # keys are internal ids, values are objects my %node1_cumul_dist; # keys are internal ids, values # are cumulative distance from node1 to given node my $place = $node1; # start at node1 my $cumul_dist = 0; while ( $place ){ $node1_ancestors{$place->internal_id} = $place; $node1_cumul_dist{$place->internal_id} = $cumul_dist; $cumul_dist++; # YB #YB if ($place->branch_length) { #YB $cumul_dist += $place->branch_length; # include current branch #YB # length in next iteration #YB } $place = $place->ancestor; } # now climb up node2, for each node checking whether # it's in node1_ancestors $place = $node2; # start at node2 $cumul_dist = 0; while ( $place ){ foreach my $key ( keys %node1_ancestors ){ # ugh if ( $place->internal_id == $key){ # we're at lca return $node1_cumul_dist{$key} + $cumul_dist; } } # include current branch length in next iteration #YB $cumul_dist += $place->branch_length || 0; $cumul_dist++; # YB $place = $place->ancestor; } $self->warn("Could not find distance!"); # should never execute, # if so, there's a problem return undef; } ===8<===========End of original message text=========== From cjfields at uiuc.edu Fri Feb 3 16:07:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 15:07:29 -0600 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu> Message-ID: <001a01c62905$d7ef0920$15327e82@pyrimidine> I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri Feb 3 18:11:03 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 3 Feb 2006 15:11:03 -0800 Subject: [Bioperl-l] Documentation in the Bioperl package In-Reply-To: References: Message-ID: Just to be sure, the wiki will be able to handle versions (releases)? (documentation and APIs may change between releases and hence a more recent doc page may not apply to an earlier release) -hilmar On 2/3/06, Brian Osborne wrote: > bioperl-l, > > The recent work on the Bioperl Wiki moved much of the Bioperl documentation > online. Since we cannot maintain 2 locations for all of this we?ll be > removing a number of files from the package, specifically: > > biodatabases.pod > biodesign.pod > bioperl.pod > bioscripts.pod > doc/howto/* > doc/faq/* > FAQ > > Rest assured that all of these files have been gone over in detail to make > sure that no important information was lost during the migration. All of > this will be replaced by a single file, such as ?README.docs?, that explains > where all the documentation is. It?s not entirely clear what will happen to > bptutorial.pl. Moving its content to different online locations is possible > but in this case we loose its functionality as a script. > > Are there any comments or questions or concerns? > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hubert.prielinger at gmx.at Fri Feb 3 17:47:37 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 03 Feb 2006 16:47:37 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter Message-ID: <43E3DD89.7080903@gmx.at> Hi, Does anybody know whether it is possible to perform a with the standalone blast a database search where the composition based statistics parameter is on and what's the abbreviation for the parameter thanks Hubert From osborne1 at optonline.net Fri Feb 3 22:32:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 03 Feb 2006 22:32:18 -0500 Subject: [Bioperl-l] Documentation in the Bioperl package In-Reply-To: Message-ID: Hilmar, MediaWiki supports such things as rollback based on date but it is not CVS where an entire set of pages are tagged by version. It is also scriptable so it may be possible to emulate this type of tagging by script, but I'm not entirely sure (see WWW::Mediawiki::Client, Jason pointed this out to me). So the simple answer is probably "no". But let's be honest: synchrony between code and documentation wasn't achieved using the previous approach, CVS, either. What Jason, Torsten, and I appreciated when adding content to this new site was that it was relatively easy, our hope is that this approach will get more people involved. The assumption is that more involvement will lead to better documentation - Jason made this assumption when electing to move the site to MediaWiki and I have to say that I completely agree with this assumption. Jason, any thoughts on this question? An interesting one... Brian O. On 2/3/06 6:11 PM, "Hilmar Lapp" wrote: > Just to be sure, the wiki will be able to handle versions (releases)? > (documentation and APIs may change between releases and hence a more > recent doc page may not apply to an earlier release) > > -hilmar > > On 2/3/06, Brian Osborne wrote: >> bioperl-l, >> >> The recent work on the Bioperl Wiki moved much of the Bioperl documentation >> online. Since we cannot maintain 2 locations for all of this we?ll be >> removing a number of files from the package, specifically: >> >> biodatabases.pod >> biodesign.pod >> bioperl.pod >> bioscripts.pod >> doc/howto/* >> doc/faq/* >> FAQ >> >> Rest assured that all of these files have been gone over in detail to make >> sure that no important information was lost during the migration. All of >> this will be replaced by a single file, such as ?README.docs?, that explains >> where all the documentation is. It?s not entirely clear what will happen to >> bptutorial.pl. Moving its content to different online locations is possible >> but in this case we loose its functionality as a script. >> >> Are there any comments or questions or concerns? >> >> Brian O. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shameer at ncbs.res.in Sat Feb 4 05:15:33 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Sat, 4 Feb 2006 15:45:33 +0530 (IST) Subject: [Bioperl-l] Calpha to Co-ordinates Program In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Message-ID: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> Dear All, Any one is aware of a perl script / Bio::PERL module that can be used to construct full atomic coordinates of a protein from a given C(alpha) trace and optimizes side chain geometry. I tried the original program Maxsprout from Holms Group, But it is not giving me proper results (am getting errors like segmentation fault - backbonchain failed etc.) Since I need to use as a part of a webs server - I would appreciate if any one could let me know about a perl script for the same. Thanks and cheers in advance, -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From torsten.seemann at infotech.monash.edu.au Sat Feb 4 22:34:35 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 05 Feb 2006 14:34:35 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E3DD89.7080903@gmx.at> References: <43E3DD89.7080903@gmx.at> Message-ID: <43E5724B.5070007@infotech.monash.edu.au> Hubert, > Does anybody know whether it is possible to perform a with the > standalone blast a database search where the composition based > statistics parameter is on > and what's the abbreviation for the parameter The StandAloneBlast only runs the "blastall" binary on your system. It accepts all the command line options (like "-d" etc.) that "blastall" does but just passes them as-is; it doesn't do anything special. On a Unix system, type "blastall -" to list all the options that your BLAST binary supports. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From fernan at iib.unsam.edu.ar Sat Feb 4 23:34:27 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sun, 5 Feb 2006 01:34:27 -0300 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E3DD89.7080903@gmx.at> References: <43E3DD89.7080903@gmx.at> Message-ID: <20060205043427.GB39264@iib.unsam.edu.ar> +----[ Hubert Prielinger (03.Feb.2006 21:06): | | Hi, | Does anybody know whether it is possible to perform a with the | standalone blast a database search where the composition based | statistics parameter is on | and what's the abbreviation for the parameter | | thanks | Hubert | +----] only for tblastn. As Torsten said, 'blastall' with no arguments would have revealed it: [ ... ] -C Use composition-based statistics for tblastn: D or d: default (equivalent to F) 0 or F or f: no composition-based statistics 1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001 2: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally For programs other than tblastn, must either be absent or be D, F or 0. [String] default = D Fernan PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206) From hubert.prielinger at gmx.at Sun Feb 5 21:56:07 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 05 Feb 2006 20:56:07 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <20060205043427.GB39264@iib.unsam.edu.ar> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> Message-ID: <43E6BAC7.5050707@gmx.at> Hi, thank you very much, If I use the tblastn instead of blastp, I get the following error message [blastall] WARNING: : Unable to open nr.00.nin I looked up in the folder, but I don't have that file, and if I download the database and extract the file, it isn't there either... thanks Hubert Fernan Aguero wrote: >+----[ Hubert Prielinger (03.Feb.2006 21:06): >| >| Hi, >| Does anybody know whether it is possible to perform a with the >| standalone blast a database search where the composition based >| statistics parameter is on >| and what's the abbreviation for the parameter >| >| thanks >| Hubert >| >+----] > >only for tblastn. > >As Torsten said, 'blastall' with no arguments would have >revealed it: > >[ ... ] > -C Use composition-based statistics for tblastn: > D or d: default (equivalent to F) > 0 or F or f: no composition-based statistics > 1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001 > 2: Composition-based score adjustment as in Bioinformatics 21:902-911, > 2005, conditioned on sequence properties > 3: Composition-based score adjustment as in Bioinformatics 21:902-911, > 2005, unconditionally > For programs other than tblastn, must either be absent or be D, F or 0. > [String] > default = D > >Fernan > >PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206) >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From torsten.seemann at infotech.monash.edu.au Sun Feb 5 23:29:11 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 06 Feb 2006 15:29:11 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6BAC7.5050707@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> Message-ID: <43E6D097.7080304@infotech.monash.edu.au> Hubert > thank you very much, If I use the tblastn instead of blastp, I get the > following error message > [blastall] WARNING: : Unable to open nr.00.nin > I looked up in the folder, but I don't have that file, and if I download > the database and extract the file, it isn't there either... "tblastn" requires a NUCLEOTIDE database to search. It appears that you have specified a PROTEIN database with "-d nr" ("nr" is protein). You probably want to install the "nt" blast database and use that instead. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From hubert.prielinger at gmx.at Sun Feb 5 23:12:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 05 Feb 2006 22:12:27 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6D097.7080304@infotech.monash.edu.au> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> Message-ID: <43E6CCAB.2060107@gmx.at> dear torsten, thanks for your quick reply, I have looked up at the ftp server and there are nt.00 to nt.04. Do I have to download all of them, are there differences? thanks Hubert Torsten Seemann wrote: >Hubert > > > >>thank you very much, If I use the tblastn instead of blastp, I get the >>following error message >>[blastall] WARNING: : Unable to open nr.00.nin >>I looked up in the folder, but I don't have that file, and if I download >>the database and extract the file, it isn't there either... >> >> > >"tblastn" requires a NUCLEOTIDE database to search. It appears that you >have specified a PROTEIN database with "-d nr" ("nr" is protein). You >probably want to install the "nt" blast database and use that instead. > > > From torsten.seemann at infotech.monash.edu.au Mon Feb 6 00:22:09 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 06 Feb 2006 16:22:09 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6CCAB.2060107@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> Message-ID: <43E6DD01.2010600@infotech.monash.edu.au> Hubert > thanks for your quick reply, I have looked up at the ftp server and > there are nt.00 to nt.04. Do I have to download all of them, are there > differences? You have to download them all. The "nt" database (actually the index files) is very big, and it is split up into gigabyte (?) parts. Although they are called "nt.00" "nt.01" etc, you still pass "-d nt" to "blastall", because together these parts are one "nt" database. The "blastall" program will automatically use the separate parts; you do not have to join them. You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are using the correct BLAST search for your problem. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From shameer at ncbs.res.in Mon Feb 6 03:27:50 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 6 Feb 2006 13:57:50 +0530 (IST) Subject: [Bioperl-l] Need a slogan for OBF In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38> Dear All, As we are moving to the all new look wiki-style-web - why dont we think about a unique logo + slogan that can express our spirit and excitement ??? For Example we can have a logo with O|B|F its full form and the slogan - any body is interested - i would be happy to design logos once we have done with the logo. I have a couple of suggestions -I hope all OBF members can sent much more powerful slogans than mine 'Let's Code for Life' 'Let's Decode Life' 'Let's Recode Life' 'Code your Life ' Happy O|B|!!! -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From olsonbr2 at msu.edu Fri Feb 3 15:54:22 2006 From: olsonbr2 at msu.edu (Bradley J. S. C. Olson) Date: Fri, 3 Feb 2006 15:54:22 -0500 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? Message-ID: <005e01c62904$02b2ad30$db4c0a23@dihedral> I have been working with the RemoteBlast.pm module and have found that it is a bit clunky to use loops to keep checking to see if you RID has finished. For example, every time you write a script, you need to add a code block (see example in the documentation) in order to keep checking if @rid is finished. Would it be better to maybe write this in as a method in the RemoteBlast module? It seems like it would be better for remoteblast to have a method we could call say retrieve_when_done that would return the blast report when the value of retrieve_blast is no longer 0. The only issue may be report parsing, but I wonder if it might be better to separate out submittal/retrieval of BLAST requests from the parsing step and make these more discrete processes? Since NCBI seems to be not supporting text results as a standard, maybe the module should work exclusively with XML and we could change report handling away from the headaches of text processing and just allow Bio::SeqIO or blastxml handle the task of making a blast reports into different forms (such as HTML, text etc). This would definitely simplifying coding using the RemoteBlast.pm module as then you could treat the report retrieval process as an object and just wait for the object to return its value, instead of coding in a bunch of test loops to see if it is done. This may also help keep bugs out of the module and make the module longer lasting and not require module users to rewrite their code every time NCBI makes changes. Any thoughts or ideas? Is anyone working on this? Thanks Brad Olson -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 From cjfields at uiuc.edu Mon Feb 6 12:27:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 6 Feb 2006 11:27:56 -0600 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter themethod? In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral> Message-ID: <002c01c62b42$ab7671a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson > Sent: Friday, February 03, 2006 2:54 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter > themethod? > > I have been working with the RemoteBlast.pm module and have found that it > is > a bit clunky to use loops to keep checking to see if you RID has finished. > > > > For example, every time you write a script, you need to add a code block > (see example in the documentation) in order to keep checking if @rid is > finished. > > Would it be better to maybe write this in as a method in the RemoteBlast > module? It seems like it would be better for remoteblast to have a method > we could call say retrieve_when_done that would return the blast report > when > the value of retrieve_blast is no longer 0. Sounds reasonable, though I'm not sure how easy it would be to implement. Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as an enhancement? > The only issue may be report parsing, but I wonder if it might be better > to > separate out submittal/retrieval of BLAST requests from the parsing step > and > make these more discrete processes? Since NCBI seems to be not supporting > text results as a standard, maybe the module should work exclusively with > XML and we could change report handling away from the headaches of text > processing and just allow Bio::SeqIO or blastxml handle the task of making > a > blast reports into different forms (such as HTML, text etc). They are separated. RemoteBlast executes BLAST remotely (via HTTP). Results are parsed via various Bio::SearchIO modules depending on what you set '-readmethod' to. This is from perldoc: >From Bio::Tools::Run::RemoteBlast ________________________________________________________ DESCRIPTION Class for remote execution of the NCBI Blast via HTTP. For a description of the many CGI parameters see: http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html Various additional options and input formats are available. ________________________________________________________ >From Bio::SearchIO____________ ____________________________________________ DESCRIPTION This is a driver for instantiating a parser for report files from sequence database searches. This object serves as a wrapper for the format parsers in Bio::SearchIO::* - you should not need to ever use those format parsers directly. (For people used to the SeqIO system it, we are deliberately using the same pattern). Once you get a SearchIO object, calling next_result() gives you back a Bio::Search::Result::ResultI compliant object, which is an object that represents one Blast/Fasta/HMMER whatever report. A list of module names and formats is below: blast BLAST (WUBLAST, NCBIBLAST,bl2seq) fasta FASTA -m9 and -m0 blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular) megablast MEGABLAST psl UCSC PSL format waba WABA output axt AXT format sim4 Sim4 hmmer HMMER hmmpfam and hmmsearch exonerate Exonerate CIGAR and VULGAR format blastxml NCBI BLAST XML wise Genewise -genesf format See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/ ________________________________________________________ This is also in the wiki online now: http://www.bioperl.org/wiki/Module:Bio::SearchIO http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast I think the current line of thought is to make XML the default, but I also know you would irritate a LOT of people out there by cutting off text output parsing completely. Roger Hall or Jason pointed out that doing so will break many scripts out there. Furthermore, the problems with text output parsing are usually minimal. For instance, the last one was a small change which broke a regex, causing an infinite loop; the actual bug was in Bio::SearchIO::blast and not in RemoteBlast. A simple addition to the regex fixed it. The only change to RemoteBlast was to implement the option of saving XML formatted BLAST output. I do like the idea of using XML output to build custom (bioperl-specific) BLAST reports, but that also requires more work, likely a lot more work. Again, maybe add that as an enhancement in Bugzilla or, better yet, submit some sample code maybe as an example. > This would definitely simplifying coding using the RemoteBlast.pm module > as > then you could treat the report retrieval process as an object and just > wait > for the object to return its value, instead of coding in a bunch of test > loops to see if it is done. This may also help keep bugs out of the > module > and make the module longer lasting and not require module users to rewrite > their code every time NCBI makes changes. I think the most stable way of submitting jobs is by using the netblast client (blastcl3) and parsing the results from that. No CGI, no HTML, just saving to a temp file and parsing through SearchIO. RemoteBlast was designed, I believe, with the idea of letting researchers with some basic knowledge of perl use an interface familiar to them (i.e. the BLAST interface at NCBI) and retrieve results on a regular basis. The results are parsed via SearchIO::blast/blastxml/blasttable. The problem is, though convenient, RemoteBlast is also reliant on the powers that be at NCBI not changing anything dramatically. It is possible that NCBI could modify the HTML code from the BLAST retrieval process, thus breaking RemoteBlast. Text output could change again, even more dramatically, thus severely breaking Bio::SearchIO::blast. Thus, we adapt to those changes by modifying the broken modules. It's evolution at its finest. It's also a fact of life that code breaks and needs to be fixed every once in a while to stay current. Okay, I'm waxing philosophical now so I know I've definitely had too much coffee. Must get back to work... > > > > Any thoughts or ideas? > > > > Is anyone working on this? > > > > Thanks > > > > Brad Olson > > > > > > > -- > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From roger at iosea.com Mon Feb 6 13:14:11 2006 From: roger at iosea.com (Roger Hall) Date: Mon, 6 Feb 2006 12:14:11 -0600 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral> Message-ID: <000f01c62b49$25732d30$4301a8c0@LIBERAL> Brad, I decided to fix this module about ten days ago, and then was out all of last week with Strep plus a virus or two - it's one of the advantages of having young kids. I see that there have been quite a few messages about this module in just the last week. I am sitting down now to read through them. I'll get back to you (and the list) ASAP. If you have any other questions or suggestions about RemoteBlast, feel free to bug me with 'em. Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson Sent: Friday, February 03, 2006 2:54 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? I have been working with the RemoteBlast.pm module and have found that it is a bit clunky to use loops to keep checking to see if you RID has finished. For example, every time you write a script, you need to add a code block (see example in the documentation) in order to keep checking if @rid is finished. Would it be better to maybe write this in as a method in the RemoteBlast module? It seems like it would be better for remoteblast to have a method we could call say retrieve_when_done that would return the blast report when the value of retrieve_blast is no longer 0. The only issue may be report parsing, but I wonder if it might be better to separate out submittal/retrieval of BLAST requests from the parsing step and make these more discrete processes? Since NCBI seems to be not supporting text results as a standard, maybe the module should work exclusively with XML and we could change report handling away from the headaches of text processing and just allow Bio::SeqIO or blastxml handle the task of making a blast reports into different forms (such as HTML, text etc). This would definitely simplifying coding using the RemoteBlast.pm module as then you could treat the report retrieval process as an object and just wait for the object to return its value, instead of coding in a bunch of test loops to see if it is done. This may also help keep bugs out of the module and make the module longer lasting and not require module users to rewrite their code every time NCBI makes changes. Any thoughts or ideas? Is anyone working on this? Thanks Brad Olson -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.m.dancis at gsk.com Mon Feb 6 12:17:13 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Mon, 6 Feb 2006 12:17:13 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: <003701c625c4$5527d790$2f01a8c0@GOLHARMOBILE1> Message-ID: Hi -- Are there any classes for manipulating miRNA's with functions such as parsing the name, storing and interlinking pri/pre/mat sequences, etc? Thanks, Barry From hubert.prielinger at gmx.at Mon Feb 6 18:16:01 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 06 Feb 2006 17:16:01 -0600 Subject: [Bioperl-l] no results with standalone tblastn In-Reply-To: <43E6DD01.2010600@infotech.monash.edu.au> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au> Message-ID: <43E7D8B1.5030307@gmx.at> dear torsten, I have downloaded all the databases, as you recommended me. And it is working, but I don't get any results, if I try it online it works fine. my result file looks like that: TBLASTN 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (8 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 3,749,503 sequences; 16,556,997,203 total letters Searching..................................................done Sequences producing significant alignments: Score E (bits) Value the program code for it looks like that: #!/usr/local/bin/perl -w BEGIN { $ENV{BLASTDIR}= "/home/Hubert/blast/blast-2.2.13/bin"; $ENV{BLASTDATADIR}= "/home/Hubert/blast/blast-2.2.13/data"; } use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; use Bio::SeqIO; use strict; print "Please insert matrix:\t"; my $matrix_STD = ; chomp $matrix_STD; print "Please insert count:\t"; my $count_STD = ; chomp $count_STD; # parameters my $expect_value = 20000; #my $filter_query_sequence = 'T'; my $one_line_description = 1000; my $alignments = 1000; #my $matrix = 'BLOSUM80'; my $gapcost = 10; my $gapextend = 1; my $wordsize = 2; #my $compbasedStat = '1'; #my $count = 1; # my $strands = 1; my @params = ('program' => 'tblastn','database' => 'nt'); #my $progress_interval = 100; my $seqio_obj = Bio::SeqIO->new( -file => "aloneblosum62.txt", -format => "raw", ); # create factory object and set parameters my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); print "submitted parameters successfully \n"; $factory->e($expect_value); #$factory->F($filter_query_sequence); $factory->v($one_line_description); $factory->b($alignments); $factory->M($matrix_STD); $factory->G($gapcost); $factory->E($gapextend); $factory->W($wordsize); #$factory->C($compbasedStat); #$factory->S($strands); print "changed parameters successfully \n"; print "\n"; # get query while ( my $query = $seqio_obj->next_seq) { print "entered while loop \n"; my $blast_report = $factory->blastall($query); # print "$blast_report\n"; $factory->outfile("nucleo80$count_STD.txt"); $count_STD++; print $query->seq; print "\n"; } thanks Hubert Torsten Seemann wrote: >Hubert > > > >>thanks for your quick reply, I have looked up at the ftp server and >>there are nt.00 to nt.04. Do I have to download all of them, are there >>differences? >> >> > >You have to download them all. The "nt" database (actually the index >files) is very big, and it is split up into gigabyte (?) parts. Although >they are called "nt.00" "nt.01" etc, you still pass "-d nt" to >"blastall", because together these parts are one "nt" database. The >"blastall" program will automatically use the separate parts; you do not >have to join them. > >You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are >using the correct BLAST search for your problem. > > > From torsten.seemann at infotech.monash.edu.au Mon Feb 6 21:17:40 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 07 Feb 2006 13:17:40 +1100 Subject: [Bioperl-l] no results with standalone tblastn In-Reply-To: <43E7D8B1.5030307@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au> <43E7D8B1.5030307@gmx.at> Message-ID: <43E80344.5090207@infotech.monash.edu.au> > I have downloaded all the databases, as you recommended me. And it is > working, but I don't get any results, if I try it online it works fine. > my result file looks like that: > > TBLASTN 2.2.13 [Nov-27-2005] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (8 letters) > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) > 3,749,503 sequences; 16,556,997,203 total letters > Searching..................................................done > Sequences producing significant alignments: Score > E (bits) Value Is your query only 8 amino acids long? This report looks like it did have alignments that were not displayed, otherwise it would print "**** No hits ****". This mailing list is not here to solve your BLAST problems unless it is a problem with the Perl module running BLAST. You first need to try and get your problem working on the command line *without* Perl. eg. /home/Hubert/blast/blast-2.2.13/bin/blastall -p tblastn -d nt -i YOUR_FASTA_FILE_WITH_SEQUENCE_IN_IT -o OUTPUT_FILE.txt -e 0.001 ... where "..." is the rest of the options you are setting in your Perl script. If it doesn't work that way, it will never work in Perl. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From rahall2 at ualr.edu Mon Feb 6 21:46:44 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Mon, 6 Feb 2006 20:46:44 -0600 Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply Message-ID: <002001c62b90$bb9dbe00$4301a8c0@LIBERAL> To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From osborne1 at optonline.net Tue Feb 7 12:05:56 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 07 Feb 2006 12:05:56 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: Barry, If the sequence information is in one of the formats that Bioperl understands (Genbank, Swissprot flat, and so on) then the answer is yes. This assumes that the details on sequence that you mentioned are found in some sequence feature section in the file. But it looks to me like there's no specialized parser for miRNA sequence per se, I'll be corrected if I'm wrong. Brian O. On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" wrote: > Hi -- > > Are there any classes for manipulating miRNA's with functions such > as parsing the name, storing and interlinking pri/pre/mat sequences, etc? > > Thanks, > > Barry > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.m.dancis at gsk.com Tue Feb 7 15:26:27 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Tue, 7 Feb 2006 15:26:27 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: It's the parser in particular that I need "Brian Osborne" Sent by: bioperl-l-bounces at lists.open-bio.org 07-Feb-2006 12:05 To barry.m.dancis at gsk.com, "bioperl-l" , bioperl-l-bounces at lists.open-bio.org cc Subject Re: [Bioperl-l] Handling miRNA's Barry, If the sequence information is in one of the formats that Bioperl understands (Genbank, Swissprot flat, and so on) then the answer is yes. This assumes that the details on sequence that you mentioned are found in some sequence feature section in the file. But it looks to me like there's no specialized parser for miRNA sequence per se, I'll be corrected if I'm wrong. Brian O. On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" wrote: > Hi -- > > Are there any classes for manipulating miRNA's with functions such > as parsing the name, storing and interlinking pri/pre/mat sequences, etc? > > Thanks, > > Barry > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From deep.raman at gmail.com Tue Feb 7 15:16:48 2006 From: deep.raman at gmail.com (Raman Deep Singh) Date: Wed, 8 Feb 2006 01:46:48 +0530 Subject: [Bioperl-l] Needed help Message-ID: Hi all I have a huge task of retrieving a number of sequences from the swiss prot databases on some fixed criteria. FOr that i want to index the swiss prot database on my local disk. I have downloaded the whole swiss prot database on my local disc (the january 2006 release). I am currently using the bioperl on linux machine . I am using the code listed below ======================= use Bio::Index::Swissprot; my $Index_File_Name = shift; my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name, '-write_flag' => 'WRITE'); $inx->make_index(@ARGV); ----------------------------------------- # Print out several sequences present in the index # in gcg format use Bio::Index::Swissprot; use Bio::SeqIO; my $out = Bio::SeqIO->new( '-format' => 'gcg', '-fh' => \*STDOUT ); my $Index_File_Name = shift; my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name); foreach my $id (@ARGV) { my $seq = $inx->fetch($id); # Returns Bio::Seq object $out->write_seq($seq); } # alternatively my $seq1 = $inx->get_Seq_by_id($id); my $seq2 = $inx->get_Seq_by_acc($acc); -- ------------------------------- i am running teh script as perl getseqfromid.pl sample.dat from the shell and i am getting this error repeatedly ------------- EXCEPTION ------------- MSG: Can't open 'DB_File' dbm file 'swiss100.dat' : No such file or directory STACK Bio::Index::Abstract::open_dbm /usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:389 STACK Bio::Index::Abstract::new /usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 STACK Bio::Index::AbstractSeq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 STACK toplevel i.pl:6 -------------------------- At some place online, i also found some document that some variables need to be exported. I also did the same but still got teh same errors kindly help Ramandeep Singh From cjfields at uiuc.edu Tue Feb 7 17:40:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 16:40:15 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <007701c62c37$7914af60$15327e82@pyrimidine> Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then listen to Brian. If you are talking about output, then we need to know which program you're using, as a parser may exist or could be built. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Feb 7 18:06:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 17:06:21 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <000001c62c3b$1c6017b0$15327e82@pyrimidine> Sorry if this gets posted twice. Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then Brian's right. If you are talking about output, then we need to know which program you're using, as a parser may exist, or prbably could be built from and existing one. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paul.boutros at utoronto.ca Tue Feb 7 20:38:42 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Tue, 7 Feb 2006 20:38:42 -0500 Subject: [Bioperl-l] (no subject) Message-ID: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Hi Roger, I would definitely prefer a fully Perl-based implementation. For starters, I have not been successful in compiling the Toolkit that contains netblast for some platforms (e.g. AIX 5.2 w/gcc 4.0). I haven't been following the discussion: is there some compelling reason to prefer a netblast-based system that's come up recently? I'm guessing that adding a new non-perl dependency would only be done if there was considerable justification for this type of change, but I'm not clear from your message what that justification is. Paul ------------------------------ Message: 12 Date: Mon, 6 Feb 2006 20:46:44 -0600 From: "Roger Hall" Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply To: Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> Content-Type: text/plain; charset="us-ascii" To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From cjfields at uiuc.edu Tue Feb 7 23:52:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 22:52:36 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) Message-ID: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> I want to submit a module for parsing RNAMotif output (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning output and returning Bio::SeqFeature::Generic objects with added tags for descriptors/sequences/file info. I'm in the process of writing up tests and going through biodesign to make sure everything's kosher, but the module itself is essentially ready-to-go. What should I do next? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rahall2 at ualr.edu Wed Feb 8 00:16:44 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Tue, 7 Feb 2006 23:16:44 -0600 Subject: [Bioperl-l] RemoteBlast [was: (no subject)] In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Message-ID: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Paul, I think that most core Bioperl folks have long since moved away from RemoteBlast and are using the functionality in StandAloneBlast to run their own local servers. More importantly, they are, in general, researchers who are coming to Bioinformatics from the life sciences side, and are particularly tired of dealing with the technical issues that RemoteBlast consistently generates due to changes in the text-formatted BLAST reports. They aren't code-for-code-sake geeks like me. ;} When RemoteBlast was written, XML was barely on the technology radar, and XML-formatted BLAST reports weren't even available. It seems that everyone recognizes that the XML reports now generated by NCBI's blast server is the wave of the future, but I think there is still some concern that not every flavor of BLAST produces XML yet. Even so, the XML parser is considered to be very strong, and only helps hasten the end of text-formatted support, since parsing text-formatted reports is the primary source of pain. In discussing the shift from old to new, I think the idea of relying on NCBI's application (and NCBI's issue system and NCBI's developers) entered the realm of possibility, so as the guy who just showed up to adopt RemoteBlast, I am trying to air all options and beg for all requirements. Personally, I am okay with the idea of maintaining text-formatted report parsing, but like I said, I'm pound foolish about code sometimes. Additional foolishness arises from the fact that the first money I earned in Bioinformatics was on a contract gig where I relied on RemoteBlast (and the related text parsers). For my money, I just needed anyone, anywhere, to say they desired a pure perl implementation to meet my personal threshold. So far, you're the second. ;} I do, however, see the advantage in shifting to XML-formatted reporting and parsing *only* as soon as every BLAST flavor supports it, if not before. (Anyone - is this still an issue. Please educate me.) At the moment, I'm leaning towards adding an option to RemoteBlast. The default (no option) would use a "pure perl" implementation, and the enhancement (with explicit option) would merely wrap the NCBI executable. However, there are other issues (queuing, batches) that I don't fully understand in context, so I haven't zeroed in on a complete recommendation yet. Additionally, the end of text-formatted reports, while drawing near, is not yet agreed, although it is pretty clear that the only way text support will be continued is if I insist on it and then deliver the support myself. :} In any case, I am very interested in a pure perl implementation for exactly the two reasons stated thus far: it's one less thing for a newbie to worry about, and it will run on every platform that runs perl. Thanks much for the input! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Paul Boutros Sent: Tuesday, February 07, 2006 7:39 PM To: BioPerl Mailing List Cc: Roger Hall Subject: [Bioperl-l] (no subject) Hi Roger, I would definitely prefer a fully Perl-based implementation. For starters, I have not been successful in compiling the Toolkit that contains netblast for some platforms (e.g. AIX 5.2 w/gcc 4.0). I haven't been following the discussion: is there some compelling reason to prefer a netblast-based system that's come up recently? I'm guessing that adding a new non-perl dependency would only be done if there was considerable justification for this type of change, but I'm not clear from your message what that justification is. Paul ------------------------------ Message: 12 Date: Mon, 6 Feb 2006 20:46:44 -0600 From: "Roger Hall" Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply To: Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> Content-Type: text/plain; charset="us-ascii" To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Wed Feb 8 01:53:58 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 8 Feb 2006 08:53:58 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: <200602080853.58889.heikki@sanbi.ac.za> Chris, Post your files to bugzilla (ticket type enhancement, add files to ticket after creation) and someone with commit ability will add them to CVS once the code is in satisfactory condition. Thanks, -Heikki On Wednesday 08 February 2006 06:52, Chris Fields wrote: > I want to submit a module for parsing RNAMotif output > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > output and returning Bio::SeqFeature::Generic objects with added tags > for descriptors/sequences/file info. I'm in the process of writing > up tests and going through biodesign to make sure everything's > kosher, but the module itself is essentially ready-to-go. What > should I do next? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Wed Feb 8 00:48:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Feb 2006 21:48:40 -0800 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: I presume you don't have a cvs write account yet - if you do just add and commit the module and test. Otherwise could you post the POD to the list please; either somebody with an account will hopefully volunteer or Jason or I or Heikki or Aaron will assume mentorship and commit the code with feedback to you. Unless you completely refuse to heed any and all advice ;) that person will then soon try to absolve him/herself of having to do this again for you and support you for receiving a cvs write account of your own. -hilmar On 2/7/06, Chris Fields wrote: > I want to submit a module for parsing RNAMotif output > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > output and returning Bio::SeqFeature::Generic objects with added tags > for descriptors/sequences/file info. I'm in the process of writing > up tests and going through biodesign to make sure everything's > kosher, but the module itself is essentially ready-to-go. What > should I do next? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Wed Feb 8 07:57:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 06:57:46 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: I'll probably goes with Heikki's advice and post the module (with POD, tests, and test file) to bugzilla as an enhancement. That way it can be looked through before committing. I will likely have a few more modules for ERPIN and maybe Infernal int he next few months (if I can get it up and running). Also, completely off-topic, I'll post what I have written up for installing bioperl-db on WinXP here soon. I think it should probably be included in the wiki in some way, maybe as a link from the bioperl- db wiki page. Thanks Hilmar, Heikki! Chris On Feb 7, 2006, at 11:48 PM, Hilmar Lapp wrote: > I presume you don't have a cvs write account yet - if you do just add > and commit the module and test. Otherwise could you post the POD to > the list please; either somebody with an account will hopefully > volunteer or Jason or I or Heikki or Aaron will assume mentorship and > commit the code with feedback to you. Unless you completely refuse to > heed any and all advice ;) that person will then soon try to absolve > him/herself of having to do this again for you and support you for > receiving a cvs write account of your own. > > -hilmar > > On 2/7/06, Chris Fields wrote: >> I want to submit a module for parsing RNAMotif output >> (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning >> output and returning Bio::SeqFeature::Generic objects with added tags >> for descriptors/sequences/file info. I'm in the process of writing >> up tests and going through biodesign to make sure everything's >> kosher, but the module itself is essentially ready-to-go. What >> should I do next? >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 8 10:32:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 09:32:25 -0600 Subject: [Bioperl-l] RemoteBlast [was: (no subject)] In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Message-ID: <000401c62cc4$de0cc9b0$15327e82@pyrimidine> Roger, It might be better to build a wrapper for the blastcl3 and make it a separate Bio::Tools::Run module, maybe branch it off from RemoteBlast or, better yet, StandAloneBlast. All the put/get parameters in the BEGIN{} block for RemoteBlast look like they are configured for NCBI's HTTP submission via CGI; I don't think you can use these for blastcl3. Ergo, you'll have to create a whole new set of hashes or parameter arrays inside RemoteBlast just for blastcl3 since everything is passed via command-line flags, like so (from http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html): blastcl3 -p blastp -d nr -i MY_QUEYR -o MY_QUERY.out However, StandAloneBlast looks like it has all the parameters mapped out in the BEGIN{} block. And it looks like the command line options support just about everything you get via the web version. It probably wouldn't take much modification from StandAloneBlast to get it to run blastcl3. As for queueing, I don't think it's supported, though you can send in a FASTA file with multiple sequences for multiple BLAST queries (I tried this and it works). You could also create a queue using a sequence factory, sending them to the netblast client one at a time, though I'd suggest putting a delay in between cycles in that case so as not to make the guys at NCBI cranky. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Roger Hall > Sent: Tuesday, February 07, 2006 11:17 PM > To: Paul.Boutros at utoronto.ca; 'BioPerl Mailing List' > Subject: Re: [Bioperl-l] RemoteBlast [was: (no subject)] > > Paul, > > I think that most core Bioperl folks have long since moved > away from RemoteBlast and are using the functionality in > StandAloneBlast to run their own local servers. More > importantly, they are, in general, researchers who are coming > to Bioinformatics from the life sciences side, and are > particularly tired of dealing with the technical issues that > RemoteBlast consistently generates due to changes in the > text-formatted BLAST reports. > > They aren't code-for-code-sake geeks like me. ;} > > When RemoteBlast was written, XML was barely on the > technology radar, and XML-formatted BLAST reports weren't > even available. It seems that everyone recognizes that the > XML reports now generated by NCBI's blast server is the wave > of the future, but I think there is still some concern that > not every flavor of BLAST produces XML yet. Even so, the XML > parser is considered to be very strong, and only helps hasten > the end of text-formatted support, since parsing > text-formatted reports is the primary source of pain. > > In discussing the shift from old to new, I think the idea of > relying on NCBI's application (and NCBI's issue system and > NCBI's developers) entered the realm of possibility, so as > the guy who just showed up to adopt RemoteBlast, I am trying > to air all options and beg for all requirements. > > Personally, I am okay with the idea of maintaining > text-formatted report parsing, but like I said, I'm pound > foolish about code sometimes. Additional foolishness arises > from the fact that the first money I earned in Bioinformatics > was on a contract gig where I relied on RemoteBlast (and the > related text parsers). > > For my money, I just needed anyone, anywhere, to say they > desired a pure perl implementation to meet my personal > threshold. So far, you're the second. ;} > > I do, however, see the advantage in shifting to XML-formatted > reporting and parsing *only* as soon as every BLAST flavor > supports it, if not before. > (Anyone - is this still an issue. Please educate me.) > > At the moment, I'm leaning towards adding an option to > RemoteBlast. The default (no option) would use a "pure perl" > implementation, and the enhancement (with explicit option) > would merely wrap the NCBI executable. > However, there are other issues (queuing, batches) that I > don't fully understand in context, so I haven't zeroed in on > a complete recommendation yet. Additionally, the end of > text-formatted reports, while drawing near, is not yet > agreed, although it is pretty clear that the only way text > support will be continued is if I insist on it and then > deliver the support myself. > :} > > In any case, I am very interested in a pure perl > implementation for exactly the two reasons stated thus far: > it's one less thing for a newbie to worry about, and it will > run on every platform that runs perl. > > Thanks much for the input! > > Roger Hall > Technical Director > MidSouth Bioinformatics Center > University of Arkansas at Little Rock > (501) 569-8074 > > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul Boutros > Sent: Tuesday, February 07, 2006 7:39 PM > To: BioPerl Mailing List > Cc: Roger Hall > Subject: [Bioperl-l] (no subject) > > Hi Roger, > > I would definitely prefer a fully Perl-based implementation. > For starters, I have not been successful in compiling the > Toolkit that contains netblast for some platforms (e.g. > AIX 5.2 w/gcc 4.0). > > I haven't been following the discussion: is there some > compelling reason to prefer a netblast-based system that's > come up recently? I'm guessing that adding a new non-perl > dependency would only be done if there was considerable > justification for this type of change, but I'm not clear from > your message what that justification is. > > Paul > > > > ------------------------------ > > Message: 12 > Date: Mon, 6 Feb 2006 20:46:44 -0600 > From: "Roger Hall" > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > please reply > To: > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > Content-Type: text/plain; charset="us-ascii" > > To everyone who uses RemoteBlast.pm: > > Would anyone object to RemoteBlast being rewritten in a way > that requires NCBI's blastcl3 executable? > > Binary downloads of blastcl3 (column "netblast") are > available for numerous platforms at: > http://ncbi.nih.gov/BLAST/download.shtml > > Does anyone require or desire a "pure perl" implementation? > If so, please explain the advantage you see with such an > implementation. > > Thanks! > > > Roger Hall > > Technical Director > > MidSouth Bioinformatics Center > > University of Arkansas at Little Rock > > (501) 569-8074 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Wed Feb 8 15:51:41 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 14:51:41 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output Message-ID: <43EA59DD.1030608@gmx.at> Hi, If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO, I get the following error message: MSG: no data for midline Query 1 WWWKWRW 7 STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 STACK toplevel /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 is that a bug...... If I want to parse Blast Output (version 2.2.13), I don't get anything..... I'm using bioperl 1.4 before, I have installed bioperl 1.4, it worked fine parsing Blast Output (version 2.2.12), but I don't remember which bioperl version I had installed thanks in advance Hubert From cjfields at uiuc.edu Wed Feb 8 17:15:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 16:15:23 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA59DD.1030608@gmx.at> Message-ID: <001101c62cfd$28605df0$15327e82@pyrimidine> My guess is you're running into text parsing problems in Bio::SearchIO::blast. Upgrade to the latest developer version (1.5.1) or bioperl-live (CVS), then see the bug below. http://bugzilla.bioperl.org/show_bug.cgi?id=1934 I think the first problem you ran into is solved in bioperl 1.5.1, the last problem (more recent, not related to the first) has been fixed but hasn't been committed to bioperl-live yet. The fixed SearchIO::blast is available in the link above, but realize it hasn't been committed yet and may change. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Hubert Prielinger > Sent: Wednesday, February 08, 2006 2:52 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > Hi, > If I want to parse a Blast Output (Version 2.2.12) with > Bio::SearchIO, I get the following error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > is that a bug...... > > If I want to parse Blast Output (version 2.2.13), I don't get > anything..... > I'm using bioperl 1.4 > > before, I have installed bioperl 1.4, it worked fine parsing > Blast Output (version 2.2.12), but I don't remember which > bioperl version I had installed > > thanks in advance > > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Wed Feb 8 16:41:04 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 15:41:04 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001101c62cfd$28605df0$15327e82@pyrimidine> References: <001101c62cfd$28605df0$15327e82@pyrimidine> Message-ID: <43EA6570.9070909@gmx.at> hi chris, thanks, I have upgraded to version 1.5.1 but it isn't still working, do you have any ohter idea, the problem I have is that I have to parse a lot of textfiles.... or shall I look for another option to parse those files... regards Hubert Chris Fields wrote: >My guess is you're running into text parsing problems in >Bio::SearchIO::blast. Upgrade to the latest developer version (1.5.1) or >bioperl-live (CVS), then see the bug below. > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >I think the first problem you ran into is solved in bioperl 1.5.1, the last >problem (more recent, not related to the first) has been fixed but hasn't >been committed to bioperl-live yet. The fixed SearchIO::blast is available >in the link above, but realize it hasn't been committed yet and may change. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Hubert Prielinger >>Sent: Wednesday, February 08, 2006 2:52 PM >>To: bioperl-l at bioperl.org >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>Hi, >>If I want to parse a Blast Output (Version 2.2.12) with >>Bio::SearchIO, I get the following error message: >> >>MSG: no data for midline Query 1 WWWKWRW 7 >>STACK Bio::SearchIO::blast::next_result >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>STACK toplevel >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>is that a bug...... >> >>If I want to parse Blast Output (version 2.2.13), I don't get >>anything..... >>I'm using bioperl 1.4 >> >>before, I have installed bioperl 1.4, it worked fine parsing >>Blast Output (version 2.2.12), but I don't remember which >>bioperl version I had installed >> >>thanks in advance >> >>Hubert >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From cjfields at uiuc.edu Wed Feb 8 18:00:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 17:00:21 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6570.9070909@gmx.at> Message-ID: <001201c62d03$703178c0$15327e82@pyrimidine> Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not just the modules you want; mixing bioperl versions might work, but you might run into interoperability problems). Then replace the Bio::SearchIO::blast with the one in Bugzilla. The 'other option' you mentioned might be trying XML instead of text, which is more stable in the long run. You will still need to run a full upgrade to bioperl 1.5.1 for that; make sure you read this: http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast If you're using SearchIO directly instead of Remoteblast, you should be able to set the '-readmethod' flag to 'blastxml'. It also wouldn't hurt to know what OS you're using or see some code. Roger is out there somewhere (I think) and may also have some input. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] > Sent: Wednesday, February 08, 2006 3:41 PM > To: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > hi chris, > thanks, I have upgraded to version 1.5.1 but it isn't still > working, do you have any ohter idea, the problem I have is > that I have to parse a lot of textfiles.... > or shall I look for another option to parse those files... > > regards > Hubert > > > > Chris Fields wrote: > > >My guess is you're running into text parsing problems in > >Bio::SearchIO::blast. Upgrade to the latest developer > version (1.5.1) > >or bioperl-live (CVS), then see the bug below. > > > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > >I think the first problem you ran into is solved in bioperl > 1.5.1, the > >last problem (more recent, not related to the first) has > been fixed but > >hasn't been committed to bioperl-live yet. The fixed > SearchIO::blast > >is available in the link above, but realize it hasn't been > committed yet and may change. > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces at lists.open-bio.org > >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>Prielinger > >>Sent: Wednesday, February 08, 2006 2:52 PM > >>To: bioperl-l at bioperl.org > >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>output > >> > >>Hi, > >>If I want to parse a Blast Output (Version 2.2.12) with > Bio::SearchIO, > >>I get the following error message: > >> > >>MSG: no data for midline Query 1 WWWKWRW 7 > >>STACK Bio::SearchIO::blast::next_result > >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>STACK toplevel > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >>is that a bug...... > >> > >>If I want to parse Blast Output (version 2.2.13), I don't get > >>anything..... > >>I'm using bioperl 1.4 > >> > >>before, I have installed bioperl 1.4, it worked fine parsing Blast > >>Output (version 2.2.12), but I don't remember which bioperl > version I > >>had installed > >> > >>thanks in advance > >> > >>Hubert > >> > >> > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > > > From hubert.prielinger at gmx.at Wed Feb 8 17:22:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 16:22:44 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001201c62d03$703178c0$15327e82@pyrimidine> References: <001201c62d03$703178c0$15327e82@pyrimidine> Message-ID: <43EA6F34.4090007@gmx.at> hi, I have installed from the following page: http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. I'm using only the SearchIO without remoteblast module, because I have already all my Blast output files. My operating system is fedora core 9. Code: #!/usr/bin/perl -w use Bio::SearchIO; print "start program\n"; my $directory = "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { print "read file\n"; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); my $cutoff_len = 10; #iterate over each query sequence while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { # Print some tab-delimited data about this HSP open (bigShot, ">>BlastOutputTrial.txt") || die ("Could not open file. $!"); #print $result->query_name, "\t"; # print $hit->significance, "\t"; print bigShot $hit->name, "-->"; print bigShot $hit->description, "\n"; #print bigShot "Query: ", $hsp->start('query'), " ", $hsp->query_string, " ", $hsp->end('query'), "\n"; print bigShot "Seq: ", $hsp->start('hit'), " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; # print $hsp->rank, "\t"; # print $hsp->percent_identity, "\t"; # print $hsp->evalue, "\t"; # print $hsp->hsp_length, "\n"; close (bigShot); }; } } } } } } closedir(DIR); Chris Fields wrote: >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not >just the modules you want; mixing bioperl versions might work, but you might >run into interoperability problems). Then replace the Bio::SearchIO::blast >with the one in Bugzilla. The 'other option' you mentioned might be trying >XML instead of text, which is more stable in the long run. You will still >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read >this: > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > >If you're using SearchIO directly instead of Remoteblast, you should be able >to set the '-readmethod' flag to 'blastxml'. > >It also wouldn't hurt to know what OS you're using or see some code. Roger >is out there somewhere (I think) and may also have some input. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>Sent: Wednesday, February 08, 2006 3:41 PM >>To: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>hi chris, >>thanks, I have upgraded to version 1.5.1 but it isn't still >>working, do you have any ohter idea, the problem I have is >>that I have to parse a lot of textfiles.... >>or shall I look for another option to parse those files... >> >>regards >>Hubert >> >> >> >>Chris Fields wrote: >> >> >> >>>My guess is you're running into text parsing problems in >>>Bio::SearchIO::blast. Upgrade to the latest developer >>> >>> >>version (1.5.1) >> >> >>>or bioperl-live (CVS), then see the bug below. >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>I think the first problem you ran into is solved in bioperl >>> >>> >>1.5.1, the >> >> >>>last problem (more recent, not related to the first) has >>> >>> >>been fixed but >> >> >>>hasn't been committed to bioperl-live yet. The fixed >>> >>> >>SearchIO::blast >> >> >>>is available in the link above, but realize it hasn't been >>> >>> >>committed yet and may change. >> >> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>To: bioperl-l at bioperl.org >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> >>>> >>parsing Blast >> >> >>>>output >>>> >>>>Hi, >>>>If I want to parse a Blast Output (Version 2.2.12) with >>>> >>>> >>Bio::SearchIO, >> >> >>>>I get the following error message: >>>> >>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>STACK Bio::SearchIO::blast::next_result >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>STACK toplevel >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>is that a bug...... >>>> >>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>anything..... >>>>I'm using bioperl 1.4 >>>> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast >>>>Output (version 2.2.12), but I don't remember which bioperl >>>> >>>> >>version I >> >> >>>>had installed >>>> >>>>thanks in advance >>>> >>>>Hubert >>>> >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> >>> > > > > From rahall2 at ualr.edu Wed Feb 8 18:34:45 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Wed, 8 Feb 2006 17:34:45 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6F34.4090007@gmx.at> Message-ID: <000401c62d08$3ede6b70$4301a8c0@LIBERAL> Hubert, Give me a bit to look over your code and think this through. I am still re-familiarizing myself with the relevant modules, so I can't give an answer off the top of my head. Also, please send me one or more of your blast reports (zipped) if you don't mind (and maybe avoid including the list in your reply). Let's take this "offline" relative to the list - we'll include the list again if there is a Bioperl issue and solution. (In case you are concerned at all, I promise not to share or study the actual BLAST results.) I'm not particularly familiar with the Fedora distributions, but I'm sure I can either chase down the perl problem or at least eliminate everything else but Fedora as the culprit. ;} (Chris - I'm not quite paying attention on an hourly basis yet, but I do intend to help support these issues for the foreseeable future. Thanks as always for the assist.) Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 -----Original Message----- From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] Sent: Wednesday, February 08, 2006 4:23 PM To: Chris Fields; bioperl-l at bioperl.org; rahall2 at ualr.edu Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output hi, I have installed from the following page: http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. I'm using only the SearchIO without remoteblast module, because I have already all my Blast output files. My operating system is fedora core 9. Code: #!/usr/bin/perl -w use Bio::SearchIO; print "start program\n"; my $directory = "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { print "read file\n"; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); my $cutoff_len = 10; #iterate over each query sequence while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { # Print some tab-delimited data about this HSP open (bigShot, ">>BlastOutputTrial.txt") || die ("Could not open file. $!"); #print $result->query_name, "\t"; # print $hit->significance, "\t"; print bigShot $hit->name, "-->"; print bigShot $hit->description, "\n"; #print bigShot "Query: ", $hsp->start('query'), " ", $hsp->query_string, " ", $hsp->end('query'), "\n"; print bigShot "Seq: ", $hsp->start('hit'), " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; # print $hsp->rank, "\t"; # print $hsp->percent_identity, "\t"; # print $hsp->evalue, "\t"; # print $hsp->hsp_length, "\n"; close (bigShot); }; } } } } } } closedir(DIR); Chris Fields wrote: >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not >just the modules you want; mixing bioperl versions might work, but you might >run into interoperability problems). Then replace the Bio::SearchIO::blast >with the one in Bugzilla. The 'other option' you mentioned might be trying >XML instead of text, which is more stable in the long run. You will still >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read >this: > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > >If you're using SearchIO directly instead of Remoteblast, you should be able >to set the '-readmethod' flag to 'blastxml'. > >It also wouldn't hurt to know what OS you're using or see some code. Roger >is out there somewhere (I think) and may also have some input. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>Sent: Wednesday, February 08, 2006 3:41 PM >>To: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>hi chris, >>thanks, I have upgraded to version 1.5.1 but it isn't still >>working, do you have any ohter idea, the problem I have is >>that I have to parse a lot of textfiles.... >>or shall I look for another option to parse those files... >> >>regards >>Hubert >> >> >> >>Chris Fields wrote: >> >> >> >>>My guess is you're running into text parsing problems in >>>Bio::SearchIO::blast. Upgrade to the latest developer >>> >>> >>version (1.5.1) >> >> >>>or bioperl-live (CVS), then see the bug below. >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>I think the first problem you ran into is solved in bioperl >>> >>> >>1.5.1, the >> >> >>>last problem (more recent, not related to the first) has >>> >>> >>been fixed but >> >> >>>hasn't been committed to bioperl-live yet. The fixed >>> >>> >>SearchIO::blast >> >> >>>is available in the link above, but realize it hasn't been >>> >>> >>committed yet and may change. >> >> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>To: bioperl-l at bioperl.org >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> >>>> >>parsing Blast >> >> >>>>output >>>> >>>>Hi, >>>>If I want to parse a Blast Output (Version 2.2.12) with >>>> >>>> >>Bio::SearchIO, >> >> >>>>I get the following error message: >>>> >>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>STACK Bio::SearchIO::blast::next_result >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>STACK toplevel >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>is that a bug...... >>>> >>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>anything..... >>>>I'm using bioperl 1.4 >>>> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast >>>>Output (version 2.2.12), but I don't remember which bioperl >>>> >>>> >>version I >> >> >>>>had installed >>>> >>>>thanks in advance >>>> >>>>Hubert >>>> >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> >>> > > > > From injunjoel at hotmail.com Wed Feb 8 19:54:26 2006 From: injunjoel at hotmail.com (Joel Steele) Date: Wed, 08 Feb 2006 16:54:26 -0800 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput In-Reply-To: <43EA6F34.4090007@gmx.at> Message-ID: Greetings, Im not well versed in Bio::SearchIO but there are a few comments about your code that may or may not be relevant... first thing: =-=-=-=-=code snippet=-=-=-=-= #!/usr/bin/perl -w use strict; #save yourself the headaches and force yourself to write clean code. =-=-=-=-=code snippet=-=-=-=-= next thing: when you are reading the files from the directory you are not doing any sort of filtering as to what is returned. If you are on a Unix flavored system you may be getting the '.' and '..' entries from your readdir(DIR) call. I would suggest placing a grep in there somewhere to get only blast files. something like: =-=-=-=-=code snippet=-=-=-=-= #assuming the file extension for blast files is .bls #the -e and -f are filetests; you could probably get away with just #-f. Here is a link for reference on the filetests available in Perl. # # http://www.perlmonks.org/?node_id=370 my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR); closedir(DIR); #then proceed with your foreach but over @files_to_parse foreach my $file(@files_to_parse){ #do cool stuff here... } =-=-=-=-=code snippet=-=-=-=-= Hope that helps. -Joel Steele "The surest way to corrupt a youth is to instruct him to hold in higher regard those who think alike than those who think differently." -Nietzsche "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo >From: Hubert Prielinger >To: Chris Fields , bioperl-l at bioperl.org, >rahall2 at ualr.edu >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >Blastoutput >Date: Wed, 08 Feb 2006 16:22:44 -0600 >MIME-Version: 1.0 >Received: from newportal.open-bio.org ([209.59.5.172]) by >bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 >Feb 2006 15:21:55 -0800 >Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by >newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k18NKjCX009295;Wed, 8 >Feb 2006 18:20:53 -0500 >Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by >newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for >; Wed, 8 Feb 2006 18:20:43 -0500 >Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000 >Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >[136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 00:19:21 >+0100 >X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY= >X-Authenticated: #16854991 >User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >X-Accept-Language: en-us, en >References: <001201c62d03$703178c0$15327e82 at pyrimidine> >X-Y-GMX-Trusted: 0 >X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 >(newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 -0500 (EST) >X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Wed, 08 >Feb 2006 18:20:43 -0500 (EST) >X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >X-Scanned-By: MIMEDefang 2.52 >X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >X-BeenThere: bioperl-l at lists.open-bio.org >X-Mailman-Version: 2.1.7 >Precedence: list >List-Id: Bioperl Project Discussion List >List-Unsubscribe: >, >List-Archive: >List-Post: >List-Help: >List-Subscribe: >, >Errors-To: bioperl-l-bounces at lists.open-bio.org >Return-Path: bioperl-l-bounces at lists.open-bio.org >X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) >FILETIME=[7419CF20:01C62D06] > >hi, >I have installed from the following page: >http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. >I'm using only the SearchIO without remoteblast module, because I have >already all my Blast output files. >My operating system is fedora core 9. > >Code: > >#!/usr/bin/perl -w > >use Bio::SearchIO; > >print "start program\n"; >my $directory = >"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; >opendir(DIR, $directory) || die("Cannot open directory"); >print "opened directory\n"; > >foreach my $file (readdir(DIR)) { >print "read file\n"; > >my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > >my $cutoff_len = 10; > > > >#iterate over each query sequence >while (my $result = $search->next_result) { >print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > for ($hsp->hit_string) { > > > if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || >tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { > > # Print some tab-delimited data about this HSP > > open (bigShot, ">>BlastOutputTrial.txt") || >die ("Could not open file. $!"); > #print $result->query_name, "\t"; > ># print $hit->significance, "\t"; > print bigShot $hit->name, "-->"; > print bigShot $hit->description, "\n"; > #print bigShot "Query: ", >$hsp->start('query'), " ", $hsp->query_string, " ", >$hsp->end('query'), "\n"; > print bigShot "Seq: ", $hsp->start('hit'), >" ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; > ># print $hsp->rank, "\t"; ># print $hsp->percent_identity, "\t"; ># print $hsp->evalue, "\t"; ># print $hsp->hsp_length, "\n"; > > close (bigShot); > > }; > > > } > } > } > } >} > >} > >closedir(DIR); > > >Chris Fields wrote: > > >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live >(not > >just the modules you want; mixing bioperl versions might work, but you >might > >run into interoperability problems). Then replace the >Bio::SearchIO::blast > >with the one in Bugzilla. The 'other option' you mentioned might be >trying > >XML instead of text, which is more stable in the long run. You will >still > >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read > >this: > > > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > > > >If you're using SearchIO directly instead of Remoteblast, you should be >able > >to set the '-readmethod' flag to 'blastxml'. > > > >It also wouldn't hurt to know what OS you're using or see some code. >Roger > >is out there somewhere (I think) and may also have some input. > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > > > > > >>-----Original Message----- > >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] > >>Sent: Wednesday, February 08, 2006 3:41 PM > >>To: Chris Fields; bioperl-l at bioperl.org > >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>parsing Blast output > >> > >>hi chris, > >>thanks, I have upgraded to version 1.5.1 but it isn't still > >>working, do you have any ohter idea, the problem I have is > >>that I have to parse a lot of textfiles.... > >>or shall I look for another option to parse those files... > >> > >>regards > >>Hubert > >> > >> > >> > >>Chris Fields wrote: > >> > >> > >> > >>>My guess is you're running into text parsing problems in > >>>Bio::SearchIO::blast. Upgrade to the latest developer > >>> > >>> > >>version (1.5.1) > >> > >> > >>>or bioperl-live (CVS), then see the bug below. > >>> > >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>> > >>>I think the first problem you ran into is solved in bioperl > >>> > >>> > >>1.5.1, the > >> > >> > >>>last problem (more recent, not related to the first) has > >>> > >>> > >>been fixed but > >> > >> > >>>hasn't been committed to bioperl-live yet. The fixed > >>> > >>> > >>SearchIO::blast > >> > >> > >>>is available in the link above, but realize it hasn't been > >>> > >>> > >>committed yet and may change. > >> > >> > >>>Christopher Fields > >>>Postdoctoral Researcher - Switzer Lab > >>>Dept. of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> > >>> > >>>>-----Original Message----- > >>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>>>Prielinger > >>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>To: bioperl-l at bioperl.org > >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>> > >>>> > >>parsing Blast > >> > >> > >>>>output > >>>> > >>>>Hi, > >>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>> > >>>> > >>Bio::SearchIO, > >> > >> > >>>>I get the following error message: > >>>> > >>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>STACK Bio::SearchIO::blast::next_result > >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>STACK toplevel > >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>>> > >>>>is that a bug...... > >>>> > >>>>If I want to parse Blast Output (version 2.2.13), I don't get > >>>>anything..... > >>>>I'm using bioperl 1.4 > >>>> > >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast > >>>>Output (version 2.2.12), but I don't remember which bioperl > >>>> > >>>> > >>version I > >> > >> > >>>>had installed > >>>> > >>>>thanks in advance > >>>> > >>>>Hubert > >>>> > >>>> > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l at lists.open-bio.org > >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From saldroubi at yahoo.com Wed Feb 8 20:12:16 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Wed, 8 Feb 2006 17:12:16 -0800 (PST) Subject: [Bioperl-l] Documentation link? Message-ID: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com> All, Forgive me but I don't see the documentation link on the new website. I only see a link to the HOWTO's. I think I am looking for the Pdoc link. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From saldroubi at yahoo.com Wed Feb 8 20:24:23 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Wed, 8 Feb 2006 17:24:23 -0800 (PST) Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> All, Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? Please excuse my lack of knowledge as I am a new comer to bioinformatics. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From osborne1 at optonline.net Wed Feb 8 20:44:56 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 08 Feb 2006 20:44:56 -0500 Subject: [Bioperl-l] Documentation link? In-Reply-To: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com> Message-ID: Sam, http://bioperl.open-bio.org/wiki/Main_Page Look for the API Docs under "main links". Brian O. On 2/8/06 8:12 PM, "Sam Al-Droubi" wrote: > All, > > Forgive me but I don't see the documentation link on the new website. I > only see a link to the HOWTO's. I think I am looking for the Pdoc link. > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Wed Feb 8 21:54:39 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 09 Feb 2006 13:54:39 +1100 Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> References: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> Message-ID: <43EAAEEF.3000304@infotech.monash.edu.au> > Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > Please excuse my lack of knowledge as I am a new comer to bioinformatics. Use the Bio::Tools::SeqStats module. The PDoc documentation even has an example similar to what you want to do: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html --Torsten Seemann From cjfields at uiuc.edu Thu Feb 9 00:07:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 23:07:15 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput In-Reply-To: References: Message-ID: On Feb 8, 2006, at 6:54 PM, Joel Steele wrote: > Greetings, > Im not well versed in Bio::SearchIO but there are a few comments > about your > code that may or may not be relevant... > > first thing: > > =-=-=-=-=code snippet=-=-=-=-= > > #!/usr/bin/perl -w > use strict; #save yourself the headaches and force yourself to > write clean > code. > > =-=-=-=-=code snippet=-=-=-=-= > Tread very carefully here. Just about every book on perl suggests 'use strict' and adding warnings for code development (ex. the Camel, the Llama, and others); in fact, these are the very books most beginners start from. Some would consider NOT using -w or 'use strict' a bad habit; everybody has an opinion (I would repeat an oft- heard Texas saying, but I'll refrain). Just remember: try to be a little more constructive in your critique and insert a little less about your personal coding style. If you hit the wrong person, you might get flamed. Here's a link that may help a bit here: http://bioperl.org/Core/Latest/ biodesign.html#respect_people_s_code__in_particular_if_it_works_ > next thing: > when you are reading the files from the directory you are not doing > any sort > of filtering as to what is returned. If you are on a Unix flavored > system > you may be getting the '.' and '..' entries from your readdir(DIR) > call. I > would suggest placing a grep in there somewhere to get only blast > files. > something like: > I agree here. You could probably also use something like File::Find here to make things a bit easier with the file names as well; works wonderfully, esp. when traversing a directory tree. > =-=-=-=-=code snippet=-=-=-=-= > > #assuming the file extension for blast files is .bls > #the -e and -f are filetests; you could probably get away with just > #-f. Here is a link for reference on the filetests available in Perl. > # > # http://www.perlmonks.org/?node_id=370 > > my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR); > closedir(DIR); > > #then proceed with your foreach but over @files_to_parse > > foreach my $file(@files_to_parse){ > #do cool stuff here... > } > Again, agreed. But, does it really solve the main problem, which is an issue with SearchIO::blast? It seemed to try parsing a blast file... > =-=-=-=-=code snippet=-=-=-=-= > > Hope that helps. > -Joel Steele > > > "The surest way to corrupt a youth is to instruct him to hold in > higher > regard those who think alike than those who think differently." - > Nietzsche > > "I do not feel obliged to believe that the same God who endowed us > with > sense, reason and intellect has intended us to forego their use." - > Galileo > > > > >> From: Hubert Prielinger >> To: Chris Fields , bioperl-l at bioperl.org, >> rahall2 at ualr.edu >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >> Blastoutput >> Date: Wed, 08 Feb 2006 16:22:44 -0600 >> MIME-Version: 1.0 >> Received: from newportal.open-bio.org ([209.59.5.172]) by >> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC >> (6.0.3790.211); Wed, 8 >> Feb 2006 15:21:55 -0800 >> Received: from newportal.open-bio.org (localhost.localdomain >> [127.0.0.1])by >> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id >> k18NKjCX009295;Wed, 8 >> Feb 2006 18:20:53 -0500 >> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by >> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for >> ; Wed, 8 Feb 2006 18:20:43 -0500 >> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000 >> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 >> 00:19:21 >> +0100 >> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY= >> X-Authenticated: #16854991 >> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >> X-Accept-Language: en-us, en >> References: <001201c62d03$703178c0$15327e82 at pyrimidine> >> X-Y-GMX-Trusted: 0 >> X-Greylist: Sender IP whitelisted, not delayed by milter- >> greylist-2.0.2 >> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 >> -0500 (EST) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >> bymilter-greylist-2.0.2 (newportal.open-bio.org >> [207.154.17.70]);Wed, 08 >> Feb 2006 18:20:43 -0500 (EST) >> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >> X-Scanned-By: MIMEDefang 2.52 >> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >> X-BeenThere: bioperl-l at lists.open-bio.org >> X-Mailman-Version: 2.1.7 >> Precedence: list >> List-Id: Bioperl Project Discussion List > bio.org> >> List-Unsubscribe: >> > l>, >> List-Archive: >> List-Post: >> List-Help: >> List-Subscribe: >> > l>, >> Errors-To: bioperl-l-bounces at lists.open-bio.org >> Return-Path: bioperl-l-bounces at lists.open-bio.org >> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) >> FILETIME=[7419CF20:01C62D06] >> >> hi, >> I have installed from the following page: >> http://news.open-bio.org/archives/2005_10.html, the Core, Run and >> Ext. >> I'm using only the SearchIO without remoteblast module, because I >> have >> already all my Blast output files. >> My operating system is fedora core 9. >> >> Code: >> >> #!/usr/bin/perl -w >> >> use Bio::SearchIO; >> >> print "start program\n"; >> my $directory = >> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; >> opendir(DIR, $directory) || die("Cannot open directory"); >> print "opened directory\n"; >> >> foreach my $file (readdir(DIR)) { >> print "read file\n"; >> >> my $search = new Bio::SearchIO (-format => 'blast', >> -file => $file); >> >> my $cutoff_len = 10; >> >> >> >> #iterate over each query sequence >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> >> if ($hsp->length('sbjct') <= $cutoff_len) { >> #print $hsp->hit_string, "\n"; >> for ($hsp->hit_string) { >> >> >> if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || >> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { >> >> # Print some tab-delimited data about this >> HSP >> >> open (bigShot, >> ">>BlastOutputTrial.txt") || >> die ("Could not open file. $!"); >> #print $result->query_name, "\t"; >> >> # print $hit->significance, "\t"; >> print bigShot $hit->name, "-->"; >> print bigShot $hit->description, "\n"; >> #print bigShot "Query: ", >> $hsp->start('query'), " ", $hsp->query_string, " ", >> $hsp->end('query'), "\n"; >> print bigShot "Seq: ", $hsp->start >> ('hit'), >> " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; >> >> # print $hsp->rank, "\t"; >> # print $hsp->percent_identity, "\t"; >> # print $hsp->evalue, "\t"; >> # print $hsp->hsp_length, "\n"; >> >> close (bigShot); >> >> }; >> >> >> } >> } >> } >> } >> } >> >> } >> >> closedir(DIR); >> >> >> Chris Fields wrote: >> >>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- >>> live >> (not >>> just the modules you want; mixing bioperl versions might work, >>> but you >> might >>> run into interoperability problems). Then replace the >> Bio::SearchIO::blast >>> with the one in Bugzilla. The 'other option' you mentioned might be >> trying >>> XML instead of text, which is more stable in the long run. You will >> still >>> need to run a full upgrade to bioperl 1.5.1 for that; make sure >>> you read >>> this: >>> >>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast >>> >>> If you're using SearchIO directly instead of Remoteblast, you >>> should be >> able >>> to set the '-readmethod' flag to 'blastxml'. >>> >>> It also wouldn't hurt to know what OS you're using or see some code. >> Roger >>> is out there somewhere (I think) and may also have some input. >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>>> -----Original Message----- >>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>>> Sent: Wednesday, February 08, 2006 3:41 PM >>>> To: Chris Fields; bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> hi chris, >>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>> working, do you have any ohter idea, the problem I have is >>>> that I have to parse a lot of textfiles.... >>>> or shall I look for another option to parse those files... >>>> >>>> regards >>>> Hubert >>>> >>>> >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> My guess is you're running into text parsing problems in >>>>> Bio::SearchIO::blast. Upgrade to the latest developer >>>>> >>>>> >>>> version (1.5.1) >>>> >>>> >>>>> or bioperl-live (CVS), then see the bug below. >>>>> >>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>> I think the first problem you ran into is solved in bioperl >>>>> >>>>> >>>> 1.5.1, the >>>> >>>> >>>>> last problem (more recent, not related to the first) has >>>>> >>>>> >>>> been fixed but >>>> >>>> >>>>> hasn't been committed to bioperl-live yet. The fixed >>>>> >>>>> >>>> SearchIO::blast >>>> >>>> >>>>> is available in the link above, but realize it hasn't been >>>>> >>>>> >>>> committed yet and may change. >>>> >>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher - Switzer Lab >>>>> Dept. of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>> Prielinger >>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>> To: bioperl-l at bioperl.org >>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>> >>>>>> >>>> parsing Blast >>>> >>>> >>>>>> output >>>>>> >>>>>> Hi, >>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>> >>>>>> >>>> Bio::SearchIO, >>>> >>>> >>>>>> I get the following error message: >>>>>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>> Blast.pl:21 >>>>>> >>>>>> is that a bug...... >>>>>> >>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>> anything..... >>>>>> I'm using bioperl 1.4 >>>>>> >>>>>> before, I have installed bioperl 1.4, it worked fine parsing >>>>>> Blast >>>>>> Output (version 2.2.12), but I don't remember which bioperl >>>>>> >>>>>> >>>> version I >>>> >>>> >>>>>> had installed >>>>>> >>>>>> thanks in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Wed Feb 8 23:46:43 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 08 Feb 2006 23:46:43 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence Message-ID: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Does anyone know of tool to mutate a DNA sequence by a specified amount? For instance, say I have a DNA sequence 1000 bases long, and I want to simulate mutations to make it 75% (or 80%, etc) similar to the original. Ryan From torsten.seemann at infotech.monash.edu.au Thu Feb 9 06:15:28 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 09 Feb 2006 22:15:28 +1100 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <43EB2450.6000606@infotech.monash.edu.au> Ryan, > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. The EMBOSS suite comes with a tool called "msbar" which can controllably mutate sequences: http://emboss.sourceforge.net/apps/msbar.html -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From cjfields at uiuc.edu Thu Feb 9 11:16:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 10:16:28 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu> Message-ID: <001b01c62d94$2e8bee50$15327e82@pyrimidine> > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Thursday, February 09, 2006 9:13 AM > To: Hubert Prielinger > Cc: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > > hi chris, > > thanks, I have upgraded to version 1.5.1 but it isn't still > working, > > do you have any ohter idea, the problem I have is that I > have to parse > > a lot of textfiles.... > > or shall I look for another option to parse those files... > > > > regards > > Hubert > > > The code from Bioperl 1.5.1 works fine for me for blast > 2.2.13 reports but unless you post your blast report we can't > really determine the problem. > > If you are still getting the same error like this I am not > convinced you have upgraded to 1.5.1 which includes a fix in > the fact that NCBI changed the HSP result format to remove > the ':' from the Query/Sbjct prefixes. We fixed this as soon > as it was apparent sometime in September. > > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > If you are just getting no results but also no warnings wrt > parsing, are you sure your logic is correct? > > If you remove your filters do you see all the HSPS? > > > while (my $result = $search->next_result) { > print $result->query_name, "\n"; > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print $hit->name, "\n"; > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >hit_string, "\n"; > } > } > } I tested some of the BLAST results that Hubert sent Roger and me with a similar script to the above. I removed the file parsing logic and it seemed to work just fine. It may very well be a logic issue or that he hasn't installed the latest fix. It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even though the returned output was from nr, the top of the blast output showed that it was v2.2.12: BLASTP 2.2.12 [Aug-07-2005] I double-checked my local version and it's definitely v.2.2.13: ------------------------------------- C:\Perl\Scripts>blastcl3 - blastcl3 2.2.13 arguments:... ------------------------------------- If you use RemoteBlast using the same settings, the version in the header looks like this: BLASTP 2.2.13 [Nov-27-2005] I'm wondering if all the blast executables (blast and netblast) from NCBI have text output like v.2.2.12, while the wwwblast outputs a new format (2.2.13). I'll ask blast-help at NCBI about this. > > To clarify some stuff - > Chris I don't necessarily think the XML is best way forward > for BLAST reports generated locally, it isn't as detailed as > the Text format and it is what most people expect to be able > to scroll through and parse -- it is also harder for the > format to change dramatically if you have a static binary on > your machine =). I think for remoteblast the XML format > should be the way forward but I expect Bioperl to maintain > support of any plain text BLAST report format that people use > on a regular basis. > Does XML lack some specific info that text output has? Didn't know that. I believe that XML should be default in RemoteBlast since it will not break, but I agree with you about text output. I also agree that it will need somebody to maintain it constantly, much like RemoteBlast. > -jason > > > > > > Chris Fields wrote: > > > >> My guess is you're running into text parsing problems in > >> Bio::SearchIO::blast. Upgrade to the latest developer version > >> (1.5.1) or > >> bioperl-live (CVS), then see the bug below. > >> > >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >> > >> I think the first problem you ran into is solved in bioperl 1.5.1, > >> the last problem (more recent, not related to the first) has been > >> fixed but hasn't been committed to bioperl-live yet. The fixed > >> SearchIO::blast is available in the link above, but > realize it hasn't > >> been committed yet and may change. > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>> Prielinger > >>> Sent: Wednesday, February 08, 2006 2:52 PM > >>> To: bioperl-l at bioperl.org > >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>> output > >>> > >>> Hi, > >>> If I want to parse a Blast Output (Version 2.2.12) with > >>> Bio::SearchIO, I get the following error message: > >>> > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>> > >>> is that a bug...... > >>> > >>> If I want to parse Blast Output (version 2.2.13), I don't get > >>> anything..... > >>> I'm using bioperl 1.4 > >>> > >>> before, I have installed bioperl 1.4, it worked fine > parsing Blast > >>> Output (version 2.2.12), but I don't remember which > bioperl version > >>> I had installed > >>> > >>> thanks in advance > >>> > >>> Hubert > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 9 12:53:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 11:53:24 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <200602080853.58889.heikki@sanbi.ac.za> Message-ID: <000001c62da1$ba346ba0$15327e82@pyrimidine> Heikki, I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and two test data files to bugzilla. The first data file is needed for normal tests, the second is for testing parsing with modified data in the score tag (using sprintf() in the RNAMotif descriptor). I ran 'perl t\RNAMotif.t' and they all passed. Thanks! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Heikki Lehvaslaiho > Sent: Wednesday, February 08, 2006 12:54 AM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > Chris, > > Post your files to bugzilla (ticket type enhancement, add > files to ticket after creation) and someone with commit > ability will add them to CVS once the code is in satisfactory > condition. > > Thanks, > > -Heikki > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > I want to submit a module for parsing RNAMotif output > > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > > output and returning Bio::SeqFeature::Generic objects with > added tags > > for descriptors/sequences/file info. I'm in the process of > writing up > > tests and going through biodesign to make sure everything's kosher, > > but the module itself is essentially ready-to-go. What should I do > > next? > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Feb 9 10:13:09 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 10:13:09 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6570.9070909@gmx.at> References: <001101c62cfd$28605df0$15327e82@pyrimidine> <43EA6570.9070909@gmx.at> Message-ID: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi chris, > thanks, I have upgraded to version 1.5.1 but it isn't still > working, do > you have any ohter idea, the problem I have is that I have to parse a > lot of textfiles.... > or shall I look for another option to parse those files... > > regards > Hubert The code from Bioperl 1.5.1 works fine for me for blast 2.2.13 reports but unless you post your blast report we can't really determine the problem. If you are still getting the same error like this I am not convinced you have upgraded to 1.5.1 which includes a fix in the fact that NCBI changed the HSP result format to remove the ':' from the Query/Sbjct prefixes. We fixed this as soon as it was apparent sometime in September. >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 If you are just getting no results but also no warnings wrt parsing, are you sure your logic is correct? If you remove your filters do you see all the HSPS? while (my $result = $search->next_result) { print $result->query_name, "\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print $hit->name, "\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >hit_string, "\n"; } } } To clarify some stuff - Chris I don't necessarily think the XML is best way forward for BLAST reports generated locally, it isn't as detailed as the Text format and it is what most people expect to be able to scroll through and parse -- it is also harder for the format to change dramatically if you have a static binary on your machine =). I think for remoteblast the XML format should be the way forward but I expect Bioperl to maintain support of any plain text BLAST report format that people use on a regular basis. -jason > > > Chris Fields wrote: > >> My guess is you're running into text parsing problems in >> Bio::SearchIO::blast. Upgrade to the latest developer version >> (1.5.1) or >> bioperl-live (CVS), then see the bug below. >> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >> >> I think the first problem you ran into is solved in bioperl 1.5.1, >> the last >> problem (more recent, not related to the first) has been fixed but >> hasn't >> been committed to bioperl-live yet. The fixed SearchIO::blast is >> available >> in the link above, but realize it hasn't been committed yet and >> may change. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Hubert Prielinger >>> Sent: Wednesday, February 08, 2006 2:52 PM >>> To: bioperl-l at bioperl.org >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> Hi, >>> If I want to parse a Blast Output (Version 2.2.12) with >>> Bio::SearchIO, I get the following error message: >>> >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> is that a bug...... >>> >>> If I want to parse Blast Output (version 2.2.13), I don't get >>> anything..... >>> I'm using bioperl 1.4 >>> >>> before, I have installed bioperl 1.4, it worked fine parsing >>> Blast Output (version 2.2.12), but I don't remember which >>> bioperl version I had installed >>> >>> thanks in advance >>> >>> Hubert >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From barry.m.dancis at gsk.com Wed Feb 8 16:44:55 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Wed, 8 Feb 2006 16:44:55 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: <007701c62c37$7914af60$15327e82@pyrimidine> Message-ID: Hi Chris-- The problem I am solving is given a mature miRna name, how do I use it to search for its pre/pri miRna and vice versa. For example, how to go from mir-102a* to hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm hoping that someone else has already done it and has some bells and whistles to go with it. Below is a hierarchy chart of a data structure to hold the naming information. The parsing is not trivial and given data in that structure there could be all kinds of neat functions that return various aspects of the names. Barry "Chris Fields" Sent by: bioperl-l-bounces at lists.open-bio.org 07-Feb-2006 17:40 To barry.m.dancis at gsk.com, "'bioperl-l'" cc Subject Re: [Bioperl-l] Handling miRNA's Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then listen to Brian. If you are talking about output, then we need to know which program you're using, as a parser may exist or could be built. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 8775 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060208/7f5bee48/attachment-0001.gif From pmr at ebi.ac.uk Thu Feb 9 03:25:24 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 9 Feb 2006 08:25:24 -0000 (GMT) Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <2714.86.132.216.50.1139473524.squirrel@webmail.ebi.ac.uk> Ryan Golhar writes: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. EMBOSS has the msbar program ("mutate sequence beyond all recognition") which allows you to select the number and type of changes. With some tuning of options to match the sequence length you should be able to get results that match whatever your definition of 75% similar might be (amazing how much more similarity you can get by adding gaps in an alignment :-) If you can specify a clear and generally useful way to define what you need we could of course add a "percent change" option to the msbar program for a future release. Hope that helps, Peter From sofia at neuro.utah.edu Thu Feb 9 13:00:05 2006 From: sofia at neuro.utah.edu (Sofia Robb) Date: Thu, 09 Feb 2006 11:00:05 -0700 Subject: [Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace with large files Message-ID: <43EB8325.6050501@neuro.utah.edu> I am having trouble parsing large (2030 contigs) phrap.out and ace.1 files. I have no problem with a small files (1 contig). Here are the errors I get when try the code that is at the end of my email. My script fails on this line: my $assembly = $in->next_assembly; I think it may be something to do with BTREE in Collection.pm, but have been unable to correct my errors. ------- file with 2030 contigs Bio::Assembly::IO::ace Can't call method "get_dup" on an undefined value at /Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359, line 17699. line 17699 of my ace file is the last line of the record for Contig253 ------ file with 2030 contigs Bio::Assembly::IO::phrap Can't call method "put" on an undefined value at /Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225, line 39839. line 39839 of my phrap.out file is first line of the record for Contig253 ------ use Bio::Assembly::IO; my $filename = $ARGV[0]; my $in = Bio::Assembly::IO->new(-file=>"$filename", -format=>"phrap" #or -format=>"ace" for ace.1 files ); my $assembly = $in->next_assembly; my @contigs = $assembly->all_contigs(); foreach my $contig ($assembly->all_contigs){ my $id = $contig->id(); print "contig id = $id "; my $seqObj = $contig->get_consensus_sequence(); my $seq = $seqObj->seq(); print "is $seq\n"; } my $id = $assembly->id(); print "$id\n"; ----- Thanks for any input, Sofia Sofia Robb Molecular Biology Ph.D Program Sanchez Laboratory Department of Neurobiology and Anatomy University of Utah http://planaria.neuro.utah.edu From hubert.prielinger at gmx.at Thu Feb 9 12:32:39 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 11:32:39 -0600 Subject: [Bioperl-l] zip file In-Reply-To: References: <43EA75FF.7010504@gmx.at> Message-ID: <43EB7CB7.7040602@gmx.at> Hi Chris, It doesn't work with the simple input line either, but I have tried my script on the command line with the file scanning part and it is working, but it takes more than 10 minutes!!!!!!!!!!! for reading one file and it doesn't create the output file, so there is no output. Before I run the script in the eclipse IDE. I'm trying to upgrade to bioperl 1.5.1 once more, hopefully that's the problem, I have installed the from bioperl.org the core, run and ext part... the output as you got it is just fine, but nevertheless I need the script with the file scanning part, because I have a lot of them. to Roger: I have tried it with different files, but always the same result.....reads the files, but takes them a very long time and no Output result file Hubert Chris Fields wrote: > Hubert, > > I tried this script out it and it managed to parse your reports. I > removed the file scanning and replaced it with a simple arg line > input (i.e. script.pl blast_file). I attached one of the output files. > > Chris > > > > #!perl > > $file = shift @ARGV; > > use Bio::SearchIO; > my $cutoff_len = 10; > my $searchio = Bio::SearchIO->new( -format => 'blast', > -file => $file ); > while ( my $result = $searchio->next_result() ) { > while( my $hit = $result->next_hit ) { > while(my $hsp = $hit->next_hsp) { > if ($hsp->length('sbjct') <= $cutoff_len) { > for ($hsp->hit_string) { > if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || > tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { > #Print some tab-delimited data about this HSP > open (bigShot, ">>BlastOutputTrial.txt") || > die ("Could not open file. $!"); > #print $result->query_name, "\t"; > #print $hit->significance, "\t"; > print bigShot $hit->name, "-->"; > print bigShot $hit->description, "\n"; > print bigShot "Query: ", > $hsp->start('query'), " ", $hsp- > >query_string, " ", > $hsp->end('query'), "\n"; > print bigShot "Seq: ", $hsp->start('hit'), > " ", $hsp->hit_string, " ", > $hsp->end('hit'), "\n"; > # print $hsp->rank, "\t"; > # print $hsp->percent_identity, "\t"; > # print $hsp->evalue, "\t"; > # print $hsp->hsp_length, "\n"; > > close (bigShot); > > }; > > > } > } > } > } > } > >------------------------------------------------------------------------ > > > From heikki at sanbi.ac.za Thu Feb 9 09:54:30 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 16:54:30 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091654.30890.heikki@sanbi.ac.za> Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 06:31:20 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 13:31:20 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091331.21690.heikki@sanbi.ac.za> Ryan, Instructions in pseudo code: take the sequence string out of the object use a hash to store changed locations repeat pick a location in the string randomly if the location is not in a hash , i.e. changed already, change it into something else add the changed location into the hash if enough locations have been changed (scalar keys hash), exit loop put the sequence string back into the seq object -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 06:31:20 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 13:31:20 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091331.21690.heikki@sanbi.ac.za> Ryan, Instructions in pseudo code: take the sequence string out of the object use a hash to store changed locations repeat pick a location in the string randomly if the location is not in a hash , i.e. changed already, change it into something else add the changed location into the hash if enough locations have been changed (scalar keys hash), exit loop put the sequence string back into the seq object -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Thu Feb 9 14:10:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 14:10:54 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Depending on whether or not you want to use evolutionary realistic models... * evolver which comes with PAML lets you evolve sequences on a tree * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? id=seqgen also lets you do this I believe there are PISE interfaces to both of these at the pasteur bioweb site - http://bioweb.pasteur.fr/ -jason On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From heikki at sanbi.ac.za Thu Feb 9 09:54:30 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 16:54:30 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091654.30890.heikki@sanbi.ac.za> Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 14:41:33 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 21:41:33 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <000001c62da1$ba346ba0$15327e82@pyrimidine> References: <000001c62da1$ba346ba0$15327e82@pyrimidine> Message-ID: <200602092141.34401.heikki@sanbi.ac.za> Chris, I committed your file. All tests pass; code looks like written by a long term bioperl contributor! Impressive. I truncated the larger test file from 270K to 20K (200 lines), to not bloat the distribution unnecessarily. Tests pass which is the main thing. Shout if if you disagree. Great job! -Heikki On Thursday 09 February 2006 19:53, Chris Fields wrote: > Heikki, > > I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and > two test data files to bugzilla. The first data file is needed for normal > tests, the second is for testing parsing with modified data in the score > tag (using sprintf() in the RNAMotif descriptor). I ran 'perl > t\RNAMotif.t' and they all passed. > > Thanks! > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Heikki Lehvaslaiho > > Sent: Wednesday, February 08, 2006 12:54 AM > > To: bioperl-l at lists.open-bio.org > > Cc: Chris Fields > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > Chris, > > > > Post your files to bugzilla (ticket type enhancement, add > > files to ticket after creation) and someone with commit > > ability will add them to CVS once the code is in satisfactory > > condition. > > > > Thanks, > > > > -Heikki > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > I want to submit a module for parsing RNAMotif output > > > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > > > output and returning Bio::SeqFeature::Generic objects with > > > > added tags > > > > > for descriptors/sequences/file info. I'm in the process of > > > > writing up > > > > > tests and going through biodesign to make sure everything's kosher, > > > but the module itself is essentially ready-to-go. What should I do > > > next? > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hubert.prielinger at gmx.at Thu Feb 9 15:13:31 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 14:13:31 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Message-ID: <43EBA26B.4010907@gmx.at> dear roger, this error message I got, when I tried to parse Blast output (version 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot of Blast output files with version 2.2.13 and for that I don't get any error message.....it just doesn't work Hubert Roger Hall wrote: >Guys - I'm looking at the error message: > >MSG: no data for midline Query 1 WWWKWRW 7 >STACK Bio::SearchIO::blast::next_result >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >STACK toplevel >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >This is my line of thought: >1. "no data for midline $_" is a unique message generated by blast.pm in one >location only at the point of a. reading three lines b. dropping lines with >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) >2. There is a regexp match that fails in order to reach that error message >3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression >4. It does anyway >5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast >reports > >I suspect a newline/chomp/metacharacter issue. Not finding the string >anywhere has me thoroughly confused - I asked Hubert for the additional >file, assuming that I didn't have it. > >My next thought is to write a quick script to test perl behavior on "Fedora >Core 9". > >Thoughts? > >Did I misread the issue entirely? :} > >Roger > > >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields >Sent: Thursday, February 09, 2006 10:16 AM >To: 'Jason Stajich'; 'Hubert Prielinger' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast >output > > > > >>-----Original Message----- >>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>Sent: Thursday, February 09, 2006 9:13 AM >>To: Hubert Prielinger >>Cc: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> >>>hi chris, >>>thanks, I have upgraded to version 1.5.1 but it isn't still >>> >>> >>working, >> >> >>>do you have any ohter idea, the problem I have is that I >>> >>> >>have to parse >> >> >>>a lot of textfiles.... >>>or shall I look for another option to parse those files... >>> >>>regards >>>Hubert >>> >>> >>The code from Bioperl 1.5.1 works fine for me for blast >>2.2.13 reports but unless you post your blast report we can't >>really determine the problem. >> >>If you are still getting the same error like this I am not >>convinced you have upgraded to 1.5.1 which includes a fix in >>the fact that NCBI changed the HSP result format to remove >>the ':' from the Query/Sbjct prefixes. We fixed this as soon >>as it was apparent sometime in September. >> >> >> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>> >>>>> >>>>> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>If you are just getting no results but also no warnings wrt >>parsing, are you sure your logic is correct? >> >>If you remove your filters do you see all the HSPS? >> >> >>while (my $result = $search->next_result) { >> print $result->query_name, "\n"; >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> print $hit->name, "\n"; >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >> >hit_string, "\n"; >> } >> } >>} >> >> > >I tested some of the BLAST results that Hubert sent Roger and me with a >similar script to the above. I removed the file parsing logic and it seemed >to work just fine. It may very well be a logic issue or that he hasn't >installed the latest fix. > >It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even >though the returned output was from nr, the top of the blast output showed >that it was v2.2.12: > >BLASTP 2.2.12 [Aug-07-2005] > >I double-checked my local version and it's definitely v.2.2.13: >------------------------------------- >C:\Perl\Scripts>blastcl3 - > >blastcl3 2.2.13 arguments:... >------------------------------------- > >If you use RemoteBlast using the same settings, the version in the header >looks like this: > >BLASTP 2.2.13 [Nov-27-2005] > >I'm wondering if all the blast executables (blast and netblast) from NCBI >have text output like v.2.2.12, while the wwwblast outputs a new format >(2.2.13). I'll ask blast-help at NCBI about this. > > > >>To clarify some stuff - >>Chris I don't necessarily think the XML is best way forward >>for BLAST reports generated locally, it isn't as detailed as >>the Text format and it is what most people expect to be able >>to scroll through and parse -- it is also harder for the >>format to change dramatically if you have a static binary on >>your machine =). I think for remoteblast the XML format >>should be the way forward but I expect Bioperl to maintain >>support of any plain text BLAST report format that people use >>on a regular basis. >> >> >> > >Does XML lack some specific info that text output has? Didn't know that. I >believe that XML should be default in RemoteBlast since it will not break, >but I agree with you about text output. I also agree that it will need >somebody to maintain it constantly, much like RemoteBlast. > > > >>-jason >> >> >>>Chris Fields wrote: >>> >>> >>> >>>>My guess is you're running into text parsing problems in >>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>(1.5.1) or >>>>bioperl-live (CVS), then see the bug below. >>>> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>>I think the first problem you ran into is solved in bioperl 1.5.1, >>>>the last problem (more recent, not related to the first) has been >>>>fixed but hasn't been committed to bioperl-live yet. The fixed >>>>SearchIO::blast is available in the link above, but >>>> >>>> >>realize it hasn't >> >> >>>>been committed yet and may change. >>>> >>>>Christopher Fields >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>Prielinger >>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>To: bioperl-l at bioperl.org >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> >>>>> >>parsing Blast >> >> >>>>>output >>>>> >>>>>Hi, >>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>Bio::SearchIO, I get the following error message: >>>>> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>> >>>>> >>>>> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> >>>>>is that a bug...... >>>>> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>>anything..... >>>>>I'm using bioperl 1.4 >>>>> >>>>>before, I have installed bioperl 1.4, it worked fine >>>>> >>>>> >>parsing Blast >> >> >>>>>Output (version 2.2.12), but I don't remember which >>>>> >>>>> >>bioperl version >> >> >>>>>I had installed >>>>> >>>>>thanks in advance >>>>> >>>>>Hubert >>>>> >>>>> >>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>-- >>Jason Stajich >>Duke University >>http://www.duke.edu/~jes12 >> >> >> > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From rahall2 at ualr.edu Thu Feb 9 15:09:52 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Thu, 09 Feb 2006 14:09:52 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001b01c62d94$2e8bee50$15327e82@pyrimidine> Message-ID: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Guys - I'm looking at the error message: MSG: no data for midline Query 1 WWWKWRW 7 STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 STACK toplevel /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 This is my line of thought: 1. "no data for midline $_" is a unique message generated by blast.pm in one location only at the point of a. reading three lines b. dropping lines with spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) 2. There is a regexp match that fails in order to reach that error message 3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression 4. It does anyway 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast reports I suspect a newline/chomp/metacharacter issue. Not finding the string anywhere has me thoroughly confused - I asked Hubert for the additional file, assuming that I didn't have it. My next thought is to write a quick script to test perl behavior on "Fedora Core 9". Thoughts? Did I misread the issue entirely? :} Roger -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, February 09, 2006 10:16 AM To: 'Jason Stajich'; 'Hubert Prielinger' Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Thursday, February 09, 2006 9:13 AM > To: Hubert Prielinger > Cc: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > > hi chris, > > thanks, I have upgraded to version 1.5.1 but it isn't still > working, > > do you have any ohter idea, the problem I have is that I > have to parse > > a lot of textfiles.... > > or shall I look for another option to parse those files... > > > > regards > > Hubert > > > The code from Bioperl 1.5.1 works fine for me for blast > 2.2.13 reports but unless you post your blast report we can't > really determine the problem. > > If you are still getting the same error like this I am not > convinced you have upgraded to 1.5.1 which includes a fix in > the fact that NCBI changed the HSP result format to remove > the ':' from the Query/Sbjct prefixes. We fixed this as soon > as it was apparent sometime in September. > > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > If you are just getting no results but also no warnings wrt > parsing, are you sure your logic is correct? > > If you remove your filters do you see all the HSPS? > > > while (my $result = $search->next_result) { > print $result->query_name, "\n"; > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print $hit->name, "\n"; > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >hit_string, "\n"; > } > } > } I tested some of the BLAST results that Hubert sent Roger and me with a similar script to the above. I removed the file parsing logic and it seemed to work just fine. It may very well be a logic issue or that he hasn't installed the latest fix. It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even though the returned output was from nr, the top of the blast output showed that it was v2.2.12: BLASTP 2.2.12 [Aug-07-2005] I double-checked my local version and it's definitely v.2.2.13: ------------------------------------- C:\Perl\Scripts>blastcl3 - blastcl3 2.2.13 arguments:... ------------------------------------- If you use RemoteBlast using the same settings, the version in the header looks like this: BLASTP 2.2.13 [Nov-27-2005] I'm wondering if all the blast executables (blast and netblast) from NCBI have text output like v.2.2.12, while the wwwblast outputs a new format (2.2.13). I'll ask blast-help at NCBI about this. > > To clarify some stuff - > Chris I don't necessarily think the XML is best way forward > for BLAST reports generated locally, it isn't as detailed as > the Text format and it is what most people expect to be able > to scroll through and parse -- it is also harder for the > format to change dramatically if you have a static binary on > your machine =). I think for remoteblast the XML format > should be the way forward but I expect Bioperl to maintain > support of any plain text BLAST report format that people use > on a regular basis. > Does XML lack some specific info that text output has? Didn't know that. I believe that XML should be default in RemoteBlast since it will not break, but I agree with you about text output. I also agree that it will need somebody to maintain it constantly, much like RemoteBlast. > -jason > > > > > > Chris Fields wrote: > > > >> My guess is you're running into text parsing problems in > >> Bio::SearchIO::blast. Upgrade to the latest developer version > >> (1.5.1) or > >> bioperl-live (CVS), then see the bug below. > >> > >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >> > >> I think the first problem you ran into is solved in bioperl 1.5.1, > >> the last problem (more recent, not related to the first) has been > >> fixed but hasn't been committed to bioperl-live yet. The fixed > >> SearchIO::blast is available in the link above, but > realize it hasn't > >> been committed yet and may change. > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>> Prielinger > >>> Sent: Wednesday, February 08, 2006 2:52 PM > >>> To: bioperl-l at bioperl.org > >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>> output > >>> > >>> Hi, > >>> If I want to parse a Blast Output (Version 2.2.12) with > >>> Bio::SearchIO, I get the following error message: > >>> > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>> > >>> is that a bug...... > >>> > >>> If I want to parse Blast Output (version 2.2.13), I don't get > >>> anything..... > >>> I'm using bioperl 1.4 > >>> > >>> before, I have installed bioperl 1.4, it worked fine > parsing Blast > >>> Output (version 2.2.12), but I don't remember which > bioperl version > >>> I had installed > >>> > >>> thanks in advance > >>> > >>> Hubert > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Lalancettec at AGR.GC.CA Thu Feb 9 15:53:10 2006 From: Lalancettec at AGR.GC.CA (Lalancette, Claudia) Date: Thu, 9 Feb 2006 15:53:10 -0500 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? Message-ID: Greetings, I need to find a way to look for a specific restriction enzyme site in hundreds of sequences. Been looking at Bio::Restriction, but not sure if will work... Any suggestions? Thanks, Claudia From cjfields at uiuc.edu Thu Feb 9 16:25:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 15:25:01 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <200602092141.34401.heikki@sanbi.ac.za> Message-ID: <000901c62dbf$49bfae20$15327e82@pyrimidine> Thanks! I think, as long as the tests pass everything is fine with me. I may be submitting another module or two in the next few weeks; just depends on how much time I can spend on them. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] > Sent: Thursday, February 09, 2006 1:42 PM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > Chris, > > I committed your file. All tests pass; code looks like > written by a long term bioperl contributor! Impressive. > > I truncated the larger test file from 270K to 20K (200 > lines), to not bloat the distribution unnecessarily. Tests > pass which is the main thing. Shout if if you disagree. > > Great job! > > -Heikki > > > On Thursday 09 February 2006 19:53, Chris Fields wrote: > > Heikki, > > > > I've added the Bio::Tools::RNAMotif module with test suite > (24 tests) > > and two test data files to bugzilla. The first data file is needed > > for normal tests, the second is for testing parsing with > modified data > > in the score tag (using sprintf() in the RNAMotif > descriptor). I ran > > 'perl t\RNAMotif.t' and they all passed. > > > > Thanks! > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki > > > Lehvaslaiho > > > Sent: Wednesday, February 08, 2006 12:54 AM > > > To: bioperl-l at lists.open-bio.org > > > Cc: Chris Fields > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > > > Chris, > > > > > > Post your files to bugzilla (ticket type enhancement, add > files to > > > ticket after creation) and someone with commit ability will add > > > them to CVS once the code is in satisfactory condition. > > > > > > Thanks, > > > > > > -Heikki > > > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > > I want to submit a module for parsing RNAMotif output > > > > (Bio::Tools::RNAMotif). It is capable, at the moment, > of scanning > > > > output and returning Bio::SeqFeature::Generic objects with > > > > > > added tags > > > > > > > for descriptors/sequences/file info. I'm in the process of > > > > > > writing up > > > > > > > tests and going through biodesign to make sure everything's > > > > kosher, but the module itself is essentially ready-to-go. What > > > > should I do next? > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher > > > > Lab of Dr. Robert Switzer > > > > Dept of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ______ _/ > _/_____________________________________________________ > > > _/ _/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > > _/ _/ _/ SANBI, South African National > Bioinformatics Institute > > > _/ _/ _/ University of Western Cape, South Africa > > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > > ___ > > > _/_/_/_/_/________________________________________________________ > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ > _/_/_/_/_/________________________________________________________ From golharam at umdnj.edu Thu Feb 9 16:19:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 09 Feb 2006 16:19:46 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za> Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1> Thanks all. The responses I got were definitely more than helpful. FYI - I did initially look at msbar. I glanced over the "Number of times to perform mutation operations", which is what I was looking for. I'm looking to statistically test some simply scoring matrices. I think msbar will do. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Thursday, February 09, 2006 9:55 AM To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l'; emboss at emboss.open-bio.org Subject: Re: [Bioperl-l] Tool to mutate DNA sequence Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? For instance, say I have a DNA sequence 1000 bases long, and I > want to simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Thu Feb 9 16:19:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 09 Feb 2006 16:19:46 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za> Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1> Thanks all. The responses I got were definitely more than helpful. FYI - I did initially look at msbar. I glanced over the "Number of times to perform mutation operations", which is what I was looking for. I'm looking to statistically test some simply scoring matrices. I think msbar will do. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Thursday, February 09, 2006 9:55 AM To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l'; emboss at emboss.open-bio.org Subject: Re: [Bioperl-l] Tool to mutate DNA sequence Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? For instance, say I have a DNA sequence 1000 bases long, and I > want to simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From injunjoel at hotmail.com Thu Feb 9 16:33:45 2006 From: injunjoel at hotmail.com (Joel Steele) Date: Thu, 09 Feb 2006 13:33:45 -0800 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output In-Reply-To: <43EBA26B.4010907@gmx.at> Message-ID: Greetings again, Its the colon... observe. -=Code Snippet=- #!/usr/bin/perl -w use strict; #the string as reported from your error. my $string1 = 'Query 1 WWWKWRW 7'; #your string with a colon thrown in for testing. my $string2 = 'Query: 1 WWWKWRW 7'; foreach ($string1, $string2){ if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){ print "Match Found in $_\n"; print $1."\n"; print $2."\n"; print $3."\n"; print $4."\n"; print $5."\n"; }else{ print "no Match for $_\n"; } } -=End Code=- The Output -=Code Snippet=- no Match for Query 1 WWWKWRW 7 Match Found in Query: 1 WWWKWRW 7 Query: 1 Query 1 WWWKWRW 7 -=End Code=- Now I would suggest changing the regexp From: /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ To: /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ in SearchIO::Blast. General suggestion: Again I would like to suggest that everyone get use to using the strict pragma. Though it may not applicable to this particular problem it becomes essential if you wish progress in your use of Perl. It is a core module so there is nothing to download from CPAN. It helps with development and once your code can run without warnings and errors you can remove it. This is not a targeted attack as some may interpret it, rather a general FYI for those out there new to Perl or programming in general. Better to start learning the rules early before bad habits creep in. One more thing. There is a wonderfully supportive Perl community available to anyone who wants to join at PerlMonks.org check it out, who knows you may even catch a glimpse of Larry Wall while youre there. -Joel Steele "The surest way to corrupt a youth is to instruct him to hold in higher regard those who think alike than those who think differently." -Nietzsche "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo >From: Hubert Prielinger >To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields >, Jason Stajich >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >parsingBlast output >Date: Thu, 09 Feb 2006 14:13:31 -0600 >MIME-Version: 1.0 >Received: from newportal.open-bio.org ([209.59.5.172]) by >bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 >Feb 2006 13:14:17 -0800 >Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by >newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 >Feb 2006 16:10:49 -0500 >Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by >newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for >; Thu, 9 Feb 2006 16:09:59 -0500 >Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000 >Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 >+0100 >X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA= >X-Authenticated: #16854991 >User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >X-Accept-Language: en-us, en >References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL> >X-Y-GMX-Trusted: 0 >X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 >(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST) >X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 >Feb 2006 16:09:59 -0500 (EST) >X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >X-Scanned-By: MIMEDefang 2.52 >X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >X-BeenThere: bioperl-l at lists.open-bio.org >X-Mailman-Version: 2.1.7 >Precedence: list >List-Id: Bioperl Project Discussion List >List-Unsubscribe: >, >List-Archive: >List-Post: >List-Help: >List-Subscribe: >, >Errors-To: bioperl-l-bounces at lists.open-bio.org >Return-Path: bioperl-l-bounces at lists.open-bio.org >X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) >FILETIME=[C95D94A0:01C62DBD] > >dear roger, >this error message I got, when I tried to parse Blast output (version >2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot >of Blast output files >with version 2.2.13 and for that I don't get any error message.....it >just doesn't work > >Hubert > > > >Roger Hall wrote: > > >Guys - I'm looking at the error message: > > > >MSG: no data for midline Query 1 WWWKWRW 7 > >STACK Bio::SearchIO::blast::next_result > >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >STACK toplevel > >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > > >This is my line of thought: > >1. "no data for midline $_" is a unique message generated by blast.pm in >one > >location only at the point of a. reading three lines b. dropping lines >with > >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < >3) > >2. There is a regexp match that fails in order to reach that error >message > >3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression > >4. It does anyway > >5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast > >reports > > > >I suspect a newline/chomp/metacharacter issue. Not finding the string > >anywhere has me thoroughly confused - I asked Hubert for the additional > >file, assuming that I didn't have it. > > > >My next thought is to write a quick script to test perl behavior on >"Fedora > >Core 9". > > > >Thoughts? > > > >Did I misread the issue entirely? :} > > > >Roger > > > > > >-----Original Message----- > >From: bioperl-l-bounces at lists.open-bio.org > >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > >Sent: Thursday, February 09, 2006 10:16 AM > >To: 'Jason Stajich'; 'Hubert Prielinger' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast > >output > > > > > > > > > >>-----Original Message----- > >>From: Jason Stajich [mailto:jason.stajich at duke.edu] > >>Sent: Thursday, February 09, 2006 9:13 AM > >>To: Hubert Prielinger > >>Cc: Chris Fields; bioperl-l at bioperl.org > >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>parsing Blast output > >> > >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > >> > >> > >>>hi chris, > >>>thanks, I have upgraded to version 1.5.1 but it isn't still > >>> > >>> > >>working, > >> > >> > >>>do you have any ohter idea, the problem I have is that I > >>> > >>> > >>have to parse > >> > >> > >>>a lot of textfiles.... > >>>or shall I look for another option to parse those files... > >>> > >>>regards > >>>Hubert > >>> > >>> > >>The code from Bioperl 1.5.1 works fine for me for blast > >>2.2.13 reports but unless you post your blast report we can't > >>really determine the problem. > >> > >>If you are still getting the same error like this I am not > >>convinced you have upgraded to 1.5.1 which includes a fix in > >>the fact that NCBI changed the HSP result format to remove > >>the ':' from the Query/Sbjct prefixes. We fixed this as soon > >>as it was apparent sometime in September. > >> > >> > >> > >>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>STACK Bio::SearchIO::blast::next_result > >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>STACK toplevel > >>>>> > >>>>> > >>>>> > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >>If you are just getting no results but also no warnings wrt > >>parsing, are you sure your logic is correct? > >> > >>If you remove your filters do you see all the HSPS? > >> > >> > >>while (my $result = $search->next_result) { > >> print $result->query_name, "\n"; > >> #iterate over each hit on the query sequence > >> while (my $hit = $result->next_hit) { > >> print $hit->name, "\n"; > >> #iterate over each HSP in the hit > >> while (my $hsp = $hit->next_hsp) { > >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >> >hit_string, "\n"; > >> } > >> } > >>} > >> > >> > > > >I tested some of the BLAST results that Hubert sent Roger and me with a > >similar script to the above. I removed the file parsing logic and it >seemed > >to work just fine. It may very well be a logic issue or that he hasn't > >installed the latest fix. > > > >It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), >even > >though the returned output was from nr, the top of the blast output >showed > >that it was v2.2.12: > > > >BLASTP 2.2.12 [Aug-07-2005] > > > >I double-checked my local version and it's definitely v.2.2.13: > >------------------------------------- > >C:\Perl\Scripts>blastcl3 - > > > >blastcl3 2.2.13 arguments:... > >------------------------------------- > > > >If you use RemoteBlast using the same settings, the version in the header > >looks like this: > > > >BLASTP 2.2.13 [Nov-27-2005] > > > >I'm wondering if all the blast executables (blast and netblast) from NCBI > >have text output like v.2.2.12, while the wwwblast outputs a new format > >(2.2.13). I'll ask blast-help at NCBI about this. > > > > > > > >>To clarify some stuff - > >>Chris I don't necessarily think the XML is best way forward > >>for BLAST reports generated locally, it isn't as detailed as > >>the Text format and it is what most people expect to be able > >>to scroll through and parse -- it is also harder for the > >>format to change dramatically if you have a static binary on > >>your machine =). I think for remoteblast the XML format > >>should be the way forward but I expect Bioperl to maintain > >>support of any plain text BLAST report format that people use > >>on a regular basis. > >> > >> > >> > > > >Does XML lack some specific info that text output has? Didn't know that. > I > >believe that XML should be default in RemoteBlast since it will not >break, > >but I agree with you about text output. I also agree that it will need > >somebody to maintain it constantly, much like RemoteBlast. > > > > > > > >>-jason > >> > >> > >>>Chris Fields wrote: > >>> > >>> > >>> > >>>>My guess is you're running into text parsing problems in > >>>>Bio::SearchIO::blast. Upgrade to the latest developer version > >>>>(1.5.1) or > >>>>bioperl-live (CVS), then see the bug below. > >>>> > >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>> > >>>>I think the first problem you ran into is solved in bioperl 1.5.1, > >>>>the last problem (more recent, not related to the first) has been > >>>>fixed but hasn't been committed to bioperl-live yet. The fixed > >>>>SearchIO::blast is available in the link above, but > >>>> > >>>> > >>realize it hasn't > >> > >> > >>>>been committed yet and may change. > >>>> > >>>>Christopher Fields > >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>University of Illinois Urbana-Champaign > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>-----Original Message----- > >>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>>>>Prielinger > >>>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>>To: bioperl-l at bioperl.org > >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>> > >>>>> > >>parsing Blast > >> > >> > >>>>>output > >>>>> > >>>>>Hi, > >>>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>>>Bio::SearchIO, I get the following error message: > >>>>> > >>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>STACK Bio::SearchIO::blast::next_result > >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>STACK toplevel > >>>>> > >>>>> > >>>>> > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >> > >>>>>is that a bug...... > >>>>> > >>>>>If I want to parse Blast Output (version 2.2.13), I don't get > >>>>>anything..... > >>>>>I'm using bioperl 1.4 > >>>>> > >>>>>before, I have installed bioperl 1.4, it worked fine > >>>>> > >>>>> > >>parsing Blast > >> > >> > >>>>>Output (version 2.2.12), but I don't remember which > >>>>> > >>>>> > >>bioperl version > >> > >> > >>>>>I had installed > >>>>> > >>>>>thanks in advance > >>>>> > >>>>>Hubert > >>>>> > >>>>> > >>>>> > >>>>>_______________________________________________ > >>>>>Bioperl-l mailing list > >>>>>Bioperl-l at lists.open-bio.org > >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>-- > >>Jason Stajich > >>Duke University > >>http://www.duke.edu/~jes12 > >> > >> > >> > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Feb 9 17:13:16 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 17:13:16 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output In-Reply-To: References: Message-ID: Uh, that was done in sept see the CVS log... On Feb 9, 2006, at 4:33 PM, Joel Steele wrote: > Greetings again, > Its the colon... > observe. > > -=Code Snippet=- > #!/usr/bin/perl -w > use strict; > > #the string as reported from your error. > my $string1 = 'Query 1 WWWKWRW 7'; > > #your string with a colon thrown in for testing. > my $string2 = 'Query: 1 WWWKWRW 7'; > > foreach ($string1, $string2){ > if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){ > print "Match Found in $_\n"; > print $1."\n"; > print $2."\n"; > print $3."\n"; > print $4."\n"; > print $5."\n"; > }else{ > print "no Match for $_\n"; > } > } > > -=End Code=- > > The Output > > -=Code Snippet=- > no Match for Query 1 WWWKWRW 7 > Match Found in Query: 1 WWWKWRW 7 > Query: 1 > Query > 1 > WWWKWRW > 7 > > -=End Code=- > > > Now I would suggest changing the regexp > > From: > /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ > > To: > /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ > > in SearchIO::Blast. > > General suggestion: > Again I would like to suggest that everyone get use to using the > strict > pragma. Though it may not applicable to this particular problem it > becomes > essential if you wish progress in your use of Perl. > It is a core module so there is nothing to download from CPAN. It > helps with > development and once your code can run without warnings and errors > you can > remove it. This is not a targeted attack as some may interpret it, > rather a > general FYI for those out there new to Perl or programming in general. > Better to start learning the rules early before bad habits creep in. > One more thing. There is a wonderfully supportive Perl community > available > to anyone who wants to join at PerlMonks.org check it out, who > knows you may > even catch a glimpse of Larry Wall while youre there. > > -Joel Steele > > "The surest way to corrupt a youth is to instruct him to hold in > higher > regard those who think alike than those who think differently." - > Nietzsche > > "I do not feel obliged to believe that the same God who endowed us > with > sense, reason and intellect has intended us to forego their use." - > Galileo > > > > >> From: Hubert Prielinger >> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields >> , Jason Stajich >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >> parsingBlast output >> Date: Thu, 09 Feb 2006 14:13:31 -0600 >> MIME-Version: 1.0 >> Received: from newportal.open-bio.org ([209.59.5.172]) by >> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); >> Thu, 9 >> Feb 2006 13:14:17 -0800 >> Received: from newportal.open-bio.org (localhost.localdomain >> [127.0.0.1])by >> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id >> k19LAD2j009778;Thu, 9 >> Feb 2006 16:10:49 -0500 >> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by >> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for >> ; Thu, 9 Feb 2006 16:09:59 -0500 >> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000 >> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 >> 22:10:05 >> +0100 >> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA= >> X-Authenticated: #16854991 >> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >> X-Accept-Language: en-us, en >> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL> >> X-Y-GMX-Trusted: 0 >> X-Greylist: Sender IP whitelisted, not delayed by milter- >> greylist-2.0.2 >> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 >> -0500 (EST) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >> bymilter-greylist-2.0.2 (newportal.open-bio.org >> [207.154.17.70]);Thu, 09 >> Feb 2006 16:09:59 -0500 (EST) >> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >> X-Scanned-By: MIMEDefang 2.52 >> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >> X-BeenThere: bioperl-l at lists.open-bio.org >> X-Mailman-Version: 2.1.7 >> Precedence: list >> List-Id: Bioperl Project Discussion List > bio.org> >> List-Unsubscribe: >> > l>, >> List-Archive: >> List-Post: >> List-Help: >> List-Subscribe: >> > l>, >> Errors-To: bioperl-l-bounces at lists.open-bio.org >> Return-Path: bioperl-l-bounces at lists.open-bio.org >> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) >> FILETIME=[C95D94A0:01C62DBD] >> >> dear roger, >> this error message I got, when I tried to parse Blast output (version >> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >> a lot >> of Blast output files >> with version 2.2.13 and for that I don't get any error message.....it >> just doesn't work >> >> Hubert >> >> >> >> Roger Hall wrote: >> >>> Guys - I'm looking at the error message: >>> >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> This is my line of thought: >>> 1. "no data for midline $_" is a unique message generated by >>> blast.pm in >> one >>> location only at the point of a. reading three lines b. dropping >>> lines >> with >>> spaces only c. identifying the Query, Midline, and Match lines (0 >>> <= $i < >> 3) >>> 2. There is a regexp match that fails in order to reach that error >> message >>> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> expression >>> 4. It does anyway >>> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in >>> the blast >>> reports >>> >>> I suspect a newline/chomp/metacharacter issue. Not finding the >>> string >>> anywhere has me thoroughly confused - I asked Hubert for the >>> additional >>> file, assuming that I didn't have it. >>> >>> My next thought is to write a quick script to test perl behavior on >> "Fedora >>> Core 9". >>> >>> Thoughts? >>> >>> Did I misread the issue entirely? :} >>> >>> Roger >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >>> Fields >>> Sent: Thursday, February 09, 2006 10:16 AM >>> To: 'Jason Stajich'; 'Hubert Prielinger' >>> Cc: bioperl-l at bioperl.org >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast >>> output >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>> Sent: Thursday, February 09, 2006 9:13 AM >>>> To: Hubert Prielinger >>>> Cc: Chris Fields; bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>> >>>> >>>>> hi chris, >>>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>>> >>>>> >>>> working, >>>> >>>> >>>>> do you have any ohter idea, the problem I have is that I >>>>> >>>>> >>>> have to parse >>>> >>>> >>>>> a lot of textfiles.... >>>>> or shall I look for another option to parse those files... >>>>> >>>>> regards >>>>> Hubert >>>>> >>>>> >>>> The code from Bioperl 1.5.1 works fine for me for blast >>>> 2.2.13 reports but unless you post your blast report we can't >>>> really determine the problem. >>>> >>>> If you are still getting the same error like this I am not >>>> convinced you have upgraded to 1.5.1 which includes a fix in >>>> the fact that NCBI changed the HSP result format to remove >>>> the ':' from the Query/Sbjct prefixes. We fixed this as soon >>>> as it was apparent sometime in September. >>>> >>>> >>>> >>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>> STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> If you are just getting no results but also no warnings wrt >>>> parsing, are you sure your logic is correct? >>>> >>>> If you remove your filters do you see all the HSPS? >>>> >>>> >>>> while (my $result = $search->next_result) { >>>> print $result->query_name, "\n"; >>>> #iterate over each hit on the query sequence >>>> while (my $hit = $result->next_hit) { >>>> print $hit->name, "\n"; >>>> #iterate over each HSP in the hit >>>> while (my $hsp = $hit->next_hsp) { >>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>> hit_string, "\n"; >>>> } >>>> } >>>> } >>>> >>>> >>> >>> I tested some of the BLAST results that Hubert sent Roger and me >>> with a >>> similar script to the above. I removed the file parsing logic >>> and it >> seemed >>> to work just fine. It may very well be a logic issue or that he >>> hasn't >>> installed the latest fix. >>> >>> It's a funny thing, though. When I tried using blastcl3 (v. >>> 2.2.13), >> even >>> though the returned output was from nr, the top of the blast output >> showed >>> that it was v2.2.12: >>> >>> BLASTP 2.2.12 [Aug-07-2005] >>> >>> I double-checked my local version and it's definitely v.2.2.13: >>> ------------------------------------- >>> C:\Perl\Scripts>blastcl3 - >>> >>> blastcl3 2.2.13 arguments:... >>> ------------------------------------- >>> >>> If you use RemoteBlast using the same settings, the version in >>> the header >>> looks like this: >>> >>> BLASTP 2.2.13 [Nov-27-2005] >>> >>> I'm wondering if all the blast executables (blast and netblast) >>> from NCBI >>> have text output like v.2.2.12, while the wwwblast outputs a new >>> format >>> (2.2.13). I'll ask blast-help at NCBI about this. >>> >>> >>> >>>> To clarify some stuff - >>>> Chris I don't necessarily think the XML is best way forward >>>> for BLAST reports generated locally, it isn't as detailed as >>>> the Text format and it is what most people expect to be able >>>> to scroll through and parse -- it is also harder for the >>>> format to change dramatically if you have a static binary on >>>> your machine =). I think for remoteblast the XML format >>>> should be the way forward but I expect Bioperl to maintain >>>> support of any plain text BLAST report format that people use >>>> on a regular basis. >>>> >>>> >>>> >>> >>> Does XML lack some specific info that text output has? Didn't >>> know that. >> I >>> believe that XML should be default in RemoteBlast since it will not >> break, >>> but I agree with you about text output. I also agree that it >>> will need >>> somebody to maintain it constantly, much like RemoteBlast. >>> >>> >>> >>>> -jason >>>> >>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>> >>>>>> My guess is you're running into text parsing problems in >>>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>> (1.5.1) or >>>>>> bioperl-live (CVS), then see the bug below. >>>>>> >>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>> >>>>>> I think the first problem you ran into is solved in bioperl >>>>>> 1.5.1, >>>>>> the last problem (more recent, not related to the first) has been >>>>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>>>> SearchIO::blast is available in the link above, but >>>>>> >>>>>> >>>> realize it hasn't >>>> >>>> >>>>>> been committed yet and may change. >>>>>> >>>>>> Christopher Fields >>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> Hubert >>>>>>> Prielinger >>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>> To: bioperl-l at bioperl.org >>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>> >>>>>>> >>>> parsing Blast >>>> >>>> >>>>>>> output >>>>>>> >>>>>>> Hi, >>>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>>> Bio::SearchIO, I get the following error message: >>>>>>> >>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>> STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> >>>>>>> is that a bug...... >>>>>>> >>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>>> anything..... >>>>>>> I'm using bioperl 1.4 >>>>>>> >>>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>>> >>>>>>> >>>> parsing Blast >>>> >>>> >>>>>>> Output (version 2.2.12), but I don't remember which >>>>>>> >>>>>>> >>>> bioperl version >>>> >>>> >>>>>>> I had installed >>>>>>> >>>>>>> thanks in advance >>>>>>> >>>>>>> Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From boris.steipe at utoronto.ca Thu Feb 9 16:54:53 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 9 Feb 2006 16:54:53 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Message-ID: <1B7E8DA9-86F5-4411-B16C-E6573E5E8C36@utoronto.ca> Golf, anyone? #!/usr/bin/perl -nl for(split//){push at a,$_} END{ while($n/@a<0.5) { $p=rand(@a); if($a[$p]=~/[A-Z]/){$a[$p]=lc((grep!/$a[$p]/,split//,"ACGT")[rand (3)]); $n++; } } print @a; } (144, not counting \s and the # !line ) :-) B. >> Does anyone know of tool to mutate a DNA sequence by a specified >> amount? >> For instance, say I have a DNA sequence 1000 bases long, and I >> want to >> simulate mutations to make it 75% (or 80%, etc) similar to the >> original. >> >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu Feb 9 17:20:46 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 16:20:46 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <000e01c62dca$bc66df60$15327e82@pyrimidine> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> Message-ID: <43EBC03E.4040900@gmx.at> Hi Chris, I'm incredibly sorry for causing so much inconvenience, yes you are right, I had only to change the blast.pm file, it is working very fine, thank you very much, and you are right, you have mentioned it ealier either to change the file... ;) but I have another question: does it work with the WU-Blast output too? regards Hubert Chris Fields wrote: >Ha! I come back from meeting and there's a billion emails! What have we >started? ;p . Sorry about this Jason; I know you're busy. > >Hubert, if you're out there, I sent you an email with an attachment. You >said the output looks like what you were expecting. So I think we have two >problems: > >1) I haven't delved into the file scanning, but the fact that it takes so >long should tell you something's seriously wrong there. Strip that part out >and start with a simple script, say, like the one Jason or that I sent you; >the script I used to generate that output works fine (on two OS's, WinXP and >Mac OS X). Use it on one file at a time. Do everything on command line >(not through Eclipse). IDE's can be notoriously flaky about running >scripts, esp. when they run debugging. > >2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still >not work whenever the text blast output has the following header, which >comes from the new web version of BLAST: > >----------------------------------------------------- >BLASTP 2.2.13 [Nov-27-2005] >Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, >Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >(1997), "Gapped BLAST and PSI-BLAST: a new generation of >protein database search programs", Nucleic Acids Res. 25:3389-3402. > >RID: 1139501210-857-165793005128.BLASTQ1 > > >Database: All non-redundant GenBank CDS >translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 3,292,813 sequences; 1,128,164,434 total letters >Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >tuberculosis >H37Rv]. >Length=193 >....... >----------------------------------------------------- > >It will work if the text output has the following header (or is an older >version of BLAST): > >----------------------------------------------------- >BLASTP 2.2.12 [Aug-07-2005] > > >Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >"Gapped BLAST and PSI-BLAST: a new generation of protein database search >programs", Nucleic Acids Res. 25:3389-3402. > >Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >tuberculosis H37Rv]. > (193 letters) > >Database: All non-redundant GenBank CDS >translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 2,895,325 sequences; 997,103,285 total letters >----------------------------------------------------- >You have the former (2.2.13) version. I know b/c I have your BLAST files. >Therefore, even bioperl-1.5.1 will not work! > >If you want the really gory details on why this is a problem, look here: > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >So, any text output with the above header will not work; it will either hang >or end abruptly (depending on OS, perl version, memory, patience). If you >look in the above, I have added a preliminary fix for this. I'll reiterate >for the billionth time, it hasn't been committed yet, so don't kill me if >blows your computer up ;> > >Here's the direct link: >http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >This is a modified version of Bio::SearchIO::blast.pm (it says it's version >1.90, but it's lying, I didn't change the version, only the regex; sorry >Jason). From what you've been posting it doesn't sound like you've tried >this, and I believe I've suggested this fix before. > >Replace the one in your Bio/SearchIO directory (which looks like >'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev. >message) with this file. Make sure the filename stays the same (blast.pm). > >Run everything again, one file at a time. Make sure you use Jason's script >as well as the one I sent you. Do NOT rely on running through multiple >files yet. Fix one bug at a time. And heed Joel's words about file checks. > > >Here's a small chunk of output from one of your blast files using the >modifed script I sent you: > >sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >Query: 1 RWKWKRKK 8 >Seq: 542 RWAWRRKK 549 > >Look familiar? > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Roger Hall [mailto:rahall2 at ualr.edu] >>Sent: Thursday, February 09, 2006 3:24 PM >>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>In other words, yes, I'm on the wrong trail. :} >> >>Sorry - I'll look at the output issue this evening (or >>realize that Chris already solved the issue). ;} >> >>Thanks! >> >>Roger >> >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Hubert Prielinger >>Sent: Thursday, February 09, 2006 2:14 PM >>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; >>Jason Stajich >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>dear roger, >>this error message I got, when I tried to parse Blast output (version >>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I >>have a lot of Blast output files with version 2.2.13 and for >>that I don't get any error message.....it just doesn't work >> >>Hubert >> >> >> >>Roger Hall wrote: >> >> >> >>>Guys - I'm looking at the error message: >>> >>>MSG: no data for midline Query 1 WWWKWRW 7 >>>STACK Bio::SearchIO::blast::next_result >>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>STACK toplevel >>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>>This is my line of thought: >>>1. "no data for midline $_" is a unique message generated by >>> >>> >>blast.pm >> >> >>>in >>> >>> >>one >> >> >>>location only at the point of a. reading three lines b. >>> >>> >>dropping lines >> >> >>>with spaces only c. identifying the Query, Midline, and >>> >>> >>Match lines (0 >> >> >>><= $i < >>> >>> >>3) >> >> >>>2. There is a regexp match that fails in order to reach that >>> >>> >>error message >> >> >>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> >>> >>expression >> >> >>>4. It does anyway >>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>> >>> >>in the blast >> >> >>>reports >>> >>>I suspect a newline/chomp/metacharacter issue. Not finding >>> >>> >>the string >> >> >>>anywhere has me thoroughly confused - I asked Hubert for the >>> >>> >>additional >> >> >>>file, assuming that I didn't have it. >>> >>>My next thought is to write a quick script to test perl behavior on >>>"Fedora Core 9". >>> >>>Thoughts? >>> >>>Did I misread the issue entirely? :} >>> >>>Roger >>> >>> >>>-----Original Message----- >>>From: bioperl-l-bounces at lists.open-bio.org >>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> >>> >>Chris Fields >> >> >>>Sent: Thursday, February 09, 2006 10:16 AM >>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>Cc: bioperl-l at bioperl.org >>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >>>Blast output >>> >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>To: Hubert Prielinger >>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >>>>Blast output >>>> >>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>> >>>> >>>> >>>> >>>>>hi chris, >>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>> >>>>> >>>>> >>>>> >>>>working, >>>> >>>> >>>> >>>> >>>>>do you have any ohter idea, the problem I have is that I >>>>> >>>>> >>>>> >>>>> >>>>have to parse >>>> >>>> >>>> >>>> >>>>>a lot of textfiles.... >>>>>or shall I look for another option to parse those files... >>>>> >>>>>regards >>>>>Hubert >>>>> >>>>> >>>>> >>>>> >>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>2.2.13 reports but unless you post your blast report we >>>> >>>> >>can't really >> >> >>>>determine the problem. >>>> >>>>If you are still getting the same error like this I am not >>>> >>>> >>convinced >> >> >>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>> >>>> >>that NCBI >> >> >>>>changed the HSP result format to remove the ':' from the >>>> >>>> >>Query/Sbjct >> >> >>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>September. >>>> >>>> >>>> >>>> >>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>If you are just getting no results but also no warnings wrt >>>> >>>> >>parsing, >> >> >>>>are you sure your logic is correct? >>>> >>>>If you remove your filters do you see all the HSPS? >>>> >>>> >>>>while (my $result = $search->next_result) { >>>> print $result->query_name, "\n"; >>>> #iterate over each hit on the query sequence >>>> while (my $hit = $result->next_hit) { >>>> print $hit->name, "\n"; >>>> #iterate over each HSP in the hit >>>> while (my $hsp = $hit->next_hsp) { >>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>> >>>> >>>>>hit_string, "\n"; >>>>> >>>>> >>>> } >>>> } >>>>} >>>> >>>> >>>> >>>> >>>I tested some of the BLAST results that Hubert sent Roger >>> >>> >>and me with a >> >> >>>similar script to the above. I removed the file parsing logic and it >>> >>> >>seemed >> >> >>>to work just fine. It may very well be a logic issue or >>> >>> >>that he hasn't >> >> >>>installed the latest fix. >>> >>>It's a funny thing, though. When I tried using blastcl3 (v. >>> >>> >>2.2.13), >> >> >>>even though the returned output was from nr, the top of the blast >>>output showed that it was v2.2.12: >>> >>>BLASTP 2.2.12 [Aug-07-2005] >>> >>>I double-checked my local version and it's definitely v.2.2.13: >>>------------------------------------- >>>C:\Perl\Scripts>blastcl3 - >>> >>>blastcl3 2.2.13 arguments:... >>>------------------------------------- >>> >>>If you use RemoteBlast using the same settings, the version in the >>>header looks like this: >>> >>>BLASTP 2.2.13 [Nov-27-2005] >>> >>>I'm wondering if all the blast executables (blast and netblast) from >>>NCBI have text output like v.2.2.12, while the wwwblast >>> >>> >>outputs a new >> >> >>>format (2.2.13). I'll ask blast-help at NCBI about this. >>> >>> >>> >>> >>> >>>>To clarify some stuff - >>>>Chris I don't necessarily think the XML is best way forward >>>> >>>> >>for BLAST >> >> >>>>reports generated locally, it isn't as detailed as the Text >>>> >>>> >>format and >> >> >>>>it is what most people expect to be able to scroll through >>>> >>>> >>and parse >> >> >>>>-- it is also harder for the format to change dramatically >>>> >>>> >>if you have >> >> >>>>a static binary on your machine =). I think for >>>> >>>> >>remoteblast the XML >> >> >>>>format should be the way forward but I expect Bioperl to maintain >>>>support of any plain text BLAST report format that people use on a >>>>regular basis. >>>> >>>> >>>> >>>> >>>> >>>Does XML lack some specific info that text output has? >>> >>> >>Didn't know that. >>I >> >> >>>believe that XML should be default in RemoteBlast since it will not >>>break, but I agree with you about text output. I also agree that it >>>will need somebody to maintain it constantly, much like RemoteBlast. >>> >>> >>> >>> >>> >>>>-jason >>>> >>>> >>>> >>>> >>>>>Chris Fields wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>My guess is you're running into text parsing problems in >>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>(1.5.1) or >>>>>>bioperl-live (CVS), then see the bug below. >>>>>> >>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>> >>>>>>I think the first problem you ran into is solved in >>>>>> >>>>>> >>bioperl 1.5.1, >> >> >>>>>>the last problem (more recent, not related to the first) has been >>>>>>fixed but hasn't been committed to bioperl-live yet. The fixed >>>>>>SearchIO::blast is available in the link above, but >>>>>> >>>>>> >>>>>> >>>>>> >>>>realize it hasn't >>>> >>>> >>>> >>>> >>>>>>been committed yet and may change. >>>>>> >>>>>>Christopher Fields >>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>> >>>>>>> >>Of Hubert >> >> >>>>>>>Prielinger >>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>To: bioperl-l at bioperl.org >>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>parsing Blast >>>> >>>> >>>> >>>> >>>>>>>output >>>>>>> >>>>>>>Hi, >>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>Bio::SearchIO, I get the following error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>> >>>> >>>> >>>>>>>is that a bug...... >>>>>>> >>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>>>>anything..... >>>>>>>I'm using bioperl 1.4 >>>>>>> >>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>parsing Blast >>>> >>>> >>>> >>>> >>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>bioperl version >>>> >>>> >>>> >>>> >>>>>>>I had installed >>>>>>> >>>>>>>thanks in advance >>>>>>> >>>>>>>Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>-- >>>>Jason Stajich >>>>Duke University >>>>http://www.duke.edu/~jes12 >>>> >>>> >>>> >>>> >>>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > > From olenka.m at gmail.com Thu Feb 9 17:49:48 2006 From: olenka.m at gmail.com (Olena Morozova) Date: Thu, 9 Feb 2006 17:49:48 -0500 Subject: [Bioperl-l] Bio::TreeIO Message-ID: <259a224c0602091449u353e4bf1g5a3cfbb46297217a@mail.gmail.com> Hi all, Probably a very stupid question, but the get_lca function does not work for unrooted trees, does it? I am trying to get the LCA for a set of nodes in a phylip tree, and I am using the script in the HOWTO. Thanks, Olena On 2/8/06, Hubert Prielinger wrote: > Hi, > If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO, > I get the following error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > is that a bug...... > > If I want to parse Blast Output (version 2.2.13), I don't get anything..... > I'm using bioperl 1.4 > > before, I have installed bioperl 1.4, it worked fine parsing Blast > Output (version 2.2.12), but I don't remember which bioperl version I > had installed > > thanks in advance > > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From victor.ruotti at gmail.com Thu Feb 9 18:22:11 2006 From: victor.ruotti at gmail.com (Victor) Date: Thu, 9 Feb 2006 17:22:11 -0600 Subject: [Bioperl-l] Running BLAT with BioPerl Message-ID: <36d7e5550602091522g114728a2w57f2a1cb7c1383ee@mail.gmail.com> Hi, Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to date in the lastest bioperl release? use Bio::Tools::Run::Alignment::Blat; my $factory = Bio::Tools::Run::Alignment::Blat->new(); my $seq = "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; my @feats = $factory->run( $seq); Here is what I get when tring to use it: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA /tmp/fB09bp5F76) crashed: -1 Notice that it is using "blat' twice in the path. The way that I fixed this is by going to the blat.pm module and changing the following lines: #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); my $str= Bio::Root::IO->catfile($self->program_name); Any ideas, maybe I'm missing the $ENV variable somewhere? I'd like to avoid making this change. Also does anyone have a known synopsis of this blat module (where to set the parameters, and whether it allows you to have a config file). I'll be happy to add a better synopsis to the module if needed. Thanks in advance, Victor From osborne1 at optonline.net Thu Feb 9 20:37:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 09 Feb 2006 20:37:39 -0500 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? In-Reply-To: Message-ID: Claudia, Yes, Bio::Restricion does this, see bptutorial.pl for code examples. Note that statement "@fragments = $analysis->fragments($enzyme)". If the array @fragments has more than 1 element that means your sequence has a site for the enzyme in question. Alternatively it sounds like you could use some kind of regular expression. Brian O. On 2/9/06 3:53 PM, "Lalancette, Claudia" wrote: > Greetings, > > > > I need to find a way to look for a specific restriction enzyme site in > hundreds of sequences. Been looking at Bio::Restriction, but not sure > if will work... Any suggestions? > > > > Thanks, > > Claudia > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Feb 9 20:52:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 19:52:34 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EBC03E.4040900@gmx.at> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> Message-ID: From 'perldoc Bio::SearchIO::blast': DESCRIPTION This object encapsulated the necessary methods for generating events suitable for building Bio::Search objects from a BLAST report file. Read the Bio::SearchIO for more information about how to use this. This driver can parse: o NCBI produced plain text BLAST reports from blastall, this also includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq reports. NCBI XML BLAST output is parsed with the blastxml SearchIO driver o WU-BLAST all reports o Jim Kent's BLAST-like output from his programs (BLASTZ, BLAT) o BLAST-like output from Paracel BTK output So, it should. Let us know if it doesn't. On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > Hi Chris, > I'm incredibly sorry for causing so much inconvenience, yes you are > right, I had only to change the blast.pm file, it is working very > fine, thank you very much, and you are right, you have mentioned it > ealier either to change the file... ;) > > but I have another question: does it work with the WU-Blast output > too? > regards > Hubert > > > Chris Fields wrote: > >> Ha! I come back from meeting and there's a billion emails! What >> have we >> started? ;p . Sorry about this Jason; I know you're busy. >> >> Hubert, if you're out there, I sent you an email with an >> attachment. You >> said the output looks like what you were expecting. So I think we >> have two >> problems: >> >> 1) I haven't delved into the file scanning, but the fact that it >> takes so >> long should tell you something's seriously wrong there. Strip >> that part out >> and start with a simple script, say, like the one Jason or that I >> sent you; >> the script I used to generate that output works fine (on two OS's, >> WinXP and >> Mac OS X). Use it on one file at a time. Do everything on >> command line >> (not through Eclipse). IDE's can be notoriously flaky about running >> scripts, esp. when they run debugging. >> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >> will still >> not work whenever the text blast output has the following header, >> which >> comes from the new web version of BLAST: >> >> ----------------------------------------------------- >> BLASTP 2.2.13 [Nov-27-2005] >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >> protein database search programs", Nucleic Acids Res. 25:3389-3402. >> >> RID: 1139501210-857-165793005128.BLASTQ1 >> >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> 3,292,813 sequences; 1,128,164,434 total letters >> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >> tuberculosis H37Rv]. >> Length=193 >> ....... >> ----------------------------------------------------- >> >> It will work if the text output has the following header (or is an >> older >> version of BLAST): >> >> ----------------------------------------------------- >> BLASTP 2.2.12 [Aug-07-2005] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >> protein database search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >> tuberculosis H37Rv]. >> (193 letters) >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> 2,895,325 sequences; 997,103,285 total letters >> ----------------------------------------------------- >> You have the former (2.2.13) version. I know b/c I have your >> BLAST files. >> Therefore, even bioperl-1.5.1 will not work! >> >> If you want the really gory details on why this is a problem, look >> here: >> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >> >> So, any text output with the above header will not work; it will >> either hang >> or end abruptly (depending on OS, perl version, memory, >> patience). If you >> look in the above, I have added a preliminary fix for this. I'll >> reiterate >> for the billionth time, it hasn't been committed yet, so don't >> kill me if >> blows your computer up ;> >> Here's the direct link: >> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >> This is a modified version of Bio::SearchIO::blast.pm (it says >> it's version >> 1.90, but it's lying, I didn't change the version, only the regex; >> sorry >> Jason). From what you've been posting it doesn't sound like >> you've tried >> this, and I believe I've suggested this fix before. >> >> Replace the one in your Bio/SearchIO directory (which looks like >> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >> prev. >> message) with this file. Make sure the filename stays the same >> (blast.pm). >> >> Run everything again, one file at a time. Make sure you use >> Jason's script >> as well as the one I sent you. Do NOT rely on running through >> multiple >> files yet. Fix one bug at a time. And heed Joel's words about >> file checks. >> >> >> Here's a small chunk of output from one of your blast files using the >> modifed script I sent you: >> >> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >> Query: 1 RWKWKRKK 8 >> Seq: 542 RWAWRRKK 549 >> >> Look familiar? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >>> -----Original Message----- >>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>> February 09, 2006 3:24 PM >>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> In other words, yes, I'm on the wrong trail. :} >>> >>> Sorry - I'll look at the output issue this evening (or realize >>> that Chris already solved the issue). ;} >>> >>> Thanks! >>> >>> Roger >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>> Prielinger >>> Sent: Thursday, February 09, 2006 2:14 PM >>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>> Stajich >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> dear roger, >>> this error message I got, when I tried to parse Blast output >>> (version >>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>> a lot of Blast output files with version 2.2.13 and for that I >>> don't get any error message.....it just doesn't work >>> >>> Hubert >>> >>> >>> >>> Roger Hall wrote: >>> >>> >>>> Guys - I'm looking at the error message: >>>> >>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>> STACK Bio::SearchIO::blast::next_result >>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>> STACK toplevel >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> This is my line of thought: >>>> 1. "no data for midline $_" is a unique message generated by >>> blast.pm >>>> in >>>> >>> one >>> >>>> location only at the point of a. reading three lines b. >>> dropping lines >>>> with spaces only c. identifying the Query, Midline, and >>> Match lines (0 >>>> <= $i < >>>> >>> 3) >>> >>>> 2. There is a regexp match that fails in order to reach that >>> error message >>> >>>> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> expression >>> >>>> 4. It does anyway >>>> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>> in the blast >>> >>>> reports >>>> >>>> I suspect a newline/chomp/metacharacter issue. Not finding >>> the string >>>> anywhere has me thoroughly confused - I asked Hubert for the >>> additional >>>> file, assuming that I didn't have it. >>>> >>>> My next thought is to write a quick script to test perl behavior >>>> on "Fedora Core 9". >>>> >>>> Thoughts? >>>> >>>> Did I misread the issue entirely? :} >>>> >>>> Roger >>>> >>>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Chris Fields >>> >>>> Sent: Thursday, February 09, 2006 10:16 AM >>>> To: 'Jason Stajich'; 'Hubert Prielinger' >>>> Cc: bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>> Sent: Thursday, February 09, 2006 9:13 AM >>>>> To: Hubert Prielinger >>>>> Cc: Chris Fields; bioperl-l at bioperl.org >>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> parsing Blast output >>>>> >>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>> >>>>> >>>>>> hi chris, >>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>> >>>>>> >>>>> working, >>>>> >>>>> >>>>>> do you have any ohter idea, the problem I have is that I >>>>>> >>>>>> >>>>> have to parse >>>>> >>>>> >>>>>> a lot of textfiles.... >>>>>> or shall I look for another option to parse those files... >>>>>> >>>>>> regards >>>>>> Hubert >>>>>> >>>>>> >>>>> The code from Bioperl 1.5.1 works fine for me for blast >>>>> 2.2.13 reports but unless you post your blast report we >>> can't really >>>>> determine the problem. >>>>> >>>>> If you are still getting the same error like this I am not >>> convinced >>>>> you have upgraded to 1.5.1 which includes a fix in the fact >>> that NCBI >>>>> changed the HSP result format to remove the ':' from the >>> Query/Sbjct >>>>> prefixes. We fixed this as soon as it was apparent sometime in >>>>> September. >>>>> >>>>> >>>>> >>>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>> STACK toplevel >>>>>>>> >>>>>>>> >>>>>>>> >>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>> Blast.pl:21 >>>>> >>>>> If you are just getting no results but also no warnings wrt >>> parsing, >>>>> are you sure your logic is correct? >>>>> >>>>> If you remove your filters do you see all the HSPS? >>>>> >>>>> >>>>> while (my $result = $search->next_result) { >>>>> print $result->query_name, "\n"; >>>>> #iterate over each hit on the query sequence >>>>> while (my $hit = $result->next_hit) { >>>>> print $hit->name, "\n"; >>>>> #iterate over each HSP in the hit >>>>> while (my $hsp = $hit->next_hsp) { >>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>> >>>>>> hit_string, "\n"; >>>>>> >>>>> } >>>>> } >>>>> } >>>>> >>>>> >>>> I tested some of the BLAST results that Hubert sent Roger >>> and me with a >>>> similar script to the above. I removed the file parsing logic >>>> and it >>>> >>> seemed >>> >>>> to work just fine. It may very well be a logic issue or >>> that he hasn't >>>> installed the latest fix. >>>> It's a funny thing, though. When I tried using blastcl3 (v. >>> 2.2.13), >>>> even though the returned output was from nr, the top of the >>>> blast output showed that it was v2.2.12: >>>> >>>> BLASTP 2.2.12 [Aug-07-2005] >>>> >>>> I double-checked my local version and it's definitely v.2.2.13: >>>> ------------------------------------- >>>> C:\Perl\Scripts>blastcl3 - >>>> >>>> blastcl3 2.2.13 arguments:... >>>> ------------------------------------- >>>> >>>> If you use RemoteBlast using the same settings, the version in >>>> the header looks like this: >>>> >>>> BLASTP 2.2.13 [Nov-27-2005] >>>> >>>> I'm wondering if all the blast executables (blast and netblast) >>>> from NCBI have text output like v.2.2.12, while the wwwblast >>> outputs a new >>>> format (2.2.13). I'll ask blast-help at NCBI about this. >>>> >>>> >>>> >>>>> To clarify some stuff - >>>>> Chris I don't necessarily think the XML is best way forward >>> for BLAST >>>>> reports generated locally, it isn't as detailed as the Text >>> format and >>>>> it is what most people expect to be able to scroll through >>> and parse >>>>> -- it is also harder for the format to change dramatically >>> if you have >>>>> a static binary on your machine =). I think for >>> remoteblast the XML >>>>> format should be the way forward but I expect Bioperl to >>>>> maintain support of any plain text BLAST report format that >>>>> people use on a regular basis. >>>>> >>>>> >>>>> >>>> Does XML lack some specific info that text output has? >>> Didn't know that. >>> I >>> >>>> believe that XML should be default in RemoteBlast since it will >>>> not break, but I agree with you about text output. I also agree >>>> that it will need somebody to maintain it constantly, much like >>>> RemoteBlast. >>>> >>>> >>>> >>>>> -jason >>>>> >>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>> >>>>>>> My guess is you're running into text parsing problems in >>>>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>> (1.5.1) or >>>>>>> bioperl-live (CVS), then see the bug below. >>>>>>> >>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>> >>>>>>> I think the first problem you ran into is solved in >>> bioperl 1.5.1, >>>>>>> the last problem (more recent, not related to the first) has >>>>>>> been fixed but hasn't been committed to bioperl-live yet. >>>>>>> The fixed SearchIO::blast is available in the link above, but >>>>>>> >>>>>>> >>>>> realize it hasn't >>>>> >>>>> >>>>>>> been committed yet and may change. >>>>>>> >>>>>>> Christopher Fields >>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>> University of Illinois Urbana-Champaign >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>> Of Hubert >>>>>>>> Prielinger >>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>> To: bioperl-l at bioperl.org >>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>> >>>>>>>> >>>>> parsing Blast >>>>> >>>>> >>>>>>>> output >>>>>>>> >>>>>>>> Hi, >>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>> Bio::SearchIO, I get the following error message: >>>>>>>> >>>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>> STACK toplevel >>>>>>>> >>>>>>>> >>>>>>>> >>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>> Blast.pl:21 >>>>> >>>>> >>>>>>>> is that a bug...... >>>>>>>> >>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>> get anything..... >>>>>>>> I'm using bioperl 1.4 >>>>>>>> >>>>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>>>> >>>>>>>> >>>>> parsing Blast >>>>> >>>>> >>>>>>>> Output (version 2.2.12), but I don't remember which >>>>>>>> >>>>>>>> >>>>> bioperl version >>>>> >>>>> >>>>>>>> I had installed >>>>>>>> >>>>>>>> thanks in advance >>>>>>>> >>>>>>>> Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> -- >>>>> Jason Stajich >>>>> Duke University >>>>> http://www.duke.edu/~jes12 >>>>> >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From heikki at sanbi.ac.za Thu Feb 9 23:47:42 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 06:47:42 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <000901c62dbf$49bfae20$15327e82@pyrimidine> References: <000901c62dbf$49bfae20$15327e82@pyrimidine> Message-ID: <200602100647.43173.heikki@sanbi.ac.za> On Thursday 09 February 2006 23:25, Chris Fields wrote: > Thanks! I think, as long as the tests pass everything is fine with me. I > may be submitting another module or two in the next few weeks; just depends > on how much time I can spend on them. Looking forwart to them! -Heikki > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] > > Sent: Thursday, February 09, 2006 1:42 PM > > To: bioperl-l at lists.open-bio.org > > Cc: Chris Fields > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > Chris, > > > > I committed your file. All tests pass; code looks like > > written by a long term bioperl contributor! Impressive. > > > > I truncated the larger test file from 270K to 20K (200 > > lines), to not bloat the distribution unnecessarily. Tests > > pass which is the main thing. Shout if if you disagree. > > > > Great job! > > > > -Heikki > > > > On Thursday 09 February 2006 19:53, Chris Fields wrote: > > > Heikki, > > > > > > I've added the Bio::Tools::RNAMotif module with test suite > > > > (24 tests) > > > > > and two test data files to bugzilla. The first data file is needed > > > for normal tests, the second is for testing parsing with > > > > modified data > > > > > in the score tag (using sprintf() in the RNAMotif > > > > descriptor). I ran > > > > > 'perl t\RNAMotif.t' and they all passed. > > > > > > Thanks! > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org > > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki > > > > Lehvaslaiho > > > > Sent: Wednesday, February 08, 2006 12:54 AM > > > > To: bioperl-l at lists.open-bio.org > > > > Cc: Chris Fields > > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > > > > > Chris, > > > > > > > > Post your files to bugzilla (ticket type enhancement, add > > > > files to > > > > > > ticket after creation) and someone with commit ability will add > > > > them to CVS once the code is in satisfactory condition. > > > > > > > > Thanks, > > > > > > > > -Heikki > > > > > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > > > I want to submit a module for parsing RNAMotif output > > > > > (Bio::Tools::RNAMotif). It is capable, at the moment, > > > > of scanning > > > > > > > output and returning Bio::SeqFeature::Generic objects with > > > > > > > > added tags > > > > > > > > > for descriptors/sequences/file info. I'm in the process of > > > > > > > > writing up > > > > > > > > > tests and going through biodesign to make sure everything's > > > > > kosher, but the module itself is essentially ready-to-go. What > > > > > should I do next? > > > > > > > > > > Christopher Fields > > > > > Postdoctoral Researcher > > > > > Lab of Dr. Robert Switzer > > > > > Dept of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ______ _/ > > > > _/_____________________________________________________ > > > > > > _/ _/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > > > _/ _/ _/ SANBI, South African National > > > > Bioinformatics Institute > > > > > > _/ _/ _/ University of Western Cape, South Africa > > > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > > > ___ > > > > _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ > > _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 23:51:11 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 06:51:11 +0200 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? In-Reply-To: References: Message-ID: <200602100651.12028.heikki@sanbi.ac.za> It should: #loop over each seq my $ra=Bio::Restriction::Analysis->new(-seq=>$seq1); @cuts = $ra->fragments('EcoRI'); # or call some other method or is it something else you are trying to do? Yours, -Heikki On Thursday 09 February 2006 22:53, Lalancette, Claudia wrote: > Greetings, > > > > I need to find a way to look for a specific restriction enzyme site in > hundreds of sequences. Been looking at Bio::Restriction, but not sure > if will work... Any suggestions? > > > > Thanks, > > Claudia > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Fri Feb 10 02:06:11 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 09:06:11 +0200 Subject: [Bioperl-l] planning sequence mutating modules Message-ID: <200602100906.11885.heikki@sanbi.ac.za> Ryan Golhar's mail got me thinking that we should have a simple framework for mutating sequences to a desired level. The model can then be extended to necessary complexity when needed by subclassing. To start with, I have been planning: Bio::SeqEvolution::EvolutionI - interface file Bio::SeqEvolution::EvolutionI::seq() - seq to mutate Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, (defaults to Bio::PrimarySeq) Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses Bio::SeqEvolution::EvolutionI::each_seqs($count) - returns an array of $count seqs Bio::SeqEvolution::EvolutionI::_generate_seq() Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring converteed to probabilites of change internally various methods to define the extent of divergence: only one to start with: Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation (= 100% - identity) Bio::SeqEvolution::Factory - core class to call, instantiates subclasses, Bio::SeqEvolution::DNASimple for nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, defaults to Bio::SeqEvolution::DNASimple for nucleotides Bio::SeqEvolution::DNASimple - default for nucleotides Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, e.g. 5 => 5:1, defaults to 1:1 simple alternative to a scoring matrix I am soliciting usual comments and suggestions about naming and minimal functionality. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Pieter.Monsieurs at esat.kuleuven.be Fri Feb 10 03:53:43 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Fri, 10 Feb 2006 09:53:43 +0100 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> Message-ID: <43EC5497.3050505@esat.kuleuven.be> Hi Chris, The parsing of the Blast output still doesn't work for me with the bug fix download of blast.pm. The module keeps turning around in the while loop at line 487 looking for a database or query-size: while( defined ($_) ) { if( /^Database:/ ) { $self->_pushback($_); last; } chomp; if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { $size = $1; $size =~ s/,//g; last; } else { $q .= " $_"; $q =~ s/ +/ /g; $q =~ s/^ | $//g; } $_ = $self->_readline; } The code keeps looking for the database information, however - as you mentioned - this information is given before the query line in the new Blast output format. This way, all hits and hsps are stored in the query_description ($hit->query_description), no hits are found and query_length is 0. Because you already adapted the module to retrieve database information at another position in the module, deleting the while loop and adding the following lines after $_ = $self->_readline (line 486), worked fine for me (using blastn and blastp): if (/Length=([\d,]+)/) { $size = $1; $size =~ s/,//g; } Regards, Pieter Chris Fields wrote: > From 'perldoc Bio::SearchIO::blast': > >DESCRIPTION > This object encapsulated the necessary methods for generating >events > suitable for building Bio::Search objects from a BLAST report >file. > Read the Bio::SearchIO for more information about how to use >this. > > This driver can parse: > > o NCBI produced plain text BLAST reports from blastall, >this also > includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >reports. NCBI > XML BLAST output is parsed with the blastxml SearchIO driver > > o WU-BLAST all reports > > o Jim Kent's BLAST-like output from his programs (BLASTZ, >BLAT) > > o BLAST-like output from Paracel BTK output > >So, it should. Let us know if it doesn't. > >On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > > > >>Hi Chris, >>I'm incredibly sorry for causing so much inconvenience, yes you are >>right, I had only to change the blast.pm file, it is working very >>fine, thank you very much, and you are right, you have mentioned it >>ealier either to change the file... ;) >> >>but I have another question: does it work with the WU-Blast output >>too? >>regards >>Hubert >> >> >>Chris Fields wrote: >> >> >> >>>Ha! I come back from meeting and there's a billion emails! What >>>have we >>>started? ;p . Sorry about this Jason; I know you're busy. >>> >>>Hubert, if you're out there, I sent you an email with an >>>attachment. You >>>said the output looks like what you were expecting. So I think we >>>have two >>>problems: >>> >>>1) I haven't delved into the file scanning, but the fact that it >>>takes so >>>long should tell you something's seriously wrong there. Strip >>>that part out >>>and start with a simple script, say, like the one Jason or that I >>>sent you; >>>the script I used to generate that output works fine (on two OS's, >>>WinXP and >>>Mac OS X). Use it on one file at a time. Do everything on >>>command line >>>(not through Eclipse). IDE's can be notoriously flaky about running >>>scripts, esp. when they run debugging. >>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>will still >>>not work whenever the text blast output has the following header, >>>which >>>comes from the new web version of BLAST: >>> >>>----------------------------------------------------- >>>BLASTP 2.2.13 [Nov-27-2005] >>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>> >>>RID: 1139501210-857-165793005128.BLASTQ1 >>> >>> >>>Database: All non-redundant GenBank CDS >>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> 3,292,813 sequences; 1,128,164,434 total letters >>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>tuberculosis H37Rv]. >>>Length=193 >>>....... >>>----------------------------------------------------- >>> >>>It will work if the text output has the following header (or is an >>>older >>>version of BLAST): >>> >>>----------------------------------------------------- >>>BLASTP 2.2.12 [Aug-07-2005] >>> >>> >>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>protein database search >>>programs", Nucleic Acids Res. 25:3389-3402. >>> >>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>tuberculosis H37Rv]. >>> (193 letters) >>> >>>Database: All non-redundant GenBank CDS >>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> 2,895,325 sequences; 997,103,285 total letters >>>----------------------------------------------------- >>>You have the former (2.2.13) version. I know b/c I have your >>>BLAST files. >>>Therefore, even bioperl-1.5.1 will not work! >>> >>>If you want the really gory details on why this is a problem, look >>>here: >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>So, any text output with the above header will not work; it will >>>either hang >>>or end abruptly (depending on OS, perl version, memory, >>>patience). If you >>>look in the above, I have added a preliminary fix for this. I'll >>>reiterate >>>for the billionth time, it hasn't been committed yet, so don't >>>kill me if >>>blows your computer up ;> >>>Here's the direct link: >>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>it's version >>>1.90, but it's lying, I didn't change the version, only the regex; >>>sorry >>>Jason). From what you've been posting it doesn't sound like >>>you've tried >>>this, and I believe I've suggested this fix before. >>> >>>Replace the one in your Bio/SearchIO directory (which looks like >>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>prev. >>>message) with this file. Make sure the filename stays the same >>>(blast.pm). >>> >>>Run everything again, one file at a time. Make sure you use >>>Jason's script >>>as well as the one I sent you. Do NOT rely on running through >>>multiple >>>files yet. Fix one bug at a time. And heed Joel's words about >>>file checks. >>> >>> >>>Here's a small chunk of output from one of your blast files using the >>>modifed script I sent you: >>> >>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>Query: 1 RWKWKRKK 8 >>>Seq: 542 RWAWRRKK 549 >>> >>>Look familiar? >>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>>>-----Original Message----- >>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>February 09, 2006 3:24 PM >>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>parsing Blast output >>>> >>>>In other words, yes, I'm on the wrong trail. :} >>>> >>>>Sorry - I'll look at the output issue this evening (or realize >>>>that Chris already solved the issue). ;} >>>> >>>>Thanks! >>>> >>>>Roger >>>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>Stajich >>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>parsing Blast output >>>> >>>>dear roger, >>>>this error message I got, when I tried to parse Blast output >>>>(version >>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>a lot of Blast output files with version 2.2.13 and for that I >>>>don't get any error message.....it just doesn't work >>>> >>>>Hubert >>>> >>>> >>>> >>>>Roger Hall wrote: >>>> >>>> >>>> >>>> >>>>>Guys - I'm looking at the error message: >>>>> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>Blast.pl:21 >>>>> >>>>>This is my line of thought: >>>>>1. "no data for midline $_" is a unique message generated by >>>>> >>>>> >>>>blast.pm >>>> >>>> >>>>>in >>>>> >>>>> >>>>> >>>>one >>>> >>>> >>>> >>>>>location only at the point of a. reading three lines b. >>>>> >>>>> >>>>dropping lines >>>> >>>> >>>>>with spaces only c. identifying the Query, Midline, and >>>>> >>>>> >>>>Match lines (0 >>>> >>>> >>>>><= $i < >>>>> >>>>> >>>>> >>>>3) >>>> >>>> >>>> >>>>>2. There is a regexp match that fails in order to reach that >>>>> >>>>> >>>>error message >>>> >>>> >>>> >>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>> >>>>> >>>>expression >>>> >>>> >>>> >>>>>4. It does anyway >>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>> >>>>> >>>>in the blast >>>> >>>> >>>> >>>>>reports >>>>> >>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>> >>>>> >>>>the string >>>> >>>> >>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>> >>>>> >>>>additional >>>> >>>> >>>>>file, assuming that I didn't have it. >>>>> >>>>>My next thought is to write a quick script to test perl behavior >>>>>on "Fedora Core 9". >>>>> >>>>>Thoughts? >>>>> >>>>>Did I misread the issue entirely? :} >>>>> >>>>>Roger >>>>> >>>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> >>>>> >>>>Chris Fields >>>> >>>> >>>> >>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>Cc: bioperl-l at bioperl.org >>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>To: Hubert Prielinger >>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>hi chris, >>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>working, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>have to parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>a lot of textfiles.... >>>>>>>or shall I look for another option to parse those files... >>>>>>> >>>>>>>regards >>>>>>>Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>2.2.13 reports but unless you post your blast report we >>>>>> >>>>>> >>>>can't really >>>> >>>> >>>>>>determine the problem. >>>>>> >>>>>>If you are still getting the same error like this I am not >>>>>> >>>>>> >>>>convinced >>>> >>>> >>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>> >>>>>> >>>>that NCBI >>>> >>>> >>>>>>changed the HSP result format to remove the ':' from the >>>>>> >>>>>> >>>>Query/Sbjct >>>> >>>> >>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>September. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>STACK toplevel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>>If you are just getting no results but also no warnings wrt >>>>>> >>>>>> >>>>parsing, >>>> >>>> >>>>>>are you sure your logic is correct? >>>>>> >>>>>>If you remove your filters do you see all the HSPS? >>>>>> >>>>>> >>>>>>while (my $result = $search->next_result) { >>>>>> print $result->query_name, "\n"; >>>>>> #iterate over each hit on the query sequence >>>>>> while (my $hit = $result->next_hit) { >>>>>> print $hit->name, "\n"; >>>>>> #iterate over each HSP in the hit >>>>>> while (my $hsp = $hit->next_hsp) { >>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>> >>>>>> >>>>>> >>>>>>>hit_string, "\n"; >>>>>>> >>>>>>> >>>>>>> >>>>>> } >>>>>> } >>>>>>} >>>>>> >>>>>> >>>>>> >>>>>> >>>>>I tested some of the BLAST results that Hubert sent Roger >>>>> >>>>> >>>>and me with a >>>> >>>> >>>>>similar script to the above. I removed the file parsing logic >>>>>and it >>>>> >>>>> >>>>> >>>>seemed >>>> >>>> >>>> >>>>>to work just fine. It may very well be a logic issue or >>>>> >>>>> >>>>that he hasn't >>>> >>>> >>>>>installed the latest fix. >>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>> >>>>> >>>>2.2.13), >>>> >>>> >>>>>even though the returned output was from nr, the top of the >>>>>blast output showed that it was v2.2.12: >>>>> >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>------------------------------------- >>>>>C:\Perl\Scripts>blastcl3 - >>>>> >>>>>blastcl3 2.2.13 arguments:... >>>>>------------------------------------- >>>>> >>>>>If you use RemoteBlast using the same settings, the version in >>>>>the header looks like this: >>>>> >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>> >>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>> >>>>> >>>>outputs a new >>>> >>>> >>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>To clarify some stuff - >>>>>>Chris I don't necessarily think the XML is best way forward >>>>>> >>>>>> >>>>for BLAST >>>> >>>> >>>>>>reports generated locally, it isn't as detailed as the Text >>>>>> >>>>>> >>>>format and >>>> >>>> >>>>>>it is what most people expect to be able to scroll through >>>>>> >>>>>> >>>>and parse >>>> >>>> >>>>>>-- it is also harder for the format to change dramatically >>>>>> >>>>>> >>>>if you have >>>> >>>> >>>>>>a static binary on your machine =). I think for >>>>>> >>>>>> >>>>remoteblast the XML >>>> >>>> >>>>>>format should be the way forward but I expect Bioperl to >>>>>>maintain support of any plain text BLAST report format that >>>>>>people use on a regular basis. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Does XML lack some specific info that text output has? >>>>> >>>>> >>>>Didn't know that. >>>>I >>>> >>>> >>>> >>>>>believe that XML should be default in RemoteBlast since it will >>>>>not break, but I agree with you about text output. I also agree >>>>>that it will need somebody to maintain it constantly, much like >>>>>RemoteBlast. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-jason >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Chris Fields wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>(1.5.1) or >>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>> >>>>>>>>I think the first problem you ran into is solved in >>>>>>>> >>>>>>>> >>>>bioperl 1.5.1, >>>> >>>> >>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>realize it hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>been committed yet and may change. >>>>>>>> >>>>>>>>Christopher Fields >>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>-----Original Message----- >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>> >>>>>>>>> >>>>Of Hubert >>>> >>>> >>>>>>>>>Prielinger >>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>parsing Blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>output >>>>>>>>> >>>>>>>>>Hi, >>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>> >>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>STACK toplevel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>is that a bug...... >>>>>>>>> >>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>get anything..... >>>>>>>>>I'm using bioperl 1.4 >>>>>>>>> >>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>parsing Blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>bioperl version >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>I had installed >>>>>>>>> >>>>>>>>>thanks in advance >>>>>>>>> >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>-- >>>>>>Jason Stajich >>>>>>Duke University >>>>>>http://www.duke.edu/~jes12 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> > >Christopher Fields >Postdoctoral Researcher >Lab of Dr. Robert Switzer >Dept of Biochemistry >University of Illinois Urbana-Champaign > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From Pieter.Monsieurs at esat.kuleuven.be Fri Feb 10 04:44:10 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Fri, 10 Feb 2006 10:44:10 +0100 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EC5497.3050505@esat.kuleuven.be> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> <43EC5497.3050505@esat.kuleuven.be> Message-ID: <43EC606A.20003@esat.kuleuven.be> Sorry for disturbing. I now works correctly with the bug fix of Chris. Thanx, Pieter Pieter Monsieurs wrote: >Hi Chris, > >The parsing of the Blast output still doesn't work for me with the bug >fix download of blast.pm. >The module keeps turning around in the while loop at line 487 looking >for a database or query-size: > >while( defined ($_) ) { > if( /^Database:/ ) { > $self->_pushback($_); > last; > } > chomp; > if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { > $size = $1; > $size =~ s/,//g; > last; > } else { > $q .= " $_"; > $q =~ s/ +/ /g; > $q =~ s/^ | $//g; > } > $_ = $self->_readline; >} > > >The code keeps looking for the database information, however - as you >mentioned - this information is given before the query line in the new >Blast output format. >This way, all hits and hsps are stored in the query_description >($hit->query_description), no hits are found and query_length is 0. >Because you already adapted the module to retrieve database information >at another position in the module, deleting the while loop and adding >the following lines after $_ = $self->_readline (line 486), worked fine >for me (using blastn and blastp): > >if (/Length=([\d,]+)/) { > $size = $1; > $size =~ s/,//g; >} > > >Regards, >Pieter > > > >Chris Fields wrote: > > > >>From 'perldoc Bio::SearchIO::blast': >> >>DESCRIPTION >> This object encapsulated the necessary methods for generating >>events >> suitable for building Bio::Search objects from a BLAST report >>file. >> Read the Bio::SearchIO for more information about how to use >>this. >> >> This driver can parse: >> >> o NCBI produced plain text BLAST reports from blastall, >>this also >> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>reports. NCBI >> XML BLAST output is parsed with the blastxml SearchIO driver >> >> o WU-BLAST all reports >> >> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>BLAT) >> >> o BLAST-like output from Paracel BTK output >> >>So, it should. Let us know if it doesn't. >> >>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >> >> >> >> >> >>>Hi Chris, >>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>right, I had only to change the blast.pm file, it is working very >>>fine, thank you very much, and you are right, you have mentioned it >>>ealier either to change the file... ;) >>> >>>but I have another question: does it work with the WU-Blast output >>>too? >>>regards >>>Hubert >>> >>> >>>Chris Fields wrote: >>> >>> >>> >>> >>> >>>>Ha! I come back from meeting and there's a billion emails! What >>>>have we >>>>started? ;p . Sorry about this Jason; I know you're busy. >>>> >>>>Hubert, if you're out there, I sent you an email with an >>>>attachment. You >>>>said the output looks like what you were expecting. So I think we >>>>have two >>>>problems: >>>> >>>>1) I haven't delved into the file scanning, but the fact that it >>>>takes so >>>>long should tell you something's seriously wrong there. Strip >>>>that part out >>>>and start with a simple script, say, like the one Jason or that I >>>>sent you; >>>>the script I used to generate that output works fine (on two OS's, >>>>WinXP and >>>>Mac OS X). Use it on one file at a time. Do everything on >>>>command line >>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>scripts, esp. when they run debugging. >>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>will still >>>>not work whenever the text blast output has the following header, >>>>which >>>>comes from the new web version of BLAST: >>>> >>>>----------------------------------------------------- >>>>BLASTP 2.2.13 [Nov-27-2005] >>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>> >>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>> >>>> >>>>Database: All non-redundant GenBank CDS >>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>tuberculosis H37Rv]. >>>>Length=193 >>>>....... >>>>----------------------------------------------------- >>>> >>>>It will work if the text output has the following header (or is an >>>>older >>>>version of BLAST): >>>> >>>>----------------------------------------------------- >>>>BLASTP 2.2.12 [Aug-07-2005] >>>> >>>> >>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>protein database search >>>>programs", Nucleic Acids Res. 25:3389-3402. >>>> >>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>tuberculosis H37Rv]. >>>> (193 letters) >>>> >>>>Database: All non-redundant GenBank CDS >>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> 2,895,325 sequences; 997,103,285 total letters >>>>----------------------------------------------------- >>>>You have the former (2.2.13) version. I know b/c I have your >>>>BLAST files. >>>>Therefore, even bioperl-1.5.1 will not work! >>>> >>>>If you want the really gory details on why this is a problem, look >>>>here: >>>> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>>So, any text output with the above header will not work; it will >>>>either hang >>>>or end abruptly (depending on OS, perl version, memory, >>>>patience). If you >>>>look in the above, I have added a preliminary fix for this. I'll >>>>reiterate >>>>for the billionth time, it hasn't been committed yet, so don't >>>>kill me if >>>>blows your computer up ;> >>>>Here's the direct link: >>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>it's version >>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>sorry >>>>Jason). From what you've been posting it doesn't sound like >>>>you've tried >>>>this, and I believe I've suggested this fix before. >>>> >>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>prev. >>>>message) with this file. Make sure the filename stays the same >>>>(blast.pm). >>>> >>>>Run everything again, one file at a time. Make sure you use >>>>Jason's script >>>>as well as the one I sent you. Do NOT rely on running through >>>>multiple >>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>file checks. >>>> >>>> >>>>Here's a small chunk of output from one of your blast files using the >>>>modifed script I sent you: >>>> >>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>Query: 1 RWKWKRKK 8 >>>>Seq: 542 RWAWRRKK 549 >>>> >>>>Look familiar? >>>> >>>>Christopher Fields >>>>Postdoctoral Researcher - Switzer Lab >>>>Dept. of Biochemistry >>>>University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>> >>>>>-----Original Message----- >>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>February 09, 2006 3:24 PM >>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>>In other words, yes, I'm on the wrong trail. :} >>>>> >>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>that Chris already solved the issue). ;} >>>>> >>>>>Thanks! >>>>> >>>>>Roger >>>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>Prielinger >>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>Stajich >>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>>dear roger, >>>>>this error message I got, when I tried to parse Blast output >>>>>(version >>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>don't get any error message.....it just doesn't work >>>>> >>>>>Hubert >>>>> >>>>> >>>>> >>>>>Roger Hall wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Guys - I'm looking at the error message: >>>>>> >>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>STACK toplevel >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>>This is my line of thought: >>>>>>1. "no data for midline $_" is a unique message generated by >>>>>> >>>>>> >>>>>> >>>>>> >>>>>blast.pm >>>>> >>>>> >>>>> >>>>> >>>>>>in >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>one >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>location only at the point of a. reading three lines b. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>dropping lines >>>>> >>>>> >>>>> >>>>> >>>>>>with spaces only c. identifying the Query, Midline, and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Match lines (0 >>>>> >>>>> >>>>> >>>>> >>>>>><= $i < >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>3) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>2. There is a regexp match that fails in order to reach that >>>>>> >>>>>> >>>>>> >>>>>> >>>>>error message >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>expression >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>4. It does anyway >>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>> >>>>>> >>>>>> >>>>>> >>>>>in the blast >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>reports >>>>>> >>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>> >>>>>> >>>>>> >>>>>> >>>>>the string >>>>> >>>>> >>>>> >>>>> >>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>additional >>>>> >>>>> >>>>> >>>>> >>>>>>file, assuming that I didn't have it. >>>>>> >>>>>>My next thought is to write a quick script to test perl behavior >>>>>>on "Fedora Core 9". >>>>>> >>>>>>Thoughts? >>>>>> >>>>>>Did I misread the issue entirely? :} >>>>>> >>>>>>Roger >>>>>> >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Chris Fields >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>Cc: bioperl-l at bioperl.org >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-----Original Message----- >>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>To: Hubert Prielinger >>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>hi chris, >>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>working, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>have to parse >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>a lot of textfiles.... >>>>>>>>or shall I look for another option to parse those files... >>>>>>>> >>>>>>>>regards >>>>>>>>Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>can't really >>>>> >>>>> >>>>> >>>>> >>>>>>>determine the problem. >>>>>>> >>>>>>>If you are still getting the same error like this I am not >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>convinced >>>>> >>>>> >>>>> >>>>> >>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>that NCBI >>>>> >>>>> >>>>> >>>>> >>>>>>>changed the HSP result format to remove the ':' from the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>Query/Sbjct >>>>> >>>>> >>>>> >>>>> >>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>September. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>STACK toplevel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>If you are just getting no results but also no warnings wrt >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>parsing, >>>>> >>>>> >>>>> >>>>> >>>>>>>are you sure your logic is correct? >>>>>>> >>>>>>>If you remove your filters do you see all the HSPS? >>>>>>> >>>>>>> >>>>>>>while (my $result = $search->next_result) { >>>>>>> print $result->query_name, "\n"; >>>>>>> #iterate over each hit on the query sequence >>>>>>> while (my $hit = $result->next_hit) { >>>>>>> print $hit->name, "\n"; >>>>>>> #iterate over each HSP in the hit >>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>hit_string, "\n"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> } >>>>>>> } >>>>>>>} >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>> >>>>>> >>>>>> >>>>>> >>>>>and me with a >>>>> >>>>> >>>>> >>>>> >>>>>>similar script to the above. I removed the file parsing logic >>>>>>and it >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>seemed >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>to work just fine. It may very well be a logic issue or >>>>>> >>>>>> >>>>>> >>>>>> >>>>>that he hasn't >>>>> >>>>> >>>>> >>>>> >>>>>>installed the latest fix. >>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>2.2.13), >>>>> >>>>> >>>>> >>>>> >>>>>>even though the returned output was from nr, the top of the >>>>>>blast output showed that it was v2.2.12: >>>>>> >>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>> >>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>------------------------------------- >>>>>>C:\Perl\Scripts>blastcl3 - >>>>>> >>>>>>blastcl3 2.2.13 arguments:... >>>>>>------------------------------------- >>>>>> >>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>the header looks like this: >>>>>> >>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>> >>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>> >>>>>> >>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>outputs a new >>>>> >>>>> >>>>> >>>>> >>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>To clarify some stuff - >>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>for BLAST >>>>> >>>>> >>>>> >>>>> >>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>format and >>>>> >>>>> >>>>> >>>>> >>>>>>>it is what most people expect to be able to scroll through >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>and parse >>>>> >>>>> >>>>> >>>>> >>>>>>>-- it is also harder for the format to change dramatically >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>if you have >>>>> >>>>> >>>>> >>>>> >>>>>>>a static binary on your machine =). I think for >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>remoteblast the XML >>>>> >>>>> >>>>> >>>>> >>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>maintain support of any plain text BLAST report format that >>>>>>>people use on a regular basis. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Does XML lack some specific info that text output has? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Didn't know that. >>>>>I >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>not break, but I agree with you about text output. I also agree >>>>>>that it will need somebody to maintain it constantly, much like >>>>>>RemoteBlast. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Chris Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>(1.5.1) or >>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>> >>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>bioperl 1.5.1, >>>>> >>>>> >>>>> >>>>> >>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>realize it hasn't >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>been committed yet and may change. >>>>>>>>> >>>>>>>>>Christopher Fields >>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>Of Hubert >>>>> >>>>> >>>>> >>>>> >>>>>>>>>>Prielinger >>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>parsing Blast >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>output >>>>>>>>>> >>>>>>>>>>Hi, >>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>> >>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>STACK toplevel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>is that a bug...... >>>>>>>>>> >>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>get anything..... >>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>> >>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>parsing Blast >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>bioperl version >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>I had installed >>>>>>>>>> >>>>>>>>>>thanks in advance >>>>>>>>>> >>>>>>>>>>Hubert >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>Bioperl-l mailing list >>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>-- >>>>>>>Jason Stajich >>>>>>>Duke University >>>>>>>http://www.duke.edu/~jes12 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Christopher Fields >>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>Dept. of Biochemistry >>>>>>University of Illinois Urbana-Champaign >>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>Christopher Fields >>Postdoctoral Researcher >>Lab of Dr. Robert Switzer >>Dept of Biochemistry >>University of Illinois Urbana-Champaign >> >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > > >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From andrej.kastrin at guest.arnes.si Fri Feb 10 09:28:19 2006 From: andrej.kastrin at guest.arnes.si (Andrej Kastrin) Date: Fri, 10 Feb 2006 15:28:19 +0100 Subject: [Bioperl-l] Medline to XML Message-ID: <43ECA303.8090904@guest.arnes.si> Dear users, my problem is not directly related to this list, by I hope, you can help me. Is there any tool to convert standard Medline record to XML format. I know there is build in function (med2xml) in Pubmed, but I'm looking for some independent perl script. Thanks in advance for any suggesions or pointers. Cheers, Andrej From cjfields at uiuc.edu Fri Feb 10 12:01:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 11:01:27 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <001801c62e63$a4a71090$15327e82@pyrimidine> I don't think there's anything like this in Bioperl, and I'm unfamilar with the naming scheme you're using. If you're searching for specific miRNA's, a good resource looks like the miRNA database, which seems to be updated regularly (http://microrna.sanger.ac.uk/sequences/) and uses the same system for RNA annotation that you use (which, I'm guessing, is a standardized annotation scheme of some sort). I believe the database is downloadable and searchable by name, so you could probably build a querying scheme using LWP or HTTP::Request (if the web interface allows for this). I know that Sean Eddy's Rfam database (http://www.sanger.ac.uk/Software/Rfam/) also has information on miRNA's, but it's somewhat limited. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Wednesday, February 08, 2006 3:45 PM > To: 'bioperl-l'; bioperl-l-bounces at lists.open-bio.org > Cc: James.R.Brown at gsk.com > Subject: Re: [Bioperl-l] Handling miRNA's > > Hi Chris-- > > The problem I am solving is given a mature miRna > name, how do I use it to search for its pre/pri miRna and > vice versa. For example, how to go from mir-102a* to > hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm > hoping that someone else has already done it and has some > bells and whistles to go with it. Below is a hierarchy chart > of a data structure to hold the naming information. The > parsing is not trivial and given data in that structure there > could be all kinds of neat functions that return various > aspects of the names. > > Barry > > > > > > > > > > > > > "Chris Fields" > Sent by: bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 17:40 > > To > barry.m.dancis at gsk.com, "'bioperl-l'" cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Are you talking about sequences or text output from a > specific program? If you are talking about sequences in a > particular format, then listen to Brian. If you are talking > about output, then we need to know which program you're > using, as a parser may exist or could be built. > > There are a few modules in Bio::Tools that handle RNA (like > QRNA, tRNAscan-SE), so check those out first. I'm currently > finishing up a Bio::Tools module for RNAMotif and have plans > for making an ERPIN parser. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > barry.m.dancis at gsk.com > > Sent: Tuesday, February 07, 2006 2:26 PM > > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > > Subject: Re: [Bioperl-l] Handling miRNA's > > > > It's the parser in particular that I need > > > > > > > > > > "Brian Osborne" Sent by: > > bioperl-l-bounces at lists.open-bio.org > > 07-Feb-2006 12:05 > > > > To > > barry.m.dancis at gsk.com, "bioperl-l" , > > bioperl-l-bounces at lists.open-bio.org > > cc > > > > Subject > > Re: [Bioperl-l] Handling miRNA's > > > > > > > > > > > > > > Barry, > > > > If the sequence information is in one of the formats that Bioperl > > understands (Genbank, Swissprot flat, and so on) then the answer is > > yes. > > This assumes that the details on sequence that you > mentioned are found > > in some sequence feature section in the file. But it looks > to me like > > there's no specialized parser for miRNA sequence per se, I'll be > > corrected if I'm wrong. > > > > Brian O. > > > > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > > > wrote: > > > > > Hi -- > > > > > > Are there any classes for manipulating miRNA's with > > functions > > such > > > as parsing the name, storing and interlinking pri/pre/mat > sequences, > > etc? > > > > > > Thanks, > > > > > > Barry > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From allenday at ucla.edu Fri Feb 10 11:13:39 2006 From: allenday at ucla.edu (Allen Day) Date: Fri, 10 Feb 2006 08:13:39 -0800 (PST) Subject: [Bioperl-l] Medline to XML In-Reply-To: <43ECA303.8090904@guest.arnes.si> References: <43ECA303.8090904@guest.arnes.si> Message-ID: why not just retrieve xml directly from the eutils service? -allen On Fri, 10 Feb 2006, Andrej Kastrin wrote: > Dear users, > > my problem is not directly related to this list, by I hope, you can help > me. Is there any tool to convert standard Medline record to XML format. > I know there is build in function (med2xml) in Pubmed, but I'm looking > for some independent perl script. > > Thanks in advance for any suggesions or pointers. > > Cheers, Andrej > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Feb 10 12:15:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Feb 2006 12:15:17 -0500 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Message-ID: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Paul - The reason for suggesting a change has to do with the instability of the CGI interface/format of the returned data, the text format is not a stable format from the webserver which reportedly will cease to be reliably parsed. Yes we can keep hacking the blast parser code to handle this, but the bioperl release cycle is certainly not tied to the NCBI blast release cycle so I find it unsatisfying to know that we are going to have broken code when they change the output formats (but not know when). Mostly I think we need to try and support something that will "ALWAYS" work so that individuals setting up webservices which rely on remote blast functionality. In theory, netblast/blastcl3 should always work since NCBI has to update the exe when they change their server setup. In terms of the web-based queues - I think the best change we can make is have the XML be the preferred retrieval method. I also see value in providing a wrapper for netblast since it should look an awful lot like running blast locally. Ideally I'd like to see a more extensible system, something like (and please feel free to come up with better names for the modules!): Bio::Tools::Run::Blast --> StandAlone (support for both WU-BLAST and NCBI- BLAST local binaries and MPI-BLAST too if simple) --> RemoteNCBI (currently the RemoteBlast server) --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) --> RemoteNetBlast (blastcl3 or netblast local executable) (other things that people want) [note: If these ideas are appealing or not, someone should archive the discussions and discussions on the wiki page so we can rely less on people searching the mailing archives for how a decision was made. Perhaps Roger can do this sort of editing in addition to the planning for support of this module]. -jason On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote: > Hi Roger, > > I would definitely prefer a fully Perl-based implementation. For > starters, I have not > been successful in compiling the Toolkit that contains netblast for > some platforms (e.g. > AIX 5.2 w/gcc 4.0). > > I haven't been following the discussion: is there some compelling > reason to prefer a > netblast-based system that's come up recently? I'm guessing that > adding a new non-perl > dependency would only be done if there was considerable > justification for this type of > change, but I'm not clear from your message what that justification > is. > > Paul > > > > ------------------------------ > > Message: 12 > Date: Mon, 6 Feb 2006 20:46:44 -0600 > From: "Roger Hall" > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > please reply > To: > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > Content-Type: text/plain; charset="us-ascii" > > To everyone who uses RemoteBlast.pm: > > Would anyone object to RemoteBlast being rewritten in a way that > requires > NCBI's blastcl3 executable? > > Binary downloads of blastcl3 (column "netblast") are available for > numerous > platforms at: http://ncbi.nih.gov/BLAST/download.shtml > > Does anyone require or desire a "pure perl" implementation? If so, > please > explain the advantage you see with such an implementation. > > Thanks! > > > Roger Hall > > Technical Director > > MidSouth Bioinformatics Center > > University of Arkansas at Little Rock > > (501) 569-8074 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hubert.prielinger at gmx.at Fri Feb 10 11:26:47 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 10 Feb 2006 10:26:47 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EC606A.20003@esat.kuleuven.be> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> <43EC5497.3050505@esat.kuleuven.be> <43EC606A.20003@esat.kuleuven.be> Message-ID: <43ECBEC7.7040506@gmx.at> Hi, I'm sorry for disturbing once more. Yesterday the script was working, today it isn't working at all, but I didn't change anything, I get the following error message: ------------- EXCEPTION ------------- MSG: Could not open comp80swiss2114.txt: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 STACK toplevel ./Blast.pl:14 -------------------------------------- the file exists and the bug I have fixed yesterday thanks for help Hubert Pieter Monsieurs wrote: > Sorry for disturbing. I now works correctly with the bug fix of Chris. > Thanx, > Pieter > > Pieter Monsieurs wrote: > >>Hi Chris, >> >>The parsing of the Blast output still doesn't work for me with the bug >>fix download of blast.pm. >>The module keeps turning around in the while loop at line 487 looking >>for a database or query-size: >> >>while( defined ($_) ) { >> if( /^Database:/ ) { >> $self->_pushback($_); >> last; >> } >> chomp; >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { >> $size = $1; >> $size =~ s/,//g; >> last; >> } else { >> $q .= " $_"; >> $q =~ s/ +/ /g; >> $q =~ s/^ | $//g; >> } >> $_ = $self->_readline; >>} >> >> >>The code keeps looking for the database information, however - as you >>mentioned - this information is given before the query line in the new >>Blast output format. >>This way, all hits and hsps are stored in the query_description >>($hit->query_description), no hits are found and query_length is 0. >>Because you already adapted the module to retrieve database information >>at another position in the module, deleting the while loop and adding >>the following lines after $_ = $self->_readline (line 486), worked fine >>for me (using blastn and blastp): >> >>if (/Length=([\d,]+)/) { >> $size = $1; >> $size =~ s/,//g; >>} >> >> >>Regards, >>Pieter >> >> >> >>Chris Fields wrote: >> >> >> >>>From 'perldoc Bio::SearchIO::blast': >>> >>>DESCRIPTION >>> This object encapsulated the necessary methods for generating >>>events >>> suitable for building Bio::Search objects from a BLAST report >>>file. >>> Read the Bio::SearchIO for more information about how to use >>>this. >>> >>> This driver can parse: >>> >>> o NCBI produced plain text BLAST reports from blastall, >>>this also >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>>reports. NCBI >>> XML BLAST output is parsed with the blastxml SearchIO driver >>> >>> o WU-BLAST all reports >>> >>> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>>BLAT) >>> >>> o BLAST-like output from Paracel BTK output >>> >>>So, it should. Let us know if it doesn't. >>> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >>> >>> >>> >>> >>> >>>>Hi Chris, >>>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>>right, I had only to change the blast.pm file, it is working very >>>>fine, thank you very much, and you are right, you have mentioned it >>>>ealier either to change the file... ;) >>>> >>>>but I have another question: does it work with the WU-Blast output >>>>too? >>>>regards >>>>Hubert >>>> >>>> >>>>Chris Fields wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Ha! I come back from meeting and there's a billion emails! What >>>>>have we >>>>>started? ;p . Sorry about this Jason; I know you're busy. >>>>> >>>>>Hubert, if you're out there, I sent you an email with an >>>>>attachment. You >>>>>said the output looks like what you were expecting. So I think we >>>>>have two >>>>>problems: >>>>> >>>>>1) I haven't delved into the file scanning, but the fact that it >>>>>takes so >>>>>long should tell you something's seriously wrong there. Strip >>>>>that part out >>>>>and start with a simple script, say, like the one Jason or that I >>>>>sent you; >>>>>the script I used to generate that output works fine (on two OS's, >>>>>WinXP and >>>>>Mac OS X). Use it on one file at a time. Do everything on >>>>>command line >>>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>>scripts, esp. when they run debugging. >>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>>will still >>>>>not work whenever the text blast output has the following header, >>>>>which >>>>>comes from the new web version of BLAST: >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>>> >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>>Length=193 >>>>>....... >>>>>----------------------------------------------------- >>>>> >>>>>It will work if the text output has the following header (or is an >>>>>older >>>>>version of BLAST): >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search >>>>>programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>> (193 letters) >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 2,895,325 sequences; 997,103,285 total letters >>>>>----------------------------------------------------- >>>>>You have the former (2.2.13) version. I know b/c I have your >>>>>BLAST files. >>>>>Therefore, even bioperl-1.5.1 will not work! >>>>> >>>>>If you want the really gory details on why this is a problem, look >>>>>here: >>>>> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>>So, any text output with the above header will not work; it will >>>>>either hang >>>>>or end abruptly (depending on OS, perl version, memory, >>>>>patience). If you >>>>>look in the above, I have added a preliminary fix for this. I'll >>>>>reiterate >>>>>for the billionth time, it hasn't been committed yet, so don't >>>>>kill me if >>>>>blows your computer up ;> >>>>>Here's the direct link: >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>>it's version >>>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>>sorry >>>>>Jason). From what you've been posting it doesn't sound like >>>>>you've tried >>>>>this, and I believe I've suggested this fix before. >>>>> >>>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>>prev. >>>>>message) with this file. Make sure the filename stays the same >>>>>(blast.pm). >>>>> >>>>>Run everything again, one file at a time. Make sure you use >>>>>Jason's script >>>>>as well as the one I sent you. Do NOT rely on running through >>>>>multiple >>>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>>file checks. >>>>> >>>>> >>>>>Here's a small chunk of output from one of your blast files using the >>>>>modifed script I sent you: >>>>> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>>Query: 1 RWKWKRKK 8 >>>>>Seq: 542 RWAWRRKK 549 >>>>> >>>>>Look familiar? >>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>>February 09, 2006 3:24 PM >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>In other words, yes, I'm on the wrong trail. :} >>>>>> >>>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>>that Chris already solved the issue). ;} >>>>>> >>>>>>Thanks! >>>>>> >>>>>>Roger >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>>Prielinger >>>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>>Stajich >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>dear roger, >>>>>>this error message I got, when I tried to parse Blast output >>>>>>(version >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>>don't get any error message.....it just doesn't work >>>>>> >>>>>>Hubert >>>>>> >>>>>> >>>>>> >>>>>>Roger Hall wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Guys - I'm looking at the error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>This is my line of thought: >>>>>>>1. "no data for midline $_" is a unique message generated by >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>blast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>in >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>one >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>location only at the point of a. reading three lines b. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>dropping lines >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>with spaces only c. identifying the Query, Midline, and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Match lines (0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>><= $i < >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>3) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>2. There is a regexp match that fails in order to reach that >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>error message >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>expression >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>4. It does anyway >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>in the blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>reports >>>>>>> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>the string >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>additional >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>file, assuming that I didn't have it. >>>>>>> >>>>>>>My next thought is to write a quick script to test perl behavior >>>>>>>on "Fedora Core 9". >>>>>>> >>>>>>>Thoughts? >>>>>>> >>>>>>>Did I misread the issue entirely? :} >>>>>>> >>>>>>>Roger >>>>>>> >>>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Chris Fields >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>>Cc: bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-----Original Message----- >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>>To: Hubert Prielinger >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>parsing Blast output >>>>>>>> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hi chris, >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>working, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>have to parse >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>a lot of textfiles.... >>>>>>>>>or shall I look for another option to parse those files... >>>>>>>>> >>>>>>>>>regards >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>can't really >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>determine the problem. >>>>>>>> >>>>>>>>If you are still getting the same error like this I am not >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>convinced >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>that NCBI >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>changed the HSP result format to remove the ':' from the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>Query/Sbjct >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>>September. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>>If you are just getting no results but also no warnings wrt >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>parsing, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>are you sure your logic is correct? >>>>>>>> >>>>>>>>If you remove your filters do you see all the HSPS? >>>>>>>> >>>>>>>> >>>>>>>>while (my $result = $search->next_result) { >>>>>>>> print $result->query_name, "\n"; >>>>>>>> #iterate over each hit on the query sequence >>>>>>>> while (my $hit = $result->next_hit) { >>>>>>>> print $hit->name, "\n"; >>>>>>>> #iterate over each HSP in the hit >>>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hit_string, "\n"; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>and me with a >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>similar script to the above. I removed the file parsing logic >>>>>>>and it >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>seemed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>to work just fine. It may very well be a logic issue or >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>that he hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>installed the latest fix. >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>2.2.13), >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>even though the returned output was from nr, the top of the >>>>>>>blast output showed that it was v2.2.12: >>>>>>> >>>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>>> >>>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>>------------------------------------- >>>>>>>C:\Perl\Scripts>blastcl3 - >>>>>>> >>>>>>>blastcl3 2.2.13 arguments:... >>>>>>>------------------------------------- >>>>>>> >>>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>>the header looks like this: >>>>>>> >>>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>>> >>>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>>> >>>>>>> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>outputs a new >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>To clarify some stuff - >>>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>for BLAST >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>format and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>it is what most people expect to be able to scroll through >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>and parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>-- it is also harder for the format to change dramatically >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>if you have >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>a static binary on your machine =). I think for >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>remoteblast the XML >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>>maintain support of any plain text BLAST report format that >>>>>>>>people use on a regular basis. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Does XML lack some specific info that text output has? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Didn't know that. >>>>>>I >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>>not break, but I agree with you about text output. I also agree >>>>>>>that it will need somebody to maintain it constantly, much like >>>>>>>RemoteBlast. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Chris Fields wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>>(1.5.1) or >>>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>bioperl 1.5.1, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>realize it hasn't >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>been committed yet and may change. >>>>>>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>Of Hubert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>>Prielinger >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>output >>>>>>>>>>> >>>>>>>>>>>Hi, >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>is that a bug...... >>>>>>>>>>> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>>get anything..... >>>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>>> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>bioperl version >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>I had installed >>>>>>>>>>> >>>>>>>>>>>thanks in advance >>>>>>>>>>> >>>>>>>>>>>Hubert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>-- >>>>>>>>Jason Stajich >>>>>>>>Duke University >>>>>>>>http://www.duke.edu/~jes12 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Christopher Fields >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more > information. > From cjfields at uiuc.edu Fri Feb 10 12:45:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 11:45:32 -0600 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Message-ID: <002201c62e69$ca8363d0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Friday, February 10, 2006 11:15 AM > To: Paul.Boutros at utoronto.ca > Cc: BioPerl Mailing List > Subject: [Bioperl-l] Remote BLAST support discussion > > Paul - > > The reason for suggesting a change has to do with the > instability of the CGI interface/format of the returned data, > the text format is not a stable format from the webserver > which reportedly will cease to be reliably parsed. Yes we > can keep hacking the blast parser code to handle this, but > the bioperl release cycle is certainly not tied to the NCBI > blast release cycle so I find it unsatisfying to know that we > are going to have broken code when they change the output > formats (but not know when). > > Mostly I think we need to try and support something that will > "ALWAYS" work so that individuals setting up webservices > which rely on remote blast functionality. In theory, > netblast/blastcl3 should always work since NCBI has to update > the exe when they change their server setup. > > In terms of the web-based queues - I think the best change we > can make is have the XML be the preferred retrieval method. > > I also see value in providing a wrapper for netblast since it > should look an awful lot like running blast locally. > > Ideally I'd like to see a more extensible system, something > like (and please feel free to come up with better names for > the modules!): > > Bio::Tools::Run::Blast > --> StandAlone (support for both WU-BLAST and NCBI-> BLAST local binaries and MPI-BLAST too if simple) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Sounds good to me. I think any wrapper for netblast could most easily be based on StandAloneBlast; the parameters look pretty much identical, though it'll probably need a little configuring as a quick text search through StandAloneBlast didn't show any 'xml' tags. Roger seemed to agree on this. > [note: If these ideas are appealing or not, someone should > archive the discussions and discussions on the wiki page so > we can rely less on people searching the mailing archives for > how a decision was made. Perhaps Roger can do this sort of > editing in addition to the planning for support of this module]. > > -jason > > On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote: > > > Hi Roger, > > > > I would definitely prefer a fully Perl-based implementation. For > > starters, I have not been successful in compiling the Toolkit that > > contains netblast for some platforms (e.g. > > AIX 5.2 w/gcc 4.0). > > > > I haven't been following the discussion: is there some compelling > > reason to prefer a netblast-based system that's come up > recently? I'm > > guessing that adding a new non-perl dependency would only > be done if > > there was considerable justification for this type of > change, but I'm > > not clear from your message what that justification is. > > > > Paul > > > > > > > > ------------------------------ > > > > Message: 12 > > Date: Mon, 6 Feb 2006 20:46:44 -0600 > > From: "Roger Hall" > > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > > please reply > > To: > > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > > Content-Type: text/plain; charset="us-ascii" > > > > To everyone who uses RemoteBlast.pm: > > > > Would anyone object to RemoteBlast being rewritten in a way that > > requires NCBI's blastcl3 executable? > > > > Binary downloads of blastcl3 (column "netblast") are available for > > numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml > > > > Does anyone require or desire a "pure perl" implementation? If so, > > please explain the advantage you see with such an implementation. > > > > Thanks! > > > > > > Roger Hall > > > > Technical Director > > > > MidSouth Bioinformatics Center > > > > University of Arkansas at Little Rock > > > > (501) 569-8074 > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rahall2 at ualr.edu Fri Feb 10 12:54:23 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 10 Feb 2006 11:54:23 -0600 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <002201c62e69$ca8363d0$15327e82@pyrimidine> Message-ID: <002501c62e6b$0686be30$d416a790@LIBERAL> It seems so obvious now. :} The only issue I see is likely obvious to those of you who have maintained this over the years - no backward compatibility, but I can live with that if yall can. I will document on wikki as suggested and then build the RemoteNCBI module described. After that is tested and committed, I will contact Torsten to see if I can help with the rest. Thanks! Roger > > Bio::Tools::Run::Blast > --> StandAlone (support for both WU-BLAST and NCBI-> BLAST local binaries and MPI-BLAST too if simple) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Sounds good to me. I think any wrapper for netblast could most easily be based on StandAloneBlast; the parameters look pretty much identical, though it'll probably need a little configuring as a quick text search through StandAloneBlast didn't show any 'xml' tags. Roger seemed to agree on this. From rahall2 at ualr.edu Fri Feb 10 13:00:51 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 10 Feb 2006 12:00:51 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43ECBEC7.7040506@gmx.at> Message-ID: <002701c62e6b$edd845b0$d416a790@LIBERAL> Hubert, I got the same message when I first ran your script. The issue for me was that "readdir(DIR)" doesn't return the full path, only the file name. I edited your script to include: $file = $directory . '/' . $file; just before the Bio::SearchIO call. Roger -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger Sent: Friday, February 10, 2006 10:27 AM To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; rahall2 at ualr.edu Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output Hi, I'm sorry for disturbing once more. Yesterday the script was working, today it isn't working at all, but I didn't change anything, I get the following error message: ------------- EXCEPTION ------------- MSG: Could not open comp80swiss2114.txt: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 STACK toplevel ./Blast.pl:14 -------------------------------------- the file exists and the bug I have fixed yesterday thanks for help Hubert Pieter Monsieurs wrote: > Sorry for disturbing. I now works correctly with the bug fix of Chris. > Thanx, > Pieter > > Pieter Monsieurs wrote: > >>Hi Chris, >> >>The parsing of the Blast output still doesn't work for me with the bug >>fix download of blast.pm. >>The module keeps turning around in the while loop at line 487 looking >>for a database or query-size: >> >>while( defined ($_) ) { >> if( /^Database:/ ) { >> $self->_pushback($_); >> last; >> } >> chomp; >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { >> $size = $1; >> $size =~ s/,//g; >> last; >> } else { >> $q .= " $_"; >> $q =~ s/ +/ /g; >> $q =~ s/^ | $//g; >> } >> $_ = $self->_readline; >>} >> >> >>The code keeps looking for the database information, however - as you >>mentioned - this information is given before the query line in the new >>Blast output format. >>This way, all hits and hsps are stored in the query_description >>($hit->query_description), no hits are found and query_length is 0. >>Because you already adapted the module to retrieve database information >>at another position in the module, deleting the while loop and adding >>the following lines after $_ = $self->_readline (line 486), worked fine >>for me (using blastn and blastp): >> >>if (/Length=([\d,]+)/) { >> $size = $1; >> $size =~ s/,//g; >>} >> >> >>Regards, >>Pieter >> >> >> >>Chris Fields wrote: >> >> >> >>>From 'perldoc Bio::SearchIO::blast': >>> >>>DESCRIPTION >>> This object encapsulated the necessary methods for generating >>>events >>> suitable for building Bio::Search objects from a BLAST report >>>file. >>> Read the Bio::SearchIO for more information about how to use >>>this. >>> >>> This driver can parse: >>> >>> o NCBI produced plain text BLAST reports from blastall, >>>this also >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>>reports. NCBI >>> XML BLAST output is parsed with the blastxml SearchIO driver >>> >>> o WU-BLAST all reports >>> >>> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>>BLAT) >>> >>> o BLAST-like output from Paracel BTK output >>> >>>So, it should. Let us know if it doesn't. >>> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >>> >>> >>> >>> >>> >>>>Hi Chris, >>>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>>right, I had only to change the blast.pm file, it is working very >>>>fine, thank you very much, and you are right, you have mentioned it >>>>ealier either to change the file... ;) >>>> >>>>but I have another question: does it work with the WU-Blast output >>>>too? >>>>regards >>>>Hubert >>>> >>>> >>>>Chris Fields wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Ha! I come back from meeting and there's a billion emails! What >>>>>have we >>>>>started? ;p . Sorry about this Jason; I know you're busy. >>>>> >>>>>Hubert, if you're out there, I sent you an email with an >>>>>attachment. You >>>>>said the output looks like what you were expecting. So I think we >>>>>have two >>>>>problems: >>>>> >>>>>1) I haven't delved into the file scanning, but the fact that it >>>>>takes so >>>>>long should tell you something's seriously wrong there. Strip >>>>>that part out >>>>>and start with a simple script, say, like the one Jason or that I >>>>>sent you; >>>>>the script I used to generate that output works fine (on two OS's, >>>>>WinXP and >>>>>Mac OS X). Use it on one file at a time. Do everything on >>>>>command line >>>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>>scripts, esp. when they run debugging. >>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>>will still >>>>>not work whenever the text blast output has the following header, >>>>>which >>>>>comes from the new web version of BLAST: >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>>> >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>>Length=193 >>>>>....... >>>>>----------------------------------------------------- >>>>> >>>>>It will work if the text output has the following header (or is an >>>>>older >>>>>version of BLAST): >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search >>>>>programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>> (193 letters) >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 2,895,325 sequences; 997,103,285 total letters >>>>>----------------------------------------------------- >>>>>You have the former (2.2.13) version. I know b/c I have your >>>>>BLAST files. >>>>>Therefore, even bioperl-1.5.1 will not work! >>>>> >>>>>If you want the really gory details on why this is a problem, look >>>>>here: >>>>> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>>So, any text output with the above header will not work; it will >>>>>either hang >>>>>or end abruptly (depending on OS, perl version, memory, >>>>>patience). If you >>>>>look in the above, I have added a preliminary fix for this. I'll >>>>>reiterate >>>>>for the billionth time, it hasn't been committed yet, so don't >>>>>kill me if >>>>>blows your computer up ;> >>>>>Here's the direct link: >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>>it's version >>>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>>sorry >>>>>Jason). From what you've been posting it doesn't sound like >>>>>you've tried >>>>>this, and I believe I've suggested this fix before. >>>>> >>>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>>prev. >>>>>message) with this file. Make sure the filename stays the same >>>>>(blast.pm). >>>>> >>>>>Run everything again, one file at a time. Make sure you use >>>>>Jason's script >>>>>as well as the one I sent you. Do NOT rely on running through >>>>>multiple >>>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>>file checks. >>>>> >>>>> >>>>>Here's a small chunk of output from one of your blast files using the >>>>>modifed script I sent you: >>>>> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>>Query: 1 RWKWKRKK 8 >>>>>Seq: 542 RWAWRRKK 549 >>>>> >>>>>Look familiar? >>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>>February 09, 2006 3:24 PM >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>In other words, yes, I'm on the wrong trail. :} >>>>>> >>>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>>that Chris already solved the issue). ;} >>>>>> >>>>>>Thanks! >>>>>> >>>>>>Roger >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>>Prielinger >>>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>>Stajich >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>dear roger, >>>>>>this error message I got, when I tried to parse Blast output >>>>>>(version >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>>don't get any error message.....it just doesn't work >>>>>> >>>>>>Hubert >>>>>> >>>>>> >>>>>> >>>>>>Roger Hall wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Guys - I'm looking at the error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>This is my line of thought: >>>>>>>1. "no data for midline $_" is a unique message generated by >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>blast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>in >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>one >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>location only at the point of a. reading three lines b. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>dropping lines >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>with spaces only c. identifying the Query, Midline, and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Match lines (0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>><= $i < >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>3) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>2. There is a regexp match that fails in order to reach that >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>error message >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>expression >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>4. It does anyway >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>in the blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>reports >>>>>>> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>the string >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>additional >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>file, assuming that I didn't have it. >>>>>>> >>>>>>>My next thought is to write a quick script to test perl behavior >>>>>>>on "Fedora Core 9". >>>>>>> >>>>>>>Thoughts? >>>>>>> >>>>>>>Did I misread the issue entirely? :} >>>>>>> >>>>>>>Roger >>>>>>> >>>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Chris Fields >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>>Cc: bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-----Original Message----- >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>>To: Hubert Prielinger >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>parsing Blast output >>>>>>>> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hi chris, >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>working, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>have to parse >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>a lot of textfiles.... >>>>>>>>>or shall I look for another option to parse those files... >>>>>>>>> >>>>>>>>>regards >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>can't really >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>determine the problem. >>>>>>>> >>>>>>>>If you are still getting the same error like this I am not >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>convinced >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>that NCBI >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>changed the HSP result format to remove the ':' from the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>Query/Sbjct >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>>September. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>>If you are just getting no results but also no warnings wrt >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>parsing, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>are you sure your logic is correct? >>>>>>>> >>>>>>>>If you remove your filters do you see all the HSPS? >>>>>>>> >>>>>>>> >>>>>>>>while (my $result = $search->next_result) { >>>>>>>> print $result->query_name, "\n"; >>>>>>>> #iterate over each hit on the query sequence >>>>>>>> while (my $hit = $result->next_hit) { >>>>>>>> print $hit->name, "\n"; >>>>>>>> #iterate over each HSP in the hit >>>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hit_string, "\n"; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>and me with a >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>similar script to the above. I removed the file parsing logic >>>>>>>and it >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>seemed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>to work just fine. It may very well be a logic issue or >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>that he hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>installed the latest fix. >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>2.2.13), >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>even though the returned output was from nr, the top of the >>>>>>>blast output showed that it was v2.2.12: >>>>>>> >>>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>>> >>>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>>------------------------------------- >>>>>>>C:\Perl\Scripts>blastcl3 - >>>>>>> >>>>>>>blastcl3 2.2.13 arguments:... >>>>>>>------------------------------------- >>>>>>> >>>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>>the header looks like this: >>>>>>> >>>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>>> >>>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>>> >>>>>>> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>outputs a new >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>To clarify some stuff - >>>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>for BLAST >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>format and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>it is what most people expect to be able to scroll through >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>and parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>-- it is also harder for the format to change dramatically >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>if you have >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>a static binary on your machine =). I think for >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>remoteblast the XML >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>>maintain support of any plain text BLAST report format that >>>>>>>>people use on a regular basis. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Does XML lack some specific info that text output has? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Didn't know that. >>>>>>I >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>>not break, but I agree with you about text output. I also agree >>>>>>>that it will need somebody to maintain it constantly, much like >>>>>>>RemoteBlast. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Chris Fields wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>>(1.5.1) or >>>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>bioperl 1.5.1, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>realize it hasn't >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>been committed yet and may change. >>>>>>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>Of Hubert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>>Prielinger >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>output >>>>>>>>>>> >>>>>>>>>>>Hi, >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>is that a bug...... >>>>>>>>>>> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>>get anything..... >>>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>>> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>bioperl version >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>I had installed >>>>>>>>>>> >>>>>>>>>>>thanks in advance >>>>>>>>>>> >>>>>>>>>>>Hubert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>-- >>>>>>>>Jason Stajich >>>>>>>>Duke University >>>>>>>>http://www.duke.edu/~jes12 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Christopher Fields >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more > information. > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 10 13:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 12:08:37 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <002701c62e6b$edd845b0$d416a790@LIBERAL> Message-ID: <002501c62e6d$04158530$15327e82@pyrimidine> Makes sense. I didn't see this since I passed the files directly from command-line. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Roger Hall [mailto:rahall2 at ualr.edu] > Sent: Friday, February 10, 2006 12:01 PM > To: 'Hubert Prielinger'; 'Pieter Monsieurs'; > bioperl-l at bioperl.org; 'Chris Fields' > Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing blast output > > Hubert, > > I got the same message when I first ran your script. The > issue for me was that "readdir(DIR)" doesn't return the full > path, only the file name. > > I edited your script to include: > > $file = $directory . '/' . $file; > > just before the Bio::SearchIO call. > > Roger > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Hubert Prielinger > Sent: Friday, February 10, 2006 10:27 AM > To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; > rahall2 at ualr.edu > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing blast output > > Hi, > I'm sorry for disturbing once more. Yesterday the script was > working, today it isn't working at all, but I didn't change > anything, I get the following error message: > > ------------- EXCEPTION ------------- > MSG: Could not open comp80swiss2114.txt: No such file or > directory STACK Bio::Root::IO::_initialize_io > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 > STACK Bio::Root::IO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 > STACK Bio::SearchIO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 > STACK Bio::SearchIO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 > STACK toplevel ./Blast.pl:14 > > -------------------------------------- > > the file exists and the bug I have fixed yesterday thanks for help > > Hubert > > > > > Pieter Monsieurs wrote: > > > Sorry for disturbing. I now works correctly with the bug > fix of Chris. > > Thanx, > > Pieter > > > > Pieter Monsieurs wrote: > > > >>Hi Chris, > >> > >>The parsing of the Blast output still doesn't work for me > with the bug > >>fix download of blast.pm. > >>The module keeps turning around in the while loop at line > 487 looking > >>for a database or query-size: > >> > >>while( defined ($_) ) { > >> if( /^Database:/ ) { > >> $self->_pushback($_); > >> last; > >> } > >> chomp; > >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { > >> $size = $1; > >> $size =~ s/,//g; > >> last; > >> } else { > >> $q .= " $_"; > >> $q =~ s/ +/ /g; > >> $q =~ s/^ | $//g; > >> } > >> $_ = $self->_readline; > >>} > >> > >> > >>The code keeps looking for the database information, > however - as you > >>mentioned - this information is given before the query line > in the new > >>Blast output format. > >>This way, all hits and hsps are stored in the query_description > >>($hit->query_description), no hits are found and query_length is 0. > >>Because you already adapted the module to retrieve database > >>information at another position in the module, deleting the > while loop > >>and adding the following lines after $_ = $self->_readline > (line 486), > >>worked fine for me (using blastn and blastp): > >> > >>if (/Length=([\d,]+)/) { > >> $size = $1; > >> $size =~ s/,//g; > >>} > >> > >> > >>Regards, > >>Pieter > >> > >> > >> > >>Chris Fields wrote: > >> > >> > >> > >>>From 'perldoc Bio::SearchIO::blast': > >>> > >>>DESCRIPTION > >>> This object encapsulated the necessary methods for > generating > >>>events > >>> suitable for building Bio::Search objects from a > BLAST report > >>>file. > >>> Read the Bio::SearchIO for more information about > how to use > >>>this. > >>> > >>> This driver can parse: > >>> > >>> o NCBI produced plain text BLAST reports from blastall, > >>>this also > >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq > >>>reports. NCBI > >>> XML BLAST output is parsed with the blastxml SearchIO > >>>driver > >>> > >>> o WU-BLAST all reports > >>> > >>> o Jim Kent's BLAST-like output from his programs > (BLASTZ, > >>>BLAT) > >>> > >>> o BLAST-like output from Paracel BTK output > >>> > >>>So, it should. Let us know if it doesn't. > >>> > >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > >>> > >>> > >>> > >>> > >>> > >>>>Hi Chris, > >>>>I'm incredibly sorry for causing so much inconvenience, > yes you are > >>>>right, I had only to change the blast.pm file, it is working very > >>>>fine, thank you very much, and you are right, you have > mentioned it > >>>>ealier either to change the file... ;) > >>>> > >>>>but I have another question: does it work with the > WU-Blast output > >>>>too? > >>>>regards > >>>>Hubert > >>>> > >>>> > >>>>Chris Fields wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>Ha! I come back from meeting and there's a billion > emails! What > >>>>>have we started? ;p . Sorry about this Jason; I know > you're busy. > >>>>> > >>>>>Hubert, if you're out there, I sent you an email with an > >>>>>attachment. You said the output looks like what you were > >>>>>expecting. So I think we have two > >>>>>problems: > >>>>> > >>>>>1) I haven't delved into the file scanning, but the > fact that it > >>>>>takes so long should tell you something's seriously > wrong there. > >>>>>Strip that part out and start with a simple script, say, > like the > >>>>>one Jason or that I sent you; the script I used to generate that > >>>>>output works fine (on two OS's, WinXP and Mac OS X). > Use it on one > >>>>>file at a time. Do everything on command line (not through > >>>>>Eclipse). IDE's can be notoriously flaky about running scripts, > >>>>>esp. when they run debugging. > >>>>>2) Even if you have bioperl-1.5.1 installed, > Bio::SearchIO::blast > >>>>>will still not work whenever the text blast output has the > >>>>>following header, which comes from the new web version of BLAST: > >>>>> > >>>>>----------------------------------------------------- > >>>>>BLASTP 2.2.13 [Nov-27-2005] > >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and > David J. > >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of > >>>>>protein database search programs", Nucleic Acids Res. > 25:3389-3402. > >>>>> > >>>>>RID: 1139501210-857-165793005128.BLASTQ1 > >>>>> > >>>>> > >>>>>Database: All non-redundant GenBank CDS > >>>>>translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>>> 3,292,813 sequences; 1,128,164,434 total > letters Query= > >>>>>NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium > >>>>>tuberculosis H37Rv]. > >>>>>Length=193 > >>>>>....... > >>>>>----------------------------------------------------- > >>>>> > >>>>>It will work if the text output has the following header > (or is an > >>>>>older version of BLAST): > >>>>> > >>>>>----------------------------------------------------- > >>>>>BLASTP 2.2.12 [Aug-07-2005] > >>>>> > >>>>> > >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of > >>>>>protein database search programs", Nucleic Acids Res. > >>>>>25:3389-3402. > >>>>> > >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR > [Mycobacterium > >>>>>tuberculosis H37Rv]. > >>>>> (193 letters) > >>>>> > >>>>>Database: All non-redundant GenBank CDS > >>>>>translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>>> 2,895,325 sequences; 997,103,285 total letters > >>>>>----------------------------------------------------- > >>>>>You have the former (2.2.13) version. I know b/c I have > your BLAST > >>>>>files. > >>>>>Therefore, even bioperl-1.5.1 will not work! > >>>>> > >>>>>If you want the really gory details on why this is a > problem, look > >>>>>here: > >>>>> > >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>> > >>>>>So, any text output with the above header will not work; it will > >>>>>either hang or end abruptly (depending on OS, perl > version, memory, > >>>>>patience). If you look in the above, I have added a preliminary > >>>>>fix for this. I'll reiterate for the billionth time, it hasn't > >>>>>been committed yet, so don't kill me if blows your > computer up ;> > >>>>>Here's the direct link: > >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view > >>>>>This is a modified version of Bio::SearchIO::blast.pm > (it says it's > >>>>>version 1.90, but it's lying, I didn't change the > version, only the > >>>>>regex; sorry Jason). From what you've been posting it doesn't > >>>>>sound like you've tried this, and I believe I've > suggested this fix > >>>>>before. > >>>>> > >>>>>Replace the one in your Bio/SearchIO directory (which looks like > >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging > from your > >>>>>prev. > >>>>>message) with this file. Make sure the filename stays the same > >>>>>(blast.pm). > >>>>> > >>>>>Run everything again, one file at a time. Make sure you use > >>>>>Jason's script as well as the one I sent you. Do NOT rely on > >>>>>running through multiple files yet. Fix one bug at a time. And > >>>>>heed Joel's words about file checks. > >>>>> > >>>>> > >>>>>Here's a small chunk of output from one of your blast > files using > >>>>>the modifed script I sent you: > >>>>> > >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 > >>>>>Query: 1 RWKWKRKK 8 > >>>>>Seq: 542 RWAWRRKK 549 > >>>>> > >>>>>Look familiar? > >>>>> > >>>>>Christopher Fields > >>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>-----Original Message----- > >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, > >>>>>>February 09, 2006 3:24 PM > >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' > >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't > work parsing > >>>>>>Blast output > >>>>>> > >>>>>>In other words, yes, I'm on the wrong trail. :} > >>>>>> > >>>>>>Sorry - I'll look at the output issue this evening (or realize > >>>>>>that Chris already solved the issue). ;} > >>>>>> > >>>>>>Thanks! > >>>>>> > >>>>>>Roger > >>>>>> > >>>>>>-----Original Message----- > >>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf > Of Hubert > >>>>>>Prielinger > >>>>>>Sent: Thursday, February 09, 2006 2:14 PM > >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris > Fields; Jason > >>>>>>Stajich > >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't > work parsing > >>>>>>Blast output > >>>>>> > >>>>>>dear roger, > >>>>>>this error message I got, when I tried to parse Blast output > >>>>>>(version > >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, > because I have > >>>>>>a lot of Blast output files with version 2.2.13 and for that I > >>>>>>don't get any error message.....it just doesn't work > >>>>>> > >>>>>>Hubert > >>>>>> > >>>>>> > >>>>>> > >>>>>>Roger Hall wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Guys - I'm looking at the error message: > >>>>>>> > >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>STACK toplevel > >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>Blast.pl:21 > >>>>>>> > >>>>>>>This is my line of thought: > >>>>>>>1. "no data for midline $_" is a unique message generated by > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>blast.pm > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>in > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>one > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>location only at the point of a. reading three lines b. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>dropping lines > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>with spaces only c. identifying the Query, Midline, and > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Match lines (0 > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>><= $i < > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>3) > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>2. There is a regexp match that fails in order to reach that > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>error message > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>expression > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>4. It does anyway > >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>in the blast > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>reports > >>>>>>> > >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>the string > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>additional > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>file, assuming that I didn't have it. > >>>>>>> > >>>>>>>My next thought is to write a quick script to test > perl behavior > >>>>>>>on "Fedora Core 9". > >>>>>>> > >>>>>>>Thoughts? > >>>>>>> > >>>>>>>Did I misread the issue entirely? :} > >>>>>>> > >>>>>>>Roger > >>>>>>> > >>>>>>> > >>>>>>>-----Original Message----- > >>>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Chris Fields > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM > >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' > >>>>>>>Cc: bioperl-l at bioperl.org > >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>parsing Blast output > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>-----Original Message----- > >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] > >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM > >>>>>>>>To: Hubert Prielinger > >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org > >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>>parsing Blast output > >>>>>>>> > >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>hi chris, > >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>working, > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>do you have any ohter idea, the problem I have is that I > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>have to parse > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>a lot of textfiles.... > >>>>>>>>>or shall I look for another option to parse those files... > >>>>>>>>> > >>>>>>>>>regards > >>>>>>>>>Hubert > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast > >>>>>>>>2.2.13 reports but unless you post your blast report we > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>can't really > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>determine the problem. > >>>>>>>> > >>>>>>>>If you are still getting the same error like this I am not > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>convinced > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>that NCBI > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>changed the HSP result format to remove the ':' from the > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>Query/Sbjct > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>prefixes. We fixed this as soon as it was apparent > sometime in > >>>>>>>>September. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>>>>>STACK toplevel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>>Blast.pl:21 > >>>>>>>> > >>>>>>>>If you are just getting no results but also no warnings wrt > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>parsing, > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>are you sure your logic is correct? > >>>>>>>> > >>>>>>>>If you remove your filters do you see all the HSPS? > >>>>>>>> > >>>>>>>> > >>>>>>>>while (my $result = $search->next_result) { > >>>>>>>> print $result->query_name, "\n"; > >>>>>>>> #iterate over each hit on the query sequence > >>>>>>>> while (my $hit = $result->next_hit) { > >>>>>>>> print $hit->name, "\n"; > >>>>>>>> #iterate over each HSP in the hit > >>>>>>>> while (my $hsp = $hit->next_hsp) { > >>>>>>>> print $hsp->evalue, " ", > $hsp->length('sbjct'), " ", $hsp- > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>hit_string, "\n"; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> } > >>>>>>>> } > >>>>>>>>} > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>I tested some of the BLAST results that Hubert sent Roger > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>and me with a > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>similar script to the above. I removed the file parsing logic > >>>>>>>and it > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>seemed > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>to work just fine. It may very well be a logic issue or > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>that he hasn't > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>installed the latest fix. > >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>2.2.13), > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>even though the returned output was from nr, the top > of the blast > >>>>>>>output showed that it was v2.2.12: > >>>>>>> > >>>>>>>BLASTP 2.2.12 [Aug-07-2005] > >>>>>>> > >>>>>>>I double-checked my local version and it's definitely v.2.2.13: > >>>>>>>------------------------------------- > >>>>>>>C:\Perl\Scripts>blastcl3 - > >>>>>>> > >>>>>>>blastcl3 2.2.13 arguments:... > >>>>>>>------------------------------------- > >>>>>>> > >>>>>>>If you use RemoteBlast using the same settings, the version in > >>>>>>>the header looks like this: > >>>>>>> > >>>>>>>BLASTP 2.2.13 [Nov-27-2005] > >>>>>>> > >>>>>>>I'm wondering if all the blast executables (blast and netblast) > >>>>>>> > >>>>>>> > >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>outputs a new > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>To clarify some stuff - > >>>>>>>>Chris I don't necessarily think the XML is best way forward > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>for BLAST > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>reports generated locally, it isn't as detailed as the Text > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>format and > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>it is what most people expect to be able to scroll through > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>and parse > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>-- it is also harder for the format to change > dramatically > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>if you have > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>a static binary on your machine =). I think for > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>remoteblast the XML > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>format should be the way forward but I expect Bioperl to > >>>>>>>>maintain support of any plain text BLAST report format that > >>>>>>>>people use on a regular basis. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>Does XML lack some specific info that text output has? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Didn't know that. > >>>>>>I > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>believe that XML should be default in RemoteBlast > since it will > >>>>>>>not break, but I agree with you about text output. I > also agree > >>>>>>>that it will need somebody to maintain it constantly, > much like > >>>>>>>RemoteBlast. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>-jason > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Chris Fields wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>My guess is you're running into text parsing problems in > >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest > developer version > >>>>>>>>>>(1.5.1) or > >>>>>>>>>>bioperl-live (CVS), then see the bug below. > >>>>>>>>>> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>> > >>>>>>>>>>I think the first problem you ran into is solved in > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>bioperl 1.5.1, > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>the last problem (more recent, not related to the > first) has > >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. > >>>>>>>>>>The fixed SearchIO::blast is available in the link > above, but > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>realize it hasn't > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>been committed yet and may change. > >>>>>>>>>> > >>>>>>>>>>Christopher Fields > >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>-----Original Message----- > >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>Of Hubert > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>>Prielinger > >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>>>>>>>>To: bioperl-l at bioperl.org > >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>parsing Blast > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>output > >>>>>>>>>>> > >>>>>>>>>>>Hi, > >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>>>>>>>>>Bio::SearchIO, I get the following error message: > >>>>>>>>>>> > >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>>>>>STACK toplevel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>>Blast.pl:21 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>is that a bug...... > >>>>>>>>>>> > >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), > I don't get > >>>>>>>>>>>anything..... > >>>>>>>>>>>I'm using bioperl 1.4 > >>>>>>>>>>> > >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>parsing Blast > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>Output (version 2.2.12), but I don't remember which > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>bioperl version > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>I had installed > >>>>>>>>>>> > >>>>>>>>>>>thanks in advance > >>>>>>>>>>> > >>>>>>>>>>>Hubert > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>_______________________________________________ > >>>>>>>>>>>Bioperl-l mailing list > >>>>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>-- > >>>>>>>>Jason Stajich > >>>>>>>>Duke University > >>>>>>>>http://www.duke.edu/~jes12 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>Christopher Fields > >>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>>>University of Illinois Urbana-Champaign > >>>>>>> > >>>>>>>_______________________________________________ > >>>>>>>Bioperl-l mailing list > >>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>_______________________________________________ > >>>>>>Bioperl-l mailing list > >>>>>>Bioperl-l at lists.open-bio.org > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>Christopher Fields > >>>Postdoctoral Researcher > >>>Lab of Dr. Robert Switzer > >>>Dept of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >>> > >> > >> > >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > > > > > Disclaimer: > http://www.kuleuven.be/cwis/email_disclaimer.htm for more > > information. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From victor.ruotti at gmail.com Fri Feb 10 15:09:16 2006 From: victor.ruotti at gmail.com (Victor) Date: Fri, 10 Feb 2006 14:09:16 -0600 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: References: Message-ID: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Hi Jason, Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to /usr/local/bin, I get the same problem. I think this might have to do with the _run internal method/sub. If you look at that subroutine, you'll see that it is using both $self->executable and $self->program_name. The test passes fine, but we might need to write a better test for this particular case. Instead of saying: my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); I think the author meant to say: my $str= Bio::Root::IO->catfile($self->program_dir,$self->program_name); I quickly used Data::Dumper on both executate and program_name and this is what I get: $VAR1 = 'blat'; $VAR1 = 'blat'; So the path is hardcoded to be /usr/local/bin/blat/blat when calling run though factory. I'd like to change the constructor a bit to deal with the params a little better and include a config file using Config::General. Also, I noticed that there is a another Blat.pm module, a parser module. Should we integrate this parser with the blat run module? Brian/Jason. Does that sound like a good idea? Victor On 2/10/06, Jason Stajich wrote: > > brian - just FYI - > > The AUTOLOAD stuff is present a great number of the run modules so this > is standard per se in that set. > > I think Victor's problem may have been the BLATDIR env variable pointing > to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor? > > tests passed for me before I did the 1.5.1 release for this module so it > basically works. It definitely needs a carekeeper as lot of these run > modules were built during the fugu group annotation project and never got > audited/re-vised after that. > > > -jason > On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: > > Victor, > > Fantastic, this is certainly a module in need, in fact there was already a > note on this in the Wiki, I'll update it: > > http://bioperl.open-bio.org/wiki/Orphan_modules > > So all I did was: > > >cd bioperl-run > >perl ?I. -w t/Blat.t > > This is the most recent bioperl-run, the live version, and all tests > passed. I'd downloaded the most recent binaries and put them in my > /usr/local/bin, already in my PATH. That's it. > > That's the saddest looking new() I've ever seen in Bioperl, a mixture of > named and unnamed parameters like that, how bizarre. The "proper" way, of > course, is to use _rearrange, and not use AUTOLOAD. > > Thanks again, > > Brian O. > > > On 2/10/06 11:02 AM, "Victor" wrote: > > Brian, > I'd be happy to do that. Can you send me a quick snap on how you got it to > work first. I'd like to see what is working first, before I start fixing > things. > > And yes I'll take a look at the Blat.t to see more on it. > > Victor > > > On 2/9/06, *Brian Osborne* wrote: > > Victor, > > Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is > working for me even though I haven't set BLATDIR. This is using the latest > blat, v. 33. > > There is a problem here though, you can see it if you read Blat.t. The > constructor does not look like your usual new(): > > my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, > > -verbose => $verbose, > "DB" => $db); > > Unfortunate - would you be willing to do more than add a useful SYNOPSIS > and > actually fix new()? There is a subtext here, we're trying to find people > who > would be willing to maintain useful modules like these, the ideal person > in > this case would be someone who'd regularly use the module. > > Brian O. > > > On 2/9/06 6:22 PM, "Victor" wrote: > > > Hi, > > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to > date > > in the lastest bioperl release? > > > > > > > > use Bio::Tools::Run::Alignment::Blat; > > my $factory = Bio::Tools::Run::Alignment::Blat->new(); > > my $seq = > > "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; > > > > my @feats = $factory->run( $seq); > > > > Here is what I get when tring to use it: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA > > /tmp/fB09bp5F76) crashed: -1 > > > > Notice that it is using "blat' twice in the path. The way that I fixed > this > > is by going to the blat.pm module and > changing the following lines: > > #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); > > my $str= Bio::Root::IO->catfile($self->program_name); > > > > Any ideas, maybe I'm missing the $ENV variable somewhere? > > I'd like to avoid making this change. Also does anyone have a known > synopsis > > of this blat module (where to set the parameters, and whether it allows > you > > to have a config file). > > I'll be happy to add a better synopsis to the module if needed. > > > > Thanks in advance, > > Victor > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From jason.stajich at duke.edu Fri Feb 10 15:36:04 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Feb 2006 15:36:04 -0500 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> References: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Message-ID: <7F520AFA-84C9-485B-A408-7A9DEFC1186E@duke.edu> On Feb 10, 2006, at 3:09 PM, Victor wrote: > Hi Jason, > Well, in my env. BLATDIR was not setup at all. When setting BLATDIR > to /usr/local/bin, I get the same problem. I think this might have > to do with the _run internal method/sub. If you look at that > subroutine, you'll see that it is using both $self->executable and > $self->program_name. The test passes fine, but we might need to > write a better test for this particular case. > > Instead of saying: > my $str= Bio::Root::IO->catfile($self->executable,$self- > >program_name); > I think the author meant to say: > my $str= Bio::Root::IO->catfile($self->program_dir,$self- > >program_name); > > I quickly used Data::Dumper on both executate and program_name and > this is what I get: > $VAR1 = 'blat'; > $VAR1 = 'blat'; > > So the path is hardcoded to be /usr/local/bin/blat/blat when > calling run though factory. > Hmm are you sure you are looking at the 1.5.1 code and/or what is in CVS? > I'd like to change the constructor a bit to deal with the params a > little better and include a config file using > Config::General. Also, I noticed that there is a another Blat.pm > module, a parser module. Should we integrate this parser with the > blat run module? > Well maybe as another parser option - I believe I added/edited it to use the PSL parser in Bio::SearchIO is that not what you see? Ick there are also some system commands in this module too which need to be removed and replaced with File::Copy or figure out how to remove them all together. > Brian/Jason. Does that sound like a good idea? But yes it needs some TLC I'm not sure I know enough about Config::General to say yes or no - but all of the run modules need some help in standardization so I would propose trying to integrate some changes into the base class (WrapperBase) that can be utilized by all the sub-classes -- if you want to use this as a model for how to do it that would be great too. thx, -j > > Victor > > > On 2/10/06, Jason Stajich wrote: > brian - > just FYI - > > The AUTOLOAD stuff is present a great number of the run modules so > this is standard per se in that set. > > I think Victor's problem may have been the BLATDIR env variable > pointing to /usr/local/bin/blat instead of /usr/local/bin - is that > the case victor? > > tests passed for me before I did the 1.5.1 release for this module > so it basically works. It definitely needs a carekeeper as lot of > these run modules were built during the fugu group annotation > project and never got audited/re-vised after that. > > > -jason > > On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: > >> Victor, >> >> Fantastic, this is certainly a module in need, in fact there was >> already a note on this in the Wiki, I'll update it: >> >> http://bioperl.open-bio.org/wiki/Orphan_modules >> >> So all I did was: >> >> >cd bioperl-run >> >perl ?I. -w t/Blat.t >> >> This is the most recent bioperl-run, the live version, and all >> tests passed. I'd downloaded the most recent binaries and put them >> in my /usr/local/bin, already in my PATH. That's it. >> >> That's the saddest looking new() I've ever seen in Bioperl, a >> mixture of named and unnamed parameters like that, how bizarre. >> The "proper" way, of course, is to use _rearrange, and not use >> AUTOLOAD. >> >> Thanks again, >> >> Brian O. >> >> >> On 2/10/06 11:02 AM, "Victor" wrote: >> >>> Brian, >>> I'd be happy to do that. Can you send me a quick snap on how you >>> got it to work first. I'd like to see what is working first, >>> before I start fixing things. >>> >>> And yes I'll take a look at the Blat.t to see more on it. >>> >>> Victor >>> >>> >>> On 2/9/06, Brian Osborne < osborne1 at optonline.net> wrote: >>>> Victor, >>>> >>>> Yes, it may be that blat is not in your path, bioperl-run/t/ >>>> Blat.t is >>>> working for me even though I haven't set BLATDIR. This is using >>>> the latest >>>> blat, v. 33. >>>> >>>> There is a problem here though, you can see it if you read >>>> Blat.t. The >>>> constructor does not look like your usual new(): >>>> >>>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, >>>> >>>> -verbose => $verbose, >>>> "DB" => $db); >>>> >>>> Unfortunate - would you be willing to do more than add a useful >>>> SYNOPSIS and >>>> actually fix new()? There is a subtext here, we're trying to >>>> find people who >>>> would be willing to maintain useful modules like these, the >>>> ideal person in >>>> this case would be someone who'd regularly use the module. >>>> >>>> Brian O. >>>> >>>> >>>> On 2/9/06 6:22 PM, "Victor" wrote: >>>> >>>> > Hi, >>>> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module >>>> is up to date >>>> > in the lastest bioperl release? >>>> > >>>> > >>>> > >>>> > use Bio::Tools::Run::Alignment::Blat; >>>> > my $factory = Bio::Tools::Run::Alignment::Blat->new(); >>>> > my $seq = >>>> > >>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAG >>>> TA"; >>>> > >>>> > my @feats = $factory->run( $seq); >>>> > >>>> > Here is what I get when tring to use it: >>>> > >>>> > ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast >>>> TGAAATAAAACTCAGTA >>>> > /tmp/fB09bp5F76) crashed: -1 >>>> > >>>> > Notice that it is using "blat' twice in the path. The way that >>>> I fixed this >>>> > is by going to the blat.pm module and >>>> changing the following lines: >>>> > #my $str= Bio::Root::IO->catfile($self->executable,$self- >>>> >program_name); >>>> > my $str= Bio::Root::IO->catfile($self->program_name); >>>> > >>>> > Any ideas, maybe I'm missing the $ENV variable somewhere? >>>> > I'd like to avoid making this change. Also does anyone have a >>>> known synopsis >>>> > of this blat module (where to set the parameters, and whether >>>> it allows you >>>> > to have a config file). >>>> > I'll be happy to add a better synopsis to the module if needed. >>>> > >>>> > Thanks in advance, >>>> > Victor >>>> > >>>> > _______________________________________________ >>>> > Bioperl-l mailing list >>>> > Bioperl-l at lists.open-bio.org >>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> lists.open-bio.org/mailman/listinfo/bioperl-l> >>>> >>>> >>> >>> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hlapp at gmx.net Fri Feb 10 16:39:39 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Feb 2006 13:39:39 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <000001c62e60$9acecca0$c2987ca5@pc13> References: <000001c62e60$9acecca0$c2987ca5@pc13> Message-ID: Sohel, please allow me to copy the list in my response. There's many good and insightful people on the list who may have something to add or different ideas. I've come across that problem myself, for instance with InterPro. What I've done so far simply is to stick it unstructured into the definition slot, which is not helpful if your purpose goes further than just displaying it in an unstructured fashion. I'm not sure you would want to create another class for this (like AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the implementation, probably not the interface) annotatable (i.e., implement Bio::Annotatable), which supposedly would be simple to do (AnnotationCollection is already implemented, you'd just return an instance of it). Even though tag/value pairs sound like quick&fast way to go I'm leaning against it; in essence we're moving away from that elsewhere (SeqFeatureI) and hence I don't think we should restart it here. I'm not giving a definitive answer here, just my (initial) thoughts. Hope that helps nonetheless. Can you fancy yourself trying the Annotatable approach and let us know how it goes? -hilmar On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > Hi Hilmar, > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, > Northwestern University. I am working on a parser for an ontology > file. I really like the ontology object model which you have > contributed to Bioperl. I think its just Awesome!! One of things which > I thought would be great to capture is the ontology headers. Right now > one can specify only the name, authority information. I was wondering > if there is any way, I could also capture other ontology file headers > like version of the file, date when that ontology file was made. I was > thinking of making a header class or alternatively it could go as Hash > of values in the Bio::Ontology::Ontology class itself. I wanted to > know whets your thoughts about on this. > ? > Thanks, > Sohel Merchant > dictyBase > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From osborne1 at optonline.net Fri Feb 10 16:49:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 10 Feb 2006 16:49:18 -0500 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Message-ID: Victor, Just a note on "convention", excuse me if this is obvious. A few different greps on the modules in bioperl-run shows that executable() gets or sets the full path to the program in question, program() or program_name() gets or sets the name of the app (e.g. "blat"). program_dir() does what it sounds like. So you're right, "($self->executable,$self->program_name)", doesn't make sense. I can't speak to Config::General but I'd say that my first concern would be that the things works in the normal way, either by naming parameters or by passing an array of arguments, but not a mixture of both! Of course you're right in thinking that tying execution to parsing is a good idea, and it looks like this is done already, just glancing at t/Blat.t. Brian O. On 2/10/06 3:09 PM, "Victor" wrote: > Hi Jason, > Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to > /usr/local/bin, I get the same problem. I think this might have to do with > the _run internal method/sub. If you look at that subroutine, you'll see > that it is using both $self->executable and $self->program_name. The test > passes fine, but we might need to write a better test for this particular > case. > > Instead of saying: > my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); > I think the author meant to say: > my $str= > Bio::Root::IO->catfile($self->program_dir,$self->program_name); > > I quickly used Data::Dumper on both executate and program_name and this is > what I get: > $VAR1 = 'blat'; > $VAR1 = 'blat'; > > So the path is hardcoded to be /usr/local/bin/blat/blat when calling run > though factory. > > I'd like to change the constructor a bit to deal with the params a little > better and include a config file using > Config::General. Also, I noticed that there is a another Blat.pm module, a > parser module. Should we integrate this parser with the blat run module? > > Brian/Jason. Does that sound like a good idea? > > Victor > > > On 2/10/06, Jason Stajich wrote: >> >> brian - just FYI - >> >> The AUTOLOAD stuff is present a great number of the run modules so this >> is standard per se in that set. >> >> I think Victor's problem may have been the BLATDIR env variable pointing >> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor? >> >> tests passed for me before I did the 1.5.1 release for this module so it >> basically works. It definitely needs a carekeeper as lot of these run >> modules were built during the fugu group annotation project and never got >> audited/re-vised after that. >> >> >> -jason >> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: >> >> Victor, >> >> Fantastic, this is certainly a module in need, in fact there was already a >> note on this in the Wiki, I'll update it: >> >> http://bioperl.open-bio.org/wiki/Orphan_modules >> >> So all I did was: >> >>> cd bioperl-run >>> perl ?I. -w t/Blat.t >> >> This is the most recent bioperl-run, the live version, and all tests >> passed. I'd downloaded the most recent binaries and put them in my >> /usr/local/bin, already in my PATH. That's it. >> >> That's the saddest looking new() I've ever seen in Bioperl, a mixture of >> named and unnamed parameters like that, how bizarre. The "proper" way, of >> course, is to use _rearrange, and not use AUTOLOAD. >> >> Thanks again, >> >> Brian O. >> >> >> On 2/10/06 11:02 AM, "Victor" wrote: >> >> Brian, >> I'd be happy to do that. Can you send me a quick snap on how you got it to >> work first. I'd like to see what is working first, before I start fixing >> things. >> >> And yes I'll take a look at the Blat.t to see more on it. >> >> Victor >> >> >> On 2/9/06, *Brian Osborne* wrote: >> >> Victor, >> >> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is >> working for me even though I haven't set BLATDIR. This is using the latest >> blat, v. 33. >> >> There is a problem here though, you can see it if you read Blat.t. The >> constructor does not look like your usual new(): >> >> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, >> >> -verbose => $verbose, >> "DB" => $db); >> >> Unfortunate - would you be willing to do more than add a useful SYNOPSIS >> and >> actually fix new()? There is a subtext here, we're trying to find people >> who >> would be willing to maintain useful modules like these, the ideal person >> in >> this case would be someone who'd regularly use the module. >> >> Brian O. >> >> >> On 2/9/06 6:22 PM, "Victor" wrote: >> >>> Hi, >>> Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to >> date >>> in the lastest bioperl release? >>> >>> >>> >>> use Bio::Tools::Run::Alignment::Blat; >>> my $factory = Bio::Tools::Run::Alignment::Blat->new(); >>> my $seq = >>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; >>> >>> my @feats = $factory->run( $seq); >>> >>> Here is what I get when tring to use it: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA >>> /tmp/fB09bp5F76) crashed: -1 >>> >>> Notice that it is using "blat' twice in the path. The way that I fixed >> this >>> is by going to the blat.pm module and >> changing the following lines: >>> #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); >>> my $str= Bio::Root::IO->catfile($self->program_name); >>> >>> Any ideas, maybe I'm missing the $ENV variable somewhere? >>> I'd like to avoid making this change. Also does anyone have a known >> synopsis >>> of this blat module (where to set the parameters, and whether it allows >> you >>> to have a config file). >>> I'll be happy to add a better synopsis to the module if needed. >>> >>> Thanks in advance, >>> Victor >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > org/mailman/listinfo/bioperl-l> >> >> >> >> >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Sat Feb 11 01:54:51 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sat, 11 Feb 2006 08:54:51 +0200 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: References: <000001c62e60$9acecca0$c2987ca5@pc13> Message-ID: <200602110854.52116.heikki@sanbi.ac.za> I second Hilmar's suggestion to use Bio::Annotation::Collection for database (ontology database in this case) metadata. While you are at it, why do not define or use an existing (?) public ontology to do that. ;-) -Heikki On Friday 10 February 2006 23:39, Hilmar Lapp wrote: > Sohel, > > please allow me to copy the list in my response. There's many good and > insightful people on the list who may have something to add or > different ideas. > > I've come across that problem myself, for instance with InterPro. What > I've done so far simply is to stick it unstructured into the definition > slot, which is not helpful if your purpose goes further than just > displaying it in an unstructured fashion. > > I'm not sure you would want to create another class for this (like > AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > implementation, probably not the interface) annotatable (i.e., > implement Bio::Annotatable), which supposedly would be simple to do > (AnnotationCollection is already implemented, you'd just return an > instance of it). > > Even though tag/value pairs sound like quick&fast way to go I'm leaning > against it; in essence we're moving away from that elsewhere > (SeqFeatureI) and hence I don't think we should restart it here. > > I'm not giving a definitive answer here, just my (initial) thoughts. > Hope that helps nonetheless. Can you fancy yourself trying the > Annotatable approach and let us know how it goes? > > -hilmar > > On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > > Hi Hilmar, > > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, > > Northwestern University. I am working on a parser for an ontology > > file. I really like the ontology object model which you have > > contributed to Bioperl. I think its just Awesome!! One of things which > > I thought would be great to capture is the ontology headers. Right now > > one can specify only the name, authority information. I was wondering > > if there is any way, I could also capture other ontology file headers > > like version of the file, date when that ontology file was made. I was > > thinking of making a header class or alternatively it could go as Hash > > of values in the Bio::Ontology::Ontology class itself. I wanted to > > know whets your thoughts about on this. > > ? > > Thanks, > > Sohel Merchant > > dictyBase -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Sun Feb 12 00:10:35 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Feb 2006 21:10:35 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <000001c62e9a$4f82eee0$c2987ca5@pc13> References: <000001c62e9a$4f82eee0$c2987ca5@pc13> Message-ID: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net> Sohel, please do keep the discussion on the list, in your own interest as there's a multitude of people who can respond to you. SimpleValue would probably be what I'd use too. As Heikki hinted you might even create an ontology for annotating ontologies, which would allow you to use Annotation::OntologyTerm for annotation, but then there's no qualifier value ... Bioperl 1.5.1 has been released last year, please check the website. -hilmar On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > Hi Hilmar, > I really like your suggestion of implementing the Bio::AnnotatableI > interface in the Bio::Ontology::Ontology class. I am going to implement > this and play around a little with it. I am planning to use > Bio::Annotation::SimpleValue for annotating the header as it provides a > good way of specifying the Tag/value pair. What are your thoughts on > using this? > > Also, I was wondering if you have any idea about the scheduled date > for the Bioperl 1.51 release. I would like to contribute some stuff in > the next release. > > Thanks, > Sohel. > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Friday, February 10, 2006 3:40 PM > To: Sohel Merchant > Cc: Bioperl > Subject: Re: Bio::Ontology::Ontology > > Sohel, > > please allow me to copy the list in my response. There's many good and > insightful people on the list who may have something to add or > different ideas. > > I've come across that problem myself, for instance with InterPro. What > I've done so far simply is to stick it unstructured into the definition > slot, which is not helpful if your purpose goes further than just > displaying it in an unstructured fashion. > > I'm not sure you would want to create another class for this (like > AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > implementation, probably not the interface) annotatable (i.e., > implement Bio::Annotatable), which supposedly would be simple to do > (AnnotationCollection is already implemented, you'd just return an > instance of it). > > Even though tag/value pairs sound like quick&fast way to go I'm leaning > against it; in essence we're moving away from that elsewhere > (SeqFeatureI) and hence I don't think we should restart it here. > > I'm not giving a definitive answer here, just my (initial) thoughts. > Hope that helps nonetheless. Can you fancy yourself trying the > Annotatable approach and let us know how it goes? > > -hilmar > > > On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > >> Hi Hilmar, >> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, >> Northwestern University. I am working on a parser for an ontology >> file. I really like the ontology object model which you have >> contributed to Bioperl. I think its just Awesome!! One of things which > >> I thought would be great to capture is the ontology headers. Right now > >> one can specify only the name, authority information. I was wondering >> if there is any way, I could also capture other ontology file headers >> like version of the file, date when that ontology file was made. I was > >> thinking of making a header class or alternatively it could go as Hash > >> of values in the Bio::Ontology::Ontology class itself. I wanted to >> know whets your thoughts about on this. >> ? >> Thanks, >> Sohel Merchant >> dictyBase >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hjm at tacgi.com Sun Feb 12 01:46:38 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Sat, 11 Feb 2006 22:46:38 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs Message-ID: <200602112246.38926.hjm@tacgi.com> Hi All, After perusing the tutorial and other docs for a an evening, I still can't find the answer to this. Forgive me if I've missed something obvious. This should not be a novel request, but I've not found it answered. If bioperl isn't the best way to do this, I'd be grateful to a pointer to a better way, especially if it includes an illuminating bit of code. The problem is to retrieve genomic sequences plus & minus some offset from a locus determined by HUGO keyword or GeneID. This would be a common followup chore for some extra analysis from a gene expression expt. Or maybe this is in the DBFetch routines, but I've missed the sequence type to specify...? TIA! -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From osborne1 at optonline.net Sun Feb 12 11:37:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 12 Feb 2006 11:37:39 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602112246.38926.hjm@tacgi.com> Message-ID: Harry, Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, from its documentation: use Bio::DB::Fasta; # create database from directory of fasta files my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); # simple access (for those without Bioperl) my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); my @ids = $db->ids; my $length = $db->length('CHROMOSOME_I'); my $alphabet = $db->alphabet('CHROMOSOME_I'); my $header = $db->header('CHROMOSOME_I'); # Bioperl-style access my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); my $seq = $obj->seq; my $subseq = $obj->subseq(4_000_000 => 4_100_000); Do you already have the offsets? Brian O. On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > Hi All, > > After perusing the tutorial and other docs for a an evening, I still can't > find the answer to this. Forgive me if I've missed something obvious. > > This should not be a novel request, but I've not found it answered. If > bioperl isn't the best way to do this, I'd be grateful to a pointer to a > better way, especially if it includes an illuminating bit of code. > > The problem is to retrieve genomic sequences plus & minus some offset from a > locus determined by HUGO keyword or GeneID. This would be a common followup > chore for some extra analysis from a gene expression expt. Or maybe this is > in the DBFetch routines, but I've missed the sequence type to specify...? > > > TIA! From pmiguel at purdue.edu Sun Feb 12 15:05:47 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sun, 12 Feb 2006 15:05:47 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Message-ID: <43EF951B.4030601@purdue.edu> Roger, Just a data point, but in case you were not already aware of it, the characters W, K and R may be included in some DNA sequences. 'W' means 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember correctly. These are ambiguous bases, where a basecaller isn't sure, for example, whether a particular peak is an A or a T. Although I see these ambiguous bases less frequently these days, even common modern basecallers (such as Applied Biosystems basecallers) can generally be configured so they will generate them. Downstream applications may not like them, however. I may be just stating the obvious, or this might be irrelevant to the issue at hand. If so, my apologies. Phillip Roger Hall wrote: > Guys - I'm looking at the error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > This is my line of thought: > 1. "no data for midline $_" is a unique message generated by blast.pm in one > location only at the point of a. reading three lines b. dropping lines with > spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) > 2. There is a regexp match that fails in order to reach that error message > 3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression > 4. It does anyway > 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast > reports > > I suspect a newline/chomp/metacharacter issue. Not finding the string > anywhere has me thoroughly confused - I asked Hubert for the additional > file, assuming that I didn't have it. > > My next thought is to write a quick script to test perl behavior on "Fedora > Core 9". > > Thoughts? > > Did I misread the issue entirely? :} > > Roger > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, February 09, 2006 10:16 AM > To: 'Jason Stajich'; 'Hubert Prielinger' > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast > output > > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Thursday, February 09, 2006 9:13 AM >> To: Hubert Prielinger >> Cc: Chris Fields; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >> parsing Blast output >> >> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi chris, >>> thanks, I have upgraded to version 1.5.1 but it isn't still >>> >> working, >> >>> do you have any ohter idea, the problem I have is that I >>> >> have to parse >> >>> a lot of textfiles.... >>> or shall I look for another option to parse those files... >>> >>> regards >>> Hubert >>> >> The code from Bioperl 1.5.1 works fine for me for blast >> 2.2.13 reports but unless you post your blast report we can't >> really determine the problem. >> >> If you are still getting the same error like this I am not >> convinced you have upgraded to 1.5.1 which includes a fix in >> the fact that NCBI changed the HSP result format to remove >> the ':' from the Query/Sbjct prefixes. We fixed this as soon >> as it was apparent sometime in September. >> >> >>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>> STACK Bio::SearchIO::blast::next_result >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>> STACK toplevel >>>>> >>>>> >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> If you are just getting no results but also no warnings wrt >> parsing, are you sure your logic is correct? >> >> If you remove your filters do you see all the HSPS? >> >> >> while (my $result = $search->next_result) { >> print $result->query_name, "\n"; >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> print $hit->name, "\n"; >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >> >hit_string, "\n"; >> } >> } >> } >> > > I tested some of the BLAST results that Hubert sent Roger and me with a > similar script to the above. I removed the file parsing logic and it seemed > to work just fine. It may very well be a logic issue or that he hasn't > installed the latest fix. > > It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even > though the returned output was from nr, the top of the blast output showed > that it was v2.2.12: > > BLASTP 2.2.12 [Aug-07-2005] > > I double-checked my local version and it's definitely v.2.2.13: > ------------------------------------- > C:\Perl\Scripts>blastcl3 - > > blastcl3 2.2.13 arguments:... > ------------------------------------- > > If you use RemoteBlast using the same settings, the version in the header > looks like this: > > BLASTP 2.2.13 [Nov-27-2005] > > I'm wondering if all the blast executables (blast and netblast) from NCBI > have text output like v.2.2.12, while the wwwblast outputs a new format > (2.2.13). I'll ask blast-help at NCBI about this. > > >> To clarify some stuff - >> Chris I don't necessarily think the XML is best way forward >> for BLAST reports generated locally, it isn't as detailed as >> the Text format and it is what most people expect to be able >> to scroll through and parse -- it is also harder for the >> format to change dramatically if you have a static binary on >> your machine =). I think for remoteblast the XML format >> should be the way forward but I expect Bioperl to maintain >> support of any plain text BLAST report format that people use >> on a regular basis. >> >> > > Does XML lack some specific info that text output has? Didn't know that. I > believe that XML should be default in RemoteBlast since it will not break, > but I agree with you about text output. I also agree that it will need > somebody to maintain it constantly, much like RemoteBlast. > > >> -jason >> >>> Chris Fields wrote: >>> >>> >>>> My guess is you're running into text parsing problems in >>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>> (1.5.1) or >>>> bioperl-live (CVS), then see the bug below. >>>> >>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>> I think the first problem you ran into is solved in bioperl 1.5.1, >>>> the last problem (more recent, not related to the first) has been >>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>> SearchIO::blast is available in the link above, but >>>> >> realize it hasn't >> >>>> been committed yet and may change. >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>> Prielinger >>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>> To: bioperl-l at bioperl.org >>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> >> parsing Blast >> >>>>> output >>>>> >>>>> Hi, >>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>> Bio::SearchIO, I get the following error message: >>>>> >>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>> STACK Bio::SearchIO::blast::next_result >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>> STACK toplevel >>>>> >>>>> >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>>>> is that a bug...... >>>>> >>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>> anything..... >>>>> I'm using bioperl 1.4 >>>>> >>>>> before, I have installed bioperl 1.4, it worked fine >>>>> >> parsing Blast >> >>>>> Output (version 2.2.12), but I don't remember which >>>>> >> bioperl version >> >>>>> I had installed >>>>> >>>>> thanks in advance >>>>> >>>>> Hubert >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Sun Feb 12 17:30:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 12 Feb 2006 16:30:07 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EF951B.4030601@purdue.edu> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> <43EF951B.4030601@purdue.edu> Message-ID: <855DEC6F-8057-47BA-9D1D-9BDC16D1D83B@uiuc.edu> Sequences are converted to FASTA format in RemoteBlast using Bio::SeqIO, which I think includes IUPAC base and amino acid ambiguities like you mention, so my guess is any errors (like odd non- IUPAC letters in nucleotide or aa queries) are likely caught there. As long as it passes Bio::SeqIO it shouldn't be a problem. Haven't tried this myself, though, so I can't say that with absolute certainty. Chris On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote: > Roger, > Just a data point, but in case you were not already aware of it, the > characters W, K and R may be included in some DNA sequences. 'W' means > 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember > correctly. These are ambiguous bases, where a basecaller isn't > sure, for > example, whether a particular peak is an A or a T. Although I see > these > ambiguous bases less frequently these days, even common modern > basecallers (such as Applied Biosystems basecallers) can generally be > configured so they will generate them. Downstream applications may not > like them, however. > I may be just stating the obvious, or this might be irrelevant to > the issue at hand. If so, my apologies. > > Phillip > Roger Hall wrote: >> Guys - I'm looking at the error message: >> >> MSG: no data for midline Query 1 WWWKWRW 7 >> STACK Bio::SearchIO::blast::next_result >> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >> STACK toplevel >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> This is my line of thought: >> 1. "no data for midline $_" is a unique message generated by >> blast.pm in one >> location only at the point of a. reading three lines b. dropping >> lines with >> spaces only c. identifying the Query, Midline, and Match lines (0 >> <= $i < 3) >> 2. There is a regexp match that fails in order to reach that error >> message >> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >> expression >> 4. It does anyway >> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the >> blast >> reports >> >> I suspect a newline/chomp/metacharacter issue. Not finding the string >> anywhere has me thoroughly confused - I asked Hubert for the >> additional >> file, assuming that I didn't have it. >> >> My next thought is to write a quick script to test perl behavior >> on "Fedora >> Core 9". >> >> Thoughts? >> >> Did I misread the issue entirely? :} >> >> Roger >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >> Fields >> Sent: Thursday, February 09, 2006 10:16 AM >> To: 'Jason Stajich'; 'Hubert Prielinger' >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >> Blast >> output >> >> >> >>> -----Original Message----- >>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>> Sent: Thursday, February 09, 2006 9:13 AM >>> To: Hubert Prielinger >>> Cc: Chris Fields; bioperl-l at bioperl.org >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>> >>>> hi chris, >>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>> >>> working, >>> >>>> do you have any ohter idea, the problem I have is that I >>>> >>> have to parse >>> >>>> a lot of textfiles.... >>>> or shall I look for another option to parse those files... >>>> >>>> regards >>>> Hubert >>>> >>> The code from Bioperl 1.5.1 works fine for me for blast >>> 2.2.13 reports but unless you post your blast report we can't >>> really determine the problem. >>> >>> If you are still getting the same error like this I am not >>> convinced you have upgraded to 1.5.1 which includes a fix in >>> the fact that NCBI changed the HSP result format to remove >>> the ':' from the Query/Sbjct prefixes. We fixed this as soon >>> as it was apparent sometime in September. >>> >>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> >>>>>> >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> If you are just getting no results but also no warnings wrt >>> parsing, are you sure your logic is correct? >>> >>> If you remove your filters do you see all the HSPS? >>> >>> >>> while (my $result = $search->next_result) { >>> print $result->query_name, "\n"; >>> #iterate over each hit on the query sequence >>> while (my $hit = $result->next_hit) { >>> print $hit->name, "\n"; >>> #iterate over each HSP in the hit >>> while (my $hsp = $hit->next_hsp) { >>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>> hit_string, "\n"; >>> } >>> } >>> } >>> >> >> I tested some of the BLAST results that Hubert sent Roger and me >> with a >> similar script to the above. I removed the file parsing logic and >> it seemed >> to work just fine. It may very well be a logic issue or that he >> hasn't >> installed the latest fix. >> >> It's a funny thing, though. When I tried using blastcl3 (v. >> 2.2.13), even >> though the returned output was from nr, the top of the blast >> output showed >> that it was v2.2.12: >> >> BLASTP 2.2.12 [Aug-07-2005] >> >> I double-checked my local version and it's definitely v.2.2.13: >> ------------------------------------- >> C:\Perl\Scripts>blastcl3 - >> >> blastcl3 2.2.13 arguments:... >> ------------------------------------- >> >> If you use RemoteBlast using the same settings, the version in the >> header >> looks like this: >> >> BLASTP 2.2.13 [Nov-27-2005] >> >> I'm wondering if all the blast executables (blast and netblast) >> from NCBI >> have text output like v.2.2.12, while the wwwblast outputs a new >> format >> (2.2.13). I'll ask blast-help at NCBI about this. >> >> >>> To clarify some stuff - >>> Chris I don't necessarily think the XML is best way forward >>> for BLAST reports generated locally, it isn't as detailed as >>> the Text format and it is what most people expect to be able >>> to scroll through and parse -- it is also harder for the >>> format to change dramatically if you have a static binary on >>> your machine =). I think for remoteblast the XML format >>> should be the way forward but I expect Bioperl to maintain >>> support of any plain text BLAST report format that people use >>> on a regular basis. >>> >>> >> >> Does XML lack some specific info that text output has? Didn't >> know that. I >> believe that XML should be default in RemoteBlast since it will >> not break, >> but I agree with you about text output. I also agree that it will >> need >> somebody to maintain it constantly, much like RemoteBlast. >> >> >>> -jason >>> >>>> Chris Fields wrote: >>>> >>>> >>>>> My guess is you're running into text parsing problems in >>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>> (1.5.1) or >>>>> bioperl-live (CVS), then see the bug below. >>>>> >>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>> I think the first problem you ran into is solved in bioperl 1.5.1, >>>>> the last problem (more recent, not related to the first) has been >>>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>>> SearchIO::blast is available in the link above, but >>>>> >>> realize it hasn't >>> >>>>> been committed yet and may change. >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>> Prielinger >>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>> To: bioperl-l at bioperl.org >>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>> >>> parsing Blast >>> >>>>>> output >>>>>> >>>>>> Hi, >>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>> Bio::SearchIO, I get the following error message: >>>>>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> >>>>>> >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>>>>> is that a bug...... >>>>>> >>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>> anything..... >>>>>> I'm using bioperl 1.4 >>>>>> >>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>> >>> parsing Blast >>> >>>>>> Output (version 2.2.12), but I don't remember which >>>>>> >>> bioperl version >>> >>>>>> I had installed >>>>>> >>>>>> thanks in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Sun Feb 12 18:56:32 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 13 Feb 2006 10:56:32 +1100 Subject: [Bioperl-l] RemoteBlast In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL> References: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Message-ID: <1139788592.29375.13.camel@chauvel.csse.monash.edu.au> Roger, > I think that most core Bioperl folks have long since moved away from > RemoteBlast and are using the functionality in StandAloneBlast to run their > own local servers. Agreed. Even smaller centres like my workplace need the throughput that a local PC, SMP system or Cluster can provide. > wave of the future, but I think there is still some concern that not every > flavor of BLAST produces XML yet. Even so, the XML parser is considered to > be very strong, and only helps hasten the end of text-formatted support, > since parsing text-formatted reports is the primary source of pain. If BioPerl switches primarily to XML parsing, the tool authors will soon add support for XML (not very difficult really) due to BioPerl's pervasiveness? > I do, however, see the advantage in shifting to XML-formatted reporting and > parsing *only* as soon as every BLAST flavor supports it, if not before. > (Anyone - is this still an issue. Please educate me.) The four BLAST flavours I utilise all support XML output: 1) NCBI BLAST 2) WU-BLAST 3) MPI-BLAST 4) FSA-BLAST. > At the moment, I'm leaning towards adding an option to RemoteBlast. The > default (no option) would use a "pure perl" implementation, and the > enhancement (with explicit option) would merely wrap the NCBI executable. If the API is done correctly both of these could co-exist with very little redundant code. (I personally rarely use remote blast). -- Torsten Seemann Victorian Bioinformatics Consortium From torsten.seemann at infotech.monash.edu.au Sun Feb 12 19:35:06 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 13 Feb 2006 11:35:06 +1100 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Message-ID: <1139790906.29375.27.camel@chauvel.csse.monash.edu.au> > Mostly I think we need to try and support something that will > "ALWAYS" work so that individuals setting up webservices which rely > on remote blast functionality. In theory, netblast/blastcl3 should > always work since NCBI has to update the exe when they change their > server setup. What usually happens when an older 'blastcl3' binary is used on a newer server setup? I guess it fails in a deterministic manner so the BioPerl user can throw a useful exception. > I also see value in providing a wrapper for netblast since it should > look an awful lot like running blast locally. Agreed - they are virtually indistinguishable. > Ideally I'd like to see a more extensible system, something like (and > please feel free to come up with better names for the modules!): Do BioPerl coding standards require "::Blast" over "::BLAST" ? (not important anyway) > Bio::Tools::Run::Blast > --> StandAlone (support for [..as many flavours as poss]) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Looks reasonable. I assume there's some interfaces in there like Bio::Tools::Blast::BlastI etc. Could probably call "RemoteNetBlast" just "RemoteNet" because it is already in the Blast:: namespace. (not important though) My only suggestion for StandAlone (and RemoteNetBlast) is that they both do a generic "run a local binary with env. vars and parameters and capture the stdout, stderr and return code". This needs to be abstracted away (or re-use existing code from bioperl-run?). Jason mentioned Ensembl::Runnable as a source of code we could incorporate into Bioperl. -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Mon Feb 13 11:45:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 10:45:14 -0600 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <20060213152603.ed3f3118@dogwood.plantbio.uga.edu> Message-ID: <001801c630bc$dd35bff0$15327e82@pyrimidine> If you're using RemoteBlast 1.28, then you've likely updated from CVS which isn't the latest fix. Make sure that you check the following: 1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first. Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn't called, it will work. 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven't been cleared and checked in by Roger Hall (who's now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn't necessary if you don't plan on using this option. And, remember, they haven't been committed yet to CVS, which means that the final version will change to refle the new version. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _____ From: Guojun Yang [mailto:gyang at plantbio.uga.edu] Sent: Monday, February 13, 2006 9:26 AM To: Chris Fields Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 Hi, Chris Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion? Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org Sent: Fri, 03 Feb 2006 16:07:29 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Mon Feb 13 13:32:14 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 13 Feb 2006 13:32:14 -0500 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 In-Reply-To: <001801c630bc$dd35bff0$15327e82@pyrimidine> Message-ID: <20060213183214.342b90da@dogwood.plantbio.uga.edu> Hi, Chris, I do have different versions of bioperl on my Linux machine (1.4. and 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I need to uninstall and remove the previous versions. I could not find any hint on uninstalling bioperl on linux. Could you please give me some suggestion? Thanks, Guojun Department of Plant Biology University of Georgia _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Sent: Mon, 13 Feb 2006 11:45:14 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 If you?re using RemoteBlast 1.28, then you?ve likely updated from CVS which isn?t the latest fix. Make sure that you check the following: 1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first. Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn?t called, it will work. 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven?t been cleared and checked in by Roger Hall (who?s now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn?t necessary if you don?t plan on using this option. And, remember, they haven?t been committed yet to CVS, which means that the final version will change to refle the new version. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _____ From: Guojun Yang [mailto:gyang at plantbio.uga.edu] Sent: Monday, February 13, 2006 9:26 AM To: Chris Fields Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 Hi, Chris Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion? Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org Sent: Fri, 03 Feb 2006 16:07:29 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Feb 13 15:39:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 14:39:38 -0600 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 In-Reply-To: <20060213183214.342b90da@dogwood.plantbio.uga.edu> Message-ID: <000901c630dd$9be54f40$15327e82@pyrimidine> How do you know two versions are installed (i.e. how are you checking the version)? Do you see have two complete bioperl distributions (in two separate directories) or are you looking in modules? Here's the way to check the version (from the FAQ): perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' If you have two full bioperl distributions on your computer, normally only one will be in use unless you have explicitly set the environment variable PERL5LIB. The PERL5LIB directories will be searched first before your normal perl directory list (@INC) is searched. You MAY get some mixing then, but only if perl can't find a particular module in the path designated in PERL5LIB; then it will progress through the directories listed in @INC. This may happen if a module is unique to a particular release, but shouldn't happen for the majority of modules, including RemoteBlast. You can check what @INC and PERL5LIB are set to by using 'perl -V'. @INC will differ depending on your OS, perl build, etc. Regardless, if you follow the directions for installing bioperl for your system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you explicitly change the installation directory when using 'perl Makefile.PL'), then 'uninstalling' Bioperl shouldn't be a problem as it will install the Bioperl distribution you downloaded over the old version in @INC. See this page: http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL for more details. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Monday, February 13, 2006 12:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > Hi, Chris, > I do have different versions of bioperl on my Linux machine (1.4. and > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I > need to uninstall and remove the previous versions. I could not find any > hint on uninstalling bioperl on linux. Could you please give me some > suggestion? > Thanks, > Guojun > > Department of Plant Biology > University of Georgia > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS > which isn't the latest fix. > > Make sure that you check the following: > > 1) Always post to the mailing list: > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) > installed first. Perform a clean installation; do not upgrade only > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't > guarantee that mixing modules from old and new distributions (1.4 and > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > installation will allow text output from BLAST v.2.2.12 to be saved and > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) > but it should still save it. I believe as long as next_results() isn't > called, it will work. > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output > are NOT in CVS; they haven't been cleared and checked in by Roger Hall > (who's now taking care of RemoteBlast) and the powers that be (Jason or > whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of > saving XML output, so isn't necessary if you don't plan on using this > option. And, remember, they haven't been committed yet to CVS, which > means that the final version will change to refle the new version. > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _____ > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > Sent: Monday, February 13, 2006 9:26 AM > To: Chris Fields > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > > Hi, Chris > > Thanks for your suggestion, however, it doesn't seem to work for my cgi > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get > any RID. Is there any suggestion? > > > > Guojun > > > Guojun Yang > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > I would say give the new code a try, but realize that it hasn't been > checked > in (like I said below). I will try going over the modified > Bio::SearchIO::blast again this weekend to see if there is anything I > might > have missed. The changed order in the header of BLAST text output has me a > bit worried that it might not catch everything, but it at least doesn't > hang > in the while() loop I described in the bug report below (bug #1934) and > seems to process everything fine. > > If you want more stability in the code, you might consider changing over > to > XML output and parsing with Bio::SearchIO::blastxml. There are some > changes > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML > output, but I believe it parses everything regardless. If you look back > the > last month or so there has been a bit of discussion here about it. Jason > describes a bit on how to set up RemoteBlast for XML: > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > Sent: Friday, February 03, 2006 1:45 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > > > Hi, Everybody, > > I see this post and am wondering if this is the reason for the > > malfunctionning of my webserver. We set up a webserver named MAK, for > MITE > > sequence analysis. It was working very well until around November 2005, > > when it stopped returning any result (the site is fine and seems to be > > doing sth after submission). In the CGI script, I used remoteblast (that > > work was done in 2003) to do searches. I currently do not have access to > > the server because I moved. Quite several people sent emails to us about > > its malfunctioning. Is there any suggestion on fixing the problem? > Should > > I simplily ask the remoteblast.pm be replaced with the new version? > > Thanks a lot, > > Guojun > > > > Department of Plant Biology > > University of Georgia > > Tel: 706-542-1857 > > Fax: 706-542-1805 > > http://www.arches.uga.edu/~guojun > > _____ > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > > l at bioperl.org] > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > > will > > work for saving text output. However, it will not parse anything using > > next_result (it will likely hang) and will not save XML format. See > these > > bugs: > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > for explanations and possible fixes (changes to RemoteBlast and > > Bio::SearchIO::blast). Note that these haven't been checked in yet so > are > > still not included in bioperl-live; they may be further modified before > > committing to CVS. If you're not worried about XML, you could just try > the > > first fix, which is a change to SearchIO::blast. > > > > Nagesh, I remember you posting to the list a month ago using a script > > which > > had problems; the script you used saves the output but doesn't actually > > parse it (i.e. you don't use next_result() to go through the data). Is > the > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > > parsing the output using "-readmethod => SearchIO" or "-readmethod => > > blast" > > using your version of RemoteBlast and method next_result()? Like below > > (from > > perldoc): > > > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { # parsing > > starts here > > my $result = $rc->next_result(); # it should hang > > here > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > > > My script hanged if I used next_result() in any way prior to the fixes. > I > > want to see how many others are having the same issues with parsing > using > > the CVS version of bioperl-live. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > Sent: Thursday, February 02, 2006 7:24 PM > > > To: Huang Jian; bioperl-l > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > Hi Huang, > > > Thanks for the message. The older version of RemoteBlast.pm works on > the > > > logic of checking the temporary file size to determine whether the > Blast > > > results are ready. This condition is not getting satisfied may be due > to > > > some changes brought about by NCBI. I had this problem recently and > > > figured out that the solution was to use the latest version which has > > > this problem fixed (does not use file size logic any more) which is > not > > > yet included in the BioPerl package. > > > Cheers > > > Nagesh > > > > > > Huang Jian wrote: > > > > > > > Dear Nagesh, > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > > me. Now it works perfectly!!! > > > > > > > > Thank you!! > > > > > > > > Huang > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > > via email > > > > > > > > > > > >> Hi Huang, > > > >> I see that you are submitting a sequence for a remote blast search. > > Can > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). > If > > > >> not I have attached it with this email, try to replace it with the > > old > > > >> one which has a bug. > > > >> Let me know if it works. > > > >> Nagesh > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Mon Feb 13 16:00:11 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 13 Feb 2006 16:00:11 -0500 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 Message-ID: <20060213160011.1e89108c@dogwood.plantbio.uga.edu> Thanks, Chris, I installed version 1.5.1 and replaced the blast.pm file with the one from your bug report. The running version is 1.5 when I use the command you sent me. But when I tried the script, it doesn't change much. My remoteblast code (portion) is here: sub search { local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; local $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= 'no'; local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"query", -desc=>"new seq"); my $len=$query->length(); @db=('nr','htgs','wgs'); foreach my $db (@db) { my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', '-data' =>"$db", '-expect'=>"$E_value"); my $blast_report = $factory->submit_blast($query); my @rids = $factory->each_rid(); foreach my $rid ( @rids ) { print STDERR "$rid\n"; } # RID = Remote Blast ID (e.g: 1017772174-16400-6638) print STDERR "waiting..."; sleep 60; foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); while (!ref($rc) ) { if( $rc < 0 ) { # retrieve_blast returns -1 on error $factory->remove_rid($rid); print "Error!\n"; send_error($email,$function,$seqname,$queryname[$ST]); die "Can't retrieve $rid"; } if ($rc==0) { # retrieve_blast returns 0 on 'job not finished' sleep 60; $rc = $factory->retrieve_blast($rid); } } if (ref($rc)) { print STDERR "Done.\n"; while( my $result = $rc->next_result) { while( my $hit = $result->next_hit()) { $hit_name=$hit->name; $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; $name=$1; @left_plus_start=(); @left_plus_end=(); @left_minus_start=(); @left_minus_end=(); @right_plus_start=(); @right_plus_end=(); @right_minus_start=(); @right_minus_end=(); if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { while( my $hsp = $hit->next_hsp()) { ...... It was working quite well before around October laster year, but it has stopped since then, When a submission is sent via a webpage, the cgi starts to work and use a memory of ~20 Mb. Then it hangs there, finally the expected email is received but without real results although it does contain something from other parts of the script. Apparently the search sub did not return anything (I know there is something should be returned.). Is it also possible the format of the NCBI output for each result has changed? Thank you, Guojun Department of Plant Biology University of Georgia ----- Original Message ----- From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > How do you know two versions are installed (i.e. how are you checking the > version)? Do you see have two complete bioperl distributions (in two > separate directories) or are you looking in modules? Here's the way to > check the version (from the FAQ): > > perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > > If you have two full bioperl distributions on your computer, normally only > one will be in use unless you have explicitly set the environment variable > PERL5LIB. The PERL5LIB directories will be searched first before your > normal perl directory list (@INC) is searched. You MAY get some mixing > then, but only if perl can't find a particular module in the path designated > in PERL5LIB; then it will progress through the directories listed in @INC. > This may happen if a module is unique to a particular release, but shouldn't > happen for the majority of modules, including RemoteBlast. You can check > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will differ > depending on your OS, perl build, etc. > > Regardless, if you follow the directions for installing bioperl for your > system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you > explicitly change the installation directory when using 'perl Makefile.PL'), > then 'uninstalling' Bioperl shouldn't be a problem as it will install the > Bioperl distribution you downloaded over the old version in @INC. See this > page: > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > for more details. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > Sent: Monday, February 13, 2006 12:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > Hi, Chris, > > I do have different versions of bioperl on my Linux machine (1.4. and > > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I > > need to uninstall and remove the previous versions. I could not find any > > hint on uninstalling bioperl on linux. Could you please give me some > > suggestion? > > Thanks, > > Guojun > > > > Department of Plant Biology > > University of Georgia > > _____ > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS > > which isn't the latest fix. > > > > Make sure that you check the following: > > > > 1) Always post to the mailing list: > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) > > installed first. Perform a clean installation; do not upgrade only > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't > > guarantee that mixing modules from old and new distributions (1.4 and > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > installation will allow text output from BLAST v.2.2.12 to be saved and > > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) > > but it should still save it. I believe as long as next_results() isn't > > called, it will work. > > > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output > > are NOT in CVS; they haven't been cleared and checked in by Roger Hall > > (who's now taking care of RemoteBlast) and the powers that be (Jason or > > whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of > > saving XML output, so isn't necessary if you don't plan on using this > > option. And, remember, they haven't been committed yet to CVS, which > > means that the final version will change to refle the new version. > > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _____ > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > Sent: Monday, February 13, 2006 9:26 AM > > To: Chris Fields > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > > > Hi, Chris > > > > Thanks for your suggestion, however, it doesn't seem to work for my cgi > > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get > > any RID. Is there any suggestion? > > > > > > > > Guojun > > > > > > Guojun Yang > > Department of Plant Biology > > University of Georgia > > Tel: 706-542-1857 > > Fax: 706-542-1805 > > http://www.arches.uga.edu/~guojun > > _____ > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > I would say give the new code a try, but realize that it hasn't been > > checked > > in (like I said below). I will try going over the modified > > Bio::SearchIO::blast again this weekend to see if there is anything I > > might > > have missed. The changed order in the header of BLAST text output has me a > > bit worried that it might not catch everything, but it at least doesn't > > hang > > in the while() loop I described in the bug report below (bug #1934) and > > seems to process everything fine. > > > > If you want more stability in the code, you might consider changing over > > to > > XML output and parsing with Bio::SearchIO::blastxml. There are some > > changes > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML > > output, but I believe it parses everything regardless. If you look back > > the > > last month or so there has been a bit of discussion here about it. Jason > > describes a bit on how to set up RemoteBlast for XML: > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > Sent: Friday, February 03, 2006 1:45 PM > > > To: bioperl-l at bioperl.org > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > > > > > Hi, Everybody, > > > I see this post and am wondering if this is the reason for the > > > malfunctionning of my webserver. We set up a webserver named MAK, for > > MITE > > > sequence analysis. It was working very well until around November 2005, > > > when it stopped returning any result (the site is fine and seems to be > > > doing sth after submission). In the CGI script, I used remoteblast (that > > > work was done in 2003) to do searches. I currently do not have access to > > > the server because I moved. Quite several people sent emails to us about > > > its malfunctioning. Is there any suggestion on fixing the problem? > > Should > > > I simplily ask the remoteblast.pm be replaced with the new version? > > > Thanks a lot, > > > Guojun > > > > > > Department of Plant Biology > > > University of Georgia > > > Tel: 706-542-1857 > > > Fax: 706-542-1805 > > > http://www.arches.uga.edu/~guojun > > > _____ > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > > > l at bioperl.org] > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > > > will > > > work for saving text output. However, it will not parse anything using > > > next_result (it will likely hang) and will not save XML format. See > > these > > > bugs: > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > for explanations and possible fixes (changes to RemoteBlast and > > > Bio::SearchIO::blast). Note that these haven't been checked in yet so > > are > > > still not included in bioperl-live; they may be further modified before > > > committing to CVS. If you're not worried about XML, you could just try > > the > > > first fix, which is a change to SearchIO::blast. > > > > > > Nagesh, I remember you posting to the list a month ago using a script > > > which > > > had problems; the script you used saves the output but doesn't actually > > > parse it (i.e. you don't use next_result() to go through the data). Is > > the > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > > > parsing the output using "-readmethod => SearchIO" or "-readmethod => > > > blast" > > > using your version of RemoteBlast and method next_result()? Like below > > > (from > > > perldoc): > > > > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { # parsing > > > starts here > > > my $result = $rc->next_result(); # it should hang > > > here > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > > > > > > My script hanged if I used next_result() in any way prior to the fixes. > > I > > > want to see how many others are having the same issues with parsing > > using > > > the CVS version of bioperl-live. > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > To: Huang Jian; bioperl-l > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > Hi Huang, > > > > Thanks for the message. The older version of RemoteBlast.pm works on > > the > > > > logic of checking the temporary file size to determine whether the > > Blast > > > > results are ready. This condition is not getting satisfied may be due > > to > > > > some changes brought about by NCBI. I had this problem recently and > > > > figured out that the solution was to use the latest version which has > > > > this problem fixed (does not use file size logic any more) which is > > not > > > > yet included in the BioPerl package. > > > > Cheers > > > > Nagesh > > > > > > > > Huang Jian wrote: > > > > > > > > > Dear Nagesh, > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > > > me. Now it works perfectly!!! > > > > > > > > > > Thank you!! > > > > > > > > > > Huang > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > > > via email > > > > > > > > > > > > > > >> Hi Huang, > > > > >> I see that you are submitting a sequence for a remote blast search. > > > Can > > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). > > If > > > > >> not I have attached it with this email, try to replace it with the > > > old > > > > >> one which has a bug. > > > > >> Let me know if it works. > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Mon Feb 13 15:57:08 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 13 Feb 2006 15:57:08 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA Message-ID: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> I'm trying to get the sequences of each exon in a gene. I have a genbank file with mRNA and exon features (among others) that look like: mRNA join(complement(22257..22386),complement(22067..22186), complement(16753..17101),complement(13840..13962), complement(10649..10820),complement(502..3028)) /gene="ENSG00000005812" /note="transcript_id=ENST00000355619" exon complement(13840..13962) /note="exon_id=ENSE00000802462" I want to make a FASTA file with 6 sequences corresponding to the 6 exons in the mRNA above. I tried writing the below code, but it doesn't do what I want. (You'll note that the code is stolen from the Bio::Seq and Feature HOWTOs.) my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my @features = $seq->get_SeqFeatures(); # just top level foreach my $feat ( @features ) { my $type = $feat->primary_tag; if ($type eq "mRNA") { print "Feature ",$feat->primary_tag, " starts ",$feat->start," ends ", $feat->end, " strand ",$feat->strand,"\n"; my @feats = $feat->get_SeqFeatures(); print "Found ", scalar @feats, " sub-features\n"; } elsif ($type eq "exon") { print "Feature ",$feat->primary_tag, " starts ",$feat->start," ends ", $feat->end, " strand ",$feat->strand,"\n"; } } } When I run the above, it says that the mRNA features have no sub-features. So how do I pull out the 6 sequences? Thanks, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From cjfields at uiuc.edu Mon Feb 13 18:18:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 17:18:24 -0600 Subject: [Bioperl-l] INSTALL.WIN in wiki Message-ID: <000001c630f3$c9efa5f0$15327e82@pyrimidine> I just added "Installing Bioperl on Windows" to the wiki. It needs some major updating and changes in formatting: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Jason has mentioned changing up some of the INSTALL docs for the wiki (http://www.bioperl.org/wiki/Talk:Getting_BioPerl). Any thoughts? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Mon Feb 13 20:38:30 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 13 Feb 2006 20:38:30 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> Message-ID: Amir, The idea is to look at the sub-locations in the SplitLocation object, this is discussed in FAQ 5.2: http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F The sequence of the feature itself can be obtained by using the entire_seq() method: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences Brian O. On 2/13/06 3:57 PM, "Amir Karger" wrote: > I'm trying to get the sequences of each exon in a gene. I have a genbank > file with mRNA and exon features (among others) that look like: > mRNA join(complement(22257..22386),complement(22067..22186), > complement(16753..17101),complement(13840..13962), > complement(10649..10820),complement(502..3028)) > /gene="ENSG00000005812" > /note="transcript_id=ENST00000355619" > exon complement(13840..13962) > /note="exon_id=ENSE00000802462" > > I want to make a FASTA file with 6 sequences corresponding to the 6 exons in > the mRNA above. I tried writing the below code, but it doesn't do what I > want. (You'll note that the code is stolen from the Bio::Seq and Feature > HOWTOs.) > > my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); > while (my $seq = $inseq->next_seq) { > my @features = $seq->get_SeqFeatures(); # just top level > foreach my $feat ( @features ) { > my $type = $feat->primary_tag; > if ($type eq "mRNA") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > my @feats = $feat->get_SeqFeatures(); > print "Found ", scalar @feats, " sub-features\n"; > } elsif ($type eq "exon") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > } > } > } > > When I run the above, it says that the mRNA features have no sub-features. > So how do I pull out the 6 sequences? > > Thanks, > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Feb 13 18:58:46 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 13 Feb 2006 15:58:46 -0800 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> References: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> Message-ID: Why you want subfeatures? This is genbank format you're parsing, right? Your mRNA features will have a split location. Loop over $feat->location->each_Location() and get $seq->subseq() with the start and end of each sublocation. If you don't know how to do this check out the implementation of $feature->splice_seq(). This should be in the HOWTO. Is it not? -hilmar On 2/13/06, Amir Karger wrote: > I'm trying to get the sequences of each exon in a gene. I have a genbank > file with mRNA and exon features (among others) that look like: > mRNA join(complement(22257..22386),complement(22067..22186), > complement(16753..17101),complement(13840..13962), > complement(10649..10820),complement(502..3028)) > /gene="ENSG00000005812" > /note="transcript_id=ENST00000355619" > exon complement(13840..13962) > /note="exon_id=ENSE00000802462" > > I want to make a FASTA file with 6 sequences corresponding to the 6 exons in > the mRNA above. I tried writing the below code, but it doesn't do what I > want. (You'll note that the code is stolen from the Bio::Seq and Feature > HOWTOs.) > > my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); > while (my $seq = $inseq->next_seq) { > my @features = $seq->get_SeqFeatures(); # just top level > foreach my $feat ( @features ) { > my $type = $feat->primary_tag; > if ($type eq "mRNA") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > my @feats = $feat->get_SeqFeatures(); > print "Found ", scalar @feats, " sub-features\n"; > } elsif ($type eq "exon") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > } > } > } > > When I run the above, it says that the mRNA features have no sub-features. > So how do I pull out the 6 sequences? > > Thanks, > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From osborne1 at optonline.net Mon Feb 13 21:11:33 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 13 Feb 2006 21:11:33 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: Message-ID: Hilmar, It could be spelled out a bit more explicitly. Brian O. On 2/13/06 6:58 PM, "Hilmar Lapp" wrote: > This should be in the HOWTO. Is it not? From rmb32 at cornell.edu Mon Feb 13 17:12:10 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 13 Feb 2006 17:12:10 -0500 Subject: [Bioperl-l] game xml SeqIO Message-ID: <43F1043A.2000205@cornell.edu> Hi all, Currently, the SeqIO for doing GAME XML does not seem to support writing (or reading?) elements. Am I correct? If I am, are there any plans to add this functionality? Can I help / do it? If there are plans to add this, how would one distinguish SeqFeatures that should be rendered as from SeqFeatures that should be rendered as ? Would we do that with Bio::SeqFeature::Computation? I assume that a given Seq can have SeqFeatures of different types associated with it (I don't know, I'm a bioperl newb). Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 607-255-2360 rmb32 at cornell.edu http://www.sgn.cornell.edu From heikki at sanbi.ac.za Tue Feb 14 01:59:29 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 14 Feb 2006 08:59:29 +0200 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602100906.11885.heikki@sanbi.ac.za> References: <200602100906.11885.heikki@sanbi.ac.za> Message-ID: <200602140859.30136.heikki@sanbi.ac.za> I've committed an interim solution to the sequence evolution problem: $newseq = Bio::SeqUtils-> evolve ($seq, $similarity, $transition_transversion_rate); I will go on to transform this code to fully OO, extensible solution. -Heikki On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > Ryan Golhar's mail got me thinking that we should have a simple framework > for mutating sequences to a desired level. The model can then be extended > to necessary complexity when needed by subclassing. > > To start with, I have been planning: > > > Bio::SeqEvolution::EvolutionI - interface file > Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > (defaults to Bio::PrimarySeq) > Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses > Bio::SeqEvolution::EvolutionI::each_seqs($count) > - returns an array of $count seqs > Bio::SeqEvolution::EvolutionI::_generate_seq() > Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > converteed to probabilites of change internally > > various methods to define the extent of divergence: > only one to start with: > Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > (= 100% - identity) > > Bio::SeqEvolution::Factory - core class to call, > instantiates subclasses, Bio::SeqEvolution::DNASimple for > nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > defaults to Bio::SeqEvolution::DNASimple for nucleotides > > > Bio::SeqEvolution::DNASimple - default for nucleotides > Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > e.g. 5 => 5:1, defaults to 1:1 > simple alternative to a scoring matrix > > > I am soliciting usual comments and suggestions about naming and minimal > functionality. > > > -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From gbazykin at Princeton.EDU Tue Feb 14 09:34:54 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Tue, 14 Feb 2006 09:34:54 -0500 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602140859.30136.heikki@sanbi.ac.za> References: <200602100906.11885.heikki@sanbi.ac.za> <200602140859.30136.heikki@sanbi.ac.za> Message-ID: <214316262.20060214093454@princeton.edu> Hi, Just a thought: I really think that in perspective, it would be nice to be able to evolve the sequence along a tree of given shape. I think PAML's "evolver" has this functionality. I've already been doing this in my scripts, but I am not sure how to couple the tree and the sequence data properly. Yegor (George) Bazykin ------------------------------ Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > I've committed an interim solution to the sequence evolution problem: > $newseq = Bio::SeqUtils-> evolve > ($seq, $similarity, $transition_transversion_rate); > I will go on to transform this code to fully OO, extensible solution. > -Heikki > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: >> Ryan Golhar's mail got me thinking that we should have a simple framework >> for mutating sequences to a desired level. The model can then be extended >> to necessary complexity when needed by subclassing. >> >> To start with, I have been planning: >> >> >> Bio::SeqEvolution::EvolutionI - interface file >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, >> (defaults to Bio::PrimarySeq) >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses >> Bio::SeqEvolution::EvolutionI::each_seqs($count) >> - returns an array of $count seqs >> Bio::SeqEvolution::EvolutionI::_generate_seq() >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring >> converteed to probabilites of change internally >> >> various methods to define the extent of divergence: >> only one to start with: >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation >> (= 100% - identity) >> >> Bio::SeqEvolution::Factory - core class to call, >> instantiates subclasses, Bio::SeqEvolution::DNASimple for >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, >> defaults to Bio::SeqEvolution::DNASimple for nucleotides >> >> >> Bio::SeqEvolution::DNASimple - default for nucleotides >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, >> e.g. 5 => 5:1, defaults to 1:1 >> simple alternative to a scoring matrix >> >> >> I am soliciting usual comments and suggestions about naming and minimal >> functionality. >> >> >> -Heikki From maximilianh at gmail.com Tue Feb 14 05:11:42 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Tue, 14 Feb 2006 11:11:42 +0100 Subject: [Bioperl-l] [BiO BB] Re: Tool to mutate DNA sequence In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Message-ID: <76f031ae0602140211n2a0bbf4fl@mail.gmail.com> The tool ROSE also evolves sequences on a tree. There is a web interface and downloadable source at http://bibiserv.techfak.uni-bielefeld.de/rose/ Max On 09/02/06, Jason Stajich wrote: > Depending on whether or not you want to use evolutionary realistic > models... > * evolver which comes with PAML lets you evolve sequences on a tree > * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? > id=seqgen > also lets you do this > I believe there are PISE interfaces to both of these at the pasteur > bioweb site - http://bioweb.pasteur.fr/ > > -jason > On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote: > > > Does anyone know of tool to mutate a DNA sequence by a specified > > amount? > > For instance, say I have a DNA sequence 1000 bases long, and I want to > > simulate mutations to make it 75% (or 80%, etc) similar to the > > original. > > > > > > Ryan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Maximilian Haeussler, CNRS Gif-sur-Yvette, Paris tel: +33 6 12 82 76 16 icq: 3825815 -- msn: maximilian.haeussler at hpi.uni-potsdam.de skype: maximilianhaeussler From heikki at sanbi.ac.za Tue Feb 14 11:09:27 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 14 Feb 2006 18:09:27 +0200 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <214316262.20060214093454@princeton.edu> References: <200602100906.11885.heikki@sanbi.ac.za> <200602140859.30136.heikki@sanbi.ac.za> <214316262.20060214093454@princeton.edu> Message-ID: <200602141809.28057.heikki@sanbi.ac.za> Yegor, Like you said, there are examples how it is done.. It should be possible to evolve sequences based on a rooted tree. You just walk the tree and evolve each sequence from its parent. If there is an agreement how the branch lengths get translated to mutations, even that could be done. Do you have any suggestions? -Heikki On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: > Hi, > > Just a thought: I really think that in perspective, it would be nice > to be able to evolve the sequence along a tree of given shape. I think > PAML's "evolver" has this functionality. I've already been doing this > in my scripts, but I am not sure how to couple the tree and the > sequence data properly. > > Yegor (George) Bazykin > > > ------------------------------ > > Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > > I've committed an interim solution to the sequence evolution problem: > > > > $newseq = Bio::SeqUtils-> evolve > > ($seq, $similarity, $transition_transversion_rate); > > > > I will go on to transform this code to fully OO, extensible solution. > > > > -Heikki > > > > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > >> Ryan Golhar's mail got me thinking that we should have a simple > >> framework for mutating sequences to a desired level. The model can then > >> be extended to necessary complexity when needed by subclassing. > >> > >> To start with, I have been planning: > >> > >> > >> Bio::SeqEvolution::EvolutionI - interface file > >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > >> (defaults to Bio::PrimarySeq) > >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses > >> Bio::SeqEvolution::EvolutionI::each_seqs($count) > >> - returns an array of $count seqs > >> Bio::SeqEvolution::EvolutionI::_generate_seq() > >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > >> converteed to probabilites of change internally > >> > >> various methods to define the extent of divergence: > >> only one to start with: > >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > >> (= 100% - identity) > >> > >> Bio::SeqEvolution::Factory - core class to call, > >> instantiates subclasses, Bio::SeqEvolution::DNASimple for > >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > >> defaults to Bio::SeqEvolution::DNASimple for nucleotides > >> > >> > >> Bio::SeqEvolution::DNASimple - default for nucleotides > >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > >> e.g. 5 => 5:1, defaults to 1:1 > >> simple alternative to a scoring matrix > >> > >> > >> I am soliciting usual comments and suggestions about naming and minimal > >> functionality. > >> > >> > >> -Heikki > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From golharam at umdnj.edu Tue Feb 14 12:01:38 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 14 Feb 2006 12:01:38 -0500 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za> Message-ID: <016401c63188$52c9d4b0$2f01a8c0@GOLHARMOBILE1> Here are my two cents.... 1. Allow sequences to be mutated by some percent amount. 2. Use mutation patterns implied by PAM matrices or some known models of mutation. 3. Have the output show the original sequences and the mutated sequence so you can easily identify what was mutated and what is conserved. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Tuesday, February 14, 2006 11:09 AM To: bioperl-l at lists.open-bio.org; Georgii A Bazykin Subject: Re: [Bioperl-l] planning sequence mutating modules Yegor, Like you said, there are examples how it is done.. It should be possible to evolve sequences based on a rooted tree. You just walk the tree and evolve each sequence from its parent. If there is an agreement how the branch lengths get translated to mutations, even that could be done. Do you have any suggestions? -Heikki On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: > Hi, > > Just a thought: I really think that in perspective, it would be nice > to be able to evolve the sequence along a tree of given shape. I think > PAML's "evolver" has this functionality. I've already been doing this > in my scripts, but I am not sure how to couple the tree and the > sequence data properly. > > Yegor (George) Bazykin > > > ------------------------------ > > Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > > I've committed an interim solution to the sequence evolution > > problem: > > > > $newseq = Bio::SeqUtils-> evolve > > ($seq, $similarity, $transition_transversion_rate); > > > > I will go on to transform this code to fully OO, extensible > > solution. > > > > -Heikki > > > > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > >> Ryan Golhar's mail got me thinking that we should have a simple > >> framework for mutating sequences to a desired level. The model can > >> then be extended to necessary complexity when needed by > >> subclassing. > >> > >> To start with, I have been planning: > >> > >> > >> Bio::SeqEvolution::EvolutionI - interface file > >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > >> (defaults to Bio::PrimarySeq) > >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by > >> subclasses > >> Bio::SeqEvolution::EvolutionI::each_seqs($count) > >> - returns an array of $count seqs > >> Bio::SeqEvolution::EvolutionI::_generate_seq() > >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > >> converteed to probabilites of change internally > >> > >> various methods to define the extent of divergence: > >> only one to start with: > >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > >> (= 100% - identity) > >> > >> Bio::SeqEvolution::Factory - core class to call, > >> instantiates subclasses, Bio::SeqEvolution::DNASimple for > >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > >> defaults to Bio::SeqEvolution::DNASimple for nucleotides > >> > >> > >> Bio::SeqEvolution::DNASimple - default for nucleotides > >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > >> e.g. 5 => 5:1, defaults to 1:1 > >> simple alternative to a scoring matrix > >> > >> > >> I am soliciting usual comments and suggestions about naming and > >> minimal functionality. > >> > >> > >> -Heikki > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hjm at tacgi.com Tue Feb 14 12:15:11 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Tue, 14 Feb 2006 09:15:11 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: <200602140915.11604.hjm@tacgi.com> Hi Brian, Thanks very much for the pointers and the speed of your reply and apologies for the speed of mine. This looks good, but what I was looking for was a bioP approach for hooking to an API at NCBI or EBI so I could get this info and seqs from them. In this case, speed of retrieval is not critical and I'd rather not download the entirety of the sequences to a local disk to hack at them. I've determined a screen-scraping approach to get them and could script that, but I thought that bioP had a method for using NCBI's external API's, tho it may be that my memory is faulty or the approach is no longer supported due to overload. Does NCBI make such APIs available anymore? I searched a bit for docs on them but couldn't find anything (unless it's buried in the NCBI tookit, which I haven't started to excavate). Failing that, would SEALS provide such a service? Any PerlPinipeds listening? Harry On Sunday 12 February 2006 08:37, Brian Osborne wrote: > Harry, > > Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, > from its documentation: > > use Bio::DB::Fasta; > > # create database from directory of fasta files > my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > # simple access (for those without Bioperl) > my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > my @ids = $db->ids; > my $length = $db->length('CHROMOSOME_I'); > my $alphabet = $db->alphabet('CHROMOSOME_I'); > my $header = $db->header('CHROMOSOME_I'); > > # Bioperl-style access > my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > my $seq = $obj->seq; > my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > Do you already have the offsets? > > Brian O. > > On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > Hi All, > > > > After perusing the tutorial and other docs for a an evening, I still > > can't find the answer to this. Forgive me if I've missed something > > obvious. > > > > This should not be a novel request, but I've not found it answered. If > > bioperl isn't the best way to do this, I'd be grateful to a pointer to a > > better way, especially if it includes an illuminating bit of code. > > > > The problem is to retrieve genomic sequences plus & minus some offset > > from a locus determined by HUGO keyword or GeneID. This would be a > > common followup chore for some extra analysis from a gene expression > > expt. Or maybe this is in the DBFetch routines, but I've missed the > > sequence type to specify...? > > > > > > TIA! -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From jason.stajich at duke.edu Tue Feb 14 13:25:21 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 14 Feb 2006 13:25:21 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> References: <200602140915.11604.hjm@tacgi.com> Message-ID: <13B3724F-3716-4C4B-95A7-6849EF167A80@duke.edu> Are you working spp that are in Ensembl? Is what you need not provided by Ensembl/EnsMart? Seems like they are doing the best job integrating gene ids to a central place. It is not exactly clear what API you are referring to - you can query Entrez via Bio::DB::Query::GenBank so if you can construct your query via the Entrez syntax you can access and retrieve it in bioperl. -jason On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and > apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for > hooking to > an API at NCBI or EBI so I could get this info and seqs from them. > In this > case, speed of retrieval is not critical and I'd rather not > download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could > script that, > but I thought that bioP had a method for using NCBI's external > API's, tho it > may be that my memory is faulty or the approach is no longer > supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for > docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, > which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds > listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on >> Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it >>> answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a >>> pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some >>> offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! > > -- > Cheers, Harry > Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com > <> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Feb 14 13:40:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Feb 2006 12:40:31 -0600 Subject: [Bioperl-l] FW: more on RemoteBlast.pm version 1.2 Message-ID: <000e01c63196$225159d0$15327e82@pyrimidine> Sorry, forgot to add that I didn't see the regex issue that you mentioned. It could be a perl-related issue. Try the fixes I mentioned and see what happens. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, February 14, 2006 12:36 PM > To: 'gyang at plantbio.uga.edu' > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > It's a good habit to always add single quotes around words. The perl > interpreter may think a single bare word is a subroutine or perlfunc > called with no args so will try to find a subroutine named blastp(). My > debugger actually gives the error that the bare word blastp may conflict > with a future reserved word. Like you said, 'use strict' will point that > out. > > As for the regex, it should match all the blast programs at NCBI (blastp, > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > else passes through. > > So, if you are using the script below, there are several errors. The bare > words for $prog and $db need quotes, and the flags for you @params array > don't have a dash before them. I get this after adding quotes but before > adding the dashes to @params: > > C:\Perl\Scripts>test_blast.pl > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: > STACK: Error::throw > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > live/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > live/Bio/Tools/Run/RemoteBlast.pm:256 > STACK: C:\Perl\Scripts\test_blast.pl:15 > ----------------------------------------------------------- > > The last line indicates a problem with this line: > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > Changing the @params to this: > > my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); > > fixes it, and I get output as expected. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > Sent: Tuesday, February 14, 2006 11:48 AM > > To: Chris Fields; bioperl-l at lists.open-bio.org > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > Hi, Chris, > > When I tried with the perldoc script, It did not work either. First it > > says $prog can not be bare word if I "use strict". I added quotes on the > > words, then it says the value for $prog does not match expression > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > script > > is shown below. Why is the expression "t?blast[pnx]"? > > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > use Bio::Seq; > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > > > > > my $prog=blastp; > > my $db=swissprot; > > my $e_val=1e-10; > > my @params=( prog=>$prog, > > data=>$db, > > expect=>$e_val, > > readmethod=>'SearchIO'); > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > my $v = 1; > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > while (my $input = $str->next_seq()){ > > #Blast a sequence against a database: > > #Alternatively, you could pass in a file with many > > #sequences rather than loop through sequence one at a time > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thank you for your help! > > > > > > Guojun > > Department of Plant Biology > > University of Georgia > > > > ----- Original Message ----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > Try two things: > > > > 1) Use a much simpler script, like the one in 'perldoc > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > wrong > > > with the logic in your subroutine: > > > > my $v = 1; > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > #Alternatively, you could pass in a file with many > > > #sequences rather than loop through sequence one at a time > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); > > > #my $r = $factory->submit_blast('amino.fa'); > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > really > > > shouldn't make that much of a difference, but I noticed that the CVS > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > > released; the Bugzilla version is based off CVS. > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > Thanks, Chris, > > > > I installed version 1.5.1 and replaced the blast.pm file with the > one > > from > > > > your bug report. The running version is 1.5 when I use the command > you > > > > sent me. But when I tried the script, it doesn't change much. My > > > > remoteblast code (portion) is here: > > > > > > sub search { > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > local > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > 'no'; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > -id=>"query", > > > > -desc=>"new seq"); > > > > my $len=$query->length(); > > > > @db=('nr','htgs','wgs'); > > > > foreach my $db (@db) { > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > > > > '-data' =>"$db", > > > > '-expect'=>"$E_value"); > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > my @rids = $factory->each_rid(); > > > > foreach my $rid ( @rids ) { > > > > print STDERR "$rid\n"; > > > > } > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > print STDERR "waiting..."; > > > > sleep 60; > > > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > while (!ref($rc) ) { > > > > if( $rc < 0 ) { > > > > # retrieve_blast returns -1 on error > > > > $factory->remove_rid($rid); > > > > print "Error!\n"; > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > die "Can't retrieve $rid"; > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > finished' > > > > sleep 60; > > > > $rc = $factory->retrieve_blast($rid); > > > > } > > > > } > > > > if (ref($rc)) { > > > > print STDERR "Done.\n"; > > > > while( my $result = $rc->next_result) { > > > > while( my $hit = $result->next_hit()) { > > > > $hit_name=$hit->name; > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > $name=$1; > > > > @left_plus_start=(); > > > > @left_plus_end=(); > > > > @left_minus_start=(); > > > > @left_minus_end=(); > > > > @right_plus_start=(); > > > > @right_plus_end=(); > > > > @right_minus_start=(); > > > > @right_minus_end=(); > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > while( my $hsp = $hit->next_hsp()) { > > > > ...... > > > > > > It was working quite well before around October laster year, but > > it has > > > > stopped since then, When a submission is sent via a webpage, the cgi > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > finally > > > > the expected email is received but without real results although it > > does > > > > contain something from other parts of the script. Apparently the > > search > > > > sub did not return anything (I know there is something should be > > > > returned.). Is it also possible the format of the NCBI output for > each > > > > result has changed? > > > > Thank you, > > > > Guojun > > > > > > > > Department of Plant Biology > > > > University of Georgia > > > > > > > > > > ----- Original Message ----- > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > How do you know two versions are installed (i.e. how are > you > > checking > > > > the > > > > > version)? Do you see have two complete bioperl distributions (in > > two > > > > > separate directories) or are you looking in modules? Here's the > way > > to > > > > > check the version (from the FAQ): > > > > > > perl -MBio::Root::Version -e 'print > > $Bio::Root::Version::VERSION,"\n"' > > > > > > If you have two full bioperl distributions on your computer, > > normally > > > > only > > > > > one will be in use unless you have explicitly set the environment > > > > variable > > > > > PERL5LIB. The PERL5LIB directories will be searched first before > > your > > > > > normal perl directory list (@INC) is searched. You MAY get some > > mixing > > > > > then, but only if perl can't find a particular module in the path > > > > designated > > > > > in PERL5LIB; then it will progress through the directories listed > in > > > > @INC. > > > > > This may happen if a module is unique to a particular release, but > > > > shouldn't > > > > > happen for the majority of modules, including RemoteBlast. You > can > > > > check > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > differ > > > > > depending on your OS, perl build, etc. > > > > > > Regardless, if you follow the directions for installing bioperl > > for > > > > your > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install', > > unless > > > > you > > > > > explicitly change the installation directory when using 'perl > > > > Makefile.PL'), > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > install > > > > the > > > > > Bioperl distribution you downloaded over the old version in @INC. > > See > > > > this > > > > > page: > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > for more details. > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > > -----Original Message----- > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > Hi, Chris, > > > > > > I do have different versions of bioperl on my Linux machine > (1.4. > > and > > > > > > 1.5.0), this may be the problem. Should I just install bioperl- > > 1.5.1 > > > > or I > > > > > > need to uninstall and remove the previous versions. I could not > > find > > > > any > > > > > > hint on uninstalling bioperl on linux. Could you please give me > > some > > > > > > suggestion? > > > > > > Thanks, > > > > > > Guojun > > > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > _____ > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely > > > > updated from CVS > > > > > > which isn't the latest fix. > > > > > > > > Make sure that you check the following: > > > > > > > > 1) Always post to the mailing list: > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live > > (CVS) > > > > > > installed first. Perform a clean installation; do not upgrade > > only > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > can't > > > > > > guarantee that mixing modules from old and new distributions > (1.4 > > and > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > > > > > installation will allow text output from BLAST v.2.2.12 to be > > saved > > > > and > > > > > > parsed; it will not parse the newest BLAST text output from NCBI > > > > (v2.2.13) > > > > > > but it should still save it. I believe as long as next_results() > > isn't > > > > > > called, it will work. > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > 2.2.13 > > > > text output > > > > > > are NOT in CVS; they haven't been cleared and checked in by > Roger > > Hall > > > > > > (who's now taking care of RemoteBlast) and the powers that be > > (Jason > > > > or > > > > > > whomever is in charge of Bio::SearchIO). They can be found in > > > > Bugzilla: > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > option > > > > of > > > > > > saving XML output, so isn't necessary if you don't plan on using > > this > > > > > > option. And, remember, they haven't been committed yet to CVS, > > which > > > > > > means that the final version will change to refle the new > version. > > > > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > _____ > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > To: Chris Fields > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > Hi, Chris > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work > > for > > > > my cgi > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't > > even > > > > get > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > Guojun > > > > > > > > > > Guojun Yang > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > Tel: 706-542-1857 > > > > > > Fax: 706-542-1805 > > > > > > http://www.arches.uga.edu/~guojun > > > > > > _____ > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > I would say give the new code a try, but realize that it > > hasn't > > > > been > > > > > > checked > > > > > > in (like I said below). I will try going over the modified > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > anything I > > > > > > might > > > > > > have missed. The changed order in the header of BLAST text > output > > has > > > > me a > > > > > > bit worried that it might not catch everything, but it at least > > > > doesn't > > > > > > hang > > > > > > in the while() loop I described in the bug report below (bug > > #1934) > > > > and > > > > > > seems to process everything fine. > > > > > > > > If you want more stability in the code, you might consider > > > > changing over > > > > > > to > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are > > some > > > > > > changes > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > saving > > > > XML > > > > > > output, but I believe it parses everything regardless. If you > look > > > > back > > > > > > the > > > > > > last month or so there has been a bit of discussion here about > it. > > > > Jason > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > > > remoteblast/ > > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > -----Original Message----- > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > To: bioperl-l at bioperl.org > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > version > > > > 1.28 > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > I see this post and am wondering if this is the reason for the > > > > > > > malfunctionning of my webserver. We set up a webserver named > > MAK, > > > > for > > > > > > MITE > > > > > > > sequence analysis. It was working very well until around > > November > > > > 2005, > > > > > > > when it stopped returning any result (the site is fine and > seems > > to > > > > be > > > > > > > doing sth after submission). In the CGI script, I used > > remoteblast > > > > (that > > > > > > > work was done in 2003) to do searches. I currently do not have > > > > access to > > > > > > > the server because I moved. Quite several people sent emails > to > > us > > > > about > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > problem? > > > > > > Should > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > version? > > > > > > > Thanks a lot, > > > > > > > Guojun > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > Tel: 706-542-1857 > > > > > > > Fax: 706-542-1805 > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > _____ > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > Jian' > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > [mailto:bioperl- > > > > > > > l at bioperl.org] > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live > > CVS. > > > > It > > > > > > > will > > > > > > > work for saving text output. However, it will not parse > anything > > > > using > > > > > > > next_result (it will likely hang) and will not save XML > format. > > See > > > > > > these > > > > > > > bugs: > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > for explanations and possible fixes (changes to RemoteBlast > and > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in > > yet > > > > so > > > > > > are > > > > > > > still not included in bioperl-live; they may be further > modified > > > > before > > > > > > > committing to CVS. If you're not worried about XML, you could > > just > > > > try > > > > > > the > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago using a > > > > script > > > > > > > which > > > > > > > had problems; the script you used saves the output but doesn't > > > > actually > > > > > > > parse it (i.e. you don't use next_result() to go through the > > data). > > > > Is > > > > > > the > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have > you > > > > tried > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > readmethod > > > > => > > > > > > > blast" > > > > > > > using your version of RemoteBlast and method next_result()? > Like > > > > below > > > > > > > (from > > > > > > > perldoc): > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > foreach my $rid ( @rids ) { > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > if( !ref($rc) ) { > > > > > > > if( $rc < 0 ) { > > > > > > > $factory->remove_rid($rid); > > > > > > > } > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > sleep 5; > > > > > > > } else { # parsing > > > > > > > starts here > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > here > > > > > > > #save the output > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > $factory->save_output($filename); > > > > > > > $factory->remove_rid($rid); > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > next unless ( $v > 0); > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior to > the > > > > fixes. > > > > > > I > > > > > > > want to see how many others are having the same issues with > > parsing > > > > > > using > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm > > works > > > > on > > > > > > the > > > > > > > > logic of checking the temporary file size to determine > whether > > the > > > > > > Blast > > > > > > > > results are ready. This condition is not getting satisfied > may > > be > > > > due > > > > > > to > > > > > > > > some changes brought about by NCBI. I had this problem > > recently > > > > and > > > > > > > > figured out that the solution was to use the latest version > > which > > > > has > > > > > > > > this problem fixed (does not use file size logic any more) > > which > > > > is > > > > > > not > > > > > > > > yet included in the BioPerl package. > > > > > > > > Cheers > > > > > > > > Nagesh > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > you > > > > send > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > net, > > so > > > > still > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > >> I see that you are submitting a sequence for a remote > blast > > > > search. > > > > > > > Can > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > (2005/12/09). > > > > > > If > > > > > > > > >> not I have attached it with this email, try to replace it > > with > > > > the > > > > > > > old > > > > > > > > >> one which has a bug. > > > > > > > > >> Let me know if it works. > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From sdavis2 at mail.nih.gov Tue Feb 14 15:02:59 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 14 Feb 2006 15:02:59 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: You can look get the upstream regions for genes via the table browser at UCSC. If you want to do it yourself, just download their refGene table (as a tab-delimited text file) that includes the HUGO gene name. Then, use the method given by Brian to look up the locations. The genome just isn't THAT big to download and to store locally. Note that most of the big sites (like NCBI, for example) impose restrictions on the number and timing of hits, so utilizing them for high-thoughput analysis (like for gene expression studies) is not always feasible. I have found that having the data locally is almost always better. Sean On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From cjfields at uiuc.edu Tue Feb 14 15:32:42 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Feb 2006 14:32:42 -0600 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine> Hilmar, Good News: I've added a section to the bioperl wiki on installing bioperl-db in Windows: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl -db Bad News: There's a new problem now. I updated from CVS yesterday; I walked through the steps and ran 'nmake test', with everything passing fine. However, load_seqdatabase.pl is extremely slow; it's loading a sequence every 5 minutes or so. I noticed (when using '-debug') that it is hanging up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a database, load the biosql schema, and load sequences w/o loading taxonomy, the problem goes away. Here's the debugging output (I cut it off at the point it hangs up): ---------------------------------------------------------------------------- ------------------------- C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format genbank -debug NP_252217.gpt Loading NP_252217.gpt ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "test" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) ---------------------------------------------------------------------------- ------------------------- Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Tue Feb 14 16:32:42 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 14 Feb 2006 16:32:42 -0500 Subject: [Bioperl-l] game xml SeqIO In-Reply-To: <43F1043A.2000205@cornell.edu> Message-ID: Robert, It looks like you're right that this data isn't handled by SeqIO/game. If you'd like to add this then feel free to do it, the modified files or patches can be submitted to bugzilla.bioperl.org. If you take this on then please add a test or 2 to t/game.t as well. Yes, Bio::SeqFeature::Computation sounds right - does it match the data you're trying to parse? SeqFeature::Generic is the most commonly used, and it's flexible, but if another type of SeqFeature fits your data more precisely then that's the one you should use. Brian O. On 2/13/06 5:12 PM, "Robert Buels" wrote: > Hi all, > > Currently, the SeqIO for doing GAME XML does not seem to support writing > (or reading?) elements. Am I correct? > > If I am, are there any plans to add this functionality? Can I help / do it? > > If there are plans to add this, how would one distinguish SeqFeatures > that should be rendered as from SeqFeatures > that should be rendered as ? Would we do that with > Bio::SeqFeature::Computation? I assume that a given Seq can have > SeqFeatures of different types associated with it (I don't know, I'm a > bioperl newb). > > Rob From saldroubi at yahoo.com Tue Feb 14 22:54:42 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Tue, 14 Feb 2006 19:54:42 -0800 (PST) Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix Message-ID: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> All, I am trying to use Bio::Matrix::GenericMatrix module. I simply put this line in my program: use Bio::Matrix::GenericMatrix; but I get the followin error: Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: /usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18. BEGIN failed--compilation aborted at sf.pl line 18. I found this module using find which is called Generic.pm in this directory /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix Could someone tell me why it is not working. I have no trouble including these modules in my file. use Bio::SeqIO; use Bio::DB::GenBank; Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From jason.stajich at duke.edu Tue Feb 14 23:10:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 14 Feb 2006 23:10:56 -0500 Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix In-Reply-To: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> References: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> Message-ID: try: use Bio::Matrix::Generic; Apparently I screwed up the SYNOPSIS. fixed that just now. -jason On Feb 14, 2006, at 10:54 PM, Sam Al-Droubi wrote: > All, > > I am trying to use Bio::Matrix::GenericMatrix module. > I simply put this line in my program: > use Bio::Matrix::GenericMatrix; > > but I get the followin error: > > Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: / > usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 / > usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/ > perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ > vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/ > vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18. > BEGIN failed--compilation aborted at sf.pl line 18. > > I found this module using find which is called Generic.pm in this > directory > /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix > > Could someone tell me why it is not working. I have no trouble > including these modules in my file. > use Bio::SeqIO; > use Bio::DB::GenBank; > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From daniel.lang at biologie.uni-freiburg.de Wed Feb 15 05:35:40 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed, 15 Feb 2006 11:35:40 +0100 Subject: [Bioperl-l] distmat matrix Message-ID: <43F303FC.9000806@biologie.uni-freiburg.de> Hi, I need to go through a uncorrected distmat matrix (EMBOSS, run locally) to filter sequences from an MSA. I had a look around and didn't find an obvious candidate. Before I start writing something my own... Is there a bioperl parser for reading distmat matrices or can I trick the Bio::MapIO parsers for scoring or PHYLIP in doing so? If anyone knows of course a tool to generate an uncorrected distance matrix of protein MSAs that is supported by bioperl, would be also OK for me:) I have no experience with the Pise (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand it it's only to execute the application on a remote web server? Or can I solve my task with Pise? Thanks in advance! Daniel From praveecbt at yahoo.co.in Wed Feb 15 03:57:44 2006 From: praveecbt at yahoo.co.in (Praveen Raj) Date: Wed, 15 Feb 2006 08:57:44 +0000 (GMT) Subject: [Bioperl-l] Help Message-ID: <20060215085744.14911.qmail@web8711.mail.in.yahoo.com> Dear Peter Schattner Sir, I have one problem with the profile_align() of Clustalw object. I have given the code like this, ...... 12 @seq_array=($seqobj1,$seqobj2,$seqobj3); 13 $seq_array_ref=\@seq_array; 14 $aln=$factory->align($seq_array_ref); 15 print $out $aln; # this works fine 16 $sen = Bio::Seq->new(-display_id => '>gi|userdata|', 17 -seq => "MTKKPGGPGKNRA....", 18 -format => "fasta"); 19 $aln=$factory->profile_align($aln,$sen); #problem here 20 print $out1 $aln; I have got one error like this in Line No. 19 ERROR: Could not open sequence file (-profile) No. of seqs. read = -1. No alignment! How I can I solve this problem? Hope you provide a proper solution. Thanking you, Praveen Raj, Project Student, NIV, India. --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. From jason.stajich at duke.edu Wed Feb 15 08:19:41 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 08:19:41 -0500 Subject: [Bioperl-l] distmat matrix In-Reply-To: <43F303FC.9000806@biologie.uni-freiburg.de> References: <43F303FC.9000806@biologie.uni-freiburg.de> Message-ID: <550C115C-1216-4285-8BE5-EC217C3F1BE9@duke.edu> Bioperl can parse PHYLIP distance matricies, see Bio::Matrix::IO. I didn't write an EMBOSS distmat result parser but that would be nice to have (but check that EMBOSS doesn't already allow output in phylip format first). There is pure-perl distance matrix calculation of a MSA for DNA sequences Bio::Align::DNAStatistics and for protein Bio::Align::ProteinStatistics There is some initial discussion here on the website, but could certainly use some more details. http://bioperl.org/wiki/Phylogenetics http://bioperl.org/wiki/HOWTO:Trees http://bioperl.org/wiki/Module:Bio::Align::DNAStatistics -jason On Feb 15, 2006, at 5:35 AM, Daniel Lang wrote: > Hi, > > I need to go through a uncorrected distmat matrix (EMBOSS, run > locally) > to filter sequences from an MSA. > I had a look around and didn't find an obvious candidate. Before I > start > writing something my own... > Is there a bioperl parser for reading distmat matrices or can I trick > the Bio::MapIO parsers for scoring or PHYLIP in doing so? > If anyone knows of course a tool to generate an uncorrected distance > matrix of protein MSAs that is supported by bioperl, would be also OK > for me:) > > I have no experience with the Pise > (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand > it it's only to execute the application on a remote web server? Or > can I > solve my task with Pise? > > Thanks in advance! > > Daniel > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From michael.watson at bbsrc.ac.uk Wed Feb 15 10:06:29 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 15 Feb 2006 15:06:29 -0000 Subject: [Bioperl-l] Website issues Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> Hi The links on the left of bioperl.org don't work in konqueror 3.1.1, which is a real b*gger because that's the browser I use on Linux... :-S Mick From rmb32 at cornell.edu Wed Feb 15 11:01:07 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 15 Feb 2006 11:01:07 -0500 Subject: [Bioperl-l] Bio::Tools::GFF parsing error Message-ID: <43F35043.7070705@cornell.edu> Hi all, I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using FeatureIO, except it purports not to support gff 2), and the file looks like: ##gff-version 2 ##date 2006-02-13 ##sequence-region C01HBa0088L02.seq 1 120525 C01HBa0088L02 RepeatMasker similarity 3537 4267 3.3 - . Target "Motif:bac_end_repeat_family_345" 1 740 C01HBa0088L02 RepeatMasker similarity 4172 4279 2.9 + . Target "Motif:HRSiTERT00100141" 1 104 C01HBa0088L02 RepeatMasker similarity 4267 4323 0.0 - . Target "Motif:k_29" 150 206 C01HBa0088L02 RepeatMasker similarity 4322 4492 26.6 + . Target "Motif:PRSiTERT00300001" 1960 2129 C01HBa0088L02 RepeatMasker similarity 4557 5124 29.5 + . Target "Motif:PRSiTERT00300001" 2142 2711 Notice the score column is padded with spaces. Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid score. My question is, who is wrong here, my input file or Bio::Tools::GFF? Should Bio::Tools::GFF be able to read this file? Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 607-255-2360 rmb32 at cornell.edu http://www.sgn.cornell.edu From jason.stajich at duke.edu Wed Feb 15 11:12:59 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 11:12:59 -0500 Subject: [Bioperl-l] Website issues In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> Okay I guess someone will have to look into that. Can you normally browse on wikipedia, we're just using their software, maybe it is a javascript problem? Please send a system bug request to our helpdesk: support at open-bio.org -jason On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > Hi > > The links on the left of bioperl.org don't work in konqueror 3.1.1, > which is a real b*gger because that's the browser I use on > Linux... :-S > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Wed Feb 15 11:13:16 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 15 Feb 2006 17:13:16 +0100 Subject: [Bioperl-l] Bio::Tools::GFF parsing error Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B2E@ANTARESIA.be.devgen.com> Hi Rob, According to the GFF Specifications Document @ http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml : All of the above described fields should be separated by TAB characters ('\t'). All values of the mandatory fields should not include whitespace (i.e. the strings for , and fields). Reading that, I am afraid you have to pre-process your gff input file ... HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Robert Buels > Sent: Wednesday, February 15, 2006 5:01 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Tools::GFF parsing error > > Hi all, > > I'm parsing a GFF2 file with Bio::Tools::GFF (I would be > using FeatureIO, except it purports not to support gff 2), > and the file looks > like: > > ##gff-version 2 > ##date 2006-02-13 > ##sequence-region C01HBa0088L02.seq 1 120525 > C01HBa0088L02 RepeatMasker similarity 3537 4267 > 3.3 > - . Target "Motif:bac_end_repeat_family_345" 1 740 > C01HBa0088L02 RepeatMasker similarity 4172 4279 > 2.9 > + . Target "Motif:HRSiTERT00100141" 1 104 > C01HBa0088L02 RepeatMasker similarity 4267 4323 > 0.0 > - . Target "Motif:k_29" 150 206 > C01HBa0088L02 RepeatMasker similarity 4322 4492 > 26.6 > + . Target "Motif:PRSiTERT00300001" 1960 2129 > C01HBa0088L02 RepeatMasker similarity 4557 5124 > 29.5 > + . Target "Motif:PRSiTERT00300001" 2142 2711 > > Notice the score column is padded with spaces. > > Bio::Tools::GFF does not like this, and says that ' 3.3' is > not a valid score. My question is, who is wrong here, my > input file or Bio::Tools::GFF? Should Bio::Tools::GFF be > able to read this file? > > Rob > > -- > Robert Buels > SGN Bioinformatics Analyst > 252A Emerson Hall, Cornell University > Ithaca, NY 14853 > Tel: 607-255-2360 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Wed Feb 15 11:29:14 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 11:29:14 -0500 Subject: [Bioperl-l] Website issues In-Reply-To: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> Message-ID: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE 3.1.4-9) But it works fine for me on 3.2.2-8.FC2 .... So I'm going to go with this being a konqueror bug, sorry to say, but feel free to still report the bug to the helpdesk. -jason On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote: > Okay I guess someone will have to look into that. Can you normally > browse on wikipedia, we're just using their software, maybe it is a > javascript problem? > > Please send a system bug request to our helpdesk: > support at open-bio.org > > -jason > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> The links on the left of bioperl.org don't work in konqueror 3.1.1, >> which is a real b*gger because that's the browser I use on >> Linux... :-S >> >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Wed Feb 15 11:57:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 10:57:13 -0600 Subject: [Bioperl-l] Added 'Installing Bioperl for Unix' to wiki Message-ID: <000301c63250$de506120$15327e82@pyrimidine> I added an Installing Bioperl for Unix page, http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix which is a quick redo of the INSTALL text file in the bioperl distribution. It's in workable shape but needs links revisions etc. Please leave any comments on the discussion pages here. http://www.bioperl.org/wiki/Talk:Getting_BioPerl http://www.bioperl.org/wiki/Talk:Installing_Bioperl_for_Unix Thanks to Brian for helping out with the Windows install doc! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From khoueiry at ibdm.univ-mrs.fr Wed Feb 15 12:23:21 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed, 15 Feb 2006 18:23:21 +0100 Subject: [Bioperl-l] Website issues In-Reply-To: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> Message-ID: <1140024202.2689.45.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060215/a69052f0/attachment.ksh From heikki at sanbi.ac.za Wed Feb 15 13:55:07 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 15 Feb 2006 20:55:07 +0200 Subject: [Bioperl-l] Website issues In-Reply-To: <1140024202.2689.45.camel@localhost> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> <1140024202.2689.45.camel@localhost> Message-ID: <200602152055.07667.heikki@sanbi.ac.za> Konqueror 3.5.1. has no problems, either. Clearly, older konqueror had a bug that has been permanently fixed. Michael, time for you to upgrade. -Heikki On Wednesday 15 February 2006 19:23, khoueiry wrote: > I test it on konqueror 3.4.2 and it works well !!! > > On Wed, 2006-02-15 at 11:29 -0500, Jason Stajich wrote: > > I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE > > 3.1.4-9) > > > > But it works fine for me on 3.2.2-8.FC2 .... > > > > So I'm going to go with this being a konqueror bug, sorry to say, but > > feel free to still report the bug to the helpdesk. > > > > -jason > > > > On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote: > > > Okay I guess someone will have to look into that. Can you normally > > > browse on wikipedia, we're just using their software, maybe it is a > > > javascript problem? > > > > > > Please send a system bug request to our helpdesk: > > > support at open-bio.org > > > > > > -jason > > > > > > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > > >> Hi > > >> > > >> The links on the left of bioperl.org don't work in konqueror 3.1.1, > > >> which is a real b*gger because that's the browser I use on > > >> Linux... :-S > > >> > > >> Mick > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > Duke University > > > http://www.duke.edu/~jes12 > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > Duke University > > http://www.duke.edu/~jes12 > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From gyang at plantbio.uga.edu Wed Feb 15 14:39:41 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Wed, 15 Feb 2006 14:39:41 -0500 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 Message-ID: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Hi, Chris, Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: " waiting........ ------------- EXCEPTION ------------- MSG: no data for midline Features flanking this part of subject sequence: STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc hIO/blast.pm:1172 STACK toplevel remoteblast_test:40 " The query sequence is: CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'): #!/usr/bin/perl use Bio::SeqIO; use Bio::Seq; use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use strict; my $prog='blastn'; my $db='nr'; my $e_val=1e-10; my @params=( -prog=>$prog, -data=>$db, -expect=>$e_val, -readmethod=>'SearchIO'); my $factory=Bio::Tools::Run::RemoteBlast->new(@params); my $v = 1; my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Do you think there might still be something in the NCBI output format? Thank you, Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun ----- Original Message ----- From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > Sorry, forgot to add that I didn't see the regex issue that you mentioned. > It could be a perl-related issue. Try the fixes I mentioned and see what > happens. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Tuesday, February 14, 2006 12:36 PM > > To: 'gyang at plantbio.uga.edu' > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > It's a good habit to always add single quotes around words. The perl > > interpreter may think a single bare word is a subroutine or perlfunc > > called with no args so will try to find a subroutine named blastp(). My > > debugger actually gives the error that the bare word blastp may conflict > > with a future reserved word. Like you said, 'use strict' will point that > > out. > > > > As for the regex, it should match all the blast programs at NCBI (blastp, > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > > else passes through. > > > > So, if you are using the script below, there are several errors. The bare > > words for $prog and $db need quotes, and the flags for you @params array > > don't have a dash before them. I get this after adding quotes but before > > adding the dashes to @params: > > > > C:\Perl\Scripts>test_blast.pl > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > live/Bio/Root/Root.pm:328 > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > live/Bio/Tools/Run/RemoteBlast.pm:256 > > STACK: C:\Perl\Scripts\test_blast.pl:15 > > ----------------------------------------------------------- > > > > The last line indicates a problem with this line: > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > Changing the @params to this: > > > > my @params=( -prog=>$prog, > > -data=>$db, > > -expect=>$e_val, > > -readmethod=>'SearchIO'); > > > > fixes it, and I get output as expected. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > Sent: Tuesday, February 14, 2006 11:48 AM > > > To: Chris Fields; bioperl-l at lists.open-bio.org > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > Hi, Chris, > > > When I tried with the perldoc script, It did not work either. First it > > > says $prog can not be bare word if I "use strict". I added quotes on the > > > words, then it says the value for $prog does not match expression > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > script > > > is shown below. Why is the expression "t?blast[pnx]"? > > > > > > #!/usr/bin/perl > > > > > > use Bio::SeqIO; > > > use Bio::Seq; > > > use Bio::Tools::Run::RemoteBlast; > > > use Bio::SearchIO; > > > > > > > > > my $prog=blastp; > > > my $db=swissprot; > > > my $e_val=1e-10; > > > my @params=( prog=>$prog, > > > data=>$db, > > > expect=>$e_val, > > > readmethod=>'SearchIO'); > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > my $v = 1; > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > > while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > #Alternatively, you could pass in a file with many > > > #sequences rather than loop through sequence one at a time > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); > > > #my $r = $factory->submit_blast('amino.fa'); > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > > > Thank you for your help! > > > > > > > > > Guojun > > > Department of Plant Biology > > > University of Georgia > > > > > > ----- Original Message ----- > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > To: gyang at plantbio.uga.edu > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > Try two things: > > > > > 1) Use a much simpler script, like the one in 'perldoc > > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > > wrong > > > > with the logic in your subroutine: > > > > > my $v = 1; > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > while (my $input = $str->next_seq()){ > > > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > > > #sequences rather than loop through sequence one at a time > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > #and swap the two lines below for an example of that. > > > > my $r = $factory->submit_blast($input); > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > > > while ( my @rids = $factory->each_rid ) { > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if( $rc < 0 ) { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save the output > > > > my $filename = $result->query_name()."\.out"; > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > > really > > > > shouldn't make that much of a difference, but I noticed that the CVS > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > > > released; the Bugzilla version is based off CVS. > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > -----Original Message----- > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > > To: bioperl-l at lists.open-bio.org > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > Thanks, Chris, > > > > > I installed version 1.5.1 and replaced the blast.pm file with the > > one > > > from > > > > > your bug report. The running version is 1.5 when I use the command > > you > > > > > sent me. But when I tried the script, it doesn't change much. My > > > > > remoteblast code (portion) is here: > > > > > > > sub search { > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > > local > > > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > > 'no'; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > > -id=>"query", > > > > > -desc=>"new seq"); > > > > > my $len=$query->length(); > > > > > @db=('nr','htgs','wgs'); > > > > > foreach my $db (@db) { > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > > > > > '-data' =>"$db", > > > > > > '-expect'=>"$E_value"); > > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > > my @rids = $factory->each_rid(); > > > > > foreach my $rid ( @rids ) { > > > > > print STDERR "$rid\n"; > > > > > } > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > > print STDERR "waiting..."; > > > > > sleep 60; > > > > > > > foreach my $rid ( @rids ) { > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > while (!ref($rc) ) { > > > > > if( $rc < 0 ) { > > > > > # retrieve_blast returns -1 on error > > > > > $factory->remove_rid($rid); > > > > > print "Error!\n"; > > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > > die "Can't retrieve $rid"; > > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > finished' > > > > > sleep 60; > > > > > $rc = $factory->retrieve_blast($rid); > > > > > } > > > > > } > > > > > if (ref($rc)) { > > > > > print STDERR "Done.\n"; > > > > > while( my $result = $rc->next_result) { > > > > > while( my $hit = $result->next_hit()) { > > > > > $hit_name=$hit->name; > > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > > $name=$1; > > > > > @left_plus_start=(); > > > > > @left_plus_end=(); > > > > > @left_minus_start=(); > > > > > @left_minus_end=(); > > > > > @right_plus_start=(); > > > > > @right_plus_end=(); > > > > > @right_minus_start=(); > > > > > @right_minus_end=(); > > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > > while( my $hsp = $hit->next_hsp()) { > > > > > ...... > > > > > > > It was working quite well before around October laster year, but > > > it has > > > > > stopped since then, When a submission is sent via a webpage, the cgi > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > > finally > > > > > the expected email is received but without real results although it > > > does > > > > > contain something from other parts of the script. Apparently the > > > search > > > > > sub did not return anything (I know there is something should be > > > > > returned.). Is it also possible the format of the NCBI output for > > each > > > > > result has changed? > > > > > Thank you, > > > > > Guojun > > > > > > > > > Department of Plant Biology > > > > > University of Georgia > > > > > > > > > > > ----- Original Message ----- > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > How do you know two versions are installed (i.e. how are > > you > > > checking > > > > > the > > > > > > version)? Do you see have two complete bioperl distributions (in > > > two > > > > > > separate directories) or are you looking in modules? Here's the > > way > > > to > > > > > > check the version (from the FAQ): > > > > > > > perl -MBio::Root::Version -e 'print > > > $Bio::Root::Version::VERSION,"\n"' > > > > > > > If you have two full bioperl distributions on your computer, > > > normally > > > > > only > > > > > > one will be in use unless you have explicitly set the environment > > > > > variable > > > > > > PERL5LIB. The PERL5LIB directories will be searched first before > > > your > > > > > > normal perl directory list (@INC) is searched. You MAY get some > > > mixing > > > > > > then, but only if perl can't find a particular module in the path > > > > > designated > > > > > > in PERL5LIB; then it will progress through the directories listed > > in > > > > > @INC. > > > > > > This may happen if a module is unique to a particular release, but > > > > > shouldn't > > > > > > happen for the majority of modules, including RemoteBlast. You > > can > > > > > check > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > > differ > > > > > > depending on your OS, perl build, etc. > > > > > > > Regardless, if you follow the directions for installing bioperl > > > for > > > > > your > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install', > > > unless > > > > > you > > > > > > explicitly change the installation directory when using 'perl > > > > > Makefile.PL'), > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > > install > > > > > the > > > > > > Bioperl distribution you downloaded over the old version in @INC. > > > See > > > > > this > > > > > > page: > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > > for more details. > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > -----Original Message----- > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > Hi, Chris, > > > > > > > I do have different versions of bioperl on my Linux machine > > (1.4. > > > and > > > > > > > 1.5.0), this may be the problem. Should I just install bioperl- > > > 1.5.1 > > > > > or I > > > > > > > need to uninstall and remove the previous versions. I could not > > > find > > > > > any > > > > > > > hint on uninstalling bioperl on linux. Could you please give me > > > some > > > > > > > suggestion? > > > > > > > Thanks, > > > > > > > Guojun > > > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > _____ > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely > > > > > updated from CVS > > > > > > > which isn't the latest fix. > > > > > > > > > Make sure that you check the following: > > > > > > > > > 1) Always post to the mailing list: > > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live > > > (CVS) > > > > > > > installed first. Perform a clean installation; do not upgrade > > > only > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > can't > > > > > > > guarantee that mixing modules from old and new distributions > > (1.4 > > > and > > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > > > > > > installation will allow text output from BLAST v.2.2.12 to be > > > saved > > > > > and > > > > > > > parsed; it will not parse the newest BLAST text output from NCBI > > > > > (v2.2.13) > > > > > > > but it should still save it. I believe as long as next_results() > > > isn't > > > > > > > called, it will work. > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > > 2.2.13 > > > > > text output > > > > > > > are NOT in CVS; they haven't been cleared and checked in by > > Roger > > > Hall > > > > > > > (who's now taking care of RemoteBlast) and the powers that be > > > (Jason > > > > > or > > > > > > > whomever is in charge of Bio::SearchIO). They can be found in > > > > > Bugzilla: > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > > option > > > > > of > > > > > > > saving XML output, so isn't necessary if you don't plan on using > > > this > > > > > > > option. And, remember, they haven't been committed yet to CVS, > > > which > > > > > > > means that the final version will change to refle the new > > version. > > > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > _____ > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > > To: Chris Fields > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > > > Hi, Chris > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work > > > for > > > > > my cgi > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't > > > even > > > > > get > > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > > Guojun > > > > > > > > > > > Guojun Yang > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > Tel: 706-542-1857 > > > > > > > Fax: 706-542-1805 > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > _____ > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > I would say give the new code a try, but realize that it > > > hasn't > > > > > been > > > > > > > checked > > > > > > > in (like I said below). I will try going over the modified > > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > > anything I > > > > > > > might > > > > > > > have missed. The changed order in the header of BLAST text > > output > > > has > > > > > me a > > > > > > > bit worried that it might not catch everything, but it at least > > > > > doesn't > > > > > > > hang > > > > > > > in the while() loop I described in the bug report below (bug > > > #1934) > > > > > and > > > > > > > seems to process everything fine. > > > > > > > > > If you want more stability in the code, you might consider > > > > > changing over > > > > > > > to > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are > > > some > > > > > > > changes > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > > saving > > > > > XML > > > > > > > output, but I believe it parses everything regardless. If you > > look > > > > > back > > > > > > > the > > > > > > > last month or so there has been a bit of discussion here about > > it. > > > > > Jason > > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > > > > remoteblast/ > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > > To: bioperl-l at bioperl.org > > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > > version > > > > > 1.28 > > > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > > I see this post and am wondering if this is the reason for the > > > > > > > > malfunctionning of my webserver. We set up a webserver named > > > MAK, > > > > > for > > > > > > > MITE > > > > > > > > sequence analysis. It was working very well until around > > > November > > > > > 2005, > > > > > > > > when it stopped returning any result (the site is fine and > > seems > > > to > > > > > be > > > > > > > > doing sth after submission). In the CGI script, I used > > > remoteblast > > > > > (that > > > > > > > > work was done in 2003) to do searches. I currently do not have > > > > > access to > > > > > > > > the server because I moved. Quite several people sent emails > > to > > > us > > > > > about > > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > > problem? > > > > > > > Should > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > > version? > > > > > > > > Thanks a lot, > > > > > > > > Guojun > > > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > Tel: 706-542-1857 > > > > > > > > Fax: 706-542-1805 > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > _____ > > > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > > Jian' > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > [mailto:bioperl- > > > > > > > > l at bioperl.org] > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live > > > CVS. > > > > > It > > > > > > > > will > > > > > > > > work for saving text output. However, it will not parse > > anything > > > > > using > > > > > > > > next_result (it will likely hang) and will not save XML > > format. > > > See > > > > > > > these > > > > > > > > bugs: > > > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > > > for explanations and possible fixes (changes to RemoteBlast > > and > > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in > > > yet > > > > > so > > > > > > > are > > > > > > > > still not included in bioperl-live; they may be further > > modified > > > > > before > > > > > > > > committing to CVS. If you're not worried about XML, you could > > > just > > > > > try > > > > > > > the > > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago using a > > > > > script > > > > > > > > which > > > > > > > > had problems; the script you used saves the output but doesn't > > > > > actually > > > > > > > > parse it (i.e. you don't use next_result() to go through the > > > data). > > > > > Is > > > > > > > the > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have > > you > > > > > tried > > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > > readmethod > > > > > => > > > > > > > > blast" > > > > > > > > using your version of RemoteBlast and method next_result()? > > Like > > > > > below > > > > > > > > (from > > > > > > > > perldoc): > > > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > > foreach my $rid ( @rids ) { > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > if( !ref($rc) ) { > > > > > > > > if( $rc < 0 ) { > > > > > > > > $factory->remove_rid($rid); > > > > > > > > } > > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > > sleep 5; > > > > > > > > } else { # parsing > > > > > > > > starts here > > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > > here > > > > > > > > #save the output > > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > > $factory->save_output($filename); > > > > > > > > $factory->remove_rid($rid); > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > > next unless ( $v > 0); > > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior to > > the > > > > > fixes. > > > > > > > I > > > > > > > > want to see how many others are having the same issues with > > > parsing > > > > > > > using > > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > l- > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm > > > works > > > > > on > > > > > > > the > > > > > > > > > logic of checking the temporary file size to determine > > whether > > > the > > > > > > > Blast > > > > > > > > > results are ready. This condition is not getting satisfied > > may > > > be > > > > > due > > > > > > > to > > > > > > > > > some changes brought about by NCBI. I had this problem > > > recently > > > > > and > > > > > > > > > figured out that the solution was to use the latest version > > > which > > > > > has > > > > > > > > > this problem fixed (does not use file size logic any more) > > > which > > > > > is > > > > > > > not > > > > > > > > > yet included in the BioPerl package. > > > > > > > > > Cheers > > > > > > > > > Nagesh > > > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > > you > > > > > send > > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > > net, > > > so > > > > > still > > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > > >> I see that you are submitting a sequence for a remote > > blast > > > > > search. > > > > > > > > Can > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > > (2005/12/09). > > > > > > > If > > > > > > > > > >> not I have attached it with this email, try to replace it > > > with > > > > > the > > > > > > > > old > > > > > > > > > >> one which has a bug. > > > > > > > > > >> Let me know if it works. > > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > From cjfields at uiuc.edu Wed Feb 15 15:17:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 14:17:27 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28 In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Message-ID: <000001c6326c$d72dd640$15327e82@pyrimidine> This looks like a genuine bug and may be something that changed in BLASTN text output; I'm getting it here, too. Running verbose shows that text output is returned, so, from that and from the stack trace it looks like another error in text parsing in Bio::SearchIO::blast. Bio::SearchIO::blast line 1172 throws a conditional exception. I'm adding this to bug 1934 in bugzilla (reference to your email and this response) for now. I'll try messing around with it when I can; I'm really busy this week. I'll also forward this to Roger Hall. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Wednesday, February 15, 2006 1:40 PM > To: Chris Fields; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] OK for aa seq but not a na seq on > RemoteBlast.pmversion 1.28 > > Hi, Chris, > Finally the remoteblast test script works for the amino.fa query. but when > I try a nucleic acid sequence (see below), Error occurs: > " > waiting........ > ------------- EXCEPTION ------------- > MSG: no data for midline Features flanking this part of subject sequence: > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > hIO/blast.pm:1172 > STACK toplevel remoteblast_test:40 > " > The query sequence is: > CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > The script (basically same as the remoteblast test, I only changed > database to 'nr' and program to 'blastn' and filename to 'ost3'): > #!/usr/bin/perl > > use Bio::SeqIO; > use Bio::Seq; > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use strict; > my $prog='blastn'; > my $db='nr'; > my $e_val=1e-10; > my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > my $v = 1; > > my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > Do you think there might still be something in the NCBI output format? > > Thank you, > Guojun > > > > > Guojun Yang > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > > > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > Sorry, forgot to add that I didn't see the regex issue that you > mentioned. > > It could be a perl-related issue. Try the fixes I mentioned and see > what > > happens. > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > Sent: Tuesday, February 14, 2006 12:36 PM > > > To: 'gyang at plantbio.uga.edu' > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > It's a good habit to always add single quotes around words. The > perl > > > interpreter may think a single bare word is a subroutine or perlfunc > > > called with no args so will try to find a subroutine named blastp(). > My > > > debugger actually gives the error that the bare word blastp may > conflict > > > with a future reserved word. Like you said, 'use strict' will point > that > > > out. > > > > > As for the regex, it should match all the blast programs at NCBI > (blastp, > > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > > > else passes through. > > > > > So, if you are using the script below, there are several errors. > The bare > > > words for $prog and $db need quotes, and the flags for you @params > array > > > don't have a dash before them. I get this after adding quotes but > before > > > adding the dashes to @params: > > > > > C:\Perl\Scripts>test_blast.pl > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > > live/Bio/Root/Root.pm:328 > > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > > live/Bio/Tools/Run/RemoteBlast.pm:256 > > > STACK: C:\Perl\Scripts\test_blast.pl:15 > > > ----------------------------------------------------------- > > > > > The last line indicates a problem with this line: > > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > Changing the @params to this: > > > > > my @params=( -prog=>$prog, > > > -data=>$db, > > > -expect=>$e_val, > > > -readmethod=>'SearchIO'); > > > > > fixes it, and I get output as expected. > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > -----Original Message----- > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > Sent: Tuesday, February 14, 2006 11:48 AM > > > > To: Chris Fields; bioperl-l at lists.open-bio.org > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > Hi, Chris, > > > > When I tried with the perldoc script, It did not work either. First > it > > > > says $prog can not be bare word if I "use strict". I added quotes on > the > > > > words, then it says the value for $prog does not match expression > > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > > script > > > > is shown below. Why is the expression "t?blast[pnx]"? > > > > > > > > #!/usr/bin/perl > > > > > > > > use Bio::SeqIO; > > > > use Bio::Seq; > > > > use Bio::Tools::Run::RemoteBlast; > > > > use Bio::SearchIO; > > > > > > > > > > > > my $prog=blastp; > > > > my $db=swissprot; > > > > my $e_val=1e-10; > > > > my @params=( prog=>$prog, > > > > data=>$db, > > > > expect=>$e_val, > > > > readmethod=>'SearchIO'); > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > > my $v = 1; > > > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > > > > while (my $input = $str->next_seq()){ > > > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > > > #sequences rather than loop through sequence one at a time > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > #and swap the two lines below for an example of that. > > > > my $r = $factory->submit_blast($input); > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > > > while ( my @rids = $factory->each_rid ) { > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if( $rc < 0 ) { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save the output > > > > my $filename = $result->query_name()."\.out"; > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thank you for your help! > > > > > > > > > > > > Guojun > > > > Department of Plant Biology > > > > University of Georgia > > > > > > > > ----- Original Message ----- > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > To: gyang at plantbio.uga.edu > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > > > > Try two things: > > > > > > 1) Use a much simpler script, like the one in 'perldoc > > > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's > something > > > > wrong > > > > > with the logic in your subroutine: > > > > > > my $v = 1; > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' > ); > > > > > > while (my $input = $str->next_seq()){ > > > > > #Blast a sequence against a database: > > > > > #Alternatively, you could pass in a file with many > > > > > #sequences rather than loop through sequence one at a time > > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > > #and swap the two lines below for an example of that. > > > > > my $r = $factory->submit_blast($input); > > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > > print STDERR "waiting..." if( $v > 0 ); > > > > > while ( my @rids = $factory->each_rid ) { > > > > > foreach my $rid ( @rids ) { > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > if( !ref($rc) ) { > > > > > if( $rc < 0 ) { > > > > > $factory->remove_rid($rid); > > > > > } > > > > > print STDERR "." if ( $v > 0 ); > > > > > sleep 5; > > > > > } else { > > > > > my $result = $rc->next_result(); > > > > > #save the output > > > > > my $filename = $result->query_name()."\.out"; > > > > > $factory->save_output($filename); > > > > > $factory->remove_rid($rid); > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > while ( my $hit = $result->next_hit ) { > > > > > next unless ( $v > 0); > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > } > > > > > } > > > > > } > > > > > } > > > > > } > > > > > } > > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > > > really > > > > > shouldn't make that much of a difference, but I noticed that the > CVS > > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 > was > > > > > released; the Bugzilla version is based off CVS. > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > Thanks, Chris, > > > > > > I installed version 1.5.1 and replaced the blast.pm file with > the > > > one > > > > from > > > > > > your bug report. The running version is 1.5 when I use the > command > > > you > > > > > > sent me. But when I tried the script, it doesn't change much. My > > > > > > remoteblast code (portion) is here: > > > > > > > > sub search { > > > > > > local > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > > > local > $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > > > local > > > > > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > > > 'no'; > > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > > > -id=>"query", > > > > > > -desc=>"new seq"); > > > > > > my $len=$query->length(); > > > > > > @db=('nr','htgs','wgs'); > > > > > > foreach my $db (@db) { > > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > =>'blastn', > > > > > > '-data' =>"$db", > > > > > > > > '-expect'=>"$E_value"); > > > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > > > my @rids = $factory->each_rid(); > > > > > > foreach my $rid ( @rids ) { > > > > > > print STDERR "$rid\n"; > > > > > > } > > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > > > print STDERR "waiting..."; > > > > > > sleep 60; > > > > > > > > foreach my $rid ( @rids ) { > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > while (!ref($rc) ) { > > > > > > if( $rc < 0 ) { > > > > > > # retrieve_blast returns -1 on error > > > > > > $factory->remove_rid($rid); > > > > > > print "Error!\n"; > > > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > > > die "Can't retrieve $rid"; > > > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > > finished' > > > > > > sleep 60; > > > > > > $rc = $factory->retrieve_blast($rid); > > > > > > } > > > > > > } > > > > > > if (ref($rc)) { > > > > > > print STDERR "Done.\n"; > > > > > > while( my $result = $rc->next_result) { > > > > > > while( my $hit = $result->next_hit()) { > > > > > > $hit_name=$hit->name; > > > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > > > $name=$1; > > > > > > @left_plus_start=(); > > > > > > @left_plus_end=(); > > > > > > @left_minus_start=(); > > > > > > @left_minus_end=(); > > > > > > @right_plus_start=(); > > > > > > @right_plus_end=(); > > > > > > @right_minus_start=(); > > > > > > @right_minus_end=(); > > > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > > > while( my $hsp = $hit->next_hsp()) { > > > > > > ...... > > > > > > > > It was working quite well before around October laster year, > but > > > > it has > > > > > > stopped since then, When a submission is sent via a webpage, the > cgi > > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > > > finally > > > > > > the expected email is received but without real results although > it > > > > does > > > > > > contain something from other parts of the script. Apparently the > > > > search > > > > > > sub did not return anything (I know there is something should be > > > > > > returned.). Is it also possible the format of the NCBI output > for > > > each > > > > > > result has changed? > > > > > > Thank you, > > > > > > Guojun > > > > > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > > How do you know two versions are installed (i.e. how > are > > > you > > > > checking > > > > > > the > > > > > > > version)? Do you see have two complete bioperl distributions > (in > > > > two > > > > > > > separate directories) or are you looking in modules? Here's > the > > > way > > > > to > > > > > > > check the version (from the FAQ): > > > > > > > > perl -MBio::Root::Version -e 'print > > > > $Bio::Root::Version::VERSION,"\n"' > > > > > > > > If you have two full bioperl distributions on your computer, > > > > normally > > > > > > only > > > > > > > one will be in use unless you have explicitly set the > environment > > > > > > variable > > > > > > > PERL5LIB. The PERL5LIB directories will be searched first > before > > > > your > > > > > > > normal perl directory list (@INC) is searched. You MAY get > some > > > > mixing > > > > > > > then, but only if perl can't find a particular module in the > path > > > > > > designated > > > > > > > in PERL5LIB; then it will progress through the directories > listed > > > in > > > > > > @INC. > > > > > > > This may happen if a module is unique to a particular release, > but > > > > > > shouldn't > > > > > > > happen for the majority of modules, including RemoteBlast. > You > > > can > > > > > > check > > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC > will > > > > differ > > > > > > > depending on your OS, perl build, etc. > > > > > > > > Regardless, if you follow the directions for installing > bioperl > > > > for > > > > > > your > > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make > install', > > > > unless > > > > > > you > > > > > > > explicitly change the installation directory when using 'perl > > > > > > Makefile.PL'), > > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > > > install > > > > > > the > > > > > > > Bioperl distribution you downloaded over the old version in > @INC. > > > > See > > > > > > this > > > > > > > page: > > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > > > for more details. > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > Hi, Chris, > > > > > > > > I do have different versions of bioperl on my Linux machine > > > (1.4. > > > > and > > > > > > > > 1.5.0), this may be the problem. Should I just install > bioperl- > > > > 1.5.1 > > > > > > or I > > > > > > > > need to uninstall and remove the previous versions. I could > not > > > > find > > > > > > any > > > > > > > > hint on uninstalling bioperl on linux. Could you please give > me > > > > some > > > > > > > > suggestion? > > > > > > > > Thanks, > > > > > > > > Guojun > > > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > _____ > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've > likely > > > > > > updated from CVS > > > > > > > > which isn't the latest fix. > > > > > > > > > > Make sure that you check the following: > > > > > > > > > > 1) Always post to the mailing list: > > > > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl- > live > > > > (CVS) > > > > > > > > installed first. Perform a clean installation; do not > upgrade > > > > only > > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > > can't > > > > > > > > guarantee that mixing modules from old and new distributions > > > (1.4 > > > > and > > > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- > live > > > > > > > > installation will allow text output from BLAST v.2.2.12 to > be > > > > saved > > > > > > and > > > > > > > > parsed; it will not parse the newest BLAST text output from > NCBI > > > > > > (v2.2.13) > > > > > > > > but it should still save it. I believe as long as > next_results() > > > > isn't > > > > > > > > called, it will work. > > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > > > 2.2.13 > > > > > > text output > > > > > > > > are NOT in CVS; they haven't been cleared and checked in by > > > Roger > > > > Hall > > > > > > > > (who's now taking care of RemoteBlast) and the powers that > be > > > > (Jason > > > > > > or > > > > > > > > whomever is in charge of Bio::SearchIO). They can be found > in > > > > > > Bugzilla: > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow > the > > > > option > > > > > > of > > > > > > > > saving XML output, so isn't necessary if you don't plan on > using > > > > this > > > > > > > > option. And, remember, they haven't been committed yet to > CVS, > > > > which > > > > > > > > means that the final version will change to refle the new > > > version. > > > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _____ > > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > > > To: Chris Fields > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > > > Hi, Chris > > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to > work > > > > for > > > > > > my cgi > > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I > didn't > > > > even > > > > > > get > > > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > > > Guojun > > > > > > > > > > > > Guojun Yang > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > Tel: 706-542-1857 > > > > > > > > Fax: 706-542-1805 > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > _____ > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > I would say give the new code a try, but realize that it > > > > hasn't > > > > > > been > > > > > > > > checked > > > > > > > > in (like I said below). I will try going over the modified > > > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > > > anything I > > > > > > > > might > > > > > > > > have missed. The changed order in the header of BLAST text > > > output > > > > has > > > > > > me a > > > > > > > > bit worried that it might not catch everything, but it at > least > > > > > > doesn't > > > > > > > > hang > > > > > > > > in the while() loop I described in the bug report below (bug > > > > #1934) > > > > > > and > > > > > > > > seems to process everything fine. > > > > > > > > > > If you want more stability in the code, you might > consider > > > > > > changing over > > > > > > > > to > > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There > are > > > > some > > > > > > > > changes > > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > > > saving > > > > > > XML > > > > > > > > output, but I believe it parses everything regardless. If > you > > > look > > > > > > back > > > > > > > > the > > > > > > > > last month or so there has been a bit of discussion here > about > > > it. > > > > > > Jason > > > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml- > using- > > > > > > remoteblast/ > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > > > To: bioperl-l at bioperl.org > > > > > > > > > Subject: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > > > I see this post and am wondering if this is the reason for > the > > > > > > > > > malfunctionning of my webserver. We set up a webserver > named > > > > MAK, > > > > > > for > > > > > > > > MITE > > > > > > > > > sequence analysis. It was working very well until around > > > > November > > > > > > 2005, > > > > > > > > > when it stopped returning any result (the site is fine and > > > seems > > > > to > > > > > > be > > > > > > > > > doing sth after submission). In the CGI script, I used > > > > remoteblast > > > > > > (that > > > > > > > > > work was done in 2003) to do searches. I currently do not > have > > > > > > access to > > > > > > > > > the server because I moved. Quite several people sent > emails > > > to > > > > us > > > > > > about > > > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > > > problem? > > > > > > > > Should > > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > > > version? > > > > > > > > > Thanks a lot, > > > > > > > > > Guojun > > > > > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > > > University of Georgia > > > > > > > > > Tel: 706-542-1857 > > > > > > > > > Fax: 706-542-1805 > > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > > _____ > > > > > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], > 'Huang > > > > Jian' > > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > > [mailto:bioperl- > > > > > > > > > l at bioperl.org] > > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl- > live > > > > CVS. > > > > > > It > > > > > > > > > will > > > > > > > > > work for saving text output. However, it will not parse > > > anything > > > > > > using > > > > > > > > > next_result (it will likely hang) and will not save XML > > > format. > > > > See > > > > > > > > these > > > > > > > > > bugs: > > > > > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > > > > > for explanations and possible fixes (changes to > RemoteBlast > > > and > > > > > > > > > Bio::SearchIO::blast). Note that these haven't been > checked in > > > > yet > > > > > > so > > > > > > > > are > > > > > > > > > still not included in bioperl-live; they may be further > > > modified > > > > > > before > > > > > > > > > committing to CVS. If you're not worried about XML, you > could > > > > just > > > > > > try > > > > > > > > the > > > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago > using a > > > > > > script > > > > > > > > > which > > > > > > > > > had problems; the script you used saves the output but > doesn't > > > > > > actually > > > > > > > > > parse it (i.e. you don't use next_result() to go through > the > > > > data). > > > > > > Is > > > > > > > > the > > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? > Have > > > you > > > > > > tried > > > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > > > readmethod > > > > > > => > > > > > > > > > blast" > > > > > > > > > using your version of RemoteBlast and method > next_result()? > > > Like > > > > > > below > > > > > > > > > (from > > > > > > > > > perldoc): > > > > > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > > > foreach my $rid ( @rids ) { > > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > > if( !ref($rc) ) { > > > > > > > > > if( $rc < 0 ) { > > > > > > > > > $factory->remove_rid($rid); > > > > > > > > > } > > > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > > > sleep 5; > > > > > > > > > } else { # parsing > > > > > > > > > starts here > > > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > > > here > > > > > > > > > #save the output > > > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > > > $factory->save_output($filename); > > > > > > > > > $factory->remove_rid($rid); > > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > > > next unless ( $v > 0); > > > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior > to > > > the > > > > > > fixes. > > > > > > > > I > > > > > > > > > want to see how many others are having the same issues > with > > > > parsing > > > > > > > > using > > > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > > > > > Christopher Fields > > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > > Dept. of Biochemistry > > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl- > > > l- > > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > > > Thanks for the message. The older version of > RemoteBlast.pm > > > > works > > > > > > on > > > > > > > > the > > > > > > > > > > logic of checking the temporary file size to determine > > > whether > > > > the > > > > > > > > Blast > > > > > > > > > > results are ready. This condition is not getting > satisfied > > > may > > > > be > > > > > > due > > > > > > > > to > > > > > > > > > > some changes brought about by NCBI. I had this problem > > > > recently > > > > > > and > > > > > > > > > > figured out that the solution was to use the latest > version > > > > which > > > > > > has > > > > > > > > > > this problem fixed (does not use file size logic any > more) > > > > which > > > > > > is > > > > > > > > not > > > > > > > > > > yet included in the BioPerl package. > > > > > > > > > > Cheers > > > > > > > > > > Nagesh > > > > > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v > 1.28 > > > > you > > > > > > send > > > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; > "bioperl-l" > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > > > net, > > > > so > > > > > > still > > > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > > > >> I see that you are submitting a sequence for a remote > > > blast > > > > > > search. > > > > > > > > > Can > > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > > > (2005/12/09). > > > > > > > > If > > > > > > > > > > >> not I have attached it with this email, try to > replace it > > > > with > > > > > > the > > > > > > > > > old > > > > > > > > > > >> one which has a bug. > > > > > > > > > > >> Let me know if it works. > > > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Bioperl-l mailing list > > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Feb 15 19:39:33 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 16 Feb 2006 00:39:33 -0000 Subject: [Bioperl-l] error running load_seqdatabase.pl References: Message-ID: <000c01c63291$5de08600$6601a8c0@WATSON> ----- Original Message ----- From: "Angshu Kar" To: "bioperl-l" Sent: Thursday, December 29, 2005 5:50 PM Subject: [Bioperl-l] error running load_seqdatabase.pl > Hi, > > I'm getting the following error while trying to run : > > ./load_seqdatabase.pl -host localhost -dbname USBA -dbuser > postgres -format > genbank NC_003076.gbk > > But I've a postgreSQL db and not a MySQL one...could anyone please guide > me > troubleshoot this? Angshu, I would probably start with: perldoc load_seqdatabase.pl I think that will likely give you your answer. Again, it is best to exhaust the resources at hand and to let the list know that you have done so (like--"I read the perldoc and tried this...."). Sean From cain at cshl.edu Wed Feb 15 11:07:28 2006 From: cain at cshl.edu (Scott Cain) Date: Wed, 15 Feb 2006 11:07:28 -0500 Subject: [Bioperl-l] Bio::Tools::GFF parsing error In-Reply-To: <43F35043.7070705@cornell.edu> References: <43F35043.7070705@cornell.edu> Message-ID: <1140019648.2849.58.camel@localhost.localdomain> Hi Robert, No column should ever be padded with spaces; GFF columns should always be separated by a single tab. Therefore, I don't thing Bio::Tools::GFF is at fault here. Scott On Wed, 2006-02-15 at 11:01 -0500, Robert Buels wrote: > Hi all, > > I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using > FeatureIO, except it purports not to support gff 2), and the file looks > like: > > ##gff-version 2 > ##date 2006-02-13 > ##sequence-region C01HBa0088L02.seq 1 120525 > C01HBa0088L02 RepeatMasker similarity 3537 4267 3.3 > - . Target "Motif:bac_end_repeat_family_345" 1 740 > C01HBa0088L02 RepeatMasker similarity 4172 4279 2.9 > + . Target "Motif:HRSiTERT00100141" 1 104 > C01HBa0088L02 RepeatMasker similarity 4267 4323 0.0 > - . Target "Motif:k_29" 150 206 > C01HBa0088L02 RepeatMasker similarity 4322 4492 26.6 > + . Target "Motif:PRSiTERT00300001" 1960 2129 > C01HBa0088L02 RepeatMasker similarity 4557 5124 29.5 > + . Target "Motif:PRSiTERT00300001" 2142 2711 > > Notice the score column is padded with spaces. > > Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid > score. My question is, who is wrong here, my input file or > Bio::Tools::GFF? Should Bio::Tools::GFF be able to read this file? > > Rob > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gmx.net Wed Feb 15 20:54:01 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 15 Feb 2006 17:54:01 -0800 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine> References: <001201c631a5$ce7496f0$15327e82@pyrimidine> Message-ID: On Feb 14, 2006, at 12:32 PM, Chris Fields wrote: > Hilmar, > > Good News: I've added a section to the bioperl wiki on installing > bioperl-db > in Windows: > > http://www.bioperl.org/wiki/ > Installing_Bioperl_on_Windows#Installing_bioperl > -db > > Bad News: There's a new problem now. I updated from CVS yesterday; I > walked > through the steps and ran 'nmake test', with everything passing fine. > However, load_seqdatabase.pl is extremely slow; it's loading a sequence > every 5 minutes or so. I noticed (when using '-debug') that it is > hanging > up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a > database, > load the biosql schema, and load sequences w/o loading taxonomy, the > problem > goes away. > > Here's the debugging output (I cut it off at the point it hangs up): > [...] > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND > ncbi_taxon_id = > ? > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) I'm a bit surprised if this is the query where it hangs. Are the indexes all there? There should be a primary key index on taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name over (taxon_id,name,name_class). Also, there should be separate indexes on taxon_name.taxon_id and taxon_name.name. Are they all there? If you reinstantiated the schema from the DDL then it seems unlikely that somehow the indexes have vanished except if you messed with the schema or the DDL. Putting an index on taxon_name.name_class really can't make sense, so let's assume it can't be that. So really I suspect this has something to do with the state of the database and the version of MySQL. In particular, from some 4.x version of MySQL under certain circumstances you have to analyze the statistics of the tables in order to get the optimizer pick up the indexes properly. Are you on MySQL 4.x and if so, have you done that? There's the ANALYZE TABLE command: http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html Note the comment: "This statement works with MyISAM, BDB, and (as of MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher? Also, you can check the execution plan for the query using EXPLAIN. http://dev.mysql.com/doc/refman/4.1/en/explain.html This should show you whether the index would be picked up for the query or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to the db using the mysql shell (mysql). I believe something similarly strange was encountered by someone using DB::GFF (or Chado) under MySQL, and if I recall correctly the solution was to optimize (analyze) the tables. Maybe someone who was in that thread reads this and can comment? -hilmar > > ----------------------------------------------------------------------- > ----- > ------------------------- > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Wed Feb 15 22:56:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 21:56:14 -0600 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: References: <001201c631a5$ce7496f0$15327e82@pyrimidine> Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote: > > On Feb 14, 2006, at 12:32 PM, Chris Fields wrote: > >> Hilmar, >> >> Good News: I've added a section to the bioperl wiki on installing >> bioperl-db >> in Windows: >> >> http://www.bioperl.org/wiki/ >> Installing_Bioperl_on_Windows#Installing_bioperl >> -db >> >> Bad News: There's a new problem now. I updated from CVS yesterday; I >> walked >> through the steps and ran 'nmake test', with everything passing fine. >> However, load_seqdatabase.pl is extremely slow; it's loading a >> sequence >> every 5 minutes or so. I noticed (when using '-debug') that it is >> hanging >> up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a >> database, >> load the biosql schema, and load sequences w/o loading taxonomy, the >> problem >> goes away. >> >> Here's the debugging output (I cut it off at the point it hangs up): >> [...] > >> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, >> NULL, >> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name >> WHERE >> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND >> ncbi_taxon_id = >> ? >> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) >> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) > > I'm a bit surprised if this is the query where it hangs. Are the > indexes all there? There should be a primary key index on > taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on > taxon_name > over (taxon_id,name,name_class). Also, there should be separate > indexes > on taxon_name.taxon_id and taxon_name.name. Are they all there? If you > reinstantiated the schema from the DDL then it seems unlikely that > somehow the indexes have vanished except if you messed with the schema > or the DDL. I looked in the mailing list archives and Barry mentions something here: http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html He rebuilt the database from scratch and got it working; no reason was given. I wouldn't be surprised if it is something Mysql-related that pops up. The strange thing is that only a few months ago everything ran well with this version of MySQL (v.5); this was with the first test database I installed on it. Another strange thing (I think I mentioned it) is that NOT loading the taxonomy with load_ncbi_taxonomy.pl worked (everything was entered). I'll try rebuilding the database from scratch to see what happens. I am running this on Windows, so this is new territory... > Putting an index on taxon_name.name_class really can't make sense, so > let's assume it can't be that. > > So really I suspect this has something to do with the state of the > database and the version of MySQL. In particular, from some 4.x > version > of MySQL under certain circumstances you have to analyze the > statistics > of the tables in order to get the optimizer pick up the indexes > properly. Are you on MySQL 4.x and if so, have you done that? > > There's the ANALYZE TABLE command: > http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html > > Note the comment: "This statement works with MyISAM, BDB, and (as of > MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher? > > Also, you can check the execution plan for the query using EXPLAIN. > http://dev.mysql.com/doc/refman/4.1/en/explain.html > > This should show you whether the index would be picked up for the > query > or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to > the db using the mysql shell (mysql). I'll give these a shot and post what I find in the next few days. > I believe something similarly strange was encountered by someone using > DB::GFF (or Chado) under MySQL, and if I recall correctly the solution > was to optimize (analyze) the tables. Maybe someone who was in that > thread reads this and can comment? > > -hilmar I wanted to also mention that we shouldn't check in the modifications to Bio::Root:Root until I confirm something (I'm at home and currently can't). I tried running a script on an unrelated module using the modified Bio::Root::Roo (with the commas added after the 'throw $class' statements. Everything worked for $self->throw(), except the thrown message wasn't displayed. I'll dig into it a bit more to see what happens. > > >> >> --------------------------------------------------------------------- >> -- >> ----- >> ------------------------- >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Feb 16 00:16:04 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 00:16:04 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: Harry, It's not clear to me that NCBI's eutils offers this capability directly. You can probably download Entrez Gene entries and parse them for coordinates but I know of no way to remotely retrieve genomic sequences like this from NCBI (ENSEMBL API perhaps?). What I had in mind uses the local approach that some of us favor and to prove to myself that this is simple to do I wrote a script that I just added to examples/tools, it's called extract_genes.pl and it's based on Bio::DB::Fasta. Download the sequence files for a given species to some dir, download Entrez Gene's gene2accession file, and run. It creates and stores a hash for lookups, it won't read gene2accession each time it runs. Brian O. On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From hlapp at gmx.net Thu Feb 16 01:31:54 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 15 Feb 2006 22:31:54 -0800 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> References: <001201c631a5$ce7496f0$15327e82@pyrimidine> <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> Message-ID: On Feb 15, 2006, at 7:56 PM, Chris Fields wrote: > [...] > I looked in the mailing list archives and Barry mentions something > here: > > http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html > > He rebuilt the database from scratch and got it working; no reason > was given. I wouldn't be surprised if it is something Mysql-related > that pops up. Note though that he was using PostgreSQL. With Pg you definitely need to 'vacuum,' which is their name for analyzing/optimizing the table(s). > The strange thing is that only a few months ago > everything ran well with this version of MySQL (v.5); this was with > the first test database I installed on it. Another strange thing (I > think I mentioned it) is that NOT loading the taxonomy with > load_ncbi_taxonomy.pl worked (everything was entered). That's not really strange, it is in fact consistent with the query you report as taking a long time. If you don't pre-load the taxonomy then the taxon and taxon_name tables are empty or almost empty and look-ups and joins of empty tables are amazingly fast :-J [...] > I wanted to also mention that we shouldn't check in the modifications > to Bio::Root:Root until I confirm something (I'm at home and > currently can't). OK we'll hold off. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From michael.watson at bbsrc.ac.uk Thu Feb 16 05:31:54 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 16 Feb 2006 10:31:54 -0000 Subject: [Bioperl-l] CONTIG sequence files from the NCBI Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I have two questions really. I fetched bacterial genome sequences from the NCBI using Bio::DB::GenBank. Some of these sequence entries are CONTIG sequences, ie they just point to other sequences that need to be joined together to form the entire genome. Looking at my downloads, it looks as if bioperl has done all the necessary joining for me - or maybe it was the NCBI that did the joining? OK, so firstly, did bioperl do the joining, and if so, are all the co-ordinates of the features updated to reflect their new location on the new, joined sequence? And secondly, sequence versions... I'm thinking that possibly the sequence version of the CONTIG may be 1 (as it hasn't changed) yet the versions of the sequences it refers to might have changed, so when I ask bioperl if these sequences have been updated, I will be told no because the CONTIG sequence version is 1, but I should be told yes because the underlying sequences have...? Make sense? Thanks Mick From cjfields at uiuc.edu Thu Feb 16 07:51:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 06:51:50 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> References: <20060215143941.54e91487@dogwood.plantbio.uga.edu> <43F449E1.80605@esat.kuleuven.be> Message-ID: <369C1D1F-DBCB-4161-A24A-7C3E579D337A@uiuc.edu> Yeah, looks like it broke text output nucleotide parsing with that. XML output parsing still works though (as expected). I'll give it a look. Chris On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote: > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast- > output. (see e.g. "Features flanking this part..." or "Features in > this part ..."), example added. > The blast.pm module starts looking for the hsp-alignement- > information, but it dies when it hits this Feature-information. > > Pieter > > >> gi|77552765|gb|DP000011.1| > query.fcgi? >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza >> sativa (japonica cultivar-group) chromosome 12, complete > > sequence > Length=27492551 > > Features flanking this part of subject sequence: > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm > sub-class val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts> > 2655 bp at 3' side: hypothetical protein www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 4 GTACTACTCTACTCTACT 21 > |||||||||||||||||| > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > Features flanking this part of subject sequence: > 2991 bp at 5' side: hypothetical protein www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts> > 1131 bp at 3' side: hypothetical protein > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 2 ATGTACTACTCTACTCTA 19 > |||||||||||||||||| > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > Features in this part of subject sequence: > DHHC zinc finger domain, putative > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts> > > Score = 34.2 bits (17), Expect = 0.87 > Identities = 17/17 (100%), Gaps = 0/17 (0%) > Strand=Plus/Plus > > Query 5 TACTACTCTACTCTACT 21 > ||||||||||||||||| > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > Features flanking this part of subject sequence: > 102 bp at 5' side: bZIP transcription factor, putative > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts> > 3740 bp at 3' side: yeast dcp1, putative www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Plus > > Query 7 CTACTCTACTCTACTC 22 > |||||||||||||||| > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > Features flanking this part of subject sequence: > > 21 bp at 5' side: peptide transporter T17F3.11, putative www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts> > 10230 bp at 3' side: transposon protein, putative, unclassified > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Minus > > Query 7 CTACTCTACTCTACTC 22 > > |||||||||||||||| > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > Guojun Yang wrote: > >> Hi, Chris, >> Finally the remoteblast test script works for the amino.fa query. >> but when I try a nucleic acid sequence (see below), Error occurs: " >> waiting........ >> ------------- EXCEPTION ------------- >> MSG: no data for midline Features flanking this part of subject >> sequence: >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ >> 5.8.3/Bio/Searc hIO/blast.pm:1172 >> STACK toplevel remoteblast_test:40 >> " >> The query sequence is: >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG >> >> The script (basically same as the remoteblast test, I only changed >> database to 'nr' and program to 'blastn' and filename to 'ost3'): >> #!/usr/bin/perl >> >> use Bio::SeqIO; >> use Bio::Seq; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::SearchIO; >> use strict; >> my $prog='blastn'; >> my $db='nr'; >> my $e_val=1e-10; >> my @params=( -prog=>$prog, >> -data=>$db, >> -expect=>$e_val, >> -readmethod=>'SearchIO'); >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >> >> my $v = 1; >> >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); >> >> while (my $input = $str->next_seq()){ >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> my $r = $factory->submit_blast($input); >> #my $r = $factory->submit_blast('amino.fa'); >> print STDERR "waiting..." if( $v > 0 ); >> while ( my @rids = $factory->each_rid ) { >> foreach my $rid ( @rids ) { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) { >> if( $rc < 0 ) { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } else { >> my $result = $rc->next_result(); >> #save the output >> my $filename = $result->query_name()."\.out"; >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), "\n"; >> while ( my $hit = $result->next_hit ) { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> } >> } >> >> >> Do you think there might still be something in the NCBI output >> format? >> >> Thank you, >> Guojun >> >> >> >> >> Guojun Yang >> Department of Plant Biology >> University of Georgia >> Tel: 706-542-1857 >> Fax: 706-542-1805 >> http://www.arches.uga.edu/~guojun >> >> >> >> ----- Original Message ----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 >> >> >> >>> Sorry, forgot to add that I didn't see the regex issue that you >>> mentioned. >>> It could be a perl-related issue. Try the fixes I mentioned and >>> see what >>> happens. >>> >>>> Christopher Fields >>>> >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>>>>> -----Original Message----- >>>>>> >>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>> Sent: Tuesday, February 14, 2006 12:36 PM >>>> To: 'gyang at plantbio.uga.edu' >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>> >>>>>> It's a good habit to always add single quotes around words. >>>>>> The perl >>>>>> >>>> interpreter may think a single bare word is a subroutine or >>>> perlfunc >>>> called with no args so will try to find a subroutine named blastp >>>> (). My >>>> debugger actually gives the error that the bare word blastp may >>>> conflict >>>> with a future reserved word. Like you said, 'use strict' will >>>> point that >>>> out. >>>> >>>>>> As for the regex, it should match all the blast programs at >>>>>> NCBI (blastp, >>>>>> >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure >>>> nothing >>>> else passes through. >>>> >>>>>> So, if you are using the script below, there are several >>>>>> errors. The bare >>>>>> >>>> words for $prog and $db need quotes, and the flags for you >>>> @params array >>>> don't have a dash before them. I get this after adding quotes >>>> but before >>>> adding the dashes to @params: >>>> >>>>>> C:\Perl\Scripts>test_blast.pl >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> >>>> MSG: >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- >>>> live/Bio/Root/Root.pm:328 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl >>>> \bioperl- >>>> live/Bio/Tools/Run/RemoteBlast.pm:256 >>>> STACK: C:\Perl\Scripts\test_blast.pl:15 >>>> ----------------------------------------------------------- >>>> >>>>>> The last line indicates a problem with this line: >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> Changing the @params to this: >>>>>> my @params=( -prog=>$prog, >>>>>> >>>> -data=>$db, >>>> -expect=>$e_val, >>>> -readmethod=>'SearchIO'); >>>> >>>>>> fixes it, and I get output as expected. >>>>>> Christopher Fields >>>>>> >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>>>>>>> -----Original Message----- >>>>>>>>> >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>> Sent: Tuesday, February 14, 2006 11:48 AM >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>>> >>>>> Hi, Chris, >>>>> When I tried with the perldoc script, It did not work either. >>>>> First it >>>>> says $prog can not be bare word if I "use strict". I added >>>>> quotes on the >>>>> words, then it says the value for $prog does not match expression >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The >>>>> >>>> script >>>> >>>>> is shown below. Why is the expression "t?blast[pnx]"? >>>>> >>>>> #!/usr/bin/perl >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use Bio::Tools::Run::RemoteBlast; >>>>> use Bio::SearchIO; >>>>> >>>>> >>>>> my $prog=blastp; >>>>> my $db=swissprot; >>>>> my $e_val=1e-10; >>>>> my @params=( prog=>$prog, >>>>> data=>$db, >>>>> expect=>$e_val, >>>>> readmethod=>'SearchIO'); >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>> >>>>> my $v = 1; >>>>> >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => >>>>> 'fasta' ); >>>>> >>>>> while (my $input = $str->next_seq()){ >>>>> #Blast a sequence against a database: >>>>> #Alternatively, you could pass in a file with many >>>>> #sequences rather than loop through sequence one at a time >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>> #and swap the two lines below for an example of that. >>>>> my $r = $factory->submit_blast($input); >>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>> print STDERR "waiting..." if( $v > 0 ); >>>>> while ( my @rids = $factory->each_rid ) { >>>>> foreach my $rid ( @rids ) { >>>>> my $rc = $factory->retrieve_blast($rid); >>>>> if( !ref($rc) ) { >>>>> if( $rc < 0 ) { >>>>> $factory->remove_rid($rid); >>>>> } >>>>> print STDERR "." if ( $v > 0 ); >>>>> sleep 5; >>>>> } else { >>>>> my $result = $rc->next_result(); >>>>> #save the output >>>>> my $filename = $result->query_name()."\.out"; >>>>> $factory->save_output($filename); >>>>> $factory->remove_rid($rid); >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>> while ( my $hit = $result->next_hit ) { >>>>> next unless ( $v > 0); >>>>> print "\thit name is ", $hit->name, "\n"; >>>>> while( my $hsp = $hit->next_hsp ) { >>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> >>>>> Thank you for your help! >>>>> >>>>> >>>>> Guojun >>>>> Department of Plant Biology >>>>> University of Georgia >>>>> >>>>> ----- Original Message ----- >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>> To: gyang at plantbio.uga.edu >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>> >>>>> >>>>> >>>>>> Try two things: >>>>>> >>>>>>> 1) Use a much simpler script, like the one in 'perldoc >>>>>>> >>>>>> Bio::Tools::Run::RemoteBlast'. If this fixes it, there's >>>>>> something >>>>>> >>>>> wrong >>>>> >>>>>> with the logic in your subroutine: >>>>>> >>>>>>> my $v = 1; >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => >>>>>>> 'fasta' ); >>>>>>> while (my $input = $str->next_seq()){ >>>>>>> >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> my $r = $factory->submit_blast($input); >>>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> foreach my $rid ( @rids ) { >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> if( !ref($rc) ) { >>>>>> if( $rc < 0 ) { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } else { >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> my $filename = $result->query_name()."\.out"; >>>>>> $factory->save_output($filename); >>>>>> $factory->remove_rid($rid); >>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> next unless ( $v > 0); >>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works. It >>>>>>> >>>> really >>>> >>>>>> shouldn't make that much of a difference, but I noticed that >>>>>> the CVS >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after >>>>>> bioperl-1.5.1 was >>>>>> released; the Bugzilla version is based off CVS. >>>>>> >>>>>>> Christopher Fields >>>>>>> >>>>>> Postdoctoral Researcher - Switzer Lab >>>>>> Dept. of Biochemistry >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>>>> -----Original Message----- >>>>>>>> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>> Sent: Monday, February 13, 2006 3:00 PM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>> >>>>>>>>> Thanks, Chris, >>>>>>>>> >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with >>>>>>> the >>>>>>> >>>> one >>>> >>>>> from >>>>> >>>>>>> your bug report. The running version is 1.5 when I use the >>>>>>> command >>>>>>> >>>> you >>>> >>>>>>> sent me. But when I tried the script, it doesn't change much. My >>>>>>> remoteblast code (portion) is here: >>>>>>> >>>>>>>>> sub search { >>>>>>>>> >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} >>>>>>> ="$ORGN"; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} >>>>>>> =5000; >>>>>>> local >>>>>>> >>>>>>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'}= >>>> >>>>>>> 'no'; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]", >>>>>>> -id=>"query", >>>>>>> -desc=>"new seq"); >>>>>>> my $len=$query->length(); >>>>>>> @db=('nr','htgs','wgs'); >>>>>>> foreach my $db (@db) { >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' >>>>>>> =>'blastn', >>>>>>> '-data' =>"$db", >>>>>>> >>>>>>> >>> '-expect'=>"$E_value"); >>> >>>>>>>>>>> my $blast_report = $factory->submit_blast($query); >>>>>>>>>>> >>>>>>>>> my @rids = $factory->each_rid(); >>>>>>>>> >>>>>>> foreach my $rid ( @rids ) { >>>>>>> print STDERR "$rid\n"; >>>>>>> } >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638) >>>>>>> print STDERR "waiting..."; >>>>>>> sleep 60; >>>>>>> >>>>>>>>> foreach my $rid ( @rids ) { >>>>>>>>> >>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>> while (!ref($rc) ) { >>>>>>> if( $rc < 0 ) { >>>>>>> # retrieve_blast returns -1 on error >>>>>>> $factory->remove_rid($rid); >>>>>>> print "Error!\n"; >>>>>>> send_error($email,$function,$seqname,$queryname[$ST]); >>>>>>> die "Can't retrieve $rid"; >>>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not >>>>>>> >>>> finished' >>>> >>>>>>> sleep 60; >>>>>>> $rc = $factory->retrieve_blast($rid); >>>>>>> } >>>>>>> } >>>>>>> if (ref($rc)) { >>>>>>> print STDERR "Done.\n"; >>>>>>> while( my $result = $rc->next_result) { >>>>>>> while( my $hit = $result->next_hit()) { >>>>>>> $hit_name=$hit->name; >>>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; >>>>>>> $name=$1; >>>>>>> @left_plus_start=(); >>>>>>> @left_plus_end=(); >>>>>>> @left_minus_start=(); >>>>>>> @left_minus_end=(); >>>>>>> @right_plus_start=(); >>>>>>> @right_plus_end=(); >>>>>>> @right_minus_start=(); >>>>>>> @right_minus_end=(); >>>>>>> >>>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { >>>>>>>>> >>>>>>> while( my $hsp = $hit->next_hsp()) { >>>>>>> ...... >>>>>>> >>>>>>>>> It was working quite well before around October laster >>>>>>>>> year, but >>>>>>>>> >>>>> it has >>>>> >>>>>>> stopped since then, When a submission is sent via a webpage, >>>>>>> the cgi >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there, >>>>>>> >>>>> finally >>>>> >>>>>>> the expected email is received but without real results >>>>>>> although it >>>>>>> >>>>> does >>>>> >>>>>>> contain something from other parts of the script. Apparently the >>>>>>> >>>>> search >>>>> >>>>>>> sub did not return anything (I know there is something should be >>>>>>> returned.). Is it also possible the format of the NCBI output >>>>>>> for >>>>>>> >>>> each >>>> >>>>>>> result has changed? >>>>>>> Thank you, >>>>>>> Guojun >>>>>>> >>>>>>>>>>> Department of Plant Biology >>>>>>>>>>> >>>>>>> University of Georgia >>>>>>> >>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>> >>>>>>>>>>>> How do you know two versions are installed (i.e. how are >>>>>>>>>>>> >>>> you >>>> >>>>> checking >>>>> >>>>>>> the >>>>>>> >>>>>>>> version)? Do you see have two complete bioperl >>>>>>>> distributions (in >>>>>>>> >>>>> two >>>>> >>>>>>>> separate directories) or are you looking in modules? Here's >>>>>>>> the >>>>>>>> >>>> way >>>> >>>>> to >>>>> >>>>>>>> check the version (from the FAQ): >>>>>>>> >>>>>>>>> perl -MBio::Root::Version -e 'print >>>>>>>>> >>>>> $Bio::Root::Version::VERSION,"\n"' >>>>> >>>>>>>>> If you have two full bioperl distributions on your computer, >>>>>>>>> >>>>> normally >>>>> >>>>>>> only >>>>>>> >>>>>>>> one will be in use unless you have explicitly set the >>>>>>>> environment >>>>>>>> >>>>>>> variable >>>>>>> >>>>>>>> PERL5LIB. The PERL5LIB directories will be searched first >>>>>>>> before >>>>>>>> >>>>> your >>>>> >>>>>>>> normal perl directory list (@INC) is searched. You MAY get >>>>>>>> some >>>>>>>> >>>>> mixing >>>>> >>>>>>>> then, but only if perl can't find a particular module in the >>>>>>>> path >>>>>>>> >>>>>>> designated >>>>>>> >>>>>>>> in PERL5LIB; then it will progress through the directories >>>>>>>> listed >>>>>>>> >>>> in >>>> >>>>>>> @INC. >>>>>>> >>>>>>>> This may happen if a module is unique to a particular >>>>>>>> release, but >>>>>>>> >>>>>>> shouldn't >>>>>>> >>>>>>>> happen for the majority of modules, including RemoteBlast. You >>>>>>>> >>>> can >>>> >>>>>>> check >>>>>>> >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'. @INC >>>>>>>> will >>>>>>>> >>>>> differ >>>>> >>>>>>>> depending on your OS, perl build, etc. >>>>>>>> >>>>>>>>> Regardless, if you follow the directions for installing >>>>>>>>> bioperl >>>>>>>>> >>>>> for >>>>> >>>>>>> your >>>>>>> >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make >>>>>>>> install', >>>>>>>> >>>>> unless >>>>> >>>>>>> you >>>>>>> >>>>>>>> explicitly change the installation directory when using 'perl >>>>>>>> >>>>>>> Makefile.PL'), >>>>>>> >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will >>>>>>>> >>>>> install >>>>> >>>>>>> the >>>>>>> >>>>>>>> Bioperl distribution you downloaded over the old version in >>>>>>>> @INC. >>>>>>>> >>>>> See >>>>> >>>>>>> this >>>>>>> >>>>>>>> page: >>>>>>>> >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL >>>>>>>>> for more details. >>>>>>>>> Christopher Fields >>>>>>>>> >>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>> Dept. of Biochemistry >>>>>>>> University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>>>> >>>>>>>>>>> Hi, Chris, >>>>>>>>>>> >>>>>>>>> I do have different versions of bioperl on my Linux machine >>>>>>>>> >>>> (1.4. >>>> >>>>> and >>>>> >>>>>>>>> 1.5.0), this may be the problem. Should I just install >>>>>>>>> bioperl- >>>>>>>>> >>>>> 1.5.1 >>>>> >>>>>>> or I >>>>>>> >>>>>>>>> need to uninstall and remove the previous versions. I could >>>>>>>>> not >>>>>>>>> >>>>> find >>>>> >>>>>>> any >>>>>>> >>>>>>>>> hint on uninstalling bioperl on linux. Could you please >>>>>>>>> give me >>>>>>>>> >>>>> some >>>>> >>>>>>>>> suggestion? >>>>>>>>> Thanks, >>>>>>>>> Guojun >>>>>>>>> >>>>>>>>>>> Department of Plant Biology >>>>>>>>>>> >>>>>>>>> University of Georgia >>>>>>>>> _____ >>>>>>>>> >>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>> >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500 >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely >>>>>>>>>>>>>>> >>>>>>> updated from CVS >>>>>>> >>>>>>>>> which isn't the latest fix. >>>>>>>>> >>>>>>>>>>> Make sure that you check the following: >>>>>>>>>>> 1) Always post to the mailing list: >>>>>>>>>>> >>>>>>>>> http://www.bioperl.org/wiki/ >>>>>>>>> HOWTO:Beginners#Getting_Assistance . >>>>>>>>> >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live >>>>>>>>>>> >>>>> (CVS) >>>>> >>>>>>>>> installed first. Perform a clean installation; do not upgrade >>>>>>>>> >>>>> only >>>>> >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we >>>>>>>>> >>>> can't >>>> >>>>>>>>> guarantee that mixing modules from old and new distributions >>>>>>>>> >>>> (1.4 >>>> >>>>> and >>>>> >>>>>>>>> 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- >>>>>>>>> live >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be >>>>>>>>> >>>>> saved >>>>> >>>>>>> and >>>>>>> >>>>>>>>> parsed; it will not parse the newest BLAST text output from >>>>>>>>> NCBI >>>>>>>>> >>>>>>> (v2.2.13) >>>>>>> >>>>>>>>> but it should still save it. I believe as long as >>>>>>>>> next_results() >>>>>>>>> >>>>> isn't >>>>> >>>>>>>>> called, it will work. >>>>>>>>> >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST >>>>>>>>>>> >>>> 2.2.13 >>>> >>>>>>> text output >>>>>>> >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by >>>>>>>>> >>>> Roger >>>> >>>>> Hall >>>>> >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be >>>>>>>>> >>>>> (Jason >>>>> >>>>>>> or >>>>>>> >>>>>>>>> whomever is in charge of Bio::SearchIO). They can be found in >>>>>>>>> >>>>>>> Bugzilla: >>>>>>> >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>>> >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>> >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the >>>>>>>>>>> >>>>> option >>>>> >>>>>>> of >>>>>>> >>>>>>>>> saving XML output, so isn't necessary if you don't plan on >>>>>>>>> using >>>>>>>>> >>>>> this >>>>> >>>>>>>>> option. And, remember, they haven't been committed yet to >>>>>>>>> CVS, >>>>>>>>> >>>>> which >>>>> >>>>>>>>> means that the final version will change to refle the new >>>>>>>>> >>>> version. >>>> >>>>>>>>>>>>> Christopher Fields >>>>>>>>>>>>> >>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>> Dept. of Biochemistry >>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>>>>>> _____ >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>>>>>>>>>> >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM >>>>>>>>> To: Chris Fields >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>>>> Hi, Chris >>>>>>>>>>>>> >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work >>>>>>>>>>> >>>>> for >>>>> >>>>>>> my cgi >>>>>>> >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I >>>>>>>>> didn't >>>>>>>>> >>>>> even >>>>> >>>>>>> get >>>>>>> >>>>>>>>> any RID. Is there any suggestion? >>>>>>>>> >>>>>>>>>>>>>>> Guojun >>>>>>>>>>>>>>> >>>>>>>>>>>>> Guojun Yang >>>>>>>>>>>>> >>>>>>>>> Department of Plant Biology >>>>>>>>> University of Georgia >>>>>>>>> Tel: 706-542-1857 >>>>>>>>> Fax: 706-542-1805 >>>>>>>>> http://www.arches.uga.edu/~guojun >>>>>>>>> _____ >>>>>>>>> >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>>>> >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500 >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>> I would say give the new code a try, but realize that it >>>>>>>>>>> >>>>> hasn't >>>>> >>>>>>> been >>>>>>> >>>>>>>>> checked >>>>>>>>> in (like I said below). I will try going over the modified >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is >>>>>>>>> >>>>> anything I >>>>> >>>>>>>>> might >>>>>>>>> have missed. The changed order in the header of BLAST text >>>>>>>>> >>>> output >>>> >>>>> has >>>>> >>>>>>> me a >>>>>>> >>>>>>>>> bit worried that it might not catch everything, but it at >>>>>>>>> least >>>>>>>>> >>>>>>> doesn't >>>>>>> >>>>>>>>> hang >>>>>>>>> in the while() loop I described in the bug report below (bug >>>>>>>>> >>>>> #1934) >>>>> >>>>>>> and >>>>>>> >>>>>>>>> seems to process everything fine. >>>>>>>>> >>>>>>>>>>> If you want more stability in the code, you might consider >>>>>>>>>>> >>>>>>> changing over >>>>>>> >>>>>>>>> to >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are >>>>>>>>> >>>>> some >>>>> >>>>>>>>> changes >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate >>>>>>>>> >>>>> saving >>>>> >>>>>>> XML >>>>>>> >>>>>>>>> output, but I believe it parses everything regardless. If you >>>>>>>>> >>>> look >>>> >>>>>>> back >>>>>>> >>>>>>>>> the >>>>>>>>> last month or so there has been a bit of discussion here about >>>>>>>>> >>>> it. >>>> >>>>>>> Jason >>>>>>> >>>>>>>>> describes a bit on how to set up RemoteBlast for XML: >>>>>>>>> >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using- >>>>>>>>>>> >>>>>>> remoteblast/ >>>>>>> >>>>>>>>>>> Christopher Fields >>>>>>>>>>> >>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>> Dept. of Biochemistry >>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM >>>>>>>>>> To: bioperl-l at bioperl.org >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>>>> >>>>> version >>>>> >>>>>>> 1.28 >>>>>>> >>>>>>>>>> Hi, Everybody, >>>>>>>>>> I see this post and am wondering if this is the reason for >>>>>>>>>> the >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named >>>>>>>>>> >>>>> MAK, >>>>> >>>>>>> for >>>>>>> >>>>>>>>> MITE >>>>>>>>> >>>>>>>>>> sequence analysis. It was working very well until around >>>>>>>>>> >>>>> November >>>>> >>>>>>> 2005, >>>>>>> >>>>>>>>>> when it stopped returning any result (the site is fine and >>>>>>>>>> >>>> seems >>>> >>>>> to >>>>> >>>>>>> be >>>>>>> >>>>>>>>>> doing sth after submission). In the CGI script, I used >>>>>>>>>> >>>>> remoteblast >>>>> >>>>>>> (that >>>>>>> >>>>>>>>>> work was done in 2003) to do searches. I currently do not >>>>>>>>>> have >>>>>>>>>> >>>>>>> access to >>>>>>> >>>>>>>>>> the server because I moved. Quite several people sent emails >>>>>>>>>> >>>> to >>>> >>>>> us >>>>> >>>>>>> about >>>>>>> >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the >>>>>>>>>> >>>>> problem? >>>>> >>>>>>>>> Should >>>>>>>>> >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new >>>>>>>>>> >>>>> version? >>>>> >>>>>>>>>> Thanks a lot, >>>>>>>>>> Guojun >>>>>>>>>> >>>>>>>>>> Department of Plant Biology >>>>>>>>>> University of Georgia >>>>>>>>>> Tel: 706-542-1857 >>>>>>>>>> Fax: 706-542-1805 >>>>>>>>>> http://www.arches.uga.edu/~guojun >>>>>>>>>> _____ >>>>>>>>>> >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang >>>>>>>>>> >>>>> Jian' >>>>> >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' >>>>>>>>>> >>>> [mailto:bioperl- >>>> >>>>>>>>>> l at bioperl.org] >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500 >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>> >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- >>>>>>>>>> live >>>>>>>>>> >>>>> CVS. >>>>> >>>>>>> It >>>>>>> >>>>>>>>>> will >>>>>>>>>> work for saving text output. However, it will not parse >>>>>>>>>> >>>> anything >>>> >>>>>>> using >>>>>>> >>>>>>>>>> next_result (it will likely hang) and will not save XML >>>>>>>>>> >>>> format. >>>> >>>>> See >>>>> >>>>>>>>> these >>>>>>>>> >>>>>>>>>> bugs: >>>>>>>>>> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>>> >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast >>>>>>>>>> >>>> and >>>> >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been >>>>>>>>>> checked in >>>>>>>>>> >>>>> yet >>>>> >>>>>>> so >>>>>>> >>>>>>>>> are >>>>>>>>> >>>>>>>>>> still not included in bioperl-live; they may be further >>>>>>>>>> >>>> modified >>>> >>>>>>> before >>>>>>> >>>>>>>>>> committing to CVS. If you're not worried about XML, you could >>>>>>>>>> >>>>> just >>>>> >>>>>>> try >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>> first fix, which is a change to SearchIO::blast. >>>>>>>>>> >>>>>>>>>> Nagesh, I remember you posting to the list a month ago >>>>>>>>>> using a >>>>>>>>>> >>>>>>> script >>>>>>> >>>>>>>>>> which >>>>>>>>>> had problems; the script you used saves the output but >>>>>>>>>> doesn't >>>>>>>>>> >>>>>>> actually >>>>>>> >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the >>>>>>>>>> >>>>> data). >>>>> >>>>>>> Is >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have >>>>>>>>>> >>>> you >>>> >>>>>>> tried >>>>>>> >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "- >>>>>>>>>> >>>>> readmethod >>>>> >>>>>>> => >>>>>>> >>>>>>>>>> blast" >>>>>>>>>> using your version of RemoteBlast and method next_result()? >>>>>>>>>> >>>> Like >>>> >>>>>>> below >>>>>>> >>>>>>>>>> (from >>>>>>>>>> perldoc): >>>>>>>>>> >>>>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>>>> foreach my $rid ( @rids ) { >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>> if( !ref($rc) ) { >>>>>>>>>> if( $rc < 0 ) { >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> } >>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>> sleep 5; >>>>>>>>>> } else { # parsing >>>>>>>>>> starts here >>>>>>>>>> my $result = $rc->next_result(); # it should hang >>>>>>>>>> here >>>>>>>>>> #save the output >>>>>>>>>> my $filename = $result->query_name()."\.out"; >>>>>>>>>> $factory->save_output($filename); >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>>>> next unless ( $v > 0); >>>>>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My script hanged if I used next_result() in any way prior to >>>>>>>>>> >>>> the >>>> >>>>>>> fixes. >>>>>>> >>>>>>>>> I >>>>>>>>> >>>>>>>>>> want to see how many others are having the same issues with >>>>>>>>>> >>>>> parsing >>>>> >>>>>>>>> using >>>>>>>>> >>>>>>>>>> the CVS version of bioperl-live. >>>>>>>>>> >>>>>>>>>> Christopher Fields >>>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>>> Dept. of Biochemistry >>>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>>> >>>> l- >>>> >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM >>>>>>>>>>> To: Huang Jian; bioperl-l >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>>> >>>>>>>>>>> Hi Huang, >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm >>>>>>>>>>> >>>>> works >>>>> >>>>>>> on >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>>> logic of checking the temporary file size to determine >>>>>>>>>>> >>>> whether >>>> >>>>> the >>>>> >>>>>>>>> Blast >>>>>>>>> >>>>>>>>>>> results are ready. This condition is not getting satisfied >>>>>>>>>>> >>>> may >>>> >>>>> be >>>>> >>>>>>> due >>>>>>> >>>>>>>>> to >>>>>>>>> >>>>>>>>>>> some changes brought about by NCBI. I had this problem >>>>>>>>>>> >>>>> recently >>>>> >>>>>>> and >>>>>>> >>>>>>>>>>> figured out that the solution was to use the latest version >>>>>>>>>>> >>>>> which >>>>> >>>>>>> has >>>>>>> >>>>>>>>>>> this problem fixed (does not use file size logic any more) >>>>>>>>>>> >>>>> which >>>>> >>>>>>> is >>>>>>> >>>>>>>>> not >>>>>>>>> >>>>>>>>>>> yet included in the BioPerl package. >>>>>>>>>>> Cheers >>>>>>>>>>> Nagesh >>>>>>>>>>> >>>>>>>>>>> Huang Jian wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Dear Nagesh, >>>>>>>>>>>> >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 >>>>>>>>>>>> >>>>> you >>>>> >>>>>>> send >>>>>>> >>>>>>>>>>>> me. Now it works perfectly!!! >>>>>>>>>>>> >>>>>>>>>>>> Thank you!! >>>>>>>>>>>> >>>>>>>>>>>> Huang >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka" >>>>>>>>>>>> >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l" >>>>>>>>>>>> >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the >>>>>>>>>>>> >>>> net, >>>> >>>>> so >>>>> >>>>>>> still >>>>>>> >>>>>>>>>>>> via email >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hi Huang, >>>>>>>>>>>>> I see that you are submitting a sequence for a remote >>>>>>>>>>>>> >>>> blast >>>> >>>>>>> search. >>>>>>> >>>>>>>>>> Can >>>>>>>>>> >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28 >>>>>>>>>>>>> >>>>>>> (2005/12/09). >>>>>>> >>>>>>>>> If >>>>>>>>> >>>>>>>>>>>>> not I have attached it with this email, try to replace it >>>>>>>>>>>>> >>>>> with >>>>> >>>>>>> the >>>>>>> >>>>>>>>>> old >>>>>>>>>> >>>>>>>>>>>>> one which has a bug. >>>>>>>>>>>>> Let me know if it works. >>>>>>>>>>>>> Nagesh >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 16 07:52:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 06:52:31 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: I think a method was recently implemented in Bio::DB::GenBank to retrieve a segment of DNA given start and end coordinates in GenBank format; that should contain the features you need. I requested it ~Nov-Dec in the mailing list but didn't get a chance to test it. Would that help? On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > Harry, > > It's not clear to me that NCBI's eutils offers this capability > directly. You > can probably download Entrez Gene entries and parse them for > coordinates but > I know of no way to remotely retrieve genomic sequences like this > from NCBI > (ENSEMBL API perhaps?). What I had in mind uses the local approach > that some > of us favor and to prove to myself that this is simple to do I wrote a > script that I just added to examples/tools, it's called > extract_genes.pl and > it's based on Bio::DB::Fasta. Download the sequence files for a given > species to some dir, download Entrez Gene's gene2accession file, > and run. It > creates and stores a hash for lookups, it won't read gene2accession > each > time it runs. > > Brian O. > > > On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> Hi Brian, >> >> Thanks very much for the pointers and the speed of your reply and >> apologies >> for the speed of mine. >> >> This looks good, but what I was looking for was a bioP approach >> for hooking to >> an API at NCBI or EBI so I could get this info and seqs from >> them. In this >> case, speed of retrieval is not critical and I'd rather not >> download the >> entirety of the sequences to a local disk to hack at them. >> >> I've determined a screen-scraping approach to get them and could >> script that, >> but I thought that bioP had a method for using NCBI's external >> API's, tho it >> may be that my memory is faulty or the approach is no longer >> supported due to >> overload. >> >> Does NCBI make such APIs available anymore? I searched a bit for >> docs on them >> but couldn't find anything (unless it's buried in the NCBI tookit, >> which I >> haven't started to excavate). >> >> Failing that, would SEALS provide such a service? Any PerlPinipeds >> listening? >> >> Harry >> >> >> >> >> >> >> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>> Harry, >>> >>> Hope you're doing well. The approach could be based on >>> Bio::DB::Fasta. So, >>> from its documentation: >>> >>> use Bio::DB::Fasta; >>> >>> # create database from directory of fasta files >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>> >>> # simple access (for those without Bioperl) >>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>> my @ids = $db->ids; >>> my $length = $db->length('CHROMOSOME_I'); >>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>> my $header = $db->header('CHROMOSOME_I'); >>> >>> # Bioperl-style access >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>> >>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>> my $seq = $obj->seq; >>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>> >>> Do you already have the offsets? >>> >>> Brian O. >>> >>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>> Hi All, >>>> >>>> After perusing the tutorial and other docs for a an evening, I >>>> still >>>> can't find the answer to this. Forgive me if I've missed something >>>> obvious. >>>> >>>> This should not be a novel request, but I've not found it >>>> answered. If >>>> bioperl isn't the best way to do this, I'd be grateful to a >>>> pointer to a >>>> better way, especially if it includes an illuminating bit of code. >>>> >>>> The problem is to retrieve genomic sequences plus & minus some >>>> offset >>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>> common followup chore for some extra analysis from a gene >>>> expression >>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>> the >>>> sequence type to specify...? >>>> >>>> >>>> TIA! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anst at kvl.dk Thu Feb 16 04:24:51 2006 From: anst at kvl.dk (Anders Stegmann) Date: Thu, 16 Feb 2006 10:24:51 +0100 Subject: [Bioperl-l] searchIO bug? Message-ID: <43F452F30200009B00000EC9@gwia.kvl.dk> Hi! I am blasting a protein seq against an identical protein. I am trying to parse the protein header by using the query_description method in the SearchIO module. After using the query_description method I use split / / in order to easily access the different header components. Here I discover that the query_description method is somehow introducing a space between number 5 comma and the following chromosome position number in the exon chromosome position list!? This truncates the list of exon chromosome positions from 7 to 4, later yielding a wrong number of the introns counted. Is this a bug? Attached is: testblast1.pl: the blastprogram to run. Q0045 the seq that is used as both query and database seq. (Q0045 has to be formated in order to be used as a database: formatdb -i Q0045 -p T -o F) Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: blastp5.pl Type: application/octet-stream Size: 50384 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/c1dd1ff5/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/c1dd1ff5/attachment-0003.obj From anst at kvl.dk Thu Feb 16 05:20:06 2006 From: anst at kvl.dk (Anders Stegmann) Date: Thu, 16 Feb 2006 11:20:06 +0100 Subject: [Bioperl-l] another searchIO bug? Message-ID: <43F45FE60200009B00000ED6@gwia.kvl.dk> Hi! I am blasting a protein seq (query) against an identical seq with a deletion of Aa nr 61 (subject). Then I print out the type of nomatch Aa and its position. The nomatch for the query seq is Aa G at position 61, which is correct. The nomatch for the subject seq is V at position 60, which is definitely not correct!? Is this a bug? testblast2.pl is the program to run Q0045 is the query seq. Q0045del61 is the subject seq (it has to be formated: formatdb -i Q0045del61 -p T -o F). Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: testblast2.pl Type: application/octet-stream Size: 6109 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045del61 Type: application/octet-stream Size: 872 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment-0002.obj From mcoyne at channing.harvard.edu Wed Feb 15 16:20:17 2006 From: mcoyne at channing.harvard.edu (Michael Coyne) Date: Wed, 15 Feb 2006 16:20:17 -0500 Subject: [Bioperl-l] Primer maps? Message-ID: <6.2.0.14.0.20060215155422.01d44a98@localhost> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060215/c777b31d/attachment-0001.html From Pieter.Monsieurs at esat.kuleuven.be Thu Feb 16 04:46:09 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Thu, 16 Feb 2006 10:46:09 +0100 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu> References: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Message-ID: <43F449E1.80605@esat.kuleuven.be> Hi, I have the same problem with the blast.pm-file. The people of NCBI added some extra info when giving the Blast-output. (see e.g. "Features flanking this part..." or "Features in this part ..."), example added. The blast.pm module starts looking for the hsp-alignement-information, but it dies when it hits this Feature-information. Pieter >gi|77552765|gb|DP000011.1| Oryza sativa (japonica cultivar-group) chromosome 12, complete sequence Length=27492551 Features flanking this part of subject sequence: 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class 2655 bp at 3' side: hypothetical protein Score = 36.2 bits (18), Expect = 0.22 Identities = 18/18 (100%), Gaps = 0/18 (0%) Strand=Plus/Minus Query 4 GTACTACTCTACTCTACT 21 |||||||||||||||||| Sbjct 19257436 GTACTACTCTACTCTACT 19257419 Features flanking this part of subject sequence: 2991 bp at 5' side: hypothetical protein 1131 bp at 3' side: hypothetical protein Score = 36.2 bits (18), Expect = 0.22 Identities = 18/18 (100%), Gaps = 0/18 (0%) Strand=Plus/Minus Query 2 ATGTACTACTCTACTCTA 19 |||||||||||||||||| Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 Features in this part of subject sequence: DHHC zinc finger domain, putative Score = 34.2 bits (17), Expect = 0.87 Identities = 17/17 (100%), Gaps = 0/17 (0%) Strand=Plus/Plus Query 5 TACTACTCTACTCTACT 21 ||||||||||||||||| Sbjct 17616437 TACTACTCTACTCTACT 17616453 Features flanking this part of subject sequence: 102 bp at 5' side: bZIP transcription factor, putative 3740 bp at 3' side: yeast dcp1, putative Score = 32.2 bits (16), Expect = 3.4 Identities = 16/16 (100%), Gaps = 0/16 (0%) Strand=Plus/Plus Query 7 CTACTCTACTCTACTC 22 |||||||||||||||| Sbjct 2775880 CTACTCTACTCTACTC 2775895 Features flanking this part of subject sequence: 21 bp at 5' side: peptide transporter T17F3.11, putative 10230 bp at 3' side: transposon protein, putative, unclassified Score = 32.2 bits (16), Expect = 3.4 Identities = 16/16 (100%), Gaps = 0/16 (0%) Strand=Plus/Minus Query 7 CTACTCTACTCTACTC 22 |||||||||||||||| Sbjct 27323153 CTACTCTACTCTACTC 27323138 Guojun Yang wrote: >Hi, Chris, >Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: >" >waiting........ >------------- EXCEPTION ------------- >MSG: no data for midline Features flanking this part of subject sequence: >STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc hIO/blast.pm:1172 >STACK toplevel remoteblast_test:40 >" >The query sequence is: >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > >The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'): >#!/usr/bin/perl > >use Bio::SeqIO; >use Bio::Seq; >use Bio::Tools::Run::RemoteBlast; >use Bio::SearchIO; >use strict; >my $prog='blastn'; >my $db='nr'; >my $e_val=1e-10; >my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >my $v = 1; > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > >while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } >} > > >Do you think there might still be something in the NCBI output format? > >Thank you, >Guojun > > > > >Guojun Yang >Department of Plant Biology >University of Georgia >Tel: 706-542-1857 >Fax: 706-542-1805 >http://www.arches.uga.edu/~guojun > > > >----- Original Message ----- >From: Chris Fields [mailto:cjfields at uiuc.edu] >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > >>Sorry, forgot to add that I didn't see the regex issue that you mentioned. >>It could be a perl-related issue. Try the fixes I mentioned and see what >>happens. >> >> >>>Christopher Fields >>> >>> >>Postdoctoral Researcher - Switzer Lab >>Dept. of Biochemistry >>University of Illinois Urbana-Champaign >> >> >>>>>-----Original Message----- >>>>> >>>>> >>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>Sent: Tuesday, February 14, 2006 12:36 PM >>>To: 'gyang at plantbio.uga.edu' >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>> >>> >>>>>It's a good habit to always add single quotes around words. The perl >>>>> >>>>> >>>interpreter may think a single bare word is a subroutine or perlfunc >>>called with no args so will try to find a subroutine named blastp(). My >>>debugger actually gives the error that the bare word blastp may conflict >>>with a future reserved word. Like you said, 'use strict' will point that >>>out. >>> >>> >>>>>As for the regex, it should match all the blast programs at NCBI (blastp, >>>>> >>>>> >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing >>>else passes through. >>> >>> >>>>>So, if you are using the script below, there are several errors. The bare >>>>> >>>>> >>>words for $prog and $db need quotes, and the flags for you @params array >>>don't have a dash before them. I get this after adding quotes but before >>>adding the dashes to @params: >>> >>> >>>>>C:\Perl\Scripts>test_blast.pl >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> >>>MSG: >>>STACK: Error::throw >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- >>>live/Bio/Root/Root.pm:328 >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- >>>live/Bio/Tools/Run/RemoteBlast.pm:256 >>>STACK: C:\Perl\Scripts\test_blast.pl:15 >>>----------------------------------------------------------- >>> >>> >>>>>The last line indicates a problem with this line: >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>>Changing the @params to this: >>>>>my @params=( -prog=>$prog, >>>>> >>>>> >>> -data=>$db, >>> -expect=>$e_val, >>> -readmethod=>'SearchIO'); >>> >>> >>>>>fixes it, and I get output as expected. >>>>>Christopher Fields >>>>> >>>>> >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>>>>>>>-----Original Message----- >>>>>>>> >>>>>>>> >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>Sent: Tuesday, February 14, 2006 11:48 AM >>>>To: Chris Fields; bioperl-l at lists.open-bio.org >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>> >>>>Hi, Chris, >>>>When I tried with the perldoc script, It did not work either. First it >>>>says $prog can not be bare word if I "use strict". I added quotes on the >>>>words, then it says the value for $prog does not match expression >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The >>>> >>>> >>>script >>> >>> >>>>is shown below. Why is the expression "t?blast[pnx]"? >>>> >>>>#!/usr/bin/perl >>>> >>>>use Bio::SeqIO; >>>>use Bio::Seq; >>>>use Bio::Tools::Run::RemoteBlast; >>>>use Bio::SearchIO; >>>> >>>> >>>>my $prog=blastp; >>>>my $db=swissprot; >>>>my $e_val=1e-10; >>>>my @params=( prog=>$prog, >>>> data=>$db, >>>> expect=>$e_val, >>>> readmethod=>'SearchIO'); >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>>my $v = 1; >>>> >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); >>>> >>>>while (my $input = $str->next_seq()){ >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> my $r = $factory->submit_blast($input); >>>> #my $r = $factory->submit_blast('amino.fa'); >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) { >>>> foreach my $rid ( @rids ) { >>>> my $rc = $factory->retrieve_blast($rid); >>>> if( !ref($rc) ) { >>>> if( $rc < 0 ) { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } else { >>>> my $result = $rc->next_result(); >>>> #save the output >>>> my $filename = $result->query_name()."\.out"; >>>> $factory->save_output($filename); >>>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>> while ( my $hit = $result->next_hit ) { >>>> next unless ( $v > 0); >>>> print "\thit name is ", $hit->name, "\n"; >>>> while( my $hsp = $hit->next_hsp ) { >>>> print "\t\tscore is ", $hsp->score, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> } >>>>} >>>> >>>>Thank you for your help! >>>> >>>> >>>>Guojun >>>>Department of Plant Biology >>>>University of Georgia >>>> >>>>----- Original Message ----- >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>To: gyang at plantbio.uga.edu >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>> >>>> >>>> >>>> >>>>>Try two things: >>>>> >>>>> >>>>>>1) Use a much simpler script, like the one in 'perldoc >>>>>> >>>>>> >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something >>>>> >>>>> >>>>wrong >>>> >>>> >>>>>with the logic in your subroutine: >>>>> >>>>> >>>>>>my $v = 1; >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); >>>>>>while (my $input = $str->next_seq()){ >>>>>> >>>>>> >>>>> #Blast a sequence against a database: >>>>> #Alternatively, you could pass in a file with many >>>>> #sequences rather than loop through sequence one at a time >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>> #and swap the two lines below for an example of that. >>>>> my $r = $factory->submit_blast($input); >>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>> print STDERR "waiting..." if( $v > 0 ); >>>>> while ( my @rids = $factory->each_rid ) { >>>>> foreach my $rid ( @rids ) { >>>>> my $rc = $factory->retrieve_blast($rid); >>>>> if( !ref($rc) ) { >>>>> if( $rc < 0 ) { >>>>> $factory->remove_rid($rid); >>>>> } >>>>> print STDERR "." if ( $v > 0 ); >>>>> sleep 5; >>>>> } else { >>>>> my $result = $rc->next_result(); >>>>> #save the output >>>>> my $filename = $result->query_name()."\.out"; >>>>> $factory->save_output($filename); >>>>> $factory->remove_rid($rid); >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>> while ( my $hit = $result->next_hit ) { >>>>> next unless ( $v > 0); >>>>> print "\thit name is ", $hit->name, "\n"; >>>>> while( my $hsp = $hit->next_hsp ) { >>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>>} >>>>> >>>>> >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It >>>>>> >>>>>> >>>really >>> >>> >>>>>shouldn't make that much of a difference, but I noticed that the CVS >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was >>>>>released; the Bugzilla version is based off CVS. >>>>> >>>>> >>>>>>Christopher Fields >>>>>> >>>>>> >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>>>>-----Original Message----- >>>>>>> >>>>>>> >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>Sent: Monday, February 13, 2006 3:00 PM >>>>>>To: bioperl-l at lists.open-bio.org >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>> >>>>>> >>>>>>>>Thanks, Chris, >>>>>>>> >>>>>>>> >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the >>>>>> >>>>>> >>>one >>> >>> >>>>from >>>> >>>> >>>>>>your bug report. The running version is 1.5 when I use the command >>>>>> >>>>>> >>>you >>> >>> >>>>>>sent me. But when I tried the script, it doesn't change much. My >>>>>>remoteblast code (portion) is here: >>>>>> >>>>>> >>>>>>>>sub search { >>>>>>>> >>>>>>>> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; >>>>>>local >>>>>> >>>>>> >>>>>> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= >>> >>> >>>>>>'no'; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", >>>>>> -id=>"query", >>>>>> -desc=>"new seq"); >>>>>>my $len=$query->length(); >>>>>>@db=('nr','htgs','wgs'); >>>>>>foreach my $db (@db) { >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', >>>>>> '-data' =>"$db", >>>>>> >>>>>> >>>>>> >>'-expect'=>"$E_value"); >> >> >>>>>>>>>>my $blast_report = $factory->submit_blast($query); >>>>>>>>>> >>>>>>>>>> >>>>>>>>my @rids = $factory->each_rid(); >>>>>>>> >>>>>>>> >>>>>>foreach my $rid ( @rids ) { >>>>>> print STDERR "$rid\n"; >>>>>>} >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) >>>>>>print STDERR "waiting..."; >>>>>>sleep 60; >>>>>> >>>>>> >>>>>>>>foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> while (!ref($rc) ) { >>>>>> if( $rc < 0 ) { >>>>>># retrieve_blast returns -1 on error >>>>>> $factory->remove_rid($rid); >>>>>> print "Error!\n"; >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); >>>>>> die "Can't retrieve $rid"; >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not >>>>>> >>>>>> >>>finished' >>> >>> >>>>>> sleep 60; >>>>>> $rc = $factory->retrieve_blast($rid); >>>>>> } >>>>>> } >>>>>> if (ref($rc)) { >>>>>> print STDERR "Done.\n"; >>>>>> while( my $result = $rc->next_result) { >>>>>> while( my $hit = $result->next_hit()) { >>>>>> $hit_name=$hit->name; >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; >>>>>> $name=$1; >>>>>> @left_plus_start=(); >>>>>> @left_plus_end=(); >>>>>> @left_minus_start=(); >>>>>> @left_minus_end=(); >>>>>> @right_plus_start=(); >>>>>> @right_plus_end=(); >>>>>> @right_minus_start=(); >>>>>> @right_minus_end=(); >>>>>> >>>>>> >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { >>>>>>>> >>>>>>>> >>>>>> while( my $hsp = $hit->next_hsp()) { >>>>>>...... >>>>>> >>>>>> >>>>>>>>It was working quite well before around October laster year, but >>>>>>>> >>>>>>>> >>>>it has >>>> >>>> >>>>>>stopped since then, When a submission is sent via a webpage, the cgi >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, >>>>>> >>>>>> >>>>finally >>>> >>>> >>>>>>the expected email is received but without real results although it >>>>>> >>>>>> >>>>does >>>> >>>> >>>>>>contain something from other parts of the script. Apparently the >>>>>> >>>>>> >>>>search >>>> >>>> >>>>>>sub did not return anything (I know there is something should be >>>>>>returned.). Is it also possible the format of the NCBI output for >>>>>> >>>>>> >>>each >>> >>> >>>>>>result has changed? >>>>>>Thank you, >>>>>>Guojun >>>>>> >>>>>> >>>>>>>>>>Department of Plant Biology >>>>>>>>>> >>>>>>>>>> >>>>>>University of Georgia >>>>>> >>>>>> >>>>>>>>>>>>----- Original Message ----- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>> >>>>>> >>>>>>>>>>>How do you know two versions are installed (i.e. how are >>>>>>>>>>> >>>>>>>>>>> >>>you >>> >>> >>>>checking >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>version)? Do you see have two complete bioperl distributions (in >>>>>>> >>>>>>> >>>>two >>>> >>>> >>>>>>>separate directories) or are you looking in modules? Here's the >>>>>>> >>>>>>> >>>way >>> >>> >>>>to >>>> >>>> >>>>>>>check the version (from the FAQ): >>>>>>> >>>>>>> >>>>>>>>perl -MBio::Root::Version -e 'print >>>>>>>> >>>>>>>> >>>>$Bio::Root::Version::VERSION,"\n"' >>>> >>>> >>>>>>>>If you have two full bioperl distributions on your computer, >>>>>>>> >>>>>>>> >>>>normally >>>> >>>> >>>>>>only >>>>>> >>>>>> >>>>>>>one will be in use unless you have explicitly set the environment >>>>>>> >>>>>>> >>>>>>variable >>>>>> >>>>>> >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first before >>>>>>> >>>>>>> >>>>your >>>> >>>> >>>>>>>normal perl directory list (@INC) is searched. You MAY get some >>>>>>> >>>>>>> >>>>mixing >>>> >>>> >>>>>>>then, but only if perl can't find a particular module in the path >>>>>>> >>>>>>> >>>>>>designated >>>>>> >>>>>> >>>>>>>in PERL5LIB; then it will progress through the directories listed >>>>>>> >>>>>>> >>>in >>> >>> >>>>>>@INC. >>>>>> >>>>>> >>>>>>>This may happen if a module is unique to a particular release, but >>>>>>> >>>>>>> >>>>>>shouldn't >>>>>> >>>>>> >>>>>>>happen for the majority of modules, including RemoteBlast. You >>>>>>> >>>>>>> >>>can >>> >>> >>>>>>check >>>>>> >>>>>> >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will >>>>>>> >>>>>>> >>>>differ >>>> >>>> >>>>>>>depending on your OS, perl build, etc. >>>>>>> >>>>>>> >>>>>>>>Regardless, if you follow the directions for installing bioperl >>>>>>>> >>>>>>>> >>>>for >>>> >>>> >>>>>>your >>>>>> >>>>>> >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', >>>>>>> >>>>>>> >>>>unless >>>> >>>> >>>>>>you >>>>>> >>>>>> >>>>>>>explicitly change the installation directory when using 'perl >>>>>>> >>>>>>> >>>>>>Makefile.PL'), >>>>>> >>>>>> >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will >>>>>>> >>>>>>> >>>>install >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>Bioperl distribution you downloaded over the old version in @INC. >>>>>>> >>>>>>> >>>>See >>>> >>>> >>>>>>this >>>>>> >>>>>> >>>>>>>page: >>>>>>> >>>>>>> >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL >>>>>>>>for more details. >>>>>>>>Christopher Fields >>>>>>>> >>>>>>>> >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>> >>>>>>>>>> >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM >>>>>>>>To: bioperl-l at lists.open-bio.org >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>Hi, Chris, >>>>>>>>>> >>>>>>>>>> >>>>>>>>I do have different versions of bioperl on my Linux machine >>>>>>>> >>>>>>>> >>>(1.4. >>> >>> >>>>and >>>> >>>> >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- >>>>>>>> >>>>>>>> >>>>1.5.1 >>>> >>>> >>>>>>or I >>>>>> >>>>>> >>>>>>>>need to uninstall and remove the previous versions. I could not >>>>>>>> >>>>>>>> >>>>find >>>> >>>> >>>>>>any >>>>>> >>>>>> >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me >>>>>>>> >>>>>>>> >>>>some >>>> >>>> >>>>>>>>suggestion? >>>>>>>>Thanks, >>>>>>>>Guojun >>>>>>>> >>>>>>>> >>>>>>>>>>Department of Plant Biology >>>>>>>>>> >>>>>>>>>> >>>>>>>>University of Georgia >>>>>>>> _____ >>>>>>>> >>>>>>>> >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>> >>>>>>>>>> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>updated from CVS >>>>>> >>>>>> >>>>>>>>which isn't the latest fix. >>>>>>>> >>>>>>>> >>>>>>>>>>Make sure that you check the following: >>>>>>>>>>1) Always post to the mailing list: >>>>>>>>>> >>>>>>>>>> >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . >>>>>>>> >>>>>>>> >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live >>>>>>>>>> >>>>>>>>>> >>>>(CVS) >>>> >>>> >>>>>>>>installed first. Perform a clean installation; do not upgrade >>>>>>>> >>>>>>>> >>>>only >>>> >>>> >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we >>>>>>>> >>>>>>>> >>>can't >>> >>> >>>>>>>>guarantee that mixing modules from old and new distributions >>>>>>>> >>>>>>>> >>>(1.4 >>> >>> >>>>and >>>> >>>> >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be >>>>>>>> >>>>>>>> >>>>saved >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI >>>>>>>> >>>>>>>> >>>>>>(v2.2.13) >>>>>> >>>>>> >>>>>>>>but it should still save it. I believe as long as next_results() >>>>>>>> >>>>>>>> >>>>isn't >>>> >>>> >>>>>>>>called, it will work. >>>>>>>> >>>>>>>> >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST >>>>>>>>>> >>>>>>>>>> >>>2.2.13 >>> >>> >>>>>>text output >>>>>> >>>>>> >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by >>>>>>>> >>>>>>>> >>>Roger >>> >>> >>>>Hall >>>> >>>> >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be >>>>>>>> >>>>>>>> >>>>(Jason >>>> >>>> >>>>>>or >>>>>> >>>>>> >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in >>>>>>>> >>>>>>>> >>>>>>Bugzilla: >>>>>> >>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>> >>>>>>>> >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the >>>>>>>>>> >>>>>>>>>> >>>>option >>>> >>>> >>>>>>of >>>>>> >>>>>> >>>>>>>>saving XML output, so isn't necessary if you don't plan on using >>>>>>>> >>>>>>>> >>>>this >>>> >>>> >>>>>>>>option. And, remember, they haven't been committed yet to CVS, >>>>>>>> >>>>>>>> >>>>which >>>> >>>> >>>>>>>>means that the final version will change to refle the new >>>>>>>> >>>>>>>> >>>version. >>> >>> >>>>>>>>>>>>Christopher Fields >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>>>>>> _____ >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM >>>>>>>>To: Chris Fields >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>>>Hi, Chris >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work >>>>>>>>>> >>>>>>>>>> >>>>for >>>> >>>> >>>>>>my cgi >>>>>> >>>>>> >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't >>>>>>>> >>>>>>>> >>>>even >>>> >>>> >>>>>>get >>>>>> >>>>>> >>>>>>>>any RID. Is there any suggestion? >>>>>>>> >>>>>>>> >>>>>>>>>>>>>>Guojun >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>Guojun Yang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Department of Plant Biology >>>>>>>>University of Georgia >>>>>>>>Tel: 706-542-1857 >>>>>>>>Fax: 706-542-1805 >>>>>>>>http://www.arches.uga.edu/~guojun >>>>>>>> _____ >>>>>>>> >>>>>>>> >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>I would say give the new code a try, but realize that it >>>>>>>>>> >>>>>>>>>> >>>>hasn't >>>> >>>> >>>>>>been >>>>>> >>>>>> >>>>>>>>checked >>>>>>>>in (like I said below). I will try going over the modified >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is >>>>>>>> >>>>>>>> >>>>anything I >>>> >>>> >>>>>>>>might >>>>>>>>have missed. The changed order in the header of BLAST text >>>>>>>> >>>>>>>> >>>output >>> >>> >>>>has >>>> >>>> >>>>>>me a >>>>>> >>>>>> >>>>>>>>bit worried that it might not catch everything, but it at least >>>>>>>> >>>>>>>> >>>>>>doesn't >>>>>> >>>>>> >>>>>>>>hang >>>>>>>>in the while() loop I described in the bug report below (bug >>>>>>>> >>>>>>>> >>>>#1934) >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>seems to process everything fine. >>>>>>>> >>>>>>>> >>>>>>>>>>If you want more stability in the code, you might consider >>>>>>>>>> >>>>>>>>>> >>>>>>changing over >>>>>> >>>>>> >>>>>>>>to >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are >>>>>>>> >>>>>>>> >>>>some >>>> >>>> >>>>>>>>changes >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate >>>>>>>> >>>>>>>> >>>>saving >>>> >>>> >>>>>>XML >>>>>> >>>>>> >>>>>>>>output, but I believe it parses everything regardless. If you >>>>>>>> >>>>>>>> >>>look >>> >>> >>>>>>back >>>>>> >>>>>> >>>>>>>>the >>>>>>>>last month or so there has been a bit of discussion here about >>>>>>>> >>>>>>>> >>>it. >>> >>> >>>>>>Jason >>>>>> >>>>>> >>>>>>>>describes a bit on how to set up RemoteBlast for XML: >>>>>>>> >>>>>>>> >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- >>>>>>>>>> >>>>>>>>>> >>>>>>remoteblast/ >>>>>> >>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>> >>>>>>>>>> >>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM >>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>>> >>>>>>>>> >>>>version >>>> >>>> >>>>>>1.28 >>>>>> >>>>>> >>>>>>>>>Hi, Everybody, >>>>>>>>>I see this post and am wondering if this is the reason for the >>>>>>>>>malfunctionning of my webserver. We set up a webserver named >>>>>>>>> >>>>>>>>> >>>>MAK, >>>> >>>> >>>>>>for >>>>>> >>>>>> >>>>>>>>MITE >>>>>>>> >>>>>>>> >>>>>>>>>sequence analysis. It was working very well until around >>>>>>>>> >>>>>>>>> >>>>November >>>> >>>> >>>>>>2005, >>>>>> >>>>>> >>>>>>>>>when it stopped returning any result (the site is fine and >>>>>>>>> >>>>>>>>> >>>seems >>> >>> >>>>to >>>> >>>> >>>>>>be >>>>>> >>>>>> >>>>>>>>>doing sth after submission). In the CGI script, I used >>>>>>>>> >>>>>>>>> >>>>remoteblast >>>> >>>> >>>>>>(that >>>>>> >>>>>> >>>>>>>>>work was done in 2003) to do searches. I currently do not have >>>>>>>>> >>>>>>>>> >>>>>>access to >>>>>> >>>>>> >>>>>>>>>the server because I moved. Quite several people sent emails >>>>>>>>> >>>>>>>>> >>>to >>> >>> >>>>us >>>> >>>> >>>>>>about >>>>>> >>>>>> >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the >>>>>>>>> >>>>>>>>> >>>>problem? >>>> >>>> >>>>>>>>Should >>>>>>>> >>>>>>>> >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new >>>>>>>>> >>>>>>>>> >>>>version? >>>> >>>> >>>>>>>>>Thanks a lot, >>>>>>>>>Guojun >>>>>>>>> >>>>>>>>>Department of Plant Biology >>>>>>>>>University of Georgia >>>>>>>>>Tel: 706-542-1857 >>>>>>>>>Fax: 706-542-1805 >>>>>>>>>http://www.arches.uga.edu/~guojun >>>>>>>>>_____ >>>>>>>>> >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang >>>>>>>>> >>>>>>>>> >>>>Jian' >>>> >>>> >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' >>>>>>>>> >>>>>>>>> >>>[mailto:bioperl- >>> >>> >>>>>>>>>l at bioperl.org] >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>> >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live >>>>>>>>> >>>>>>>>> >>>>CVS. >>>> >>>> >>>>>>It >>>>>> >>>>>> >>>>>>>>>will >>>>>>>>>work for saving text output. However, it will not parse >>>>>>>>> >>>>>>>>> >>>anything >>> >>> >>>>>>using >>>>>> >>>>>> >>>>>>>>>next_result (it will likely hang) and will not save XML >>>>>>>>> >>>>>>>>> >>>format. >>> >>> >>>>See >>>> >>>> >>>>>>>>these >>>>>>>> >>>>>>>> >>>>>>>>>bugs: >>>>>>>>> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>> >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast >>>>>>>>> >>>>>>>>> >>>and >>> >>> >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in >>>>>>>>> >>>>>>>>> >>>>yet >>>> >>>> >>>>>>so >>>>>> >>>>>> >>>>>>>>are >>>>>>>> >>>>>>>> >>>>>>>>>still not included in bioperl-live; they may be further >>>>>>>>> >>>>>>>>> >>>modified >>> >>> >>>>>>before >>>>>> >>>>>> >>>>>>>>>committing to CVS. If you're not worried about XML, you could >>>>>>>>> >>>>>>>>> >>>>just >>>> >>>> >>>>>>try >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>first fix, which is a change to SearchIO::blast. >>>>>>>>> >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a >>>>>>>>> >>>>>>>>> >>>>>>script >>>>>> >>>>>> >>>>>>>>>which >>>>>>>>>had problems; the script you used saves the output but doesn't >>>>>>>>> >>>>>>>>> >>>>>>actually >>>>>> >>>>>> >>>>>>>>>parse it (i.e. you don't use next_result() to go through the >>>>>>>>> >>>>>>>>> >>>>data). >>>> >>>> >>>>>>Is >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have >>>>>>>>> >>>>>>>>> >>>you >>> >>> >>>>>>tried >>>>>> >>>>>> >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- >>>>>>>>> >>>>>>>>> >>>>readmethod >>>> >>>> >>>>>>=> >>>>>> >>>>>> >>>>>>>>>blast" >>>>>>>>>using your version of RemoteBlast and method next_result()? >>>>>>>>> >>>>>>>>> >>>Like >>> >>> >>>>>>below >>>>>> >>>>>> >>>>>>>>>(from >>>>>>>>>perldoc): >>>>>>>>> >>>>>>>>>while ( my @rids = $factory->each_rid ) { >>>>>>>>>foreach my $rid ( @rids ) { >>>>>>>>>my $rc = $factory->retrieve_blast($rid); >>>>>>>>>if( !ref($rc) ) { >>>>>>>>>if( $rc < 0 ) { >>>>>>>>>$factory->remove_rid($rid); >>>>>>>>>} >>>>>>>>>print STDERR "." if ( $v > 0 ); >>>>>>>>>sleep 5; >>>>>>>>>} else { # parsing >>>>>>>>>starts here >>>>>>>>>my $result = $rc->next_result(); # it should hang >>>>>>>>>here >>>>>>>>>#save the output >>>>>>>>>my $filename = $result->query_name()."\.out"; >>>>>>>>>$factory->save_output($filename); >>>>>>>>>$factory->remove_rid($rid); >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>>>>>while ( my $hit = $result->next_hit ) { >>>>>>>>>next unless ( $v > 0); >>>>>>>>>print "\thit name is ", $hit->name, "\n"; >>>>>>>>>while( my $hsp = $hit->next_hsp ) { >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>> >>>>>>>>> >>>>>>>>>My script hanged if I used next_result() in any way prior to >>>>>>>>> >>>>>>>>> >>>the >>> >>> >>>>>>fixes. >>>>>> >>>>>> >>>>>>>>I >>>>>>>> >>>>>>>> >>>>>>>>>want to see how many others are having the same issues with >>>>>>>>> >>>>>>>>> >>>>parsing >>>> >>>> >>>>>>>>using >>>>>>>> >>>>>>>> >>>>>>>>>the CVS version of bioperl-live. >>>>>>>>> >>>>>>>>>Christopher Fields >>>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>>Dept. of Biochemistry >>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>> >>>>>>>>>> >>>l- >>> >>> >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM >>>>>>>>>>To: Huang Jian; bioperl-l >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>> >>>>>>>>>>Hi Huang, >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm >>>>>>>>>> >>>>>>>>>> >>>>works >>>> >>>> >>>>>>on >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>>logic of checking the temporary file size to determine >>>>>>>>>> >>>>>>>>>> >>>whether >>> >>> >>>>the >>>> >>>> >>>>>>>>Blast >>>>>>>> >>>>>>>> >>>>>>>>>>results are ready. This condition is not getting satisfied >>>>>>>>>> >>>>>>>>>> >>>may >>> >>> >>>>be >>>> >>>> >>>>>>due >>>>>> >>>>>> >>>>>>>>to >>>>>>>> >>>>>>>> >>>>>>>>>>some changes brought about by NCBI. I had this problem >>>>>>>>>> >>>>>>>>>> >>>>recently >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>>>figured out that the solution was to use the latest version >>>>>>>>>> >>>>>>>>>> >>>>which >>>> >>>> >>>>>>has >>>>>> >>>>>> >>>>>>>>>>this problem fixed (does not use file size logic any more) >>>>>>>>>> >>>>>>>>>> >>>>which >>>> >>>> >>>>>>is >>>>>> >>>>>> >>>>>>>>not >>>>>>>> >>>>>>>> >>>>>>>>>>yet included in the BioPerl package. >>>>>>>>>>Cheers >>>>>>>>>>Nagesh >>>>>>>>>> >>>>>>>>>>Huang Jian wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>Dear Nagesh, >>>>>>>>>>> >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 >>>>>>>>>>> >>>>>>>>>>> >>>>you >>>> >>>> >>>>>>send >>>>>> >>>>>> >>>>>>>>>>>me. Now it works perfectly!!! >>>>>>>>>>> >>>>>>>>>>>Thank you!! >>>>>>>>>>> >>>>>>>>>>>Huang >>>>>>>>>>> >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" >>>>>>>>>>> >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" >>>>>>>>>>> >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the >>>>>>>>>>> >>>>>>>>>>> >>>net, >>> >>> >>>>so >>>> >>>> >>>>>>still >>>>>> >>>>>> >>>>>>>>>>>via email >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>Hi Huang, >>>>>>>>>>>>I see that you are submitting a sequence for a remote >>>>>>>>>>>> >>>>>>>>>>>> >>>blast >>> >>> >>>>>>search. >>>>>> >>>>>> >>>>>>>>>Can >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>(2005/12/09). >>>>>> >>>>>> >>>>>>>>If >>>>>>>> >>>>>>>> >>>>>>>>>>>>not I have attached it with this email, try to replace it >>>>>>>>>>>> >>>>>>>>>>>> >>>>with >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>>>old >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>one which has a bug. >>>>>>>>>>>>Let me know if it works. >>>>>>>>>>>>Nagesh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>_______________________________________________ >>>>>> >>>>>> >>>>>>>>Bioperl-l mailing list >>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>> >>>>>>>> >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From jason.stajich at duke.edu Thu Feb 16 09:00:01 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 16 Feb 2006 09:00:01 -0500 Subject: [Bioperl-l] searchIO bug? In-Reply-To: <43F452F30200009B00000EC9@gwia.kvl.dk> References: <43F452F30200009B00000EC9@gwia.kvl.dk> Message-ID: <11B49C84-9C04-4F43-9278-A3AA09C9B773@duke.edu> i think it would be more helpful if you posted the actual report rather than the protein since this may be dependent on the version of blast you are using. if you used split(/\s+/, $header) it wouldn't matter how many spaces. On Feb 16, 2006, at 4:24 AM, Anders Stegmann wrote: > Hi! > > > I am blasting a protein seq against an identical protein. > I am trying to parse the protein header by using the query_description > method in the SearchIO module. > After using the query_description method I use split / / in order > to easily access the different header components. > Here I discover that the query_description method is somehow > introducing > a space between number 5 comma and the following chromosome position > number > in the exon chromosome position list!? > This truncates the list of exon chromosome positions from 7 to 4, > later > yielding a wrong number of the introns counted. > > Is this a bug? > > Attached is: > > testblast1.pl: the blastprogram to run. > > Q0045 the seq that is used as both query and database seq. > (Q0045 has to be formated in order to be used as a database: > formatdb -i > Q0045 -p T -o F) > > > Regards Anders. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From cjfields at uiuc.edu Thu Feb 16 10:50:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 09:50:04 -0600 Subject: [Bioperl-l] additional error message In-Reply-To: <20060216100410.54a1a6d5@dogwood.plantbio.uga.edu> Message-ID: <002901c63310$a7da1b20$15327e82@pyrimidine> I don't think the apache error is related to the main issue here, but you could always try upgrading LWP to see if that fixes it. The second issue is text parsing issues in SearchIO specific to nucleotide BLAST information, which I'm looking into. Jason has posted a bit on using XML. Basically, do the following: my $prog = 'blastn'; my $db = 'nr'; my $e_val=1e-10; my $v = 1; my @params=(-prog=>$prog, -data=>$db, -expect=>$e_val, -readmethod=>'xml'); my $factory=Bio::Tools::Run::RemoteBlast->new(@params); $factory->retrieve_parameter('FORMAT_TYPE', 'XML'); You'll also need to modify following line: my $filename = $result->query_name()."\.out"; b/c the XML tag for this feature is actually part of the rid for some reason, so you'll get a weird output file name. This is a problem with NCBI's XML output, not SearchIO::XML parsing. XML BLAST files can be really big (~5 MB and up depending on how much information is returned), so it may take a little time to go through the data. Right now, it is the only consistently reliable way that output can be parsed at this moment as NCBI keeps changing text output, sending us back into "SearchIO::blast hell," as J.S. puts it. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > Sent: Thursday, February 16, 2006 9:04 AM > To: Chris Fields; Pieter Monsieurs > Cc: bioperl-l at lists.open-bio.org > Subject: additional error message > > when I check my apache error_log, there is one line saying: > "waiting...Parsing of undecoded UTF-8 will give garbage when decoding > entities at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 137.," > I also see an error saying "MSG: no data for midline Features flanking > this part of subject sequence:, " that is mentioned by Pieter. > Chris, may I have your suggestion on change it to XML parsing? I read > Jason's comments/suggestions about it, but could not make it work. > Thanks > > Guojun > Department of Plant Biology > University of Georgia > > > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: Pieter Monsieurs [mailto:Pieter.Monsieurs at esat.kuleuven.be] > Cc: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > > > Yeah, looks like it broke text output nucleotide parsing with that. > > XML output parsing still works though (as expected). I'll give it a > > look. > > > Chris > > > On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote: > > > > Hi, > > > > > > I have the same problem with the blast.pm-file. > > > The people of NCBI added some extra info when giving the Blast- > > > output. (see e.g. "Features flanking this part..." or "Features in > > > this part ..."), example added. > > > The blast.pm module starts looking for the hsp-alignement- > > > information, but it dies when it hits this Feature-information. > > > > > > Pieter > > > > > > > > >> gi|77552765|gb|DP000011.1| > >> query.fcgi? > > >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza > > >> sativa (japonica cultivar-group) chromosome 12, complete > > > > > > sequence > > > Length=27492551 > > > > > > Features flanking this part of subject sequence: > > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm > > > sub-class > > val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts> > > > 2655 bp at 3' side: hypothetical protein > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 4 GTACTACTCTACTCTACT 21 > > > |||||||||||||||||| > > > > > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > > > > > > > Features flanking this part of subject sequence: > > > 2991 bp at 5' side: hypothetical protein > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts> > > > 1131 bp at 3' side: hypothetical protein > > > > > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 2 ATGTACTACTCTACTCTA 19 > > > |||||||||||||||||| > > > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > > > > > > > > > Features in this part of subject sequence: > > > DHHC zinc finger domain, putative > > > > > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts> > > > > > > Score = 34.2 bits (17), Expect = 0.87 > > > Identities = 17/17 (100%), Gaps = 0/17 (0%) > > > Strand=Plus/Plus > > > > > > Query 5 TACTACTCTACTCTACT 21 > > > ||||||||||||||||| > > > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > > > > > > > > > Features flanking this part of subject sequence: > > > 102 bp at 5' side: bZIP transcription factor, putative > > > > > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts> > > > 3740 bp at 3' side: yeast dcp1, putative > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Plus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > |||||||||||||||| > > > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > > > > > > > Features flanking this part of subject sequence: > > > > > > 21 bp at 5' side: peptide transporter T17F3.11, putative > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts> > > > 10230 bp at 3' side: transposon protein, putative, unclassified > > > > > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Minus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > > > > |||||||||||||||| > > > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > > > > > > > > > > > Guojun Yang wrote: > > > > > >> Hi, Chris, > > >> Finally the remoteblast test script works for the amino.fa query. > > >> but when I try a nucleic acid sequence (see below), Error occurs: " > > >> waiting........ > > >> ------------- EXCEPTION ------------- > > >> MSG: no data for midline Features flanking this part of subject > > >> sequence: > > >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ > > >> 5.8.3/Bio/Searc hIO/blast.pm:1172 > > >> STACK toplevel remoteblast_test:40 > > >> " > > >> The query sequence is: > > >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > > >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > > >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > > >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > >> > > >> The script (basically same as the remoteblast test, I only changed > > >> database to 'nr' and program to 'blastn' and filename to 'ost3'): > > >> #!/usr/bin/perl > > >> > > >> use Bio::SeqIO; > > >> use Bio::Seq; > > >> use Bio::Tools::Run::RemoteBlast; > > >> use Bio::SearchIO; > > >> use strict; > > >> my $prog='blastn'; > > >> my $db='nr'; > > >> my $e_val=1e-10; > > >> my @params=( -prog=>$prog, > > >> -data=>$db, > > >> -expect=>$e_val, > > >> -readmethod=>'SearchIO'); > > >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >> > > >> my $v = 1; > > >> > > >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > >> > > >> while (my $input = $str->next_seq()){ > > >> #Blast a sequence against a database: > > >> #Alternatively, you could pass in a file with many > > >> #sequences rather than loop through sequence one at a time > > >> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >> #and swap the two lines below for an example of that. > > >> my $r = $factory->submit_blast($input); > > >> #my $r = $factory->submit_blast('amino.fa'); > > >> print STDERR "waiting..." if( $v > 0 ); > > >> while ( my @rids = $factory->each_rid ) { > > >> foreach my $rid ( @rids ) { > > >> my $rc = $factory->retrieve_blast($rid); > > >> if( !ref($rc) ) { > > >> if( $rc < 0 ) { > > >> $factory->remove_rid($rid); > > >> } > > >> print STDERR "." if ( $v > 0 ); > > >> sleep 5; > > >> } else { > > >> my $result = $rc->next_result(); > > >> #save the output > > >> my $filename = $result->query_name()."\.out"; > > >> $factory->save_output($filename); > > >> $factory->remove_rid($rid); > > >> print "\nQuery Name: ", $result->query_name(), "\n"; > > >> while ( my $hit = $result->next_hit ) { > > >> next unless ( $v > 0); > > >> print "\thit name is ", $hit->name, "\n"; > > >> while( my $hsp = $hit->next_hsp ) { > > >> print "\t\tscore is ", $hsp->score, "\n"; > > >> } > > >> } > > >> } > > >> } > > >> } > > >> } > > >> > > >> > > >> Do you think there might still be something in the NCBI output > > >> format? > > >> > > >> Thank you, > > >> Guojun > > >> > > >> > > >> > > >> > > >> Guojun Yang > > >> Department of Plant Biology > > >> University of Georgia > > >> Tel: 706-542-1857 > > >> Fax: 706-542-1805 > > >> http://www.arches.uga.edu/~guojun > > >> > > >> > > >> > > >> ----- Original Message ----- > > >> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >> > > >> > > >> > > >>> Sorry, forgot to add that I didn't see the regex issue that you > > >>> mentioned. > > >>> It could be a perl-related issue. Try the fixes I mentioned and > > >>> see what > > >>> happens. > > >>> > > >>>> Christopher Fields > > >>>> > > >>> Postdoctoral Researcher - Switzer Lab > > >>> Dept. of Biochemistry > > >>> University of Illinois Urbana-Champaign > > >>>>>> -----Original Message----- > > >>>>>> > > >>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>> Sent: Tuesday, February 14, 2006 12:36 PM > > >>>> To: 'gyang at plantbio.uga.edu' > > >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>>> > > >>>>>> It's a good habit to always add single quotes around words. > > >>>>>> The perl > > >>>>>> > > >>>> interpreter may think a single bare word is a subroutine or > > >>>> perlfunc > > >>>> called with no args so will try to find a subroutine named blastp > > >>>> (). My > > >>>> debugger actually gives the error that the bare word blastp may > > >>>> conflict > > >>>> with a future reserved word. Like you said, 'use strict' will > > >>>> point that > > >>>> out. > > >>>> > > >>>>>> As for the regex, it should match all the blast programs at > > >>>>>> NCBI (blastp, > > >>>>>> > > >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure > > >>>> nothing > > >>>> else passes through. > > >>>> > > >>>>>> So, if you are using the script below, there are several > > >>>>>> errors. The bare > > >>>>>> > > >>>> words for $prog and $db need quotes, and the flags for you > > >>>> @params array > > >>>> don't have a dash before them. I get this after adding quotes > > >>>> but before > > >>>> adding the dashes to @params: > > >>>> > > >>>>>> C:\Perl\Scripts>test_blast.pl > > >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>>>>> > > >>>> MSG: > > >>>> STACK: Error::throw > > >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > >>>> live/Bio/Root/Root.pm:328 > > >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl > > >>>> \bioperl- > > >>>> live/Bio/Tools/Run/RemoteBlast.pm:256 > > >>>> STACK: C:\Perl\Scripts\test_blast.pl:15 > > >>>> ----------------------------------------------------------- > > >>>> > > >>>>>> The last line indicates a problem with this line: > > >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>>> Changing the @params to this: > > >>>>>> my @params=( -prog=>$prog, > > >>>>>> > > >>>> -data=>$db, > > >>>> -expect=>$e_val, > > >>>> -readmethod=>'SearchIO'); > > >>>> > > >>>>>> fixes it, and I get output as expected. > > >>>>>> Christopher Fields > > >>>>>> > > >>>> Postdoctoral Researcher - Switzer Lab > > >>>> Dept. of Biochemistry > > >>>> University of Illinois Urbana-Champaign > > >>>> > > >>>>>>>>> -----Original Message----- > > >>>>>>>>> > > >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>> Sent: Tuesday, February 14, 2006 11:48 AM > > >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org > > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>>>> > > >>>>> Hi, Chris, > > >>>>> When I tried with the perldoc script, It did not work either. > > >>>>> First it > > >>>>> says $prog can not be bare word if I "use strict". I added > > >>>>> quotes on the > > >>>>> words, then it says the value for $prog does not match expression > > >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > >>>>> > > >>>> script > > >>>> > > >>>>> is shown below. Why is the expression "t?blast[pnx]"? > > >>>>> > > >>>>> #!/usr/bin/perl > > >>>>> > > >>>>> use Bio::SeqIO; > > >>>>> use Bio::Seq; > > >>>>> use Bio::Tools::Run::RemoteBlast; > > >>>>> use Bio::SearchIO; > > >>>>> > > >>>>> > > >>>>> my $prog=blastp; > > >>>>> my $db=swissprot; > > >>>>> my $e_val=1e-10; > > >>>>> my @params=( prog=>$prog, > > >>>>> data=>$db, > > >>>>> expect=>$e_val, > > >>>>> readmethod=>'SearchIO'); > > >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>> > > >>>>> my $v = 1; > > >>>>> > > >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => > >>>>> > 'fasta' ); > > >>>>> > > >>>>> while (my $input = $str->next_seq()){ > > >>>>> #Blast a sequence against a database: > > >>>>> #Alternatively, you could pass in a file with many > > >>>>> #sequences rather than loop through sequence one at a time > > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>>> #and swap the two lines below for an example of that. > > >>>>> my $r = $factory->submit_blast($input); > > >>>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>> while ( my @rids = $factory->each_rid ) { > > >>>>> foreach my $rid ( @rids ) { > > >>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>> if( !ref($rc) ) { > > >>>>> if( $rc < 0 ) { > > >>>>> $factory->remove_rid($rid); > > >>>>> } > > >>>>> print STDERR "." if ( $v > 0 ); > > >>>>> sleep 5; > > >>>>> } else { > > >>>>> my $result = $rc->next_result(); > > >>>>> #save the output > > >>>>> my $filename = $result->query_name()."\.out"; > > >>>>> $factory->save_output($filename); > > >>>>> $factory->remove_rid($rid); > > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>> while ( my $hit = $result->next_hit ) { > > >>>>> next unless ( $v > 0); > > >>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> Thank you for your help! > > >>>>> > > >>>>> > > >>>>> Guojun > > >>>>> Department of Plant Biology > > >>>>> University of Georgia > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>> To: gyang at plantbio.uga.edu > > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>> > > >>>>> > > >>>>> > > >>>>>> Try two things: > > >>>>>> > > >>>>>>> 1) Use a much simpler script, like the one in 'perldoc > > >>>>>>> > > >>>>>> Bio::Tools::Run::RemoteBlast'. If this fixes it, there's > > >>>>>> something > > >>>>>> > > >>>>> wrong > > >>>>> > > >>>>>> with the logic in your subroutine: > > >>>>>> > > >>>>>>> my $v = 1; > > >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => > > >>>>>>> 'fasta' ); > > >>>>>>> while (my $input = $str->next_seq()){ > > >>>>>>> > > >>>>>> #Blast a sequence against a database: > > >>>>>> #Alternatively, you could pass in a file with many > > >>>>>> #sequences rather than loop through sequence one at a time > > >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>>>> #and swap the two lines below for an example of that. > > >>>>>> my $r = $factory->submit_blast($input); > > >>>>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>>> while ( my @rids = $factory->each_rid ) { > > >>>>>> foreach my $rid ( @rids ) { > > >>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>> if( !ref($rc) ) { > > >>>>>> if( $rc < 0 ) { > > >>>>>> $factory->remove_rid($rid); > > >>>>>> } > > >>>>>> print STDERR "." if ( $v > 0 ); > > >>>>>> sleep 5; > > >>>>>> } else { > > >>>>>> my $result = $rc->next_result(); > > >>>>>> #save the output > > >>>>>> my $filename = $result->query_name()."\.out"; > > >>>>>> $factory->save_output($filename); > > >>>>>> $factory->remove_rid($rid); > > >>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>>> while ( my $hit = $result->next_hit ) { > > >>>>>> next unless ( $v > 0); > > >>>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> > > >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works. It > > >>>>>>> > > >>>> really > > >>>> > > >>>>>> shouldn't make that much of a difference, but I noticed that > > >>>>>> the CVS > > >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after > > >>>>>> bioperl-1.5.1 was > > >>>>>> released; the Bugzilla version is based off CVS. > > >>>>>> > > >>>>>>> Christopher Fields > > >>>>>>> > > >>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>> Dept. of Biochemistry > > >>>>>> University of Illinois Urbana-Champaign > > >>>>>> > > >>>>>>>> -----Original Message----- > > >>>>>>>> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>> Sent: Monday, February 13, 2006 3:00 PM > > >>>>>>> To: bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>> > > >>>>>>>>> Thanks, Chris, > > >>>>>>>>> > > >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with > > >>>>>>> the > > >>>>>>> > > >>>> one > > >>>> > > >>>>> from > > >>>>> > > >>>>>>> your bug report. The running version is 1.5 when I use the > > >>>>>>> command > > >>>>>>> > > >>>> you > > >>>> > > >>>>>>> sent me. But when I tried the script, it doesn't change much. My > > >>>>>>> remoteblast code (portion) is here: > > >>>>>>> > > >>>>>>>>> sub search { > > >>>>>>>>> > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} > > >>>>>>> ="$ORGN"; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} > > >>>>>>> =5000; > > >>>>>>> local > > >>>>>>> > > >>>>>>> > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER > > >>>> {'COMPOSITION_BASED_STATISTICS'}= > > >>>> > > >>>>>>> 'no'; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > >>>>>>> -id=>"query", > > >>>>>>> -desc=>"new seq"); > > >>>>>>> my $len=$query->length(); > > >>>>>>> @db=('nr','htgs','wgs'); > > >>>>>>> foreach my $db (@db) { > > >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > > >>>>>>> =>'blastn', > > >>>>>>> '-data' =>"$db", > > >>>>>>> > > >>>>>>> > > >>> '-expect'=>"$E_value"); > > >>> > > >>>>>>>>>>> my $blast_report = $factory->submit_blast($query); > > >>>>>>>>>>> > > >>>>>>>>> my @rids = $factory->each_rid(); > > >>>>>>>>> > > >>>>>>> foreach my $rid ( @rids ) { > > >>>>>>> print STDERR "$rid\n"; > > >>>>>>> } > > >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > >>>>>>> print STDERR "waiting..."; > > >>>>>>> sleep 60; > > >>>>>>> > > >>>>>>>>> foreach my $rid ( @rids ) { > > >>>>>>>>> > > >>>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>>> while (!ref($rc) ) { > > >>>>>>> if( $rc < 0 ) { > > >>>>>>> # retrieve_blast returns -1 on error > > >>>>>>> $factory->remove_rid($rid); > > >>>>>>> print "Error!\n"; > > >>>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > > >>>>>>> die "Can't retrieve $rid"; > > >>>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > >>>>>>> > > >>>> finished' > > >>>> > > >>>>>>> sleep 60; > > >>>>>>> $rc = $factory->retrieve_blast($rid); > > >>>>>>> } > > >>>>>>> } > > >>>>>>> if (ref($rc)) { > > >>>>>>> print STDERR "Done.\n"; > > >>>>>>> while( my $result = $rc->next_result) { > > >>>>>>> while( my $hit = $result->next_hit()) { > > >>>>>>> $hit_name=$hit->name; > > >>>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > >>>>>>> $name=$1; > > >>>>>>> @left_plus_start=(); > > >>>>>>> @left_plus_end=(); > > >>>>>>> @left_minus_start=(); > > >>>>>>> @left_minus_end=(); > > >>>>>>> @right_plus_start=(); > > >>>>>>> @right_plus_end=(); > > >>>>>>> @right_minus_start=(); > > >>>>>>> @right_minus_end=(); > > >>>>>>> > > >>>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > >>>>>>>>> > > >>>>>>> while( my $hsp = $hit->next_hsp()) { > > >>>>>>> ...... > > >>>>>>> > > >>>>>>>>> It was working quite well before around October laster > > >>>>>>>>> year, but > > >>>>>>>>> > > >>>>> it has > > >>>>> > > >>>>>>> stopped since then, When a submission is sent via a webpage, > > >>>>>>> the cgi > > >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there, > > >>>>>>> > > >>>>> finally > > >>>>> > > >>>>>>> the expected email is received but without real results > > >>>>>>> although it > > >>>>>>> > > >>>>> does > > >>>>> > > >>>>>>> contain something from other parts of the script. Apparently the > > >>>>>>> > > >>>>> search > > >>>>> > > >>>>>>> sub did not return anything (I know there is something should be > > >>>>>>> returned.). Is it also possible the format of the NCBI output > > >>>>>>> for > > >>>>>>> > > >>>> each > > >>>> > > >>>>>>> result has changed? > > >>>>>>> Thank you, > > >>>>>>> Guojun > > >>>>>>> > > >>>>>>>>>>> Department of Plant Biology > > >>>>>>>>>>> > > >>>>>>> University of Georgia > > >>>>>>> > > >>>>>>>>>>>>> ----- Original Message ----- > > >>>>>>>>>>>>> > > >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>> > > >>>>>>>>>>>> How do you know two versions are installed (i.e. how are > > >>>>>>>>>>>> > > >>>> you > > >>>> > > >>>>> checking > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>> version)? Do you see have two complete bioperl > > >>>>>>>> distributions (in > > >>>>>>>> > > >>>>> two > > >>>>> > > >>>>>>>> separate directories) or are you looking in modules? Here's > > >>>>>>>> the > > >>>>>>>> > > >>>> way > > >>>> > > >>>>> to > > >>>>> > > >>>>>>>> check the version (from the FAQ): > > >>>>>>>> > > >>>>>>>>> perl -MBio::Root::Version -e 'print > > >>>>>>>>> > > >>>>> $Bio::Root::Version::VERSION,"\n"' > > >>>>> > > >>>>>>>>> If you have two full bioperl distributions on your computer, > > >>>>>>>>> > > >>>>> normally > > >>>>> > > >>>>>>> only > > >>>>>>> > > >>>>>>>> one will be in use unless you have explicitly set the > > >>>>>>>> environment > > >>>>>>>> > > >>>>>>> variable > > >>>>>>> > > >>>>>>>> PERL5LIB. The PERL5LIB directories will be searched first > > >>>>>>>> before > > >>>>>>>> > > >>>>> your > > >>>>> > > >>>>>>>> normal perl directory list (@INC) is searched. You MAY get > > >>>>>>>> some > > >>>>>>>> > > >>>>> mixing > > >>>>> > > >>>>>>>> then, but only if perl can't find a particular module in the > > >>>>>>>> path > > >>>>>>>> > > >>>>>>> designated > > >>>>>>> > > >>>>>>>> in PERL5LIB; then it will progress through the directories > > >>>>>>>> listed > > >>>>>>>> > > >>>> in > > >>>> > > >>>>>>> @INC. > > >>>>>>> > > >>>>>>>> This may happen if a module is unique to a particular > > >>>>>>>> release, but > > >>>>>>>> > > >>>>>>> shouldn't > > >>>>>>> > > >>>>>>>> happen for the majority of modules, including RemoteBlast. You > > >>>>>>>> > > >>>> can > > >>>> > > >>>>>>> check > > >>>>>>> > > >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'. @INC > > >>>>>>>> will > > >>>>>>>> > > >>>>> differ > > >>>>> > > >>>>>>>> depending on your OS, perl build, etc. > > >>>>>>>> > > >>>>>>>>> Regardless, if you follow the directions for installing > > >>>>>>>>> bioperl > > >>>>>>>>> > > >>>>> for > > >>>>> > > >>>>>>> your > > >>>>>>> > > >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make > > >>>>>>>> install', > > >>>>>>>> > > >>>>> unless > > >>>>> > > >>>>>>> you > > >>>>>>> > > >>>>>>>> explicitly change the installation directory when using 'perl > > >>>>>>>> > > >>>>>>> Makefile.PL'), > > >>>>>>> > > >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will > > >>>>>>>> > > >>>>> install > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>> Bioperl distribution you downloaded over the old version in > > >>>>>>>> @INC. > > >>>>>>>> > > >>>>> See > > >>>>> > > >>>>>>> this > > >>>>>>> > > >>>>>>>> page: > > >>>>>>>> > > >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > >>>>>>>>> for more details. > > >>>>>>>>> Christopher Fields > > >>>>>>>>> > > >>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>> Dept. of Biochemistry > > >>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>> > > >>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>> > > >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM > > >>>>>>>>> To: bioperl-l at lists.open-bio.org > > >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>>>> > > >>>>>>>>>>> Hi, Chris, > > >>>>>>>>>>> > > >>>>>>>>> I do have different versions of bioperl on my Linux machine > > >>>>>>>>> > > >>>> (1.4. > > >>>> > > >>>>> and > > >>>>> > > >>>>>>>>> 1.5.0), this may be the problem. Should I just install > > >>>>>>>>> bioperl- > > >>>>>>>>> > > >>>>> 1.5.1 > > >>>>> > > >>>>>>> or I > > >>>>>>> > > >>>>>>>>> need to uninstall and remove the previous versions. I could > > >>>>>>>>> not > > >>>>>>>>> > > >>>>> find > > >>>>> > > >>>>>>> any > > >>>>>>> > > >>>>>>>>> hint on uninstalling bioperl on linux. Could you please > > >>>>>>>>> give me > > >>>>>>>>> > > >>>>> some > > >>>>> > > >>>>>>>>> suggestion? > > >>>>>>>>> Thanks, > > >>>>>>>>> Guojun > > >>>>>>>>> > > >>>>>>>>>>> Department of Plant Biology > > >>>>>>>>>>> > > >>>>>>>>> University of Georgia > > >>>>>>>>> _____ > > >>>>>>>>> > > >>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>>> > > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely > > >>>>>>>>>>>>>>> > > >>>>>>> updated from CVS > > >>>>>>> > > >>>>>>>>> which isn't the latest fix. > > >>>>>>>>> > > >>>>>>>>>>> Make sure that you check the following: > > >>>>>>>>>>> 1) Always post to the mailing list: > > >>>>>>>>>>> > > >>>>>>>>> http://www.bioperl.org/wiki/ > > >>>>>>>>> HOWTO:Beginners#Getting_Assistance . > > >>>>>>>>> > > >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live > > >>>>>>>>>>> > > >>>>> (CVS) > > >>>>> > > >>>>>>>>> installed first. Perform a clean installation; do not upgrade > > >>>>>>>>> > > >>>>> only > > >>>>> > > >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > >>>>>>>>> > > >>>> can't > > >>>> > > >>>>>>>>> guarantee that mixing modules from old and new distributions > > >>>>>>>>> > > >>>> (1.4 > > >>>> > > >>>>> and > > >>>>> > > >>>>>>>>> 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- > > >>>>>>>>> live > > >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be > > >>>>>>>>> > > >>>>> saved > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>> parsed; it will not parse the newest BLAST text output from > > >>>>>>>>> NCBI > > >>>>>>>>> > > >>>>>>> (v2.2.13) > > >>>>>>> > > >>>>>>>>> but it should still save it. I believe as long as > > >>>>>>>>> next_results() > > >>>>>>>>> > > >>>>> isn't > > >>>>> > > >>>>>>>>> called, it will work. > > >>>>>>>>> > > >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST > > >>>>>>>>>>> > > >>>> 2.2.13 > > >>>> > > >>>>>>> text output > > >>>>>>> > > >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by > > >>>>>>>>> > > >>>> Roger > > >>>> > > >>>>> Hall > > >>>>> > > >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be > > >>>>>>>>> > > >>>>> (Jason > > >>>>> > > >>>>>>> or > > >>>>>>> > > >>>>>>>>> whomever is in charge of Bio::SearchIO). They can be found in > > >>>>>>>>> > > >>>>>>> Bugzilla: > > >>>>>>> > > >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>>> > > >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>>> > > >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > >>>>>>>>>>> > > >>>>> option > > >>>>> > > >>>>>>> of > > >>>>>>> > > >>>>>>>>> saving XML output, so isn't necessary if you don't plan on > > >>>>>>>>> using > > >>>>>>>>> > > >>>>> this > > >>>>> > > >>>>>>>>> option. And, remember, they haven't been committed yet to > > >>>>>>>>> CVS, > > >>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>>>> means that the final version will change to refle the new > > >>>>>>>>> > > >>>> version. > > >>>> > > >>>>>>>>>>>>> Christopher Fields > > >>>>>>>>>>>>> > > >>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>> Dept. of Biochemistry > > >>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>> > > >>>>>>>>>>>>> _____ > > >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>>>>>>>>>> > > >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM > > >>>>>>>>> To: Chris Fields > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>>>> Hi, Chris > > >>>>>>>>>>>>> > > >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work > > >>>>>>>>>>> > > >>>>> for > > >>>>> > > >>>>>>> my cgi > > >>>>>>> > > >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I > > >>>>>>>>> didn't > > >>>>>>>>> > > >>>>> even > > >>>>> > > >>>>>>> get > > >>>>>>> > > >>>>>>>>> any RID. Is there any suggestion? > > >>>>>>>>> > > >>>>>>>>>>>>>>> Guojun > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>> Guojun Yang > > >>>>>>>>>>>>> > > >>>>>>>>> Department of Plant Biology > > >>>>>>>>> University of Georgia > > >>>>>>>>> Tel: 706-542-1857 > > >>>>>>>>> Fax: 706-542-1805 > > >>>>>>>>> http://www.arches.uga.edu/~guojun > > >>>>>>>>> _____ > > >>>>>>>>> > > >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>>>>> > > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>> I would say give the new code a try, but realize that it > > >>>>>>>>>>> > > >>>>> hasn't > > >>>>> > > >>>>>>> been > > >>>>>>> > > >>>>>>>>> checked > > >>>>>>>>> in (like I said below). I will try going over the modified > > >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is > > >>>>>>>>> > > >>>>> anything I > > >>>>> > > >>>>>>>>> might > > >>>>>>>>> have missed. The changed order in the header of BLAST text > > >>>>>>>>> > > >>>> output > > >>>> > > >>>>> has > > >>>>> > > >>>>>>> me a > > >>>>>>> > > >>>>>>>>> bit worried that it might not catch everything, but it at > > >>>>>>>>> least > > >>>>>>>>> > > >>>>>>> doesn't > > >>>>>>> > > >>>>>>>>> hang > > >>>>>>>>> in the while() loop I described in the bug report below (bug > > >>>>>>>>> > > >>>>> #1934) > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>> seems to process everything fine. > > >>>>>>>>> > > >>>>>>>>>>> If you want more stability in the code, you might consider > > >>>>>>>>>>> > > >>>>>>> changing over > > >>>>>>> > > >>>>>>>>> to > > >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are > > >>>>>>>>> > > >>>>> some > > >>>>> > > >>>>>>>>> changes > > >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > >>>>>>>>> > > >>>>> saving > > >>>>> > > >>>>>>> XML > > >>>>>>> > > >>>>>>>>> output, but I believe it parses everything regardless. If you > > >>>>>>>>> > > >>>> look > > >>>> > > >>>>>>> back > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> last month or so there has been a bit of discussion here about > > >>>>>>>>> > > >>>> it. > > >>>> > > >>>>>>> Jason > > >>>>>>> > > >>>>>>>>> describes a bit on how to set up RemoteBlast for XML: > > >>>>>>>>> > > >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > >>>>>>>>>>> > > >>>>>>> remoteblast/ > > >>>>>>> > > >>>>>>>>>>> Christopher Fields > > >>>>>>>>>>> > > >>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>> Dept. of Biochemistry > > >>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>> > > >>>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>>> > > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM > > >>>>>>>>>> To: bioperl-l at bioperl.org > > >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>>>> > > >>>>> version > > >>>>> > > >>>>>>> 1.28 > > >>>>>>> > > >>>>>>>>>> Hi, Everybody, > > >>>>>>>>>> I see this post and am wondering if this is the reason for > > >>>>>>>>>> the > > >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named > > >>>>>>>>>> > > >>>>> MAK, > > >>>>> > > >>>>>>> for > > >>>>>>> > > >>>>>>>>> MITE > > >>>>>>>>> > > >>>>>>>>>> sequence analysis. It was working very well until around > > >>>>>>>>>> > > >>>>> November > > >>>>> > > >>>>>>> 2005, > > >>>>>>> > > >>>>>>>>>> when it stopped returning any result (the site is fine and > > >>>>>>>>>> > > >>>> seems > > >>>> > > >>>>> to > > >>>>> > > >>>>>>> be > > >>>>>>> > > >>>>>>>>>> doing sth after submission). In the CGI script, I used > > >>>>>>>>>> > > >>>>> remoteblast > > >>>>> > > >>>>>>> (that > > >>>>>>> > > >>>>>>>>>> work was done in 2003) to do searches. I currently do not > > >>>>>>>>>> have > > >>>>>>>>>> > > >>>>>>> access to > > >>>>>>> > > >>>>>>>>>> the server because I moved. Quite several people sent emails > > >>>>>>>>>> > > >>>> to > > >>>> > > >>>>> us > > >>>>> > > >>>>>>> about > > >>>>>>> > > >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the > > >>>>>>>>>> > > >>>>> problem? > > >>>>> > > >>>>>>>>> Should > > >>>>>>>>> > > >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new > > >>>>>>>>>> > > >>>>> version? > > >>>>> > > >>>>>>>>>> Thanks a lot, > > >>>>>>>>>> Guojun > > >>>>>>>>>> > > >>>>>>>>>> Department of Plant Biology > > >>>>>>>>>> University of Georgia > > >>>>>>>>>> Tel: 706-542-1857 > > >>>>>>>>>> Fax: 706-542-1805 > > >>>>>>>>>> http://www.arches.uga.edu/~guojun > > >>>>>>>>>> _____ > > >>>>>>>>>> > > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > >>>>>>>>>> > > >>>>> Jian' > > >>>>> > > >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > >>>>>>>>>> > > >>>> [mailto:bioperl- > > >>>> > > >>>>>>>>>> l at bioperl.org] > > >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>>> > > >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- > > >>>>>>>>>> live > > >>>>>>>>>> > > >>>>> CVS. > > >>>>> > > >>>>>>> It > > >>>>>>> > > >>>>>>>>>> will > > >>>>>>>>>> work for saving text output. However, it will not parse > > >>>>>>>>>> > > >>>> anything > > >>>> > > >>>>>>> using > > >>>>>>> > > >>>>>>>>>> next_result (it will likely hang) and will not save XML > > >>>>>>>>>> > > >>>> format. > > >>>> > > >>>>> See > > >>>>> > > >>>>>>>>> these > > >>>>>>>>> > > >>>>>>>>>> bugs: > > >>>>>>>>>> > > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>>>> > > >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast > > >>>>>>>>>> > > >>>> and > > >>>> > > >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been > > >>>>>>>>>> checked in > > >>>>>>>>>> > > >>>>> yet > > >>>>> > > >>>>>>> so > > >>>>>>> > > >>>>>>>>> are > > >>>>>>>>> > > >>>>>>>>>> still not included in bioperl-live; they may be further > > >>>>>>>>>> > > >>>> modified > > >>>> > > >>>>>>> before > > >>>>>>> > > >>>>>>>>>> committing to CVS. If you're not worried about XML, you could > > >>>>>>>>>> > > >>>>> just > > >>>>> > > >>>>>>> try > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>> first fix, which is a change to SearchIO::blast. > > >>>>>>>>>> > > >>>>>>>>>> Nagesh, I remember you posting to the list a month ago > > >>>>>>>>>> using a > > >>>>>>>>>> > > >>>>>>> script > > >>>>>>> > > >>>>>>>>>> which > > >>>>>>>>>> had problems; the script you used saves the output but > > >>>>>>>>>> doesn't > > >>>>>>>>>> > > >>>>>>> actually > > >>>>>>> > > >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the > > >>>>>>>>>> > > >>>>> data). > > >>>>> > > >>>>>>> Is > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have > > >>>>>>>>>> > > >>>> you > > >>>> > > >>>>>>> tried > > >>>>>>> > > >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "- > > >>>>>>>>>> > > >>>>> readmethod > > >>>>> > > >>>>>>> => > > >>>>>>> > > >>>>>>>>>> blast" > > >>>>>>>>>> using your version of RemoteBlast and method next_result()? > > >>>>>>>>>> > > >>>> Like > > >>>> > > >>>>>>> below > > >>>>>>> > > >>>>>>>>>> (from > > >>>>>>>>>> perldoc): > > >>>>>>>>>> > > >>>>>>>>>> while ( my @rids = $factory->each_rid ) { > > >>>>>>>>>> foreach my $rid ( @rids ) { > > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>>>>>> if( !ref($rc) ) { > > >>>>>>>>>> if( $rc < 0 ) { > > >>>>>>>>>> $factory->remove_rid($rid); > > >>>>>>>>>> } > > >>>>>>>>>> print STDERR "." if ( $v > 0 ); > > >>>>>>>>>> sleep 5; > > >>>>>>>>>> } else { # parsing > > >>>>>>>>>> starts here > > >>>>>>>>>> my $result = $rc->next_result(); # it should hang > > >>>>>>>>>> here > > >>>>>>>>>> #save the output > > >>>>>>>>>> my $filename = $result->query_name()."\.out"; > > >>>>>>>>>> $factory->save_output($filename); > > >>>>>>>>>> $factory->remove_rid($rid); > > >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>>>>>>> while ( my $hit = $result->next_hit ) { > > >>>>>>>>>> next unless ( $v > 0); > > >>>>>>>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> My script hanged if I used next_result() in any way prior to > > >>>>>>>>>> > > >>>> the > > >>>> > > >>>>>>> fixes. > > >>>>>>> > > >>>>>>>>> I > > >>>>>>>>> > > >>>>>>>>>> want to see how many others are having the same issues with > > >>>>>>>>>> > > >>>>> parsing > > >>>>> > > >>>>>>>>> using > > >>>>>>>>> > > >>>>>>>>>> the CVS version of bioperl-live. > > >>>>>>>>>> > > >>>>>>>>>> Christopher Fields > > >>>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>>> Dept. of Biochemistry > > >>>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > >>>>>>>>>>> > > >>>> l- > > >>>> > > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM > > >>>>>>>>>>> To: Huang Jian; bioperl-l > > >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>>>> > > >>>>>>>>>>> Hi Huang, > > >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm > > >>>>>>>>>>> > > >>>>> works > > >>>>> > > >>>>>>> on > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>>> logic of checking the temporary file size to determine > > >>>>>>>>>>> > > >>>> whether > > >>>> > > >>>>> the > > >>>>> > > >>>>>>>>> Blast > > >>>>>>>>> > > >>>>>>>>>>> results are ready. This condition is not getting satisfied > > >>>>>>>>>>> > > >>>> may > > >>>> > > >>>>> be > > >>>>> > > >>>>>>> due > > >>>>>>> > > >>>>>>>>> to > > >>>>>>>>> > > >>>>>>>>>>> some changes brought about by NCBI. I had this problem > > >>>>>>>>>>> > > >>>>> recently > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>>>> figured out that the solution was to use the latest version > > >>>>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>> has > > >>>>>>> > > >>>>>>>>>>> this problem fixed (does not use file size logic any more) > > >>>>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>> is > > >>>>>>> > > >>>>>>>>> not > > >>>>>>>>> > > >>>>>>>>>>> yet included in the BioPerl package. > > >>>>>>>>>>> Cheers > > >>>>>>>>>>> Nagesh > > >>>>>>>>>>> > > >>>>>>>>>>> Huang Jian wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> Dear Nagesh, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > >>>>>>>>>>>> > > >>>>> you > > >>>>> > > >>>>>>> send > > >>>>>>> > > >>>>>>>>>>>> me. Now it works perfectly!!! > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thank you!! > > >>>>>>>>>>>> > > >>>>>>>>>>>> Huang > > >>>>>>>>>>>> > > >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka" > > >>>>>>>>>>>> > > >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l" > > >>>>>>>>>>>> > > >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM > > >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the > > >>>>>>>>>>>> > > >>>> net, > > >>>> > > >>>>> so > > >>>>> > > >>>>>>> still > > >>>>>>> > > >>>>>>>>>>>> via email > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Hi Huang, > > >>>>>>>>>>>>> I see that you are submitting a sequence for a remote > > >>>>>>>>>>>>> > > >>>> blast > > >>>> > > >>>>>>> search. > > >>>>>>> > > >>>>>>>>>> Can > > >>>>>>>>>> > > >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28 > > >>>>>>>>>>>>> > > >>>>>>> (2005/12/09). > > >>>>>>> > > >>>>>>>>> If > > >>>>>>>>> > > >>>>>>>>>>>>> not I have attached it with this email, try to replace it > > >>>>>>>>>>>>> > > >>>>> with > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>>>> old > > >>>>>>>>>> > > >>>>>>>>>>>>> one which has a bug. > > >>>>>>>>>>>>> Let me know if it works. > > >>>>>>>>>>>>> Nagesh > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>>> > > >>>>>>>>>> _______________________________________________ > > >>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> _______________________________________________ > > >>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> > > >>>>>>>>> Bioperl-l mailing list > > >>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>> > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>> > > >>>>>>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Marc.Logghe at DEVGEN.com Thu Feb 16 10:47:13 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Thu, 16 Feb 2006 16:47:13 +0100 Subject: [Bioperl-l] Primer maps? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> Hi Mike, Another route you might take is mapping your primers into Bio::SeqFeature::Generic objects and add them to the seq object. Then you dump the object into a rich sequence format like genbank and pass that to EMBOSS's showseq application Or you might do it completely with showseq. Here the only thing you need is an annotation file containing the positions of the primers, followed by any text (e.g. primer name). Then you do: showseq -translate - -format 4 -annotation Have a look at http://emboss.sourceforge.net/apps/showseq.html for more options HTH, Marc Marc Logghe, PhD Expert Scientist Bioinformatics deVGen NV Technologiepark 30 B - 9052 Ghent-Zwijnaarde Tel. +32 9 324 24 83 Fax. +32 9 324 24 25 Web: www.devgen.com --- Disclaimer start --- This e-mail and any attachments thereto may contain information which is confidential and/or which is proprietary to the sender. Accordingly, this e-mail and any attachments thereto, as well as any and all information contained therein, are intended for the sole use of the recipient or recipients designated above. Any use of this e-mail, of any attachments thereto, of any and all information contained therein, and/or of any part(s) thereof (including, without limitation, total or partial reproduction, communication and/or distribution in any form) by persons other than the designated recipient(s) is prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you for your cooperation. --- Disclaimer end --- ________________________________ From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne Sent: Wednesday, February 15, 2006 10:20 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Primer maps? Hello all -- I'm having a devil of a time figuring out how to make restriction maps using BioPerl. What I'm going for is output similar to GCG's map program, but instead of using a set of defined restriction enzymes, I'd like to use a set of primers, to create a primer map rather than a restriction map. I do not need a table of restriction enzymes that cut or don't cut (or primers that match or don't match, in this case), but an honest-to-goodness map, something like: FKP-5-> | CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT 1921 ---------+---------+---------+---------+---------+---------+ 1980 GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA a M E I V S T F D E L Q D Y - I also need translations of orfs, but I can use GenBank files as input to the program and thus the CDS translations are already there, so I'm guessing that shouldn't be too hard.... How does one create such a map using the BioPerl modules? There are intriguing indications out there that such a thing is possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single example of code that creates such a basic, bread-and-butter thing as a restriction map with orf translations. The documentation to these modules is fairly useless to me, consisting mostly of internal methods and function prototypes. Perhaps my skills as a Perl programmer are to blame, but a clear example of how a map like this is constructed would be a big help. Right now, I'm generating primer maps with system calls to EMBOSS's remap, pointing it at a file of primer sequences rather than a file of restriction enzyme sequences, but the results are less than desired. I'm considering trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my needs, but this seems like a lot of work for an operation I suspect is possible in BioPerl. Any help greatly appreciated... Mike --------------------------------------------------------------------- //=\ Michael J. Coyne phone: (617) 525-7820 \=// Channing Laboratory FAX: (617) 264-5193 //=\ EBRC, Room 617 \=// 221 Longwood Avenue email:mcoyne at channing.harvard.edu //=\ Boston, MA 02115 mjcoyne at comcast.net \=// --------------------------------------------------------------------- From sdavis2 at mail.nih.gov Thu Feb 16 09:43:45 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 16 Feb 2006 09:43:45 -0500 Subject: [Bioperl-l] Primer maps? In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost> Message-ID: Do you mean that you want to use Bio::Graphics to make a picture, or just map your primers onto a sequence? Sean On 2/15/06 4:20 PM, "Michael Coyne" wrote: > Hello all -- > > I'm having a devil of a time figuring out how to make restriction maps using > BioPerl. What I'm going for is output similar to GCG's map program, but > instead of using a set of defined restriction enzymes, I'd like to use a set > of primers, to create a primer map rather than a restriction map. I do not > need a table of restriction enzymes that cut or don't cut (or primers that > match or don't match, in this case), but an honest-to-goodness map, something > like: > > FKP-5-> > | > CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 ---------+---------+---------+---------+---------+---------+ 1980 > GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > > a M E I V S T F D E L Q D Y - > > I also need translations of orfs, but I can use GenBank files as input to the > program and thus the CDS translations are already there, so I'm guessing that > shouldn't be too hard.... How does one create such a map using the BioPerl > modules? > > There are intriguing indications out there that such a thing is possible (e.g. > the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single > example of code that creates such a basic, bread-and-butter thing as a > restriction map with orf translations. The documentation to these modules is > fairly useless to me, consisting mostly of internal methods and function > prototypes. Perhaps my skills as a Perl programmer are to blame, but a clear > example of how a map like this is constructed would be a big help. > > Right now, I'm generating primer maps with system calls to EMBOSS's remap, > pointing it at a file of primer sequences rather than a file of restriction > enzyme sequences, but the results are less than desired. I'm considering > trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my > needs, but this seems like a lot of work for an operation I suspect is > possible in BioPerl. > > Any help greatly appreciated... > > Mike > > --------------------------------------------------------------------- > //=\ Michael J. Coyne phone: (617) 525-7820 > \=// Channing Laboratory FAX: (617) 264-5193 > //=\ EBRC, Room 617 > \=// 221 Longwood Avenue email:mcoyne at channing.harvard.edu > //=\ Boston, MA 02115 mjcoyne at comcast.net > \=// > --------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu Feb 16 11:27:13 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 11:27:13 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: Harry, I've long suspected, but never demonstrated, that the easiest way to do something like this is through ENSEMBL, and Jason hinted at this as well. In fact your question is something of a FAQ, and my previous responses always included a plea to some anonymous ENSEMBL API expert, always unheeded. At any rate, here is an example script I made: #!/usr/bin/perl use strict; use lib "/Users/bosborne/ensembl/modules"; use DBI; use Getopt::Long; use Bio::EnsEMBL::DBSQL::DBAdaptor; my $name; GetOptions( "n=s" => \$name ); my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new( -user => "anonymous", -dbname => "homo_sapiens_core_37_35j", -host => "ensembldb.ensembl.org", -pass => "", -driver => 'mysql' ); my $gene_adaptor = $db->get_GeneAdaptor; my $slice_adaptor = $db->get_SliceAdaptor; my @genes = @{$gene_adaptor->fetch_all_by_external_name($name)}; for my $gene (@genes) { for my $trans (@{$gene->get_all_Transcripts}) { my $seq = $slice_adaptor->fetch_by_region("chromosome", $trans->seq_region_name, $trans->start, $trans->end); print "\n",$seq->seq,"\n"; } } There are some issues, the largest of which is that though this script prints out big sequences it's completely untested! Another is that it makes assumptions about transcripts, you should verify for yourself that ENSEMBL's definition of transcript fits yours. Finally that fetch_all_by_external_name() method does not seem to accept a second argument, i.e. namespace. I found this surprising. Anyway, if more than one gene is retrieved using some name or id you're in a quandary. For more on this API see: http://www.ensembl.org/info/software/core/core_tutorial.html There are tons of modules and methods in this API, I've barely scratched the surface here. Brian O. On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From heikki at sanbi.ac.za Thu Feb 16 12:32:51 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 16 Feb 2006 19:32:51 +0200 Subject: [Bioperl-l] Primer maps? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> Message-ID: <200602161932.51552.heikki@sanbi.ac.za> Mike, Marc's suggestion is the best I've heard. We really do not have any kind of pretty print functionality within BioPerl. I guess there has not been a pressing need. Bio::Graphics has filled in the need for sequence display. I think Bio::Seq::PrettyPrint could be a great way to design prettyprinting in very modular way so that it can print out anything mapped to a sequence location. The EMBOSS showseq would be a great help in there. A student project? Would anyone be interested? -Heikki On Thursday 16 February 2006 17:47, Marc Logghe wrote: > Hi Mike, > Another route you might take is mapping your primers into > Bio::SeqFeature::Generic objects and add them to the seq object. Then > you dump the object into a rich sequence format like genbank and pass > that to EMBOSS's showseq application > Or you might do it completely with showseq. Here the only thing you need > is an annotation file containing the positions of the primers, followed > by any text (e.g. primer name). > Then you do: > showseq -translate - -format 4 > -annotation > Have a look at http://emboss.sourceforge.net/apps/showseq.html for more > options > > HTH, > Marc > > > Marc Logghe, PhD > Expert Scientist Bioinformatics > deVGen NV > Technologiepark 30 > B - 9052 Ghent-Zwijnaarde > Tel. +32 9 324 24 83 > Fax. +32 9 324 24 25 > Web: www.devgen.com > > --- Disclaimer start --- > This e-mail and any attachments thereto may contain information which is > confidential and/or which is proprietary to the sender. Accordingly, > this e-mail and any attachments thereto, as well as any and all > information contained therein, are intended for the sole use of the > recipient or recipients designated above. Any use of this e-mail, of any > attachments thereto, of any and all information contained therein, > and/or of any part(s) thereof (including, without limitation, total or > partial reproduction, communication and/or distribution in any form) by > persons other than the designated recipient(s) is prohibited. If you > have received this e-mail in error, please notify the sender either by > telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation. > --- Disclaimer end --- > > > > > > ________________________________ > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne > Sent: Wednesday, February 15, 2006 10:20 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Primer maps? > > > Hello all -- > > I'm having a devil of a time figuring out how to make > restriction maps using BioPerl. What I'm going for is output similar to > GCG's map program, but instead of using a set of defined restriction > enzymes, I'd like to use a set of primers, to create a primer map rather > than a restriction map. I do not need a table of restriction enzymes > that cut or don't cut (or primers that match or don't match, in this > case), but an honest-to-goodness map, something like: > > FKP-5-> > > > CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 > ---------+---------+---------+---------+---------+---------+ 1980 > > GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > > a M E I V S T F D E L Q D Y > - > > I also need translations of orfs, but I can use GenBank files as > input to the program and thus the CDS translations are already there, so > I'm guessing that shouldn't be too hard.... How does one create such a > map using the BioPerl modules? > > There are intriguing indications out there that such a thing is > possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I > can't find a single example of code that creates such a basic, > bread-and-butter thing as a restriction map with orf translations. The > documentation to these modules is fairly useless to me, consisting > mostly of internal methods and function prototypes. Perhaps my skills > as a Perl programmer are to blame, but a clear example of how a map like > this is constructed would be a big help. > > Right now, I'm generating primer maps with system calls to > EMBOSS's remap, pointing it at a file of primer sequences rather than a > file of restriction enzyme sequences, but the results are less than > desired. I'm considering trying to adapt tacg 4.1.0 or sequence > extractor 1.1 web-based code to my needs, but this seems like a lot of > work for an operation I suspect is possible in BioPerl. > > Any help greatly appreciated... > > Mike > > > > --------------------------------------------------------------------- > //=\ Michael J. Coyne phone: (617) > 525-7820 > \=// Channing Laboratory FAX: (617) > 264-5193 > //=\ EBRC, Room 617 > \=// 221 Longwood Avenue > email:mcoyne at channing.harvard.edu > //=\ Boston, MA 02115 mjcoyne at comcast.net > \=// > > --------------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From osborne1 at optonline.net Thu Feb 16 12:59:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 12:59:37 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602160823.03534.hjm@tacgi.com> Message-ID: Chris and Harry, I'm writing a Wiki page on this, it's linked to the FAQ as Wiki is complaining that the FAQ is getting too big. I'll fill in the ENSEMBL API and Bio::DB::Fasta approaches, if you would comment on the BioPerl/eutils approach at some point that would be superb: http://bioperl.open-bio.org/wiki/Getting_Genomic_Sequences Brian O. On 2/16/06 11:23 AM, "Harry Mangalam" wrote: > Yes, I'm going to try this 1st. Also the pointer to the NCBI eutils page was > helpful. They describe the same thing and I think that API will give me what > I need. I'll post back to report. > > Sorry for the delay in answering - this is a side project and as such is going > slow. > > Many thanks to you guys, especially Brian for the example code - much more > than I had a right to expect. Virtual Beers all round and real ones should > we ever meet up. > > Harry > > > On Thursday 16 February 2006 04:52, Chris Fields wrote: >> I think a method was recently implemented in Bio::DB::GenBank to >> retrieve a segment of DNA given start and end coordinates in GenBank >> format; that should contain the features you need. I requested it >> ~Nov-Dec in the mailing list but didn't get a chance to test it. >> Would that help? >> >> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>> Harry, >>> >>> It's not clear to me that NCBI's eutils offers this capability >>> directly. You >>> can probably download Entrez Gene entries and parse them for >>> coordinates but >>> I know of no way to remotely retrieve genomic sequences like this >>> from NCBI >>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>> that some >>> of us favor and to prove to myself that this is simple to do I wrote a >>> script that I just added to examples/tools, it's called >>> extract_genes.pl and >>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>> species to some dir, download Entrez Gene's gene2accession file, >>> and run. It >>> creates and stores a hash for lookups, it won't read gene2accession >>> each >>> time it runs. >>> >>> Brian O. >>> >>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>> Hi Brian, >>>> >>>> Thanks very much for the pointers and the speed of your reply and >>>> apologies >>>> for the speed of mine. >>>> >>>> This looks good, but what I was looking for was a bioP approach >>>> for hooking to >>>> an API at NCBI or EBI so I could get this info and seqs from >>>> them. In this >>>> case, speed of retrieval is not critical and I'd rather not >>>> download the >>>> entirety of the sequences to a local disk to hack at them. >>>> >>>> I've determined a screen-scraping approach to get them and could >>>> script that, >>>> but I thought that bioP had a method for using NCBI's external >>>> API's, tho it >>>> may be that my memory is faulty or the approach is no longer >>>> supported due to >>>> overload. >>>> >>>> Does NCBI make such APIs available anymore? I searched a bit for >>>> docs on them >>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>> which I >>>> haven't started to excavate). >>>> >>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>> listening? >>>> >>>> Harry >>>> >>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>> Harry, >>>>> >>>>> Hope you're doing well. The approach could be based on >>>>> Bio::DB::Fasta. So, >>>>> from its documentation: >>>>> >>>>> use Bio::DB::Fasta; >>>>> >>>>> # create database from directory of fasta files >>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>> >>>>> # simple access (for those without Bioperl) >>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>> my @ids = $db->ids; >>>>> my $length = $db->length('CHROMOSOME_I'); >>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>> my $header = $db->header('CHROMOSOME_I'); >>>>> >>>>> # Bioperl-style access >>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>> >>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>> my $seq = $obj->seq; >>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>> >>>>> Do you already have the offsets? >>>>> >>>>> Brian O. >>>>> >>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>> Hi All, >>>>>> >>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>> still >>>>>> can't find the answer to this. Forgive me if I've missed something >>>>>> obvious. >>>>>> >>>>>> This should not be a novel request, but I've not found it >>>>>> answered. If >>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>> pointer to a >>>>>> better way, especially if it includes an illuminating bit of code. >>>>>> >>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>> offset >>>>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>>>> common followup chore for some extra analysis from a gene >>>>>> expression >>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>> the >>>>>> sequence type to specify...? >>>>>> >>>>>> >>>>>> TIA! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign From hjm at tacgi.com Thu Feb 16 12:02:07 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 09:02:07 -0800 Subject: [Bioperl-l] Primer maps? In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost> References: <6.2.0.14.0.20060215155422.01d44a98@localhost> Message-ID: <200602160902.07383.hjm@tacgi.com> A bit off the bioperl topic - if you must have bioperl, ignore this (or just system() wrap the command) - but you can do exactly this mapping and in-line translation with a thing I wrote called tacg - you make a GCG-formatted file of primers ie for each pattern you need a line like: ; Top Bottom ;Name Offset Recognition Pattern Offset ! comments primer1 0 tcgggywmkkgg 0 ! ... primer2 0 gcttggctgaggag 0 ! . . . Obviously the offsets can be set to 0 for non REs. There's no limit to the number of primer patterns (tho I think there's a compiled-in limit of 30 chars in the pattern - easily changed in header), no limit to amount of seq searched, handles degeneracies, searches at ~4Mbases/s on a 2G opteron (120 patterns). Also does searching with errors (slowly) and regex's (at pcre speeds), and matrices. Other neat stuff, too. The output is sort of as you describe - replace the RE names with your primer labels and you'll have it. 6 frame xl with 3 letter abbrievs. BsrGI BsrGI AflII DraI \ \ \ \ 121 gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt 180 3453 cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa 3512 ^ * ^ * ^ * ^ * ^ * ^ * 1 ValCysIleCysThrLeuCysThrLeuLysThrTyrThrPheHisCysValTerIleIle 2 CysValPheValHisPheValHisLeuArgProThrHisPheIleValPheLysLeuLeu 3 ValTyrLeuTyrThrLeuTyrThrTerAspLeuHisIleSerLeuCysLeuAsnTyrTyr 4 HisIleGlnValSerGlnValSerLeuTerValValAsnTerGlnThrTerIleIleVal 5 ThrTyrLysTyrValLysTyrValTerArgSerCysMetGluAsnHisLysPheTerTer 6 HisThrAsnThrCysLysThrCysLysGlyLeuValCysLysMetThrAsnLeuAsnAsn or 3 frames with 1 letter abbrievs BsrGI BsrGI AflII DraI \ \ \ \ 121 gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt 180 3453 cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa 3512 ^ * ^ * ^ * ^ * ^ * ^ * 1 V C I C T L C T L K T Y T F H C V * I I 2 C V F V H F V H L R P T H F I V F K L L 3 V Y L Y T L Y T * D L H I S L C L N Y Y read more at tacg.sf.net or reply to me for the latest docs and version - have to admit the sf site is a bit moldy. hjm On Wednesday 15 February 2006 13:20, Michael Coyne wrote: > Hello all -- > > I'm having a devil of a time figuring out how to make restriction maps > using BioPerl.? What I'm going for is output similar to GCG's map program, > but instead of using a set of defined restriction enzymes, I'd like to use > a set of primers, to create a primer map rather than a restriction map.? I > do not need a table of restriction enzymes that cut or don't cut (or > primers that match or don't match, in this case), but an honest-to-goodness > map, something like: > > ?????????????????????????????????????? FKP-5-> > ???????????????????????????????????????????? | > ???? CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 ---------+---------+---------+---------+---------+---------+ 1980 > ???? GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > ? > a???????????????????????? M? E? I? V? S? T? F? D? E? L? Q? D? Y?? - > > I also need translations of orfs, but I can use GenBank files as input to > the program and thus the CDS translations are already there, so I'm > guessing that shouldn't be too hard....? How does one create such a map > using the BioPerl modules? > > There are intriguing indications out there that such a thing is possible > (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find > a single example of code that creates such a basic, bread-and-butter thing > as a restriction map with orf translations.? The documentation to these > modules is fairly useless to me, consisting mostly of internal methods and > function prototypes.? Perhaps my skills as a Perl programmer are to blame, > but a clear example of how a map like this is constructed would be a big > help. > > Right now, I'm generating primer maps with system calls to EMBOSS's remap, > pointing it at a file of primer sequences rather than a file of restriction > enzyme sequences, but the results are less than desired.? I'm considering > trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my > needs, but this seems like a lot of work for an operation I suspect is > possible in BioPerl. > > Any help greatly appreciated... > > Mike > > --------------------------------------------------------------------- > ?//=\?? Michael J. Coyne?????????????????????? phone: (617) 525-7820 > ?\=//?? Channing Laboratory??????????????????? FAX:?? (617) 264-5193 > ? //=\? EBRC, Room 617 > ? \=//? 221 Longwood Avenue??????? email:mcoyne at channing.harvard.edu > ?? //=\ Boston, MA 02115???????????????? mjcoyne at comcast.net > ?? \=// > --------------------------------------------------------------------- -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From hjm at tacgi.com Thu Feb 16 11:23:02 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 08:23:02 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: <200602160823.03534.hjm@tacgi.com> Yes, I'm going to try this 1st. Also the pointer to the NCBI eutils page was helpful. They describe the same thing and I think that API will give me what I need. I'll post back to report. Sorry for the delay in answering - this is a side project and as such is going slow. Many thanks to you guys, especially Brian for the example code - much more than I had a right to expect. Virtual Beers all round and real ones should we ever meet up. Harry On Thursday 16 February 2006 04:52, Chris Fields wrote: > I think a method was recently implemented in Bio::DB::GenBank to > retrieve a segment of DNA given start and end coordinates in GenBank > format; that should contain the features you need. I requested it > ~Nov-Dec in the mailing list but didn't get a chance to test it. > Would that help? > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > Harry, > > > > It's not clear to me that NCBI's eutils offers this capability > > directly. You > > can probably download Entrez Gene entries and parse them for > > coordinates but > > I know of no way to remotely retrieve genomic sequences like this > > from NCBI > > (ENSEMBL API perhaps?). What I had in mind uses the local approach > > that some > > of us favor and to prove to myself that this is simple to do I wrote a > > script that I just added to examples/tools, it's called > > extract_genes.pl and > > it's based on Bio::DB::Fasta. Download the sequence files for a given > > species to some dir, download Entrez Gene's gene2accession file, > > and run. It > > creates and stores a hash for lookups, it won't read gene2accession > > each > > time it runs. > > > > Brian O. > > > > On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> Hi Brian, > >> > >> Thanks very much for the pointers and the speed of your reply and > >> apologies > >> for the speed of mine. > >> > >> This looks good, but what I was looking for was a bioP approach > >> for hooking to > >> an API at NCBI or EBI so I could get this info and seqs from > >> them. In this > >> case, speed of retrieval is not critical and I'd rather not > >> download the > >> entirety of the sequences to a local disk to hack at them. > >> > >> I've determined a screen-scraping approach to get them and could > >> script that, > >> but I thought that bioP had a method for using NCBI's external > >> API's, tho it > >> may be that my memory is faulty or the approach is no longer > >> supported due to > >> overload. > >> > >> Does NCBI make such APIs available anymore? I searched a bit for > >> docs on them > >> but couldn't find anything (unless it's buried in the NCBI tookit, > >> which I > >> haven't started to excavate). > >> > >> Failing that, would SEALS provide such a service? Any PerlPinipeds > >> listening? > >> > >> Harry > >> > >> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > >>> Harry, > >>> > >>> Hope you're doing well. The approach could be based on > >>> Bio::DB::Fasta. So, > >>> from its documentation: > >>> > >>> use Bio::DB::Fasta; > >>> > >>> # create database from directory of fasta files > >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>> > >>> # simple access (for those without Bioperl) > >>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > >>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > >>> my @ids = $db->ids; > >>> my $length = $db->length('CHROMOSOME_I'); > >>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > >>> my $header = $db->header('CHROMOSOME_I'); > >>> > >>> # Bioperl-style access > >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>> > >>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > >>> my $seq = $obj->seq; > >>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > >>> > >>> Do you already have the offsets? > >>> > >>> Brian O. > >>> > >>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > >>>> Hi All, > >>>> > >>>> After perusing the tutorial and other docs for a an evening, I > >>>> still > >>>> can't find the answer to this. Forgive me if I've missed something > >>>> obvious. > >>>> > >>>> This should not be a novel request, but I've not found it > >>>> answered. If > >>>> bioperl isn't the best way to do this, I'd be grateful to a > >>>> pointer to a > >>>> better way, especially if it includes an illuminating bit of code. > >>>> > >>>> The problem is to retrieve genomic sequences plus & minus some > >>>> offset > >>>> from a locus determined by HUGO keyword or GeneID. This would be a > >>>> common followup chore for some extra analysis from a gene > >>>> expression > >>>> expt. Or maybe this is in the DBFetch routines, but I've missed > >>>> the > >>>> sequence type to specify...? > >>>> > >>>> > >>>> TIA! > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From cjfields at uiuc.edu Thu Feb 16 16:37:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 15:37:25 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> Message-ID: <000301c63341$2e015d50$15327e82@pyrimidine> As an update for those interested, I check on this today, feeding SearchIO XML and text output for all NCBI's BLAST flavors. Basically, all XML parses fine. All text output except blastn and tblastx works fine. The last two have the extra lines starting with 'Features in this part of subject sequence:'. I'll be checking into SearchIO::blast but don't know when I can get around to posting a fix. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > Sent: Thursday, February 16, 2006 3:46 AM > To: gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast-output. > (see e.g. "Features flanking this part..." or "Features in this part > ..."), example added. > The blast.pm module starts looking for the hsp-alignement-information, > but it dies when it hits this Feature-information. > > Pieter > > ...... From osborne1 at optonline.net Thu Feb 16 17:19:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 17:19:16 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: Message-ID: Chris, Yes. The question now is where to easily get the coordinates. Brian O. On 2/16/06 7:52 AM, "Chris Fields" wrote: > I think a method was recently implemented in Bio::DB::GenBank to > retrieve a segment of DNA given start and end coordinates in GenBank > format; that should contain the features you need. I requested it > ~Nov-Dec in the mailing list but didn't get a chance to test it. > Would that help? > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > >> Harry, >> >> It's not clear to me that NCBI's eutils offers this capability >> directly. You >> can probably download Entrez Gene entries and parse them for >> coordinates but >> I know of no way to remotely retrieve genomic sequences like this >> from NCBI >> (ENSEMBL API perhaps?). What I had in mind uses the local approach >> that some >> of us favor and to prove to myself that this is simple to do I wrote a >> script that I just added to examples/tools, it's called >> extract_genes.pl and >> it's based on Bio::DB::Fasta. Download the sequence files for a given >> species to some dir, download Entrez Gene's gene2accession file, >> and run. It >> creates and stores a hash for lookups, it won't read gene2accession >> each >> time it runs. >> >> Brian O. >> >> >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >> >>> Hi Brian, >>> >>> Thanks very much for the pointers and the speed of your reply and >>> apologies >>> for the speed of mine. >>> >>> This looks good, but what I was looking for was a bioP approach >>> for hooking to >>> an API at NCBI or EBI so I could get this info and seqs from >>> them. In this >>> case, speed of retrieval is not critical and I'd rather not >>> download the >>> entirety of the sequences to a local disk to hack at them. >>> >>> I've determined a screen-scraping approach to get them and could >>> script that, >>> but I thought that bioP had a method for using NCBI's external >>> API's, tho it >>> may be that my memory is faulty or the approach is no longer >>> supported due to >>> overload. >>> >>> Does NCBI make such APIs available anymore? I searched a bit for >>> docs on them >>> but couldn't find anything (unless it's buried in the NCBI tookit, >>> which I >>> haven't started to excavate). >>> >>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>> listening? >>> >>> Harry >>> >>> >>> >>> >>> >>> >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>> Harry, >>>> >>>> Hope you're doing well. The approach could be based on >>>> Bio::DB::Fasta. So, >>>> from its documentation: >>>> >>>> use Bio::DB::Fasta; >>>> >>>> # create database from directory of fasta files >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>> >>>> # simple access (for those without Bioperl) >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>> my @ids = $db->ids; >>>> my $length = $db->length('CHROMOSOME_I'); >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>> my $header = $db->header('CHROMOSOME_I'); >>>> >>>> # Bioperl-style access >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>> >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>> my $seq = $obj->seq; >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>> >>>> Do you already have the offsets? >>>> >>>> Brian O. >>>> >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>> Hi All, >>>>> >>>>> After perusing the tutorial and other docs for a an evening, I >>>>> still >>>>> can't find the answer to this. Forgive me if I've missed something >>>>> obvious. >>>>> >>>>> This should not be a novel request, but I've not found it >>>>> answered. If >>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>> pointer to a >>>>> better way, especially if it includes an illuminating bit of code. >>>>> >>>>> The problem is to retrieve genomic sequences plus & minus some >>>>> offset >>>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>>> common followup chore for some extra analysis from a gene >>>>> expression >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>> the >>>>> sequence type to specify...? >>>>> >>>>> >>>>> TIA! >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Feb 16 17:29:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 16:29:15 -0600 Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO text parsing? Message-ID: <000001c63348$6b8136d0$15327e82@pyrimidine> I'm floating this to see what people think... I'm beginning to wonder, especially when I'm wading through the regex/parsing nightmare in SearchIO::blast, if we should either require a minimal BLAST version number for parsing to work in SearchIO::blast. I could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a warning if the blast version number is below a minimal version, so at least people will know what the problem is (not us!). The regexes are really piling up, and the latest changes in blastn and tblastx will require adding a few more. I also think that this would help remind everybody running the latest Bioperl that there are also newer versions of BLAST. My current thought is to get it working for the latest text output from NCBI, check it against the last version of BLAST (v. 2.2.12, which, luckily, blastcl3 generates), and not worry too much about older ones. Any thoughts on this? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 16 17:45:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 16:45:52 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: Message-ID: <000101c6334a$bd80a900$15327e82@pyrimidine> If I know the start, end, and strand info for a list of features (personal preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew up), couldn't I try pulling out the surrounding region? My thought is this, though I haven't coded it yet: 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates (array of hashes) based off what I get from RNAMotif objects. 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep in there somewhere to not tick off the NCBI curators. Reason I'm interested in this is b/c I want to know where the RNA motif is in context to surrounding features. If it is very close to a coding region, then the motif likely indicates translational regulation. Further away may indicate transcriptional termination or another mechanism. The files returned should have the features included as long as they are in the full length GenBank record. I tried it out using the web form but not through Bio::DB::GenBank yet. If I can get it to work I'll add it to the page. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Thursday, February 16, 2006 4:19 PM > To: Chris Fields > Cc: Harry Mangalam; bioperl-l > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names or > GeneIDs > > Chris, > > Yes. The question now is where to easily get the coordinates. > > Brian O. > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > I think a method was recently implemented in Bio::DB::GenBank to > > retrieve a segment of DNA given start and end coordinates in GenBank > > format; that should contain the features you need. I requested it > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > Would that help? > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > > >> Harry, > >> > >> It's not clear to me that NCBI's eutils offers this capability > >> directly. You > >> can probably download Entrez Gene entries and parse them for > >> coordinates but > >> I know of no way to remotely retrieve genomic sequences like this > >> from NCBI > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > >> that some > >> of us favor and to prove to myself that this is simple to do I wrote a > >> script that I just added to examples/tools, it's called > >> extract_genes.pl and > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > >> species to some dir, download Entrez Gene's gene2accession file, > >> and run. It > >> creates and stores a hash for lookups, it won't read gene2accession > >> each > >> time it runs. > >> > >> Brian O. > >> > >> > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> > >>> Hi Brian, > >>> > >>> Thanks very much for the pointers and the speed of your reply and > >>> apologies > >>> for the speed of mine. > >>> > >>> This looks good, but what I was looking for was a bioP approach > >>> for hooking to > >>> an API at NCBI or EBI so I could get this info and seqs from > >>> them. In this > >>> case, speed of retrieval is not critical and I'd rather not > >>> download the > >>> entirety of the sequences to a local disk to hack at them. > >>> > >>> I've determined a screen-scraping approach to get them and could > >>> script that, > >>> but I thought that bioP had a method for using NCBI's external > >>> API's, tho it > >>> may be that my memory is faulty or the approach is no longer > >>> supported due to > >>> overload. > >>> > >>> Does NCBI make such APIs available anymore? I searched a bit for > >>> docs on them > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > >>> which I > >>> haven't started to excavate). > >>> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > >>> listening? > >>> > >>> Harry > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > >>>> Harry, > >>>> > >>>> Hope you're doing well. The approach could be based on > >>>> Bio::DB::Fasta. So, > >>>> from its documentation: > >>>> > >>>> use Bio::DB::Fasta; > >>>> > >>>> # create database from directory of fasta files > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>>> > >>>> # simple access (for those without Bioperl) > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > >>>> my @ids = $db->ids; > >>>> my $length = $db->length('CHROMOSOME_I'); > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > >>>> my $header = $db->header('CHROMOSOME_I'); > >>>> > >>>> # Bioperl-style access > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>>> > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > >>>> my $seq = $obj->seq; > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > >>>> > >>>> Do you already have the offsets? > >>>> > >>>> Brian O. > >>>> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > >>>>> Hi All, > >>>>> > >>>>> After perusing the tutorial and other docs for a an evening, I > >>>>> still > >>>>> can't find the answer to this. Forgive me if I've missed something > >>>>> obvious. > >>>>> > >>>>> This should not be a novel request, but I've not found it > >>>>> answered. If > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > >>>>> pointer to a > >>>>> better way, especially if it includes an illuminating bit of code. > >>>>> > >>>>> The problem is to retrieve genomic sequences plus & minus some > >>>>> offset > >>>>> from a locus determined by HUGO keyword or GeneID. This would be a > >>>>> common followup chore for some extra analysis from a gene > >>>>> expression > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > >>>>> the > >>>>> sequence type to specify...? > >>>>> > >>>>> > >>>>> TIA! > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hjm at tacgi.com Thu Feb 16 18:10:59 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 15:10:59 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine> References: <000101c6334a$bd80a900$15327e82@pyrimidine> Message-ID: <200602161510.59679.hjm@tacgi.com> This is essentially what I want to do and my [only in pseudocode] approach is basically what you describe, except that currently I only have HUGO descriptors, not Genbank UIDs. If you know of an index that lists both, that would be the entire shot. I'm also interested in tracking transcriptional control elements and cross-correlating & why I wrote the 'rules' chunk of the recently (self-promoted) tacg. Best Harry On Thursday 16 February 2006 14:45, Chris Fields wrote: > If I know the start, end, and strand info for a list of features (personal > preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew > up), couldn't I try pulling out the surrounding region? My thought is > this, though I haven't coded it yet: > > 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates > (array of hashes) based off what I get from RNAMotif objects. > 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream > and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep > in there somewhere to not tick off the NCBI curators. > > Reason I'm interested in this is b/c I want to know where the RNA motif is > in context to surrounding features. If it is very close to a coding region, > then the motif likely indicates translational regulation. Further away may > indicate transcriptional termination or another mechanism. > > The files returned should have the features included as long as they are in > the full length GenBank record. I tried it out using the web form but not > through Bio::DB::GenBank yet. If I can get it to work I'll add it to the > page. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: Brian Osborne [mailto:osborne1 at optonline.net] > > Sent: Thursday, February 16, 2006 4:19 PM > > To: Chris Fields > > Cc: Harry Mangalam; bioperl-l > > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > > or GeneIDs > > > > Chris, > > > > Yes. The question now is where to easily get the coordinates. > > > > Brian O. > > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > I think a method was recently implemented in Bio::DB::GenBank to > > > retrieve a segment of DNA given start and end coordinates in GenBank > > > format; that should contain the features you need. I requested it > > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > > Would that help? > > > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > >> Harry, > > >> > > >> It's not clear to me that NCBI's eutils offers this capability > > >> directly. You > > >> can probably download Entrez Gene entries and parse them for > > >> coordinates but > > >> I know of no way to remotely retrieve genomic sequences like this > > >> from NCBI > > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > > >> that some > > >> of us favor and to prove to myself that this is simple to do I wrote a > > >> script that I just added to examples/tools, it's called > > >> extract_genes.pl and > > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > > >> species to some dir, download Entrez Gene's gene2accession file, > > >> and run. It > > >> creates and stores a hash for lookups, it won't read gene2accession > > >> each > > >> time it runs. > > >> > > >> Brian O. > > >> > > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > > >>> Hi Brian, > > >>> > > >>> Thanks very much for the pointers and the speed of your reply and > > >>> apologies > > >>> for the speed of mine. > > >>> > > >>> This looks good, but what I was looking for was a bioP approach > > >>> for hooking to > > >>> an API at NCBI or EBI so I could get this info and seqs from > > >>> them. In this > > >>> case, speed of retrieval is not critical and I'd rather not > > >>> download the > > >>> entirety of the sequences to a local disk to hack at them. > > >>> > > >>> I've determined a screen-scraping approach to get them and could > > >>> script that, > > >>> but I thought that bioP had a method for using NCBI's external > > >>> API's, tho it > > >>> may be that my memory is faulty or the approach is no longer > > >>> supported due to > > >>> overload. > > >>> > > >>> Does NCBI make such APIs available anymore? I searched a bit for > > >>> docs on them > > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > > >>> which I > > >>> haven't started to excavate). > > >>> > > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > > >>> listening? > > >>> > > >>> Harry > > >>> > > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > > >>>> Harry, > > >>>> > > >>>> Hope you're doing well. The approach could be based on > > >>>> Bio::DB::Fasta. So, > > >>>> from its documentation: > > >>>> > > >>>> use Bio::DB::Fasta; > > >>>> > > >>>> # create database from directory of fasta files > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> # simple access (for those without Bioperl) > > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > > >>>> my @ids = $db->ids; > > >>>> my $length = $db->length('CHROMOSOME_I'); > > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > > >>>> my $header = $db->header('CHROMOSOME_I'); > > >>>> > > >>>> # Bioperl-style access > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > > >>>> my $seq = $obj->seq; > > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > >>>> > > >>>> Do you already have the offsets? > > >>>> > > >>>> Brian O. > > >>>> > > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > >>>>> Hi All, > > >>>>> > > >>>>> After perusing the tutorial and other docs for a an evening, I > > >>>>> still > > >>>>> can't find the answer to this. Forgive me if I've missed something > > >>>>> obvious. > > >>>>> > > >>>>> This should not be a novel request, but I've not found it > > >>>>> answered. If > > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > > >>>>> pointer to a > > >>>>> better way, especially if it includes an illuminating bit of code. > > >>>>> > > >>>>> The problem is to retrieve genomic sequences plus & minus some > > >>>>> offset > > >>>>> from a locus determined by HUGO keyword or GeneID. This would be a > > >>>>> common followup chore for some extra analysis from a gene > > >>>>> expression > > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > > >>>>> the > > >>>>> sequence type to specify...? > > >>>>> > > >>>>> > > >>>>> TIA! > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From anst at kvl.dk Fri Feb 17 04:18:18 2006 From: anst at kvl.dk (Anders Stegmann) Date: Fri, 17 Feb 2006 10:18:18 +0100 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <43F45FE60200009B00000ED6@gwia.kvl.dk> References: <43F45FE60200009B00000ED6@gwia.kvl.dk> Message-ID: <43F5A2EA0200009B00000F45@gwia.kvl.dk> >>>Anders Stegmann 02/16/06 11:20 am >>> Hi! I am blasting a protein seq (query) against an identical seq with a deletion of Aa nr 61 (subject). Then I print out the type of nomatch Aa and its position. The nomatch for the query seq is Aa G at position 61, which is correct. The nomatch for the subject seq is V at position 60, which is definitely not correct!? Is this a bug? testblast2.pl is the program to run Q0045 is the query seq. Q0045del61 is the subject seq (it has to be formated: formatdb -i Q0045del61 -p T -o F). Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045del61 Type: application/octet-stream Size: 872 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: testblast2.pl Type: application/octet-stream Size: 6109 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment-0002.obj -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment.html From saldroubi at yahoo.com Fri Feb 17 12:49:40 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 17 Feb 2006 09:49:40 -0800 (PST) Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <43EAAEEF.3000304@infotech.monash.edu.au> Message-ID: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> Torsten and all, I don't think this will work for me for it only generates statistics for a single sequence. What I need is a count matrix for each position for a number of DNA sequences. In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide. For example if I pass an array of sequences say: ATC,CCC,TTT then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this: 1 2 3 A 1 0 0 C 1 1 2 T 1 2 1 G 0 0 0 Any idea of this is already built somewhere in bioperl? Thank you. Torsten Seemann wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > Please excuse my lack of knowledge as I am a new comer to bioinformatics. Use the Bio::Tools::SeqStats module. The PDoc documentation even has an example similar to what you want to do: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html --Torsten Seemann Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From muratem at eng.uah.edu Fri Feb 17 12:45:30 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Fri, 17 Feb 2006 11:45:30 -0600 (CST) Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO text parsing? In-Reply-To: <000001c63348$6b8136d0$15327e82@pyrimidine> References: <000001c63348$6b8136d0$15327e82@pyrimidine> Message-ID: On Thu, 16 Feb 2006, Chris Fields wrote: > I'm floating this to see what people think... > > I'm beginning to wonder, especially when I'm wading through the > regex/parsing nightmare in SearchIO::blast, if we should either require a > minimal BLAST version number for parsing to work in SearchIO::blast. I > could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a > warning if the blast version number is below a minimal version, so at least > people will know what the problem is (not us!). > > The regexes are really piling up, and the latest changes in blastn and > tblastx will require adding a few more. I also think that this would help > remind everybody running the latest Bioperl that there are also newer > versions of BLAST. My current thought is to get it working for the latest > text output from NCBI, check it against the last version of BLAST (v. > 2.2.12, which, luckily, blastcl3 generates), and not worry too much about > older ones. > > Any thoughts on this? > Chris I could live with it. I think most of the world runs on NCBI or WUBLAST and it's easy to download/update either of those. Thanks for the effort. I use SearchIO a lot. Mike > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 17 13:15:53 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 17 Feb 2006 12:15:53 -0600 Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: http://forkhead.cgb.ki.se/TFBS/ provides ability to generate position frequency matrix from list of (presumaby aligned) sequences as follows: #!/usr/bin/env perl use TFBS::PatternGen::SimplePFM; my @sequences = <>; chomp @sequences; print TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences)->pattern->rawpr int; exit 0; The output when run on your example input shows that the order the nucleotides is not the same as you expect (it is alphbetical): 1 0 0 1 1 2 0 0 0 1 2 1 Good luck, TFBS installation requires signifigant dependencies, including bioperl and PDL. Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sam >Al-Droubi >Sent: Friday, February 17, 2006 11:50 AM >To: Torsten Seemann >Cc: BioPerl list >Subject: Re: [Bioperl-l] Count or weight matrix in bioperl? > > >Torsten and all, > > I don't think this will work for me for it only generates >statistics for a single sequence. What I need is a count >matrix for each position for a number of DNA sequences. In >other words, if I pass there 3 sequences to this function then >it returns the count for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for >postion 1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > > Torsten Seemann >wrote:> Say I have an array of nucleotide sequences of of >length N. I want to calculate the count matrix (weight >matrix). That is for each position 1..N, I want to know how >many As, Cs ,Ts and Gs there are. Is the code to do this >already written in bioperl to build this matrix if I pass it >those strings? >> Please excuse my lack of knowledge as I am a new comer to >bioinformatics. > >Use the Bio::Tools::SeqStats module. The PDoc documentation >even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/Seq >Stats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Feb 17 14:01:45 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 17 Feb 2006 14:01:45 -0500 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk> References: <43F45FE60200009B00000ED6@gwia.kvl.dk> <43F5A2EA0200009B00000F45@gwia.kvl.dk> Message-ID: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu> In case people on the list think that by my speaking up about question means they should ignore it... Hopefully someone else can help debug this - I really don't have time I'm afraid. -jason On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote: > > >>>> Anders Stegmann 02/16/06 11:20 am >>> > Hi! > > I am blasting a protein seq (query) against an identical seq with a > deletion of Aa nr 61 (subject). > Then I print out the type of nomatch Aa and its position. > The nomatch for the query seq is Aa G at position 61, which is > correct. > The nomatch for the subject seq is V at position 60, which is > definitely > not correct!? > > Is this a bug? > > testblast2.pl is the program to run > > Q0045 is the query seq. > > Q0045del61 is the subject seq (it has to be formated: formatdb -i > Q0045del61 -p T -o F). > > Regards Anders. > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Fri Feb 17 14:17:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Feb 2006 13:17:32 -0600 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu> Message-ID: <000001c633f6$cd391740$15327e82@pyrimidine> No, haven't ignored it. Just been busy going through SearchIO::blast again (I've perltidy'd it) since BLASTN and TBLASTX output (v2.2.13) don't work; looks like all others should. Trying to fix one problem at a time. I'll look at this next. Don't worry about it. ;> Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Friday, February 17, 2006 1:02 PM > To: Anders Stegmann > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] another searchIO bug? with blast report > > In case people on the list think that by my speaking up about > question means they should ignore it... > > Hopefully someone else can help debug this - I really don't have time > I'm afraid. > > -jason > > > On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote: > > > > > > >>>> Anders Stegmann 02/16/06 11:20 am >>> > > Hi! > > > > I am blasting a protein seq (query) against an identical seq with a > > deletion of Aa nr 61 (subject). > > Then I print out the type of nomatch Aa and its position. > > The nomatch for the query seq is Aa G at position 61, which is > > correct. > > The nomatch for the subject seq is V at position 60, which is > > definitely > > not correct!? > > > > Is this a bug? > > > > testblast2.pl is the program to run > > > > Q0045 is the query seq. > > > > Q0045del61 is the subject seq (it has to be formated: formatdb -i > > Q0045del61 -p T -o F). > > > > Regards Anders. > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Fri Feb 17 13:09:00 2006 From: skirov at utk.edu (Stefan Kirov) Date: Fri, 17 Feb 2006 13:09:00 -0500 Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> References: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> Message-ID: <43F6113C.6070501@utk.edu> If you have bioperl-live: write a file: >seqgroup1 ATC CCC TTT my $mio=new Bio::Matrix::PSM::IO(-format=>'masta',-file=>$filename); while (my $matrix=$mio->next_matrix) {#Returns Bio::Matrix::PSM::SiteMatrix object #do something with the matrix... print $matrix->consensus,"\n"; } This is not going to give you the raw counts, but it will give you the fequency for each pos/letter. see the docs for Bio::Matrix::PSM::SiteMatrix Hope this helps Stefan Sam Al-Droubi wrote: >Torsten and all, > > I don't think this will work for me for it only generates statistics for a single sequence. What I need is a count matrix for each position for a number of DNA sequences. In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > > Torsten Seemann wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > > >> Please excuse my lack of knowledge as I am a new comer to bioinformatics. >> >> > >Use the Bio::Tools::SeqStats module. The PDoc documentation even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Fri Feb 17 18:02:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Feb 2006 17:02:02 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine> Message-ID: <000601c63416$2a14aa00$15327e82@pyrimidine> Brian, I added some sample code to the page. See what you think. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, February 16, 2006 4:46 PM > To: 'Brian Osborne' > Cc: 'Harry Mangalam'; 'bioperl-l' > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > orGeneIDs > > If I know the start, end, and strand info for a list of features (personal > preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew > up), couldn't I try pulling out the surrounding region? My thought is > this, > though I haven't coded it yet: > > 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates > (array of hashes) based off what I get from RNAMotif objects. > 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream > and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep > in there somewhere to not tick off the NCBI curators. > > Reason I'm interested in this is b/c I want to know where the RNA motif is > in context to surrounding features. If it is very close to a coding > region, > then the motif likely indicates translational regulation. Further away > may > indicate transcriptional termination or another mechanism. > > The files returned should have the features included as long as they are > in > the full length GenBank record. I tried it out using the web form but not > through Bio::DB::GenBank yet. If I can get it to work I'll add it to the > page. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Brian Osborne [mailto:osborne1 at optonline.net] > > Sent: Thursday, February 16, 2006 4:19 PM > > To: Chris Fields > > Cc: Harry Mangalam; bioperl-l > > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > or > > GeneIDs > > > > Chris, > > > > Yes. The question now is where to easily get the coordinates. > > > > Brian O. > > > > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > > > I think a method was recently implemented in Bio::DB::GenBank to > > > retrieve a segment of DNA given start and end coordinates in GenBank > > > format; that should contain the features you need. I requested it > > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > > Would that help? > > > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > > > > >> Harry, > > >> > > >> It's not clear to me that NCBI's eutils offers this capability > > >> directly. You > > >> can probably download Entrez Gene entries and parse them for > > >> coordinates but > > >> I know of no way to remotely retrieve genomic sequences like this > > >> from NCBI > > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > > >> that some > > >> of us favor and to prove to myself that this is simple to do I wrote > a > > >> script that I just added to examples/tools, it's called > > >> extract_genes.pl and > > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > > >> species to some dir, download Entrez Gene's gene2accession file, > > >> and run. It > > >> creates and stores a hash for lookups, it won't read gene2accession > > >> each > > >> time it runs. > > >> > > >> Brian O. > > >> > > >> > > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > > >> > > >>> Hi Brian, > > >>> > > >>> Thanks very much for the pointers and the speed of your reply and > > >>> apologies > > >>> for the speed of mine. > > >>> > > >>> This looks good, but what I was looking for was a bioP approach > > >>> for hooking to > > >>> an API at NCBI or EBI so I could get this info and seqs from > > >>> them. In this > > >>> case, speed of retrieval is not critical and I'd rather not > > >>> download the > > >>> entirety of the sequences to a local disk to hack at them. > > >>> > > >>> I've determined a screen-scraping approach to get them and could > > >>> script that, > > >>> but I thought that bioP had a method for using NCBI's external > > >>> API's, tho it > > >>> may be that my memory is faulty or the approach is no longer > > >>> supported due to > > >>> overload. > > >>> > > >>> Does NCBI make such APIs available anymore? I searched a bit for > > >>> docs on them > > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > > >>> which I > > >>> haven't started to excavate). > > >>> > > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > > >>> listening? > > >>> > > >>> Harry > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > > >>>> Harry, > > >>>> > > >>>> Hope you're doing well. The approach could be based on > > >>>> Bio::DB::Fasta. So, > > >>>> from its documentation: > > >>>> > > >>>> use Bio::DB::Fasta; > > >>>> > > >>>> # create database from directory of fasta files > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> # simple access (for those without Bioperl) > > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > > >>>> my @ids = $db->ids; > > >>>> my $length = $db->length('CHROMOSOME_I'); > > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > > >>>> my $header = $db->header('CHROMOSOME_I'); > > >>>> > > >>>> # Bioperl-style access > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > > >>>> my $seq = $obj->seq; > > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > >>>> > > >>>> Do you already have the offsets? > > >>>> > > >>>> Brian O. > > >>>> > > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > >>>>> Hi All, > > >>>>> > > >>>>> After perusing the tutorial and other docs for a an evening, I > > >>>>> still > > >>>>> can't find the answer to this. Forgive me if I've missed > something > > >>>>> obvious. > > >>>>> > > >>>>> This should not be a novel request, but I've not found it > > >>>>> answered. If > > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > > >>>>> pointer to a > > >>>>> better way, especially if it includes an illuminating bit of code. > > >>>>> > > >>>>> The problem is to retrieve genomic sequences plus & minus some > > >>>>> offset > > >>>>> from a locus determined by HUGO keyword or GeneID. This would be > a > > >>>>> common followup chore for some extra analysis from a gene > > >>>>> expression > > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > > >>>>> the > > >>>>> sequence type to specify...? > > >>>>> > > >>>>> > > >>>>> TIA! > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Feb 17 23:01:14 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Feb 2006 23:01:14 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine> Message-ID: Chris, That's nice. Now what I'm puzzling over is how to get the genomic coordinates given an id, like a Gene id. The raw query is something like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&rettyp e=xml This is _something_ like the queries used within Bio::DB::Query::GenBank, but not exactly. Now taking a look at how the text returned is transformed into objects... Brian O. On 2/17/06 6:02 PM, "Chris Fields" wrote: > Brian, > > I added some sample code to the page. See what you think. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, February 16, 2006 4:46 PM >> To: 'Brian Osborne' >> Cc: 'Harry Mangalam'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> orGeneIDs >> >> If I know the start, end, and strand info for a list of features (personal >> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew >> up), couldn't I try pulling out the surrounding region? My thought is >> this, >> though I haven't coded it yet: >> >> 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates >> (array of hashes) based off what I get from RNAMotif objects. >> 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream >> and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep >> in there somewhere to not tick off the NCBI curators. >> >> Reason I'm interested in this is b/c I want to know where the RNA motif is >> in context to surrounding features. If it is very close to a coding >> region, >> then the motif likely indicates translational regulation. Further away >> may >> indicate transcriptional termination or another mechanism. >> >> The files returned should have the features included as long as they are >> in >> the full length GenBank record. I tried it out using the web form but not >> through Bio::DB::GenBank yet. If I can get it to work I'll add it to the >> page. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: Brian Osborne [mailto:osborne1 at optonline.net] >>> Sent: Thursday, February 16, 2006 4:19 PM >>> To: Chris Fields >>> Cc: Harry Mangalam; bioperl-l >>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> or >>> GeneIDs >>> >>> Chris, >>> >>> Yes. The question now is where to easily get the coordinates. >>> >>> Brian O. >>> >>> >>> On 2/16/06 7:52 AM, "Chris Fields" wrote: >>> >>>> I think a method was recently implemented in Bio::DB::GenBank to >>>> retrieve a segment of DNA given start and end coordinates in GenBank >>>> format; that should contain the features you need. I requested it >>>> ~Nov-Dec in the mailing list but didn't get a chance to test it. >>>> Would that help? >>>> >>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>>> >>>>> Harry, >>>>> >>>>> It's not clear to me that NCBI's eutils offers this capability >>>>> directly. You >>>>> can probably download Entrez Gene entries and parse them for >>>>> coordinates but >>>>> I know of no way to remotely retrieve genomic sequences like this >>>>> from NCBI >>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>>>> that some >>>>> of us favor and to prove to myself that this is simple to do I wrote >> a >>>>> script that I just added to examples/tools, it's called >>>>> extract_genes.pl and >>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>>>> species to some dir, download Entrez Gene's gene2accession file, >>>>> and run. It >>>>> creates and stores a hash for lookups, it won't read gene2accession >>>>> each >>>>> time it runs. >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks very much for the pointers and the speed of your reply and >>>>>> apologies >>>>>> for the speed of mine. >>>>>> >>>>>> This looks good, but what I was looking for was a bioP approach >>>>>> for hooking to >>>>>> an API at NCBI or EBI so I could get this info and seqs from >>>>>> them. In this >>>>>> case, speed of retrieval is not critical and I'd rather not >>>>>> download the >>>>>> entirety of the sequences to a local disk to hack at them. >>>>>> >>>>>> I've determined a screen-scraping approach to get them and could >>>>>> script that, >>>>>> but I thought that bioP had a method for using NCBI's external >>>>>> API's, tho it >>>>>> may be that my memory is faulty or the approach is no longer >>>>>> supported due to >>>>>> overload. >>>>>> >>>>>> Does NCBI make such APIs available anymore? I searched a bit for >>>>>> docs on them >>>>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>>>> which I >>>>>> haven't started to excavate). >>>>>> >>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>>>> listening? >>>>>> >>>>>> Harry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>>>> Harry, >>>>>>> >>>>>>> Hope you're doing well. The approach could be based on >>>>>>> Bio::DB::Fasta. So, >>>>>>> from its documentation: >>>>>>> >>>>>>> use Bio::DB::Fasta; >>>>>>> >>>>>>> # create database from directory of fasta files >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> # simple access (for those without Bioperl) >>>>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>>>> my @ids = $db->ids; >>>>>>> my $length = $db->length('CHROMOSOME_I'); >>>>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>>>> my $header = $db->header('CHROMOSOME_I'); >>>>>>> >>>>>>> # Bioperl-style access >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>>>> my $seq = $obj->seq; >>>>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>>>> >>>>>>> Do you already have the offsets? >>>>>>> >>>>>>> Brian O. >>>>>>> >>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>>>> still >>>>>>>> can't find the answer to this. Forgive me if I've missed >> something >>>>>>>> obvious. >>>>>>>> >>>>>>>> This should not be a novel request, but I've not found it >>>>>>>> answered. If >>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>>>> pointer to a >>>>>>>> better way, especially if it includes an illuminating bit of code. >>>>>>>> >>>>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>>>> offset >>>>>>>> from a locus determined by HUGO keyword or GeneID. This would be >> a >>>>>>>> common followup chore for some extra analysis from a gene >>>>>>>> expression >>>>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>>>> the >>>>>>>> sequence type to specify...? >>>>>>>> >>>>>>>> >>>>>>>> TIA! >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri Feb 17 23:56:08 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Feb 2006 23:56:08 -0500 Subject: [Bioperl-l] CONTIG sequence files from the NCBI In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: Michael, Yes, BioPerl has done this for you. Essentially what it does it take all the ids in the CONTIG section and query for each individually, then use the sequences and the location data to create the single large sequence. This sequence is appended to the annotation and feature section of the initial Genbank entry. If you want to study this yourself take a look at Bio::DB::NCBIHelper::postprocess_data. OK, to answer your first question with my assumption: what NCBI is doing is simply providing a shorthand rather than an entire large sequence, therefore no feature coordinates change, whether it's shorthand, CONTIG, or longhand, ORIGIN. Second, my explanation tells you that all the sequences are the very latest versions of each sequence, that's how eutils works by default. However, I don't think I've answered your question because I'm not sure I understand what you mean by "when I ask bioperl if these sequences have been updated, I will be told no". All Bioperl does is read the file provided by GenBank and use its stated version, nothing fancy. Brian O. On 2/16/06 5:31 AM, "michael watson (IAH-C)" wrote: > Hi > > I have two questions really. I fetched bacterial genome sequences from > the NCBI using Bio::DB::GenBank. > > Some of these sequence entries are CONTIG sequences, ie they just point > to other sequences that need to be joined together to form the entire > genome. > > Looking at my downloads, it looks as if bioperl has done all the > necessary joining for me - or maybe it was the NCBI that did the > joining? > > OK, so firstly, did bioperl do the joining, and if so, are all the > co-ordinates of the features updated to reflect their new location on > the new, joined sequence? > > And secondly, sequence versions... I'm thinking that possibly the > sequence version of the CONTIG may be 1 (as it hasn't changed) yet the > versions of the sequences it refers to might have changed, so when I ask > bioperl if these sequences have been updated, I will be told no because > the CONTIG sequence version is 1, but I should be told yes because the > underlying sequences have...? > > Make sense? > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pedro.fabre at gmail.com Fri Feb 17 13:36:37 2006 From: pedro.fabre at gmail.com (pedro fabre) Date: Fri, 17 Feb 2006 18:36:37 +0000 Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: >Torsten and all, > > I don't think this will work for me for it only generates >statistics for a single sequence. What I need is a count matrix for >each position for a number of DNA sequences. In other words, if I >pass there 3 sequences to this function then it returns the count >for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for postion >1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > Sam, What about this? I worked in something like that some time ago for SNP calculation and it looks to me you are on the same way. If you have a sequence like A C G T C C A - T C G G T A G T G C C C C C C G T G C C G C T C G T G C Convert the sequence to numbers (0 for the first value, 1 for the first modification (reading by columns), 2 for the second modification and so on) Deletions can be considered as another base if you like After that: 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 Once we have the haplotype converted to numbers we have to generate the snp type information for the haplotype. SNP code = SUM ( value * multiplicity ^ position );> where: SUM is the sum of the values for the SNP value is the SNP number code (0 [generally for the mayor allele], 1 [for the minor allele]. position is the position on the block. For this example the code is: 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 ------------------------------------------------------------------ 14 10 12 4 2 14 14 14 14 14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3 12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3 .... Once we have the families classify. We will B just the SNP's B. 14 10 12 4 2 If you want to look into the code follow this link. http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup HTH Pedro > Torsten Seemann wrote:> >Say I have an array of nucleotide sequences of of length N. I want >to calculate the count matrix (weight matrix). That is for each >position 1..N, I want to know how many As, Cs ,Ts and Gs there are. >Is the code to do this already written in bioperl to build this >matrix if I pass it those strings? >> Please excuse my lack of knowledge as I am a new comer to bioinformatics. > >Use the Bio::Tools::SeqStats module. The PDoc documentation even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Feb 18 18:35:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 18 Feb 2006 17:35:22 -0600 Subject: [Bioperl-l] Bio::SearchIO fix posted in Bugzilla Message-ID: <97C946BE-8410-4B7F-9FA3-97A01641E20E@uiuc.edu> Added a fix for the blastn and tblastx problems with Bio::SearchIO text parsing of BLAST 2.2.13 output: http://bugzilla.open-bio.org/show_bug.cgi?id=1934 The extra lines "Features in this part of subject sequence" and the following descriptive lines are passed over using a loop. See the bug report for specifics. Cheers, Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun Feb 19 00:47:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 19 Feb 2006 00:47:44 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine> Message-ID: Chris and Harry, OK, I've put the missing link in place. This is Bio::DB::EntrezGene, so you can get NCBI Genes as objects, perfectly analogous to Bio::DB::GenBank and the related modules: use Bio::DB::EntrezGene; $db = new Bio::DB::EntrezGene; $seq = $db->get_Seq_by_id(2); So starting with just a Gene id, then using Bio::DB::GenBank as Chris showed, you can get the sequence. What's a little odd is how Entrez Gene has stored positional information and Sequence identifier, you may have thought that they'd create a special set of fields for this but no, it's only available as part of a URL as far as I can tell: Bio::Annotation::DBLink=HASH() '_root_verbose' => 0 'database' => 'Evidence Viewer' 'primary_id' => 4693 'url' => 'http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_079573.2&ge ne=NDP&lid=4693&from=6657835&to=6682559' Question: are NT_* sequences going to be a problem for Bio::DB::GenBank? I see this in NCBIHelper: # NT contigs can not be retrieved $self->throw("NT_ contigs are whole chromosome files which are not part of regular". "database distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.") if $ids =~ /NT_/; Perhaps we can modify this so there's no throw() when a seq_start and seq_stop are specified. Brian O. On 2/17/06 6:02 PM, "Chris Fields" wrote: > Brian, > > I added some sample code to the page. See what you think. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, February 16, 2006 4:46 PM >> To: 'Brian Osborne' >> Cc: 'Harry Mangalam'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> orGeneIDs >> >> If I know the start, end, and strand info for a list of features (personal >> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew >> up), couldn't I try pulling out the surrounding region? My thought is >> this, >> though I haven't coded it yet: >> >> 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates >> (array of hashes) based off what I get from RNAMotif objects. >> 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream >> and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep >> in there somewhere to not tick off the NCBI curators. >> >> Reason I'm interested in this is b/c I want to know where the RNA motif is >> in context to surrounding features. If it is very close to a coding >> region, >> then the motif likely indicates translational regulation. Further away >> may >> indicate transcriptional termination or another mechanism. >> >> The files returned should have the features included as long as they are >> in >> the full length GenBank record. I tried it out using the web form but not >> through Bio::DB::GenBank yet. If I can get it to work I'll add it to the >> page. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: Brian Osborne [mailto:osborne1 at optonline.net] >>> Sent: Thursday, February 16, 2006 4:19 PM >>> To: Chris Fields >>> Cc: Harry Mangalam; bioperl-l >>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> or >>> GeneIDs >>> >>> Chris, >>> >>> Yes. The question now is where to easily get the coordinates. >>> >>> Brian O. >>> >>> >>> On 2/16/06 7:52 AM, "Chris Fields" wrote: >>> >>>> I think a method was recently implemented in Bio::DB::GenBank to >>>> retrieve a segment of DNA given start and end coordinates in GenBank >>>> format; that should contain the features you need. I requested it >>>> ~Nov-Dec in the mailing list but didn't get a chance to test it. >>>> Would that help? >>>> >>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>>> >>>>> Harry, >>>>> >>>>> It's not clear to me that NCBI's eutils offers this capability >>>>> directly. You >>>>> can probably download Entrez Gene entries and parse them for >>>>> coordinates but >>>>> I know of no way to remotely retrieve genomic sequences like this >>>>> from NCBI >>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>>>> that some >>>>> of us favor and to prove to myself that this is simple to do I wrote >> a >>>>> script that I just added to examples/tools, it's called >>>>> extract_genes.pl and >>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>>>> species to some dir, download Entrez Gene's gene2accession file, >>>>> and run. It >>>>> creates and stores a hash for lookups, it won't read gene2accession >>>>> each >>>>> time it runs. >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks very much for the pointers and the speed of your reply and >>>>>> apologies >>>>>> for the speed of mine. >>>>>> >>>>>> This looks good, but what I was looking for was a bioP approach >>>>>> for hooking to >>>>>> an API at NCBI or EBI so I could get this info and seqs from >>>>>> them. In this >>>>>> case, speed of retrieval is not critical and I'd rather not >>>>>> download the >>>>>> entirety of the sequences to a local disk to hack at them. >>>>>> >>>>>> I've determined a screen-scraping approach to get them and could >>>>>> script that, >>>>>> but I thought that bioP had a method for using NCBI's external >>>>>> API's, tho it >>>>>> may be that my memory is faulty or the approach is no longer >>>>>> supported due to >>>>>> overload. >>>>>> >>>>>> Does NCBI make such APIs available anymore? I searched a bit for >>>>>> docs on them >>>>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>>>> which I >>>>>> haven't started to excavate). >>>>>> >>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>>>> listening? >>>>>> >>>>>> Harry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>>>> Harry, >>>>>>> >>>>>>> Hope you're doing well. The approach could be based on >>>>>>> Bio::DB::Fasta. So, >>>>>>> from its documentation: >>>>>>> >>>>>>> use Bio::DB::Fasta; >>>>>>> >>>>>>> # create database from directory of fasta files >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> # simple access (for those without Bioperl) >>>>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>>>> my @ids = $db->ids; >>>>>>> my $length = $db->length('CHROMOSOME_I'); >>>>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>>>> my $header = $db->header('CHROMOSOME_I'); >>>>>>> >>>>>>> # Bioperl-style access >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>>>> my $seq = $obj->seq; >>>>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>>>> >>>>>>> Do you already have the offsets? >>>>>>> >>>>>>> Brian O. >>>>>>> >>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>>>> still >>>>>>>> can't find the answer to this. Forgive me if I've missed >> something >>>>>>>> obvious. >>>>>>>> >>>>>>>> This should not be a novel request, but I've not found it >>>>>>>> answered. If >>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>>>> pointer to a >>>>>>>> better way, especially if it includes an illuminating bit of code. >>>>>>>> >>>>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>>>> offset >>>>>>>> from a locus determined by HUGO keyword or GeneID. This would be >> a >>>>>>>> common followup chore for some extra analysis from a gene >>>>>>>> expression >>>>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>>>> the >>>>>>>> sequence type to specify...? >>>>>>>> >>>>>>>> >>>>>>>> TIA! >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maximilianh at gmail.com Sun Feb 19 08:52:37 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Sun, 19 Feb 2006 14:52:37 +0100 Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <76f031ae0602190552v5f2542dbv@mail.gmail.com> Hi bio-mailinglists, does anyone here know of a tool or a library to display two (or more) sequences at the same time with coloured features? Possibly with lines, connecting some features from one sequence to the other (synteny-plot) ? Or to display two multiple alignments, one on top of each other, with colored features added? It's not that it would be difficult to write, but programming visualisation usually takes a lot of time. Bio::Graphics seems mainly concerned with one main sequence and features on it. Well, I could copy together two of these gif-images, but then there would be no connecting lines. Same applies for the graphics in Biojava or the gff2ps tool or all the multiple alignment viewers that I know (Bioedit, ClustalX). There is something called Toucan in Java, which displays at least several lines of gff-style-features, but no visible sequences and more importantly, no connecting lines. A recent software, Djinn lite, is using a similar kind of visualization to compare different spliced genes from various species, but it's mainly aimed at splicing and written in Visual Basic. I guess a good compromise might be the 3D viewer Sockeye, but I haven't seen any synteny-lines in sockeye yet. I guess I must have missed something here. I cannot be the first one that would like to compare, say, two gff files, or two multiple alignments? Thanks a lot for any idea, Max From lutfullah at upesh.edu Sun Feb 19 12:01:05 2006 From: lutfullah at upesh.edu (Dr. Lutfullah) Date: Sun, 19 Feb 2006 22:01:05 +0500 Subject: [Bioperl-l] bioperl in jail Message-ID: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> Hello, I am trying to create a situation where users can ssh login to a chrooted jailed account with limited functionality. I created the chroot jail on my Fedora Core 4 installation using a script available at: http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/ The script has a line: ====================== APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd /usr/libexec/openssh/sftp-server" ======================= to which I added everything I could get with /bin/perl to make it: APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5 /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl" perl becomes available inside the jail but I cannot use the line "use Bio::Perl" inside the jail. The script produces an error on including /usr/lib or /usr/lib/perl5: Copying necessary library-files to jail (may take some time) cp: omitting directory `/usr/lib' ldd: /usr/lib: No such file or directory Copying files from /etc/pam.d/ to jail Copying PAM-Modules to jail In the jailed account the little test program: use Bio::Perl; print 2+4; generated this error: Can't locate Bio/Perl.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread ............................................ Any help would be much appreciated. Thanks in advance. LK From boris.steipe at utoronto.ca Sun Feb 19 17:34:52 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Sun, 19 Feb 2006 17:34:52 -0500 Subject: [Bioperl-l] bioperl in jail In-Reply-To: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> References: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> Message-ID: The path that perl uses internally to search its modules (@INC) is not the same thing as the path your shell uses. You have to modify @INC either within running scripts, or by setting the PERL5LIB environment variable upon login. e.g. see http://modperlbook.org/html/ch03_09.html HTH, B. On 19 Feb 2006, at 12:01, Dr. Lutfullah wrote: > Hello, > > I am trying to create a situation where users can ssh login to a > chrooted > jailed account with limited functionality. > I created the chroot jail on my Fedora Core 4 installation using a > script > available at: > http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/ > The script has a line: > ====================== > APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv > /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id > /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd > /usr/libexec/openssh/sftp-server" > ======================= > to which I added everything I could get with /bin/perl to make it: > > APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv > /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id > /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd > /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5 > /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp > /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl" > > perl becomes available inside the jail but I cannot use the line "use > Bio::Perl" inside the jail. > > The script produces an error on including /usr/lib or /usr/lib/perl5: > > Copying necessary library-files to jail (may take some time) > cp: omitting directory `/usr/lib' > ldd: /usr/lib: No such file or directory > Copying files from /etc/pam.d/ to jail > Copying PAM-Modules to jail > > In the jailed account the little test program: > > use Bio::Perl; > print 2+4; > > generated this error: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread > ............................................ > > Any help would be much appreciated. Thanks in advance. > > LK > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From khoueiry at ibdm.univ-mrs.fr Mon Feb 20 04:27:07 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Mon, 20 Feb 2006 10:27:07 +0100 Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> Message-ID: <1140427628.10569.10.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060220/fc7e2fc8/attachment.ksh From shameer at ncbs.res.in Mon Feb 20 01:21:01 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 20 Feb 2006 11:51:01 +0530 (IST) Subject: [Bioperl-l] Matrix Average Code / Module ? In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> Message-ID: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> Hi all, Is there any program/module to calculate the average of a blosum/pam any matrix ? I have a matrix and I need to see the average for example 11 22 43 54 50 27 87 74 32 10 66 58 98 78 20 22 23 44 16 34 I have gone through Bio::Matrix::MatrixI and Bio::Matrix::GenericMatrix and other perl modules like Math::Matrix http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm and Math::Cephes::Matrix - but none of them have a provison to do matrix average calculation. Any help ??? thanks in advance, Happy biocomputing !!! -- Shameer Khadar National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From cjfields at uiuc.edu Mon Feb 20 12:01:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Feb 2006 11:01:26 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> Message-ID: <000e01c6363f$494bc5e0$15327e82@pyrimidine> I have added a preliminary bugfix for the problems seen with nucleotide blast parsing for BLAST 2.2.13 reports. I passed SearchIO::blast through perltidy to space out the blocks (really for my own purposes; it's a pretty complex module). The fix bypasses the extra lines output for blastn and tblastx and now seems to parse the text output for those reports correctly. I tested it using all NCBI BLAST flavors for the last two version of BLAST (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST report parsing, such as WU-BLAST, RPS-BLAST, or Paracel. It has only been tested on MacOSX at the moment, so I need people out there to test it out on anything they can to make sure it works before committing. I'll be trying it on Windows today. Report back to me and I'll post anything on bugzilla. Here it is: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > Sent: Thursday, February 16, 2006 3:46 AM > To: gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast-output. > (see e.g. "Features flanking this part..." or "Features in this part > ..."), example added. > The blast.pm module starts looking for the hsp-alignement-information, > but it dies when it hits this Feature-information. > > Pieter > > > >gi|77552765|gb|DP000011.1| > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group) > chromosome 12, complete > > sequence > Length=27492551 > > Features flanking this part of subject sequence: > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class > &from=19251479&to=19253693&view=gbwithparts> > > 2655 bp at 3' side: hypothetical protein > &from=19260091&to=19260600&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 4 GTACTACTCTACTCTACT 21 > |||||||||||||||||| > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > Features flanking this part of subject sequence: > > 2991 bp at 5' side: hypothetical protein > &from=27003164&to=27003907&view=gbwithparts> > 1131 bp at 3' side: hypothetical protein > > &from=27008046&to=27010752&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 2 ATGTACTACTCTACTCTA 19 > |||||||||||||||||| > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > Features in this part of subject sequence: > DHHC zinc finger domain, putative > > &from=17614825&to=17618687&view=gbwithparts> > > Score = 34.2 bits (17), Expect = 0.87 > Identities = 17/17 (100%), Gaps = 0/17 (0%) > Strand=Plus/Plus > > Query 5 TACTACTCTACTCTACT 21 > ||||||||||||||||| > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > Features flanking this part of subject sequence: > 102 bp at 5' side: bZIP transcription factor, putative > > &from=2774964&to=2775778&view=gbwithparts> > 3740 bp at 3' side: yeast dcp1, putative > &from=2779635&to=2782508&view=gbwithparts> > > Score = 32.2 bits (16), Expect = > 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Plus > > Query 7 CTACTCTACTCTACTC 22 > |||||||||||||||| > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > Features flanking this part of subject sequence: > > 21 bp at 5' side: peptide transporter T17F3.11, putative > &from=27321354&to=27323117&view=gbwithparts> > > 10230 bp at 3' side: transposon protein, putative, unclassified > &from=27333383&to=27334285&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Minus > > Query 7 CTACTCTACTCTACTC 22 > > |||||||||||||||| > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > Guojun Yang wrote: > > >Hi, Chris, > >Finally the remoteblast test script works for the amino.fa query. but > when I try a nucleic acid sequence (see below), Error occurs: > >" > >waiting........ > >------------- EXCEPTION ------------- > >MSG: no data for midline Features flanking this part of subject > sequence: > >STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > hIO/blast.pm:1172 > >STACK toplevel remoteblast_test:40 > >" > >The query sequence is: > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > > >The script (basically same as the remoteblast test, I only changed > database to 'nr' and program to 'blastn' and filename to 'ost3'): > >#!/usr/bin/perl > > > >use Bio::SeqIO; > >use Bio::Seq; > >use Bio::Tools::Run::RemoteBlast; > >use Bio::SearchIO; > >use strict; > >my $prog='blastn'; > >my $db='nr'; > >my $e_val=1e-10; > >my @params=( -prog=>$prog, > > -data=>$db, > > -expect=>$e_val, > > -readmethod=>'SearchIO'); > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > >my $v = 1; > > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > > >while (my $input = $str->next_seq()){ > > #Blast a sequence against a database: > > #Alternatively, you could pass in a file with many > > #sequences rather than loop through sequence one at a time > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > >} > > > > > >Do you think there might still be something in the NCBI output format? > > > >Thank you, > >Guojun > > > > > > > > > >Guojun Yang > >Department of Plant Biology > >University of Georgia > >Tel: 706-542-1857 > >Fax: 706-542-1805 > >http://www.arches.uga.edu/~guojun > > > > > > > >----- Original Message ----- > >From: Chris Fields [mailto:cjfields at uiuc.edu] > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > > >>Sorry, forgot to add that I didn't see the regex issue that you > mentioned. > >>It could be a perl-related issue. Try the fixes I mentioned and see > what > >>happens. > >> > >> > >>>Christopher Fields > >>> > >>> > >>Postdoctoral Researcher - Switzer Lab > >>Dept. of Biochemistry > >>University of Illinois Urbana-Champaign > >> > >> > >>>>>-----Original Message----- > >>>>> > >>>>> > >>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>Sent: Tuesday, February 14, 2006 12:36 PM > >>>To: 'gyang at plantbio.uga.edu' > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > >>> > >>> > >>>>>It's a good habit to always add single quotes around words. The perl > >>>>> > >>>>> > >>>interpreter may think a single bare word is a subroutine or perlfunc > >>>called with no args so will try to find a subroutine named blastp(). > My > >>>debugger actually gives the error that the bare word blastp may > conflict > >>>with a future reserved word. Like you said, 'use strict' will point > that > >>>out. > >>> > >>> > >>>>>As for the regex, it should match all the blast programs at NCBI > (blastp, > >>>>> > >>>>> > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > >>>else passes through. > >>> > >>> > >>>>>So, if you are using the script below, there are several errors. The > bare > >>>>> > >>>>> > >>>words for $prog and $db need quotes, and the flags for you @params > array > >>>don't have a dash before them. I get this after adding quotes but > before > >>>adding the dashes to @params: > >>> > >>> > >>>>>C:\Perl\Scripts>test_blast.pl > >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>> > >>>>> > >>>MSG: > >>>STACK: Error::throw > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > >>>live/Bio/Root/Root.pm:328 > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > >>>live/Bio/Tools/Run/RemoteBlast.pm:256 > >>>STACK: C:\Perl\Scripts\test_blast.pl:15 > >>>----------------------------------------------------------- > >>> > >>> > >>>>>The last line indicates a problem with this line: > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>Changing the @params to this: > >>>>>my @params=( -prog=>$prog, > >>>>> > >>>>> > >>> -data=>$db, > >>> -expect=>$e_val, > >>> -readmethod=>'SearchIO'); > >>> > >>> > >>>>>fixes it, and I get output as expected. > >>>>>Christopher Fields > >>>>> > >>>>> > >>>Postdoctoral Researcher - Switzer Lab > >>>Dept. of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>>>>>>>-----Original Message----- > >>>>>>>> > >>>>>>>> > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > >>>>Sent: Tuesday, February 14, 2006 11:48 AM > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > >>>> > >>>>Hi, Chris, > >>>>When I tried with the perldoc script, It did not work either. First it > >>>>says $prog can not be bare word if I "use strict". I added quotes on > the > >>>>words, then it says the value for $prog does not match expression > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > >>>> > >>>> > >>>script > >>> > >>> > >>>>is shown below. Why is the expression "t?blast[pnx]"? > >>>> > >>>>#!/usr/bin/perl > >>>> > >>>>use Bio::SeqIO; > >>>>use Bio::Seq; > >>>>use Bio::Tools::Run::RemoteBlast; > >>>>use Bio::SearchIO; > >>>> > >>>> > >>>>my $prog=blastp; > >>>>my $db=swissprot; > >>>>my $e_val=1e-10; > >>>>my @params=( prog=>$prog, > >>>> data=>$db, > >>>> expect=>$e_val, > >>>> readmethod=>'SearchIO'); > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >>>> > >>>>my $v = 1; > >>>> > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > >>>> > >>>>while (my $input = $str->next_seq()){ > >>>> #Blast a sequence against a database: > >>>> #Alternatively, you could pass in a file with many > >>>> #sequences rather than loop through sequence one at a time > >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > >>>> #and swap the two lines below for an example of that. > >>>> my $r = $factory->submit_blast($input); > >>>> #my $r = $factory->submit_blast('amino.fa'); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) { > >>>> foreach my $rid ( @rids ) { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) { > >>>> if( $rc < 0 ) { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> sleep 5; > >>>> } else { > >>>> my $result = $rc->next_result(); > >>>> #save the output > >>>> my $filename = $result->query_name()."\.out"; > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>>> while ( my $hit = $result->next_hit ) { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> } > >>>>} > >>>> > >>>>Thank you for your help! > >>>> > >>>> > >>>>Guojun > >>>>Department of Plant Biology > >>>>University of Georgia > >>>> > >>>>----- Original Message ----- > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>To: gyang at plantbio.uga.edu > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>> > >>>> > >>>> > >>>> > >>>>>Try two things: > >>>>> > >>>>> > >>>>>>1) Use a much simpler script, like the one in 'perldoc > >>>>>> > >>>>>> > >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > >>>>> > >>>>> > >>>>wrong > >>>> > >>>> > >>>>>with the logic in your subroutine: > >>>>> > >>>>> > >>>>>>my $v = 1; > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > >>>>>>while (my $input = $str->next_seq()){ > >>>>>> > >>>>>> > >>>>> #Blast a sequence against a database: > >>>>> #Alternatively, you could pass in a file with many > >>>>> #sequences rather than loop through sequence one at a time > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > >>>>> #and swap the two lines below for an example of that. > >>>>> my $r = $factory->submit_blast($input); > >>>>> #my $r = $factory->submit_blast('amino.fa'); > >>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>> while ( my @rids = $factory->each_rid ) { > >>>>> foreach my $rid ( @rids ) { > >>>>> my $rc = $factory->retrieve_blast($rid); > >>>>> if( !ref($rc) ) { > >>>>> if( $rc < 0 ) { > >>>>> $factory->remove_rid($rid); > >>>>> } > >>>>> print STDERR "." if ( $v > 0 ); > >>>>> sleep 5; > >>>>> } else { > >>>>> my $result = $rc->next_result(); > >>>>> #save the output > >>>>> my $filename = $result->query_name()."\.out"; > >>>>> $factory->save_output($filename); > >>>>> $factory->remove_rid($rid); > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>>>> while ( my $hit = $result->next_hit ) { > >>>>> next unless ( $v > 0); > >>>>> print "\thit name is ", $hit->name, "\n"; > >>>>> while( my $hsp = $hit->next_hsp ) { > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>>> } > >>>>> } > >>>>> } > >>>>> } > >>>>> } > >>>>>} > >>>>> > >>>>> > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It > >>>>>> > >>>>>> > >>>really > >>> > >>> > >>>>>shouldn't make that much of a difference, but I noticed that the CVS > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > >>>>>released; the Bugzilla version is based off CVS. > >>>>> > >>>>> > >>>>>>Christopher Fields > >>>>>> > >>>>>> > >>>>>Postdoctoral Researcher - Switzer Lab > >>>>>Dept. of Biochemistry > >>>>>University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>>>>-----Original Message----- > >>>>>>> > >>>>>>> > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>Sent: Monday, February 13, 2006 3:00 PM > >>>>>>To: bioperl-l at lists.open-bio.org > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>> > >>>>>> > >>>>>>>>Thanks, Chris, > >>>>>>>> > >>>>>>>> > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the > >>>>>> > >>>>>> > >>>one > >>> > >>> > >>>>from > >>>> > >>>> > >>>>>>your bug report. The running version is 1.5 when I use the command > >>>>>> > >>>>>> > >>>you > >>> > >>> > >>>>>>sent me. But when I tried the script, it doesn't change much. My > >>>>>>remoteblast code (portion) is here: > >>>>>> > >>>>>> > >>>>>>>>sub search { > >>>>>>>> > >>>>>>>> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > >>>>>>local > >>>>>> > >>>>>> > >>>>>> > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > >>> > >>> > >>>>>>'no'; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", > >>>>>> -id=>"query", > >>>>>> -desc=>"new seq"); > >>>>>>my $len=$query->length(); > >>>>>>@db=('nr','htgs','wgs'); > >>>>>>foreach my $db (@db) { > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > >>>>>> '-data' =>"$db", > >>>>>> > >>>>>> > >>>>>> > >>'-expect'=>"$E_value"); > >> > >> > >>>>>>>>>>my $blast_report = $factory->submit_blast($query); > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>my @rids = $factory->each_rid(); > >>>>>>>> > >>>>>>>> > >>>>>>foreach my $rid ( @rids ) { > >>>>>> print STDERR "$rid\n"; > >>>>>>} > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) > >>>>>>print STDERR "waiting..."; > >>>>>>sleep 60; > >>>>>> > >>>>>> > >>>>>>>>foreach my $rid ( @rids ) { > >>>>>>>> > >>>>>>>> > >>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>> while (!ref($rc) ) { > >>>>>> if( $rc < 0 ) { > >>>>>># retrieve_blast returns -1 on error > >>>>>> $factory->remove_rid($rid); > >>>>>> print "Error!\n"; > >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > >>>>>> die "Can't retrieve $rid"; > >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > >>>>>> > >>>>>> > >>>finished' > >>> > >>> > >>>>>> sleep 60; > >>>>>> $rc = $factory->retrieve_blast($rid); > >>>>>> } > >>>>>> } > >>>>>> if (ref($rc)) { > >>>>>> print STDERR "Done.\n"; > >>>>>> while( my $result = $rc->next_result) { > >>>>>> while( my $hit = $result->next_hit()) { > >>>>>> $hit_name=$hit->name; > >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > >>>>>> $name=$1; > >>>>>> @left_plus_start=(); > >>>>>> @left_plus_end=(); > >>>>>> @left_minus_start=(); > >>>>>> @left_minus_end=(); > >>>>>> @right_plus_start=(); > >>>>>> @right_plus_end=(); > >>>>>> @right_minus_start=(); > >>>>>> @right_minus_end=(); > >>>>>> > >>>>>> > >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > >>>>>>>> > >>>>>>>> > >>>>>> while( my $hsp = $hit->next_hsp()) { > >>>>>>...... > >>>>>> > >>>>>> > >>>>>>>>It was working quite well before around October laster year, but > >>>>>>>> > >>>>>>>> > >>>>it has > >>>> > >>>> > >>>>>>stopped since then, When a submission is sent via a webpage, the cgi > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, > >>>>>> > >>>>>> > >>>>finally > >>>> > >>>> > >>>>>>the expected email is received but without real results although it > >>>>>> > >>>>>> > >>>>does > >>>> > >>>> > >>>>>>contain something from other parts of the script. Apparently the > >>>>>> > >>>>>> > >>>>search > >>>> > >>>> > >>>>>>sub did not return anything (I know there is something should be > >>>>>>returned.). Is it also possible the format of the NCBI output for > >>>>>> > >>>>>> > >>>each > >>> > >>> > >>>>>>result has changed? > >>>>>>Thank you, > >>>>>>Guojun > >>>>>> > >>>>>> > >>>>>>>>>>Department of Plant Biology > >>>>>>>>>> > >>>>>>>>>> > >>>>>>University of Georgia > >>>>>> > >>>>>> > >>>>>>>>>>>>----- Original Message ----- > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>> > >>>>>> > >>>>>>>>>>>How do you know two versions are installed (i.e. how are > >>>>>>>>>>> > >>>>>>>>>>> > >>>you > >>> > >>> > >>>>checking > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>version)? Do you see have two complete bioperl distributions (in > >>>>>>> > >>>>>>> > >>>>two > >>>> > >>>> > >>>>>>>separate directories) or are you looking in modules? Here's the > >>>>>>> > >>>>>>> > >>>way > >>> > >>> > >>>>to > >>>> > >>>> > >>>>>>>check the version (from the FAQ): > >>>>>>> > >>>>>>> > >>>>>>>>perl -MBio::Root::Version -e 'print > >>>>>>>> > >>>>>>>> > >>>>$Bio::Root::Version::VERSION,"\n"' > >>>> > >>>> > >>>>>>>>If you have two full bioperl distributions on your computer, > >>>>>>>> > >>>>>>>> > >>>>normally > >>>> > >>>> > >>>>>>only > >>>>>> > >>>>>> > >>>>>>>one will be in use unless you have explicitly set the environment > >>>>>>> > >>>>>>> > >>>>>>variable > >>>>>> > >>>>>> > >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first before > >>>>>>> > >>>>>>> > >>>>your > >>>> > >>>> > >>>>>>>normal perl directory list (@INC) is searched. You MAY get some > >>>>>>> > >>>>>>> > >>>>mixing > >>>> > >>>> > >>>>>>>then, but only if perl can't find a particular module in the path > >>>>>>> > >>>>>>> > >>>>>>designated > >>>>>> > >>>>>> > >>>>>>>in PERL5LIB; then it will progress through the directories listed > >>>>>>> > >>>>>>> > >>>in > >>> > >>> > >>>>>>@INC. > >>>>>> > >>>>>> > >>>>>>>This may happen if a module is unique to a particular release, but > >>>>>>> > >>>>>>> > >>>>>>shouldn't > >>>>>> > >>>>>> > >>>>>>>happen for the majority of modules, including RemoteBlast. You > >>>>>>> > >>>>>>> > >>>can > >>> > >>> > >>>>>>check > >>>>>> > >>>>>> > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > >>>>>>> > >>>>>>> > >>>>differ > >>>> > >>>> > >>>>>>>depending on your OS, perl build, etc. > >>>>>>> > >>>>>>> > >>>>>>>>Regardless, if you follow the directions for installing bioperl > >>>>>>>> > >>>>>>>> > >>>>for > >>>> > >>>> > >>>>>>your > >>>>>> > >>>>>> > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', > >>>>>>> > >>>>>>> > >>>>unless > >>>> > >>>> > >>>>>>you > >>>>>> > >>>>>> > >>>>>>>explicitly change the installation directory when using 'perl > >>>>>>> > >>>>>>> > >>>>>>Makefile.PL'), > >>>>>> > >>>>>> > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will > >>>>>>> > >>>>>>> > >>>>install > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>Bioperl distribution you downloaded over the old version in @INC. > >>>>>>> > >>>>>>> > >>>>See > >>>> > >>>> > >>>>>>this > >>>>>> > >>>>>> > >>>>>>>page: > >>>>>>> > >>>>>>> > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > >>>>>>>>for more details. > >>>>>>>>Christopher Fields > >>>>>>>> > >>>>>>>> > >>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>Dept. of Biochemistry > >>>>>>>University of Illinois Urbana-Champaign > >>>>>>> > >>>>>>> > >>>>>>>>>>-----Original Message----- > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM > >>>>>>>>To: bioperl-l at lists.open-bio.org > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Hi, Chris, > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>I do have different versions of bioperl on my Linux machine > >>>>>>>> > >>>>>>>> > >>>(1.4. > >>> > >>> > >>>>and > >>>> > >>>> > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- > >>>>>>>> > >>>>>>>> > >>>>1.5.1 > >>>> > >>>> > >>>>>>or I > >>>>>> > >>>>>> > >>>>>>>>need to uninstall and remove the previous versions. I could not > >>>>>>>> > >>>>>>>> > >>>>find > >>>> > >>>> > >>>>>>any > >>>>>> > >>>>>> > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me > >>>>>>>> > >>>>>>>> > >>>>some > >>>> > >>>> > >>>>>>>>suggestion? > >>>>>>>>Thanks, > >>>>>>>>Guojun > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Department of Plant Biology > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>University of Georgia > >>>>>>>> _____ > >>>>>>>> > >>>>>>>> > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>updated from CVS > >>>>>> > >>>>>> > >>>>>>>>which isn't the latest fix. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Make sure that you check the following: > >>>>>>>>>>1) Always post to the mailing list: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > >>>>>>>> > >>>>>>>> > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live > >>>>>>>>>> > >>>>>>>>>> > >>>>(CVS) > >>>> > >>>> > >>>>>>>>installed first. Perform a clean installation; do not upgrade > >>>>>>>> > >>>>>>>> > >>>>only > >>>> > >>>> > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > >>>>>>>> > >>>>>>>> > >>>can't > >>> > >>> > >>>>>>>>guarantee that mixing modules from old and new distributions > >>>>>>>> > >>>>>>>> > >>>(1.4 > >>> > >>> > >>>>and > >>>> > >>>> > >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be > >>>>>>>> > >>>>>>>> > >>>>saved > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI > >>>>>>>> > >>>>>>>> > >>>>>>(v2.2.13) > >>>>>> > >>>>>> > >>>>>>>>but it should still save it. I believe as long as next_results() > >>>>>>>> > >>>>>>>> > >>>>isn't > >>>> > >>>> > >>>>>>>>called, it will work. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST > >>>>>>>>>> > >>>>>>>>>> > >>>2.2.13 > >>> > >>> > >>>>>>text output > >>>>>> > >>>>>> > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by > >>>>>>>> > >>>>>>>> > >>>Roger > >>> > >>> > >>>>Hall > >>>> > >>>> > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be > >>>>>>>> > >>>>>>>> > >>>>(Jason > >>>> > >>>> > >>>>>>or > >>>>>> > >>>>>> > >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in > >>>>>>>> > >>>>>>>> > >>>>>>Bugzilla: > >>>>>> > >>>>>> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the > >>>>>>>>>> > >>>>>>>>>> > >>>>option > >>>> > >>>> > >>>>>>of > >>>>>> > >>>>>> > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using > >>>>>>>> > >>>>>>>> > >>>>this > >>>> > >>>> > >>>>>>>>option. And, remember, they haven't been committed yet to CVS, > >>>>>>>> > >>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>>>means that the final version will change to refle the new > >>>>>>>> > >>>>>>>> > >>>version. > >>> > >>> > >>>>>>>>>>>>Christopher Fields > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>Dept. of Biochemistry > >>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>> _____ > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM > >>>>>>>>To: Chris Fields > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>Hi, Chris > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work > >>>>>>>>>> > >>>>>>>>>> > >>>>for > >>>> > >>>> > >>>>>>my cgi > >>>>>> > >>>>>> > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't > >>>>>>>> > >>>>>>>> > >>>>even > >>>> > >>>> > >>>>>>get > >>>>>> > >>>>>> > >>>>>>>>any RID. Is there any suggestion? > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>>>Guojun > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>Guojun Yang > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Department of Plant Biology > >>>>>>>>University of Georgia > >>>>>>>>Tel: 706-542-1857 > >>>>>>>>Fax: 706-542-1805 > >>>>>>>>http://www.arches.uga.edu/~guojun > >>>>>>>> _____ > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>I would say give the new code a try, but realize that it > >>>>>>>>>> > >>>>>>>>>> > >>>>hasn't > >>>> > >>>> > >>>>>>been > >>>>>> > >>>>>> > >>>>>>>>checked > >>>>>>>>in (like I said below). I will try going over the modified > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is > >>>>>>>> > >>>>>>>> > >>>>anything I > >>>> > >>>> > >>>>>>>>might > >>>>>>>>have missed. The changed order in the header of BLAST text > >>>>>>>> > >>>>>>>> > >>>output > >>> > >>> > >>>>has > >>>> > >>>> > >>>>>>me a > >>>>>> > >>>>>> > >>>>>>>>bit worried that it might not catch everything, but it at least > >>>>>>>> > >>>>>>>> > >>>>>>doesn't > >>>>>> > >>>>>> > >>>>>>>>hang > >>>>>>>>in the while() loop I described in the bug report below (bug > >>>>>>>> > >>>>>>>> > >>>>#1934) > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>seems to process everything fine. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>If you want more stability in the code, you might consider > >>>>>>>>>> > >>>>>>>>>> > >>>>>>changing over > >>>>>> > >>>>>> > >>>>>>>>to > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are > >>>>>>>> > >>>>>>>> > >>>>some > >>>> > >>>> > >>>>>>>>changes > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > >>>>>>>> > >>>>>>>> > >>>>saving > >>>> > >>>> > >>>>>>XML > >>>>>> > >>>>>> > >>>>>>>>output, but I believe it parses everything regardless. If you > >>>>>>>> > >>>>>>>> > >>>look > >>> > >>> > >>>>>>back > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>>last month or so there has been a bit of discussion here about > >>>>>>>> > >>>>>>>> > >>>it. > >>> > >>> > >>>>>>Jason > >>>>>> > >>>>>> > >>>>>>>>describes a bit on how to set up RemoteBlast for XML: > >>>>>>>> > >>>>>>>> > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- > >>>>>>>>>> > >>>>>>>>>> > >>>>>>remoteblast/ > >>>>>> > >>>>>> > >>>>>>>>>>Christopher Fields > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>Dept. of Biochemistry > >>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>-----Original Message----- > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM > >>>>>>>>>To: bioperl-l at bioperl.org > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>>> > >>>>>>>>> > >>>>version > >>>> > >>>> > >>>>>>1.28 > >>>>>> > >>>>>> > >>>>>>>>>Hi, Everybody, > >>>>>>>>>I see this post and am wondering if this is the reason for the > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named > >>>>>>>>> > >>>>>>>>> > >>>>MAK, > >>>> > >>>> > >>>>>>for > >>>>>> > >>>>>> > >>>>>>>>MITE > >>>>>>>> > >>>>>>>> > >>>>>>>>>sequence analysis. It was working very well until around > >>>>>>>>> > >>>>>>>>> > >>>>November > >>>> > >>>> > >>>>>>2005, > >>>>>> > >>>>>> > >>>>>>>>>when it stopped returning any result (the site is fine and > >>>>>>>>> > >>>>>>>>> > >>>seems > >>> > >>> > >>>>to > >>>> > >>>> > >>>>>>be > >>>>>> > >>>>>> > >>>>>>>>>doing sth after submission). In the CGI script, I used > >>>>>>>>> > >>>>>>>>> > >>>>remoteblast > >>>> > >>>> > >>>>>>(that > >>>>>> > >>>>>> > >>>>>>>>>work was done in 2003) to do searches. I currently do not have > >>>>>>>>> > >>>>>>>>> > >>>>>>access to > >>>>>> > >>>>>> > >>>>>>>>>the server because I moved. Quite several people sent emails > >>>>>>>>> > >>>>>>>>> > >>>to > >>> > >>> > >>>>us > >>>> > >>>> > >>>>>>about > >>>>>> > >>>>>> > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the > >>>>>>>>> > >>>>>>>>> > >>>>problem? > >>>> > >>>> > >>>>>>>>Should > >>>>>>>> > >>>>>>>> > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new > >>>>>>>>> > >>>>>>>>> > >>>>version? > >>>> > >>>> > >>>>>>>>>Thanks a lot, > >>>>>>>>>Guojun > >>>>>>>>> > >>>>>>>>>Department of Plant Biology > >>>>>>>>>University of Georgia > >>>>>>>>>Tel: 706-542-1857 > >>>>>>>>>Fax: 706-542-1805 > >>>>>>>>>http://www.arches.uga.edu/~guojun > >>>>>>>>>_____ > >>>>>>>>> > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > >>>>>>>>> > >>>>>>>>> > >>>>Jian' > >>>> > >>>> > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > >>>>>>>>> > >>>>>>>>> > >>>[mailto:bioperl- > >>> > >>> > >>>>>>>>>l at bioperl.org] > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > >>>>>>>>> > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live > >>>>>>>>> > >>>>>>>>> > >>>>CVS. > >>>> > >>>> > >>>>>>It > >>>>>> > >>>>>> > >>>>>>>>>will > >>>>>>>>>work for saving text output. However, it will not parse > >>>>>>>>> > >>>>>>>>> > >>>anything > >>> > >>> > >>>>>>using > >>>>>> > >>>>>> > >>>>>>>>>next_result (it will likely hang) and will not save XML > >>>>>>>>> > >>>>>>>>> > >>>format. > >>> > >>> > >>>>See > >>>> > >>>> > >>>>>>>>these > >>>>>>>> > >>>>>>>> > >>>>>>>>>bugs: > >>>>>>>>> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > >>>>>>>>> > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast > >>>>>>>>> > >>>>>>>>> > >>>and > >>> > >>> > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in > >>>>>>>>> > >>>>>>>>> > >>>>yet > >>>> > >>>> > >>>>>>so > >>>>>> > >>>>>> > >>>>>>>>are > >>>>>>>> > >>>>>>>> > >>>>>>>>>still not included in bioperl-live; they may be further > >>>>>>>>> > >>>>>>>>> > >>>modified > >>> > >>> > >>>>>>before > >>>>>> > >>>>>> > >>>>>>>>>committing to CVS. If you're not worried about XML, you could > >>>>>>>>> > >>>>>>>>> > >>>>just > >>>> > >>>> > >>>>>>try > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>first fix, which is a change to SearchIO::blast. > >>>>>>>>> > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a > >>>>>>>>> > >>>>>>>>> > >>>>>>script > >>>>>> > >>>>>> > >>>>>>>>>which > >>>>>>>>>had problems; the script you used saves the output but doesn't > >>>>>>>>> > >>>>>>>>> > >>>>>>actually > >>>>>> > >>>>>> > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the > >>>>>>>>> > >>>>>>>>> > >>>>data). > >>>> > >>>> > >>>>>>Is > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have > >>>>>>>>> > >>>>>>>>> > >>>you > >>> > >>> > >>>>>>tried > >>>>>> > >>>>>> > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- > >>>>>>>>> > >>>>>>>>> > >>>>readmethod > >>>> > >>>> > >>>>>>=> > >>>>>> > >>>>>> > >>>>>>>>>blast" > >>>>>>>>>using your version of RemoteBlast and method next_result()? > >>>>>>>>> > >>>>>>>>> > >>>Like > >>> > >>> > >>>>>>below > >>>>>> > >>>>>> > >>>>>>>>>(from > >>>>>>>>>perldoc): > >>>>>>>>> > >>>>>>>>>while ( my @rids = $factory->each_rid ) { > >>>>>>>>>foreach my $rid ( @rids ) { > >>>>>>>>>my $rc = $factory->retrieve_blast($rid); > >>>>>>>>>if( !ref($rc) ) { > >>>>>>>>>if( $rc < 0 ) { > >>>>>>>>>$factory->remove_rid($rid); > >>>>>>>>>} > >>>>>>>>>print STDERR "." if ( $v > 0 ); > >>>>>>>>>sleep 5; > >>>>>>>>>} else { # parsing > >>>>>>>>>starts here > >>>>>>>>>my $result = $rc->next_result(); # it should hang > >>>>>>>>>here > >>>>>>>>>#save the output > >>>>>>>>>my $filename = $result->query_name()."\.out"; > >>>>>>>>>$factory->save_output($filename); > >>>>>>>>>$factory->remove_rid($rid); > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; > >>>>>>>>>while ( my $hit = $result->next_hit ) { > >>>>>>>>>next unless ( $v > 0); > >>>>>>>>>print "\thit name is ", $hit->name, "\n"; > >>>>>>>>>while( my $hsp = $hit->next_hsp ) { > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>My script hanged if I used next_result() in any way prior to > >>>>>>>>> > >>>>>>>>> > >>>the > >>> > >>> > >>>>>>fixes. > >>>>>> > >>>>>> > >>>>>>>>I > >>>>>>>> > >>>>>>>> > >>>>>>>>>want to see how many others are having the same issues with > >>>>>>>>> > >>>>>>>>> > >>>>parsing > >>>> > >>>> > >>>>>>>>using > >>>>>>>> > >>>>>>>> > >>>>>>>>>the CVS version of bioperl-live. > >>>>>>>>> > >>>>>>>>>Christopher Fields > >>>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>>Dept. of Biochemistry > >>>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>-----Original Message----- > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > >>>>>>>>>> > >>>>>>>>>> > >>>l- > >>> > >>> > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM > >>>>>>>>>>To: Huang Jian; bioperl-l > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > >>>>>>>>>> > >>>>>>>>>>Hi Huang, > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm > >>>>>>>>>> > >>>>>>>>>> > >>>>works > >>>> > >>>> > >>>>>>on > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>>logic of checking the temporary file size to determine > >>>>>>>>>> > >>>>>>>>>> > >>>whether > >>> > >>> > >>>>the > >>>> > >>>> > >>>>>>>>Blast > >>>>>>>> > >>>>>>>> > >>>>>>>>>>results are ready. This condition is not getting satisfied > >>>>>>>>>> > >>>>>>>>>> > >>>may > >>> > >>> > >>>>be > >>>> > >>>> > >>>>>>due > >>>>>> > >>>>>> > >>>>>>>>to > >>>>>>>> > >>>>>>>> > >>>>>>>>>>some changes brought about by NCBI. I had this problem > >>>>>>>>>> > >>>>>>>>>> > >>>>recently > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>>>figured out that the solution was to use the latest version > >>>>>>>>>> > >>>>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>has > >>>>>> > >>>>>> > >>>>>>>>>>this problem fixed (does not use file size logic any more) > >>>>>>>>>> > >>>>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>is > >>>>>> > >>>>>> > >>>>>>>>not > >>>>>>>> > >>>>>>>> > >>>>>>>>>>yet included in the BioPerl package. > >>>>>>>>>>Cheers > >>>>>>>>>>Nagesh > >>>>>>>>>> > >>>>>>>>>>Huang Jian wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>Dear Nagesh, > >>>>>>>>>>> > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>you > >>>> > >>>> > >>>>>>send > >>>>>> > >>>>>> > >>>>>>>>>>>me. Now it works perfectly!!! > >>>>>>>>>>> > >>>>>>>>>>>Thank you!! > >>>>>>>>>>> > >>>>>>>>>>>Huang > >>>>>>>>>>> > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" > >>>>>>>>>>> > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" > >>>>>>>>>>> > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the > >>>>>>>>>>> > >>>>>>>>>>> > >>>net, > >>> > >>> > >>>>so > >>>> > >>>> > >>>>>>still > >>>>>> > >>>>>> > >>>>>>>>>>>via email > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>Hi Huang, > >>>>>>>>>>>>I see that you are submitting a sequence for a remote > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>blast > >>> > >>> > >>>>>>search. > >>>>>> > >>>>>> > >>>>>>>>>Can > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>(2005/12/09). > >>>>>> > >>>>>> > >>>>>>>>If > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>not I have attached it with this email, try to replace it > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>with > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>>>old > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>one which has a bug. > >>>>>>>>>>>>Let me know if it works. > >>>>>>>>>>>>Nagesh > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>_______________________________________________ > >>>>>>>>>>Bioperl-l mailing list > >>>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>_______________________________________________ > >>>>>> > >>>>>> > >>>>>>>>Bioperl-l mailing list > >>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>>_______________________________________________ > >>>>>>>> > >>>>>>>> > >>>>>>Bioperl-l mailing list > >>>>>>Bioperl-l at lists.open-bio.org > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From valiente at lsi.upc.edu Mon Feb 20 13:51:35 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Mon, 20 Feb 2006 19:51:35 +0100 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy Message-ID: <43FA0FB7.6060904@lsi.upc.edu> The local flat file implementation of Bio::DB::Taxonomy seems to be fine: use Bio::DB::Taxonomy; my $nodesfile = "nodes.dmp"; my $namesfile = "names.dmp"; my $db = new Bio::DB::Taxonomy(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); my $taxonid = $db->get_taxonid('Homo sapiens'); Here, $taxonid is 9606. However, my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid); raises: -------------------- WARNING --------------------- MSG: can't create a species object for Homo sapiens (human) because it isn't a species but is a '' instead --------------------------------------------------- Thanks, Gabriel From boris.steipe at utoronto.ca Mon Feb 20 13:40:19 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon, 20 Feb 2006 13:40:19 -0500 Subject: [Bioperl-l] Matrix Average Code / Module ? In-Reply-To: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> Message-ID: <92CF0104-0524-4BA3-B039-3CEECF68E20B@utoronto.ca> Assuming you mean the arithmetic average of all elements in a matrix, you could do the following (using your numbers): #!/usr/bin/perl -w use strict; my @matrix; push(@matrix, [(11,22,43,54,50)]); # [(...)] :a list passed as an anonymous array push(@matrix, [(27,87,74,32,10)]); push(@matrix, [(66,58,98,78,20)]); push(@matrix, [(22,23,44,16,34)]); my $sum = 0; my $number = 0; foreach my $row (@matrix) { foreach my $element (@{$row}){ $sum += $element; $number++; } } print "Average of $number elements = ", $sum/$number,"\n"; exit; HTH, B. On 20 Feb 2006, at 01:21, Shameer Khadar wrote: > Hi all, > Is there any program/module to calculate the average of a blosum/ > pam any > matrix ? > > I have a matrix and I need to see the average > > for example > > 11 22 43 54 50 > 27 87 74 32 10 > 66 58 98 78 20 > 22 23 44 16 34 > > I have gone through Bio::Matrix::MatrixI and > Bio::Matrix::GenericMatrix > and other perl modules like Math::Matrix > http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm > and Math::Cephes::Matrix - but none of them have a provison to do > matrix > average calculation. > > Any help ??? > thanks in advance, > Happy biocomputing !!! > > > -- > Shameer Khadar > National Centre for Biological Sciences (TIFR) > UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India > T - 91-080-23636420-32 EXT 4241 > F - 91-080-23636662/23636675 > W - http://www.ncbs.res.in > -------------------------------------------------- > "Refrain from illusions, insist on work and not words, > patiently seek divine and scientific truth." > MM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Feb 20 17:01:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Feb 2006 16:01:15 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28 In-Reply-To: <000e01c6363f$494bc5e0$15327e82@pyrimidine> Message-ID: <000001c63669$2bf06a80$15327e82@pyrimidine> Guojun Yang pointed out that his BLAST output was still not parsed correctly, so I posted another change: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 The direct link for the module is: http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view Note that all caveats (can't sue if computer blows up, this is a very preliminary bugfix, etc.) apply. Apparently, NCBI has changed blastn and tblastx output to show features in the region for each HSP, starting with the either one of the following lines: Features in this part of subject sequence: Features flanking this part of subject sequence: If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13, most blastn or tblastx report parsing seems to choke on these lines, unless you are pretty lucky. This extra little feature was introduced a while back for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default and hadn't starting affecting web output until this last fall. The first fix I posted caught only the first version but not the second The fix included a loop with debugging output to bypass this for now. If you use SearchIO directly for parsing (not through RemoteBlast) you can see the bypassed lines by setting the '-verbose' flag to 1. Thanks to Guojun Yang for pointing this out. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Monday, February 20, 2006 11:01 AM > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on > RemoteBlast.pmversion 1.28 > > I have added a preliminary bugfix for the problems seen with nucleotide > blast parsing for BLAST 2.2.13 reports. I passed SearchIO::blast through > perltidy to space out the blocks (really for my own purposes; it's a > pretty > complex module). The fix bypasses the extra lines output for blastn and > tblastx and now seems to parse the text output for those reports > correctly. > I tested it using all NCBI BLAST flavors for the last two version of BLAST > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel. It has only been > tested on MacOSX at the moment, so I need people out there to test it out > on > anything they can to make sure it works before committing. I'll be trying > it on Windows today. Report back to me and I'll post anything on > bugzilla. > > Here it is: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Bioch