From Russell.Smithies at agresearch.co.nz Sun Mar 1 18:06:22 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 2 Mar 2009 12:06:22 +1300 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> Works fine for me using the code below. Took about 30 seconds to return a result. ============================================== #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast $prog = "tblastx"; $db = "nr"; $e_val = "1e-10"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the balst. $infile = 'infile.fasta'; $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $remoteBlast->save_output($filename); $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } ================================================================= > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ocar Campos > Sent: Friday, 27 February 2009 9:29 a.m. > To: Scott Markel > Cc: Bioperl Mailing List. > Subject: Re: [Bioperl-l] Remote Blast and Report > > Hello, > > I was reading the documentation, and I tried some new code, but when > retrieving I get an error, this is the error: Can't call method "query_name" > on an undefined value at ./aer2.pl line 44, line 185. I'm working > with only one sequence first, but then I am suppose to work with more than > 50 sequences. Here is my code, that looks quite much as the one in the > documentation and some examples I found. Any idea or help that you could > give me please? > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > #Here i set the parameters for blast > $prog = "tblastx"; > $db = "nr"; > $e_val = "1e-10"; > $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, > -data => $db, > -expect => $e_val > -readmethod => 'txt' > ); > > #Select the file and make the balst. > $infile = infile.fasta'; > $r = $remoteBlast->submit_blast($infile); > > #Here i was suppose to get the blast report. > while (@reqIDs = $remoteBlast->each_rid ) > { > print STDERR join(" ", "\nINFO RIDs: ", @reqIDs), "\n"; > > foreach $reqID (@reqIDs) > { > $rc = $remoteBlast->retrieve_blast($reqID); #With this I should get the > report. > if( !ref($rc) ) > { > if( $rc < 0 ) { #If there's no hits. > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else > { > $result = $rc->next_result(); > $filename = $result->query_name()."\.out"; #this should save the > report, but here i get error. > $remoteBlast->save_output($filename); > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > > > } > > } > > > 2009/2/25 Scott Markel > > > O'car, > > > > There's a polling mechanism you need to use. See the example in the > > Bio::Tools::Run::RemoteBlast module. Start looking around line 60. > > > > Scott > > > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: smarkel at accelrys.com > > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > > San Diego, CA 92121 fax: +1 858 799 5222 > > USA web: http://www.accelrys.com > > > > http://www.linkedin.com/in/smarkel > > Vice President, Board of Directors: > > International Society for Computational Biology > > Co-chair: ISCB Publications Committee > > Associate Editor: PLoS Computational Biology > > Editorial Board: Briefings in Bioinformatics > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Ocar Campos > > > Sent: Wednesday, 25 February 2009 4:04 PM > > > To: Bioperl Mailing List. > > > Subject: [Bioperl-l] Remote Blast and Report > > > > > > Hello: > > > > > > I'm working in a script to remote blast a file with some sequences, I > > > already got the part of sending the query to blast, but I do not get the > > > idea of how retrieve a txt report, I mean, like the one you get by > > running > > > a > > > blast via web and you can read in a plane text editor. > > > > > > This is what I've done so far: > > > > > > > > > use Bio::Tools::Run::RemoteBlast; > > > use Bio::SearchIO; > > > > > > $prog = "tblastx"; > > > $db = "nr"; > > > $e_val = "1e-10"; > > > $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, > > > -data => $db, > > > -expect => $e_val > > > -readmethod => 'Blast'); > > > > > > #I select the file to make que query and do the blast. > > > $infile = 'file.input.fasta'; > > > $r = $remoteBlast->submit_blast($infile); > > > > > > #this should be the report i get. > > > $outfile = 'got.output'; > > > > > > further than this I've tried some things but none of them work, anybody > > > who > > > could give an idea of how retrieving the plane text reports please? > > > > > > Cheers. > > > > > > O'car > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ...the pain is momentary, the glory is forever... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ocarnorsk138 at gmail.com Sun Mar 1 19:51:01 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Sun, 1 Mar 2009 21:51:01 -0300 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> Message-ID: Hello, I tried your script Russel, but I still got the same error in the console, "Can't call method "query_name" on an undefined value at ./aer2.pl line 39, line 185.", I didn't do anything to it, just copy/paste and ran it. So what I assumed is that the object for the SearchIO module was not initialized, so I created it, but now I get an Exeption while parsing the report: ------------- EXCEPTION ------------- MSG: Could not open Bio::SearchIO::blast=HASH(0x8bb79bc): Doesn't exist the file or directory. STACK Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.8.8/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.8/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO.pm:167 STACK toplevel ./aer2.pl:45 -------------------------------------- aer2.pl is my script, Any Idea what it could be? The script: #!/usr/bin/perl use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; ########HERE I SET THE PARAMETERS $prog = "tblastx"; $db = "nr"; $e_val = "1e-10"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); ########SELECT FILE AND RUN THE BLAST. $infile = 'secuencia.fasta'; $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting...\n" if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); ###I RETRIEVE THE REPORT. print $rc, "\n"; ########ONLY FOR CHECKING if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { ##########HERE I CREATE THE SEARCHIO OBJECT FOR WORKING WITH THE REPORT $report = new Bio::SearchIO (-format => 'blast', -file => $rc #########$rc SHOULD CONTAIN THE REPORT ); my $result = $report->next_result(); #########SAVE THE OUTPUT my $filename = $result->query_name()."\.out"; $remoteBlast->save_output($filename); $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } Thanks in advance. Cheers. O'car. From jason at bioperl.org Sun Mar 1 20:19:50 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 1 Mar 2009 17:19:50 -0800 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> Message-ID: <053E9CBD-5784-47F7-8F3F-255449752AC6@bioperl.org> I think $rc is already a Bio::SearchIO object - you shouldn't have to instantiate a Bio::SearchIO object that is what the -readmethod in the @params at the top is for. Try printing ref($ref) to see what it is. -js On Mar 1, 2009, at 4:51 PM, Ocar Campos wrote: > Hello, I tried your script Russel, but I still got the same error in > the > console, "Can't call method "query_name" on an undefined value at ./ > aer2.pl > line 39, line 185.", I didn't do anything to it, just copy/ > paste and > ran it. So what I assumed is that the object for the SearchIO module > was not > initialized, so I created it, but now I get an Exeption while > parsing the > report: > > > ------------- EXCEPTION ------------- > MSG: Could not open Bio::SearchIO::blast=HASH(0x8bb79bc): Doesn't > exist the > file or directory. > STACK Bio::Root::IO::_initialize_io > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/IO.pm:273 > STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ > IO.pm:213 > STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/ > SearchIO.pm:135 > STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/ > SearchIO.pm:167 > STACK toplevel ./aer2.pl:45 > > -------------------------------------- > > aer2.pl is my script, Any Idea what it could be? > > The script: > > #!/usr/bin/perl > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use Data::Dumper; > > ########HERE I SET THE PARAMETERS > $prog = "tblastx"; > $db = "nr"; > $e_val = "1e-10"; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > ########SELECT FILE AND RUN THE BLAST. > $infile = 'secuencia.fasta'; > $r = $remoteBlast->submit_blast($infile); > > my $v = 1; > > print STDERR "waiting...\n" if( $v > 0 ); ######## WAIT FOR THE > RESULTS > TO RETURN!!!!! > while ( my @rids = $remoteBlast->each_rid ) > { > foreach my $rid ( @rids ) > { > my $rc = $remoteBlast->retrieve_blast($rid); ###I RETRIEVE THE > REPORT. > print $rc, "\n"; ########ONLY FOR CHECKING > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else > { > ##########HERE I CREATE THE SEARCHIO OBJECT FOR WORKING WITH > THE > REPORT > $report = new Bio::SearchIO (-format => 'blast', > -file => $rc #########$rc > SHOULD > CONTAIN THE REPORT > ); > my $result = $report->next_result(); > #########SAVE THE OUTPUT > my $filename = $result->query_name()."\.out"; > $remoteBlast->save_output($filename); > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > > > Thanks in advance. > Cheers. > O'car. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From ocarnorsk138 at gmail.com Sun Mar 1 20:35:37 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Sun, 1 Mar 2009 22:35:37 -0300 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: <053E9CBD-5784-47F7-8F3F-255449752AC6@bioperl.org> References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> <053E9CBD-5784-47F7-8F3F-255449752AC6@bioperl.org> Message-ID: Hello Jason, I printed the ref($ref), this is what i got: Bio::SearchIO::blast I'm going to update from version 1.4.2 of bioperl to 1.6.0, maybe that's why is not working. O'car 2009/3/1 Jason Stajich > I think $rc is already a Bio::SearchIO object - you shouldn't have to > instantiate a Bio::SearchIO object that is what the -readmethod in the > @params at the top is for. > > Try printing ref($ref) to see what it is. > > -js > > On Mar 1, 2009, at 4:51 PM, Ocar Campos wrote: > > Hello, I tried your script Russel, but I still got the same error in the >> console, "Can't call method "query_name" on an undefined value at >> ./aer2.pl >> line 39, line 185.", I didn't do anything to it, just copy/paste >> and >> ran it. So what I assumed is that the object for the SearchIO module was >> not >> initialized, so I created it, but now I get an Exeption while parsing the >> report: >> >> >> ------------- EXCEPTION ------------- >> MSG: Could not open Bio::SearchIO::blast=HASH(0x8bb79bc): Doesn't exist >> the >> file or directory. >> STACK Bio::Root::IO::_initialize_io >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/IO.pm:273 >> STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.8/Bio/Root/IO.pm:213 >> STACK Bio::SearchIO::new >> /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO.pm:135 >> STACK Bio::SearchIO::new >> /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO.pm:167 >> STACK toplevel ./aer2.pl:45 >> >> -------------------------------------- >> >> aer2.pl is my script, Any Idea what it could be? >> >> The script: >> >> #!/usr/bin/perl >> >> use Bio::Tools::Run::RemoteBlast; >> use Bio::SearchIO; >> use Data::Dumper; >> >> ########HERE I SET THE PARAMETERS >> $prog = "tblastx"; >> $db = "nr"; >> $e_val = "1e-10"; >> >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO' ); >> >> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >> >> >> ########SELECT FILE AND RUN THE BLAST. >> $infile = 'secuencia.fasta'; >> $r = $remoteBlast->submit_blast($infile); >> >> my $v = 1; >> >> print STDERR "waiting...\n" if( $v > 0 ); ######## WAIT FOR THE RESULTS >> TO RETURN!!!!! >> while ( my @rids = $remoteBlast->each_rid ) >> { >> foreach my $rid ( @rids ) >> { >> my $rc = $remoteBlast->retrieve_blast($rid); ###I RETRIEVE THE >> REPORT. >> print $rc, "\n"; ########ONLY FOR CHECKING >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $remoteBlast->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } >> else >> { >> ##########HERE I CREATE THE SEARCHIO OBJECT FOR WORKING WITH THE >> REPORT >> $report = new Bio::SearchIO (-format => 'blast', >> -file => $rc #########$rc SHOULD >> CONTAIN THE REPORT >> ); >> my $result = $report->next_result(); >> #########SAVE THE OUTPUT >> my $filename = $result->query_name()."\.out"; >> $remoteBlast->save_output($filename); >> $remoteBlast->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), "\n"; >> while ( my $hit = $result->next_hit ) >> { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) >> { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> } >> >> >> Thanks in advance. >> Cheers. >> O'car. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Jason Stajich > jason at bioperl.org > > > > -- O'car Campos C. Bioinformatics Engineering Student. Universidad de Talca. From ocarnorsk138 at gmail.com Sun Mar 1 21:29:40 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Sun, 1 Mar 2009 23:29:40 -0300 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> <053E9CBD-5784-47F7-8F3F-255449752AC6@bioperl.org> Message-ID: Hello Everybody! Finally I got the script working, It was a problem of the version of my BioPerl, I updated to 1.6 and now the script is working fine, I use the one that Russel send, that is this one, and got no problem #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast $prog = "tblastx"; $db = "nr"; $e_val = "1e-10"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the balst. $infile = 'infile.fasta'; $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $remoteBlast->save_output($filename); $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } Thanks Everybody for the help, and the Ideas! Cheers! -- O'car Campos C. Bioinformatics Engineering Student. Universidad de Talca. From abhishek.vit at gmail.com Mon Mar 2 02:09:42 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 2 Mar 2009 02:09:42 -0500 Subject: [Bioperl-l] Implementing Logistic Regression for data anaylsis Message-ID: Hi All Does any one know if there is any logistic regression module in Bioperl or Perl for that matter. Cant find anything relevant on CPAN. I want to analyze some data with this Algorithm. Thanks, -Abhi From vecchi.b at gmail.com Mon Mar 2 13:53:41 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Mon, 2 Mar 2009 16:53:41 -0200 Subject: [Bioperl-l] Offering to help Message-ID: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> Hi, I've been using BioPerl for a couple of years now, so first of all, thanks to all the devs that made it possible. Although I have only used a fraction of it, I think I am now acquainted with it enough to be useful in some way. I offer then to help with any issue that needs taken care of. I have written a wrapper for the Qcons ( http://tsailab.tamu.edu/Qcons/) application (calculates protein-protein contacts), you can look at the code at my github repository ( http://github.com/brunoV/qcons/tree/master). Don't know if it will be useful or not (or any of the other modules that I've written for my personal use, feel free to look). If there is anything else that you feel that I would be able to do, I'd be happy to. My current field is more involved with wet-lab work (antibody engineering), but I like programming a lot and I think this will be a great chance to keep learning while still contributing. Cheers, Bruno. From alperyilmaz at gmail.com Mon Mar 2 22:49:47 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Mon, 2 Mar 2009 22:49:47 -0500 Subject: [Bioperl-l] Bio::DB::GFF again Message-ID: I have another question about Bio::DB::GFF module. I went through the manual couldn't find the answer. Maybe it's easy solution and I don't know it. Is it possible to overlay protein domain coordinates on genomic coordinates in Bio::DB::GFF? Let's say I want to extract -nucleotide sequence- of a protein domain. If my input is protein domain location (as amino acid coordinates) can I have dna sequence by using Bio::DB::GFF for the same region, (excluding intron sequences)? thanks, Alper Yilmaz From heikki.lehvaslaiho at gmail.com Tue Mar 3 02:11:24 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 3 Mar 2009 09:11:24 +0200 Subject: [Bioperl-l] Offering to help In-Reply-To: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> Message-ID: Hi Bruno, I had a quick look at your code. You have used Moose which would be a new dependency to BioPerl. I am not too sure what to with it. The problem is that since you are not inheriting from Bio::Root::RootI, none of the standard BioPerl methods work. Maybe it is time to start adding code to the trunk that replicates Bio::Root::* functionality using Moose which will then lead the way to Perl6? Most probably we will be needing the full functionality of Moose, don't you think? Meaning that his cuter and faster friends Mouse and Squirrel are not enough? :) -Heikki 2009/3/2 Bruno Vecchi : > Hi, > > I've been using BioPerl for a couple of years now, so first of all, > thanks to all the devs that made it possible. > > Although I have only used a fraction of it, I think I am now acquainted > with it enough to be useful in some way. I offer then to help with any > issue that needs taken care of. > > I have written a wrapper for the Qcons ( > http://tsailab.tamu.edu/Qcons/) > application (calculates protein-protein contacts), you can look at the > code at my github repository ( > http://github.com/brunoV/qcons/tree/master). > Don't know if it will be useful or not (or any of the other modules that > I've > written for my personal use, feel free to look). > > If there is anything else that you feel that I would be able to do, I'd > be happy to. My current field is more involved with wet-lab work > (antibody engineering), but I like programming a lot and I think this > will be a great chance to keep learning while still contributing. > > Cheers, > > ? Bruno. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - heikki lehvaslaiho gmail com Sent from: Cape Town Western Cape South Africa. From cjfields at illinois.edu Tue Mar 3 08:45:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Mar 2009 07:45:43 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> Message-ID: <1EBE7DF6-C36B-4DDA-91E3-6B55CDB7BAFB@illinois.edu> On Mar 3, 2009, at 1:11 AM, Heikki Lehvaslaiho wrote: > Hi Bruno, > > I had a quick look at your code. You have used Moose which would be a > new dependency to BioPerl. I am not too sure what to with it. The > problem is that since you are not inheriting from Bio::Root::RootI, > none of the standard BioPerl methods work. > > Maybe it is time to start adding code to the trunk that replicates > Bio::Root::* functionality using Moose which will then lead the way to > Perl6? Been thinking the same thing. Maybe biomoose? > Most probably we will be needing the full functionality of Moose, > don't you think? Meaning that his cuter and faster friends Mouse and > Squirrel are not enough? :) > > > -Heikki This would definitely be an easier transition to perl6 and, at the same time, would allow us to refactor code where it's needed (using roles instead of interfaces, etc). chris From kanzure at gmail.com Tue Mar 3 08:55:00 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Tue, 3 Mar 2009 07:55:00 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> Message-ID: <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> On Mon, Mar 2, 2009 at 12:53 PM, Bruno Vecchi wrote: > Although I have only used a fraction of it, I think I am now acquainted > with it enough to be useful in some way. I offer then to help with any > issue that needs taken care of. That's quite a dangerous proposition you're making there. Does that mean I get a free ride to request any possible program or bugfix and so on? I'll have to think about this some more. > I have written a wrapper for the Qcons ( > http://tsailab.tamu.edu/Qcons/) > application (calculates protein-protein contacts), you can look at the > code at my github repository ( > http://github.com/brunoV/qcons/tree/master). Thank you for the git repo :-). > If there is anything else that you feel that I would be able to do, I'd > be happy to. My current field is more involved with wet-lab work > (antibody engineering), but I like programming a lot and I think this > will be a great chance to keep learning while still contributing. I've been doing some work on XMLization of lab protocols. Would you be interested in this? It's kind of a cross-section of managing an experiment and doing some programming, meanwhile bioperl has traditionally focused on access to bioinformatics datasets, which sadly doesn't include protocols (yikes!). - Bryan http://heybryan.org/ 1 512 203 0507 From cjfields at illinois.edu Tue Mar 3 09:34:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Mar 2009 08:34:47 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> Message-ID: On Mar 3, 2009, at 7:55 AM, Bryan Bishop wrote: > ... >> If there is anything else that you feel that I would be able to do, >> I'd >> be happy to. My current field is more involved with wet-lab work >> (antibody engineering), but I like programming a lot and I think this >> will be a great chance to keep learning while still contributing. > > I've been doing some work on XMLization of lab protocols. Would you be > interested in this? It's kind of a cross-section of managing an > experiment and doing some programming, meanwhile bioperl has > traditionally focused on access to bioinformatics datasets, which > sadly doesn't include protocols (yikes!). > > - Bryan > http://heybryan.org/ > 1 512 203 0507 Bryan, For an example of how to get involved with BioPerl search the mail archives for the recent Bio::DB::HIV work by Mark Jensen, now in BioPerl 1.6. We would happily support lab protocols if we have a standard to go by (i.e. schema to work with) and someone willing to code and maintain modules related to their use. As for a place to fit in, we have the various Bio::Biblio modules; I could easily see protocols fitting into that namespace, though I have to admit I'm unfamiliar with the overall structure/purpose of Bio::Biblio. chris From gopu_36 at yahoo.com Tue Mar 3 09:36:53 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Tue, 3 Mar 2009 06:36:53 -0800 (PST) Subject: [Bioperl-l] Re mote Blast and Report In-Reply-To: References: Message-ID: <22309761.post@talk.nabble.com> Hi, I am kind of new to bioperl and will definitely check the version before I start. I have a very similar situation to run remote blast but against refseq database. I have a set of accession numbers for which I am intending to run remote blast and parse the results. I was looking for some example to blast against refsef. Hence do I have to modify as below? $prog = "blastn"; $db = "refseq_rna"; My next doubt is since I have around 800 ACCESSION nos, do I have to instantiate the below line that many times inside the loop? $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, -data => $db, -expect => $e_val -readmethod => 'Blast'); Thanks and Regards. Ocar Campos-2 wrote: > > Hello: > > I'm working in a script to remote blast a file with some sequences, I > already got the part of sending the query to blast, but I do not get the > idea of how retrieve a txt report, I mean, like the one you get by running > a > blast via web and you can read in a plane text editor. > > This is what I've done so far: > > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > $prog = "tblastx"; > $db = "nr"; > $e_val = "1e-10"; > $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, > -data => $db, > -expect => $e_val > -readmethod => 'Blast'); > > #I select the file to make que query and do the blast. > $infile = 'file.input.fasta'; > $r = $remoteBlast->submit_blast($infile); > > #this should be the report i get. > $outfile = 'got.output'; > > further than this I've tried some things but none of them work, anybody > who > could give an idea of how retrieving the plane text reports please? > > Cheers. > > O'car > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Remote-Blast-and-Report-tp22214949p22309761.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From caroline.johnston at iop.kcl.ac.uk Tue Mar 3 09:14:30 2009 From: caroline.johnston at iop.kcl.ac.uk (Caroline) Date: Tue, 03 Mar 2009 14:14:30 +0000 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> Message-ID: <1236089670.7303.19.camel@clive> On Tue, 2009-03-03 at 09:11 +0200, Heikki Lehvaslaiho wrote: > > Maybe it is time to start adding code to the trunk that replicates > Bio::Root::* functionality using Moose which will then lead the way to > Perl6? > > Most probably we will be needing the full functionality of Moose, > don't you think? Meaning that his cuter and faster friends Mouse and > Squirrel are not enough? :) Was that a proposal to port Bioperl to Moose? Want a hand? I've used Bioperl & Moose a bit but am very far from expert in either of them and this seems like a good project to learn on. Cheers, Cass From kanzure at gmail.com Tue Mar 3 09:56:36 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Tue, 3 Mar 2009 08:56:36 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> Message-ID: <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> On Tue, Mar 3, 2009 at 8:34 AM, Chris Fields wrote: > On Mar 3, 2009, at 7:55 AM, Bryan Bishop wrote: >>> If there is anything else that you feel that I would be able to do, I'd >>> be happy to. My current field is more involved with wet-lab work >>> (antibody engineering), but I like programming a lot and I think this >>> will be a great chance to keep learning while still contributing. >> >> I've been doing some work on XMLization of lab protocols. Would you be >> interested in this? It's kind of a cross-section of managing an >> experiment and doing some programming, meanwhile bioperl has >> traditionally focused on access to bioinformatics datasets, which >> sadly doesn't include protocols (yikes!). > > For an example of how to get involved with BioPerl search the mail archives > for the recent Bio::DB::HIV work by Mark Jensen, now in BioPerl 1.6. Neat. I'll have to look into that. > We would happily support lab protocols if we have a standard to go by (i.e. > schema to work with) and someone willing to code and maintain modules Have you looked into CLP-ML? I wrote up a pcr.xml example the other day, I think I linked to it in my second to last post (maybe). (I'm on the run, sorry for not linking at the moment.) > related to their use. ?As for a place to fit in, we have the various > Bio::Biblio modules; I could easily see protocols fitting into that > namespace, though I have to admit I'm unfamiliar with the overall > structure/purpose of Bio::Biblio. What? Is that a BibTeX parser module? - Bryan http://heybryan.org/ 1 512 203 0507 From ocarnorsk138 at gmail.com Tue Mar 3 10:40:44 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Tue, 3 Mar 2009 12:40:44 -0300 Subject: [Bioperl-l] Re mote Blast and Report In-Reply-To: <22309761.post@talk.nabble.com> References: <22309761.post@talk.nabble.com> Message-ID: Hello: According to the documentation, depending on the database you want to use, you modify the parameter, so what you did I think it should work, there is a list of databases you can remoteblast and that is this one: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html I think General Databases is where you should search. About the 800 Accession Numbers, wether you have the accesion numbers all in one file or in different files, i think the creation of the object should work doing it only once, what you should have in the loop is the line that sends the file to blast, and the lines that parse the report, this only if you have the accession numbers in more than one file. Hope it helps. Cheers O'car Campos C. Bioinformatics Engineering Student. Universidad de Talca. -- 2009/3/3 gopu_36 > > Hi, > > I am kind of new to bioperl and will definitely check the version before I > start. I have a very similar situation to run remote blast but against > refseq database. I have a set of accession numbers for which I am intending > to run remote blast and parse the results. I was looking for some example > to > blast against refsef. Hence do I have to modify as below? > > $prog = "blastn"; > $db = "refseq_rna"; > > > My next doubt is since I have around 800 ACCESSION nos, do I have to > instantiate the below line that many times inside the loop? > > $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, > -data => $db, > -expect => $e_val > -readmethod => 'Blast'); > > > Thanks and Regards. > > > Ocar Campos-2 wrote: > > > > Hello: > > > > I'm working in a script to remote blast a file with some sequences, I > > already got the part of sending the query to blast, but I do not get the > > idea of how retrieve a txt report, I mean, like the one you get by > running > > a > > blast via web and you can read in a plane text editor. > > > > This is what I've done so far: > > > > > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > > > $prog = "tblastx"; > > $db = "nr"; > > $e_val = "1e-10"; > > $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, > > -data => $db, > > -expect => $e_val > > -readmethod => 'Blast'); > > > > #I select the file to make que query and do the blast. > > $infile = 'file.input.fasta'; > > $r = $remoteBlast->submit_blast($infile); > > > > #this should be the report i get. > > $outfile = 'got.output'; > > > > further than this I've tried some things but none of them work, anybody > > who > > could give an idea of how retrieving the plane text reports please? > > > > Cheers. > > > > O'car > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > View this message in context: > http://www.nabble.com/Remote-Blast-and-Report-tp22214949p22309761.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ajmackey at gmail.com Tue Mar 3 11:06:08 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 3 Mar 2009 11:06:08 -0500 Subject: [Bioperl-l] Bio::DB::GFF again In-Reply-To: References: Message-ID: <24c96eca0903030806m3db1ffaeha4e586fc1981f1f5@mail.gmail.com> The short answer is no, not directly. But, take a look at Bio::Coordinate::GeneMapper; you can use this module (in combination with features retrieved from a Bio::DB::GFF data source) to do the necessary translation of coordinates from protein domain space into genomic/CDS coordinates. -Aaron On Mon, Mar 2, 2009 at 10:49 PM, Alper Yilmaz wrote: > I have another question about Bio::DB::GFF module. I went through the > manual > couldn't find the answer. Maybe it's easy solution and I don't know it. > > Is it possible to overlay protein domain coordinates on genomic coordinates > in Bio::DB::GFF? > Let's say I want to extract -nucleotide sequence- of a protein domain. If > my > input is protein domain location (as amino acid coordinates) can I have dna > sequence by using Bio::DB::GFF for the same region, (excluding intron > sequences)? > > thanks, > > Alper Yilmaz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Mar 3 12:14:19 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Mar 2009 11:14:19 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> Message-ID: On Mar 3, 2009, at 8:56 AM, Bryan Bishop wrote: > On Tue, Mar 3, 2009 at 8:34 AM, Chris Fields > wrote: >> ... >> For an example of how to get involved with BioPerl search the mail >> archives >> for the recent Bio::DB::HIV work by Mark Jensen, now in BioPerl 1.6. > > Neat. I'll have to look into that. > >> We would happily support lab protocols if we have a standard to go >> by (i.e. >> schema to work with) and someone willing to code and maintain modules > > Have you looked into CLP-ML? I wrote up a pcr.xml example the other > day, I think I linked to it in my second to last post (maybe). (I'm on > the run, sorry for not linking at the moment.) The key issues for inclusion into a stable bioperl release are (1) support and (2) stability; we can't have something going to CPAN that may have a fluctuating API w/o a decent deprecation cycle. We can make, however, make the space for in-development modules and module API changes (see recent talk of bioperl-dev). One could: 1) install a stable bioperl release (1.6) 2) optionally install or set PERL5LIB to bioperl-dev (for the bleeding edge) For a Moose-based bioperl implementation I suggest a separate repo completely. We're using svn currently on dev.open-bio.org, though I and a few others are also using git. I'm neutral on the matter but it's possible the consensus may be to keep everything in the open-bio svn repo (not everyone has git or uses it). >> related to their use. As for a place to fit in, we have the various >> Bio::Biblio modules; I could easily see protocols fitting into that >> namespace, though I have to admit I'm unfamiliar with the overall >> structure/purpose of Bio::Biblio. > > What? Is that a BibTeX parser module? > > - Bryan > http://heybryan.org/ > 1 512 203 0507 I think Bio::Biblio is the generic class structure for various bibliographic sources; the parsers would be in Bio::Biblio::IO (no BibTex AFAIK). They're probably in need of some work. chris From kanzure at gmail.com Tue Mar 3 12:21:46 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Tue, 3 Mar 2009 11:21:46 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> Message-ID: <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> On Tue, Mar 3, 2009 at 11:14 AM, Chris Fields wrote: > For a Moose-based bioperl implementation I suggest a separate repo > completely. ?We're using svn currently on dev.open-bio.org, though I and a > few others are also using git. ?I'm neutral on the matter but it's possible > the consensus may be to keep everything in the open-bio svn repo (not > everyone has git or uses it). Yes, I've been figuring that the protocols work should be done in a separate repository for now anyway, I just haven't got around to starting the code, although with XML::Simple it should be easier or quicker to do than I'm making it out to be. Part of the issue is that there needs to be some good thought put into what exactly the schema should be or how exactly to represent the information, which I've always had a little bit of anxiety over. It would be fantastic if I could somehow come up with an easy format and a wizard creation tool front-end for something like protocol-online.org, since all of those protocols truly deserve to be in some sort of microformat, whether CLP-ML or something based off of YAML. Maybe I'll post a thread soon outlining what a discussion would have to go over to make sure there's no shooting of self in the foot, to come up with that relatively mature API or microformat, and see what others on the list have to say about it? My background is more computer science than bioinformatics and biology (even though I've been in biology labs longer (strange)), so it would be great to get some support on that front. - Bryan http://heybryan.org/ 1 512 203 0507 From vecchi.b at gmail.com Tue Mar 3 12:30:27 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Tue, 3 Mar 2009 18:30:27 +0100 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> Message-ID: <1a0c1b750903030930le321a67w83a34e4740a2d167@mail.gmail.com> > > That's quite a dangerous proposition you're making there. Does that > mean I get a free ride to request any possible program or bugfix and > so on? I'll have to think about this some more. If I can wrap my head around the subject, then yes, why not? I am willing to help. 2009/3/3 Chris Fields cjfields at illinois.edu > For a Moose-based bioperl implementation I suggest a separate repo > completely. We're using svn currently on dev.open-bio.org, though I and a > few others are also using git. I'm neutral on the matter but it's possible > the consensus may be to keep everything in the open-bio svn repo (not > everyone has git or uses it). > If it's ok with you, I'd like to help porting modules to Moose. As for VCS, my vote (FWIW) goes to git and github, there's been a lot of migration in that direction lately (perl5, rakudo). From cjfields at illinois.edu Tue Mar 3 12:43:12 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Mar 2009 11:43:12 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <1a0c1b750903030930le321a67w83a34e4740a2d167@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <1a0c1b750903030930le321a67w83a34e4740a2d167@mail.gmail.com> Message-ID: On Mar 3, 2009, at 11:30 AM, Bruno Vecchi wrote: > That's quite a dangerous proposition you're making there. Does that > mean I get a free ride to request any possible program or bugfix and > so on? I'll have to think about this some more. > > If I can wrap my head around the subject, then yes, why not? I am > willing to help. > > 2009/3/3 Chris Fields cjfields at illinois.edu > For a Moose-based bioperl implementation I suggest a separate repo > completely. We're using svn currently on dev.open-bio.org, though I > and a few others are also using git. I'm neutral on the matter but > it's possible the consensus may be to keep everything in the open- > bio svn repo (not everyone has git or uses it). > > If it's ok with you, I'd like to help porting modules to Moose. As > for VCS, my vote (FWIW) goes to git and github, there's been a lot > of migration in that direction lately (perl5, rakudo). I agree, but remember not all the bioperl devs have git nor know the idiosyncrasies git has vs svn. I'm still wrapping my brain around it myself. Also, Rakudo had some initial headaches dealing with being on git and syncing with ongoing parrot development, recently migrated over to the parrot svn server. That's primarily b/c getting rakudo implemented drove much of the current run of parrot development, though I think the situation is somewhat rectified (I think pmichaud indicated rakudo targets specific parrot versions now). BTW, I managed to donate some code to Rakudo (the .trans implementation, which is pretty hacky, and a few bug fixes). I'll be posting an update to that one soon now that the Setting is being worked on and Grammars are somewhat easier to work with. chris From cjfields at illinois.edu Tue Mar 3 13:29:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Mar 2009 12:29:50 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <1a0c1b750903030930le321a67w83a34e4740a2d167@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <1a0c1b750903030930le321a67w83a34e4740a2d167@mail.gmail.com> Message-ID: <0072AB03-1DF0-4ECA-97A9-0CFAAC3A9B10@illinois.edu> On Mar 3, 2009, at 11:30 AM, Bruno Vecchi wrote: > ... > If it's ok with you, I'd like to help porting modules to Moose. As > for VCS, > my vote (FWIW) goes to git and github, there's been a lot of > migration in > that direction lately (perl5, rakudo). Forgot to mention in the last post, but it might be wise to think ahead a bit about namespace conflicts. For instance, I wouldn't port modules into Bio::* due to potential conflicts with non-Moose BioPerl modules (unless you want to stick everything into a Bio::Moose, which I don't think is a good idea for a base namespace). One could possibly use a variation on Bio* for Moose-related development (BioMoose or similar). chris From awitney at sgul.ac.uk Tue Mar 3 13:49:26 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 3 Mar 2009 18:49:26 +0000 Subject: [Bioperl-l] Offering to help In-Reply-To: <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> Message-ID: <0F42665A-075D-452C-80D4-17354FB994F4@sgul.ac.uk> On 3 Mar 2009, at 17:21, Bryan Bishop wrote: > On Tue, Mar 3, 2009 at 11:14 AM, Chris Fields > wrote: >> For a Moose-based bioperl implementation I suggest a separate repo >> completely. We're using svn currently on dev.open-bio.org, though >> I and a >> few others are also using git. I'm neutral on the matter but it's >> possible >> the consensus may be to keep everything in the open-bio svn repo (not >> everyone has git or uses it). > > Yes, I've been figuring that the protocols work should be done in a > separate repository for now anyway, I just haven't got around to > starting the code, although with XML::Simple it should be easier or > quicker to do than I'm making it out to be. Part of the issue is that > there needs to be some good thought put into what exactly the schema > should be or how exactly to represent the information, which I've > always had a little bit of anxiety over. It would be fantastic if I > could somehow come up with an easy format and a wizard creation tool > front-end for something like protocol-online.org, since all of those > protocols truly deserve to be in some sort of microformat, whether > CLP-ML or something based off of YAML. Maybe I'll post a thread soon > outlining what a discussion would have to go over to make sure there's > no shooting of self in the foot, to come up with that relatively > mature API or microformat, and see what others on the list have to say > about it? My background is more computer science than bioinformatics > and biology (even though I've been in biology labs longer (strange)), > so it would be great to get some support on that front. The FuGE folks have modeled Protocols as part of their object model. I haven't looked at it recently but it is supposed to be fairly generic for any kind of protocol. We have used the MAGE (FuGE's almost predecessor) version of Protocol within our database for several years and it covers most cases. The FuGE project pages can be found here: http://fuge.sourceforge.net/ Maybe that would be a good place to start? adam From kanzure at gmail.com Tue Mar 3 15:05:22 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Tue, 3 Mar 2009 14:05:22 -0600 Subject: [Bioperl-l] Offering to help In-Reply-To: <0F42665A-075D-452C-80D4-17354FB994F4@sgul.ac.uk> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> <0F42665A-075D-452C-80D4-17354FB994F4@sgul.ac.uk> Message-ID: <55ad6af70903031205v45e9637ex9f665afad6e2b861@mail.gmail.com> On Tue, Mar 3, 2009 at 12:49 PM, Adam Witney wrote: > On 3 Mar 2009, at 17:21, Bryan Bishop wrote: >> On Tue, Mar 3, 2009 at 11:14 AM, Chris Fields wrote: >>> For a Moose-based bioperl implementation I suggest a separate repo >>> completely. ?We're using svn currently on dev.open-bio.org, though I and >>> a >>> few others are also using git. ?I'm neutral on the matter but it's >>> possible >>> the consensus may be to keep everything in the open-bio svn repo (not >>> everyone has git or uses it). >> >> Yes, I've been figuring that the protocols work should be done in a >> separate repository for now anyway, I just haven't got around to >> starting the code, although with XML::Simple it should be easier or >> quicker to do than I'm making it out to be. Part of the issue is that >> there needs to be some good thought put into what exactly the schema >> should be or how exactly to represent the information, which I've >> always had a little bit of anxiety over. It would be fantastic if I >> could somehow come up with an easy format and a wizard creation tool >> front-end for something like protocol-online.org, since all of those >> protocols truly deserve to be in some sort of microformat, whether >> CLP-ML or something based off of YAML. Maybe I'll post a thread soon >> outlining what a discussion would have to go over to make sure there's >> no shooting of self in the foot, to come up with that relatively >> mature API or microformat, and see what others on the list have to say >> about it? My background is more computer science than bioinformatics >> and biology (even though I've been in biology labs longer (strange)), >> so it would be great to get some support on that front. > > The FuGE folks have modeled Protocols as part of their object model. I > haven't looked at it recently but it is supposed to be fairly generic for > any kind of protocol. We have used the MAGE (FuGE's almost predecessor) > version of Protocol within our database for several years and it covers most > cases. > > The FuGE project pages can be found here: > > http://fuge.sourceforge.net/ > > Maybe that would be a good place to start? Thanks Adam, I hadn't seen that before. I glanced over their API documentation. http://fuge.sourceforge.net/dev/V1Final/FuGEv1-refManual.html#Action So, it looks like they are defining individual steps with an ordinal number (which is good), also equipment via make and model tags. One of the ideas that I have been working on is the concept of instructions being "solved" by equipment: so you can either execute a PCR protocol by manually dunking test tubes into warm and cold baths, or you can use a thermocycler or microfluidic chip using some standardized interface. This is also the same problem that "semantic web" advocates have to put up with: UDDI, WSDL, B2B, "service discovery", etc., without overly restricting the standards to the point that nobody can do anything new. But anyway, getting off track. There is a link to "example files"- which is broken- http://fuge.sourceforge.net/dev/V1Final/Instances/ Although this one does work: http://fuge.sourceforge.net/presentation/master_example.xml though it might be too meta for me :-). Ultimately I'd like to come up with a system that I can dump into EXPO (like The Robot Scientist) but also into a human readable format (comparably, instructables?). http://expo.sf.net/ videos: http://www.aber.ac.uk/compsci/Research/bio/robotsci/video/ - Bryan http://heybryan.org/ 1 512 203 0507 From xuy at agr.gc.ca Wed Mar 4 15:52:25 2009 From: xuy at agr.gc.ca (MADSGENE) Date: Wed, 4 Mar 2009 12:52:25 -0800 (PST) Subject: [Bioperl-l] problem of bioperl on mac In-Reply-To: <22214912.post@talk.nabble.com> References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu> <22214912.post@talk.nabble.com> Message-ID: <22339072.post@talk.nabble.com> Hi bioperlers, Thanks for all responses. I removed the bioperl and reinstalled it by fink. It works. Regards Xia MADSGENE wrote: > > > I installed the bioperl-pm586 version 1.5.2-4 in my macbook(MAC OS X > version 10.5.6) using fink. I also found it in /sw/share/bioperl-pm586. > However I failed to get it when I typed "which". When I typed >perl -e > 'print @INC', it displayed > /sw/lib/perl5/5.8.6/darwin-thread-multi-21evel...but no information about > bioperl-pm586. I took the suggstion and added "setenv PERL5LIB > ${PERL5LIB}:/sw/share/bioperl-pm586 " into .bashrc, it still did not work. > > I am grateful to you if anyone could help me out. > I note there are two .bashrc files, one is in root and the other in home. > If your suggestion is to modify these files, please let me know which one. > > Regards > > Xia Wang > > > > -- View this message in context: http://www.nabble.com/bioperl-on-mac-tp15101357p22339072.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Kevin.M.Brown at asu.edu Wed Mar 4 15:56:44 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 4 Mar 2009 13:56:44 -0700 Subject: [Bioperl-l] problem of bioperl on mac In-Reply-To: <22339072.post@talk.nabble.com> References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu><22214912.post@talk.nabble.com> <22339072.post@talk.nabble.com> Message-ID: <1A4207F8295607498283FE9E93B775B405CF197A@EX02.asurite.ad.asu.edu> And for those that are interested, here's what version of bioperl are available via fink for various levels of OSX http://pdb.finkproject.org/pdb/package.php/bioperl-pm586 > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of MADSGENE > Sent: Wednesday, March 04, 2009 1:52 PM > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problem of bioperl on mac > > > Hi bioperlers, > > Thanks for all responses. > I removed the bioperl and reinstalled it by fink. It works. > > Regards > > Xia > > > > > > MADSGENE wrote: > > > > > > I installed the bioperl-pm586 version 1.5.2-4 in my macbook(MAC OS X > > version 10.5.6) using fink. I also found it in > /sw/share/bioperl-pm586. > > However I failed to get it when I typed "which". When I > typed >perl -e > > 'print @INC', it displayed > > /sw/lib/perl5/5.8.6/darwin-thread-multi-21evel...but no > information about > > bioperl-pm586. I took the suggstion and added "setenv PERL5LIB > > ${PERL5LIB}:/sw/share/bioperl-pm586 " into .bashrc, it > still did not work. > > > > I am grateful to you if anyone could help me out. > > I note there are two .bashrc files, one is in root and the > other in home. > > If your suggestion is to modify these files, please let me > know which one. > > > > Regards > > > > Xia Wang > > > > > > > > > > -- > View this message in context: > http://www.nabble.com/bioperl-on-mac-tp15101357p22339072.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From kvddrift at earthlink.net Wed Mar 4 16:27:54 2009 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 4 Mar 2009 16:27:54 -0500 (GMT-05:00) Subject: [Bioperl-l] problem of bioperl on mac Message-ID: <9373793.1236202075292.JavaMail.root@elwamui-ovcar.atl.sa.earthlink.net> >And for those that are interested, here's what version of bioperl are >available via fink for various levels of OSX >http://pdb.finkproject.org/pdb/package.php/bioperl-pm586 I am the maintainer of the fink package for bioperl. Just like to add that the fink database website is not completely accurate, some kind of bug, I believe. So just to be complete, both bioperl and bioperl-run version 1.6 are available for OS X systems with perl 5.8.6 and perl 5.8.8. Cheers, - Koen. From kpclancy at hotmail.com Thu Mar 5 00:49:35 2009 From: kpclancy at hotmail.com (Kevin Clancy) Date: Wed, 4 Mar 2009 22:49:35 -0700 Subject: [Bioperl-l] Offering to help In-Reply-To: <55ad6af70903031205v45e9637ex9f665afad6e2b861@mail.gmail.com> References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> <0F42665A-075D-452C-80D4-17354FB994F4@sgul.ac.uk> <55ad6af70903031205v45e9637ex9f665afad6e2b861@mail.gmail.com> Message-ID: Hi Guys You might also want to check out Ontologies for Biomedical Investigations - OBI http://obi-ontology.org/page/Main_Page I've been following this thread and am intreagued by the use of roles in Moose. So if I have a defined ontology with classes, each of which has roles, can that be dropped into the Moose framework? The main reason for asking is that a lot of work has gone into OBI including normalizing the ontology to others in the national center for biomedical ontologies http://bioontology.org/ and obo foundry http://www.obofoundry.org/. having some of this ontological work extended into the behaviour of the Bio* languages would be an interesting extension of these projects. My main concern with FuGE is that I don't know how compatable it is with these communities. kevin > Date: Tue, 3 Mar 2009 14:05:22 -0600 > From: kanzure at gmail.com > To: awitney at sgul.ac.uk > CC: cjfields at illinois.edu; bioperl-l at lists.open-bio.org; kanzure at gmail.com > Subject: Re: [Bioperl-l] Offering to help > > On Tue, Mar 3, 2009 at 12:49 PM, Adam Witney wrote: > > On 3 Mar 2009, at 17:21, Bryan Bishop wrote: > >> On Tue, Mar 3, 2009 at 11:14 AM, Chris Fields wrote: > >>> For a Moose-based bioperl implementation I suggest a separate repo > >>> completely. We're using svn currently on dev.open-bio.org, though I and > >>> a > >>> few others are also using git. I'm neutral on the matter but it's > >>> possible > >>> the consensus may be to keep everything in the open-bio svn repo (not > >>> everyone has git or uses it). > >> > >> Yes, I've been figuring that the protocols work should be done in a > >> separate repository for now anyway, I just haven't got around to > >> starting the code, although with XML::Simple it should be easier or > >> quicker to do than I'm making it out to be. Part of the issue is that > >> there needs to be some good thought put into what exactly the schema > >> should be or how exactly to represent the information, which I've > >> always had a little bit of anxiety over. It would be fantastic if I > >> could somehow come up with an easy format and a wizard creation tool > >> front-end for something like protocol-online.org, since all of those > >> protocols truly deserve to be in some sort of microformat, whether > >> CLP-ML or something based off of YAML. Maybe I'll post a thread soon > >> outlining what a discussion would have to go over to make sure there's > >> no shooting of self in the foot, to come up with that relatively > >> mature API or microformat, and see what others on the list have to say > >> about it? My background is more computer science than bioinformatics > >> and biology (even though I've been in biology labs longer (strange)), > >> so it would be great to get some support on that front. > > > > The FuGE folks have modeled Protocols as part of their object model. I > > haven't looked at it recently but it is supposed to be fairly generic for > > any kind of protocol. We have used the MAGE (FuGE's almost predecessor) > > version of Protocol within our database for several years and it covers most > > cases. > > > > The FuGE project pages can be found here: > > > > http://fuge.sourceforge.net/ > > > > Maybe that would be a good place to start? > > Thanks Adam, I hadn't seen that before. I glanced over their API documentation. > > http://fuge.sourceforge.net/dev/V1Final/FuGEv1-refManual.html#Action > > So, it looks like they are defining individual steps with an ordinal > number (which is good), also equipment via make and model tags. One of > the ideas that I have been working on is the concept of instructions > being "solved" by equipment: so you can either execute a PCR protocol > by manually dunking test tubes into warm and cold baths, or you can > use a thermocycler or microfluidic chip using some standardized > interface. This is also the same problem that "semantic web" advocates > have to put up with: UDDI, WSDL, B2B, "service discovery", etc., > without overly restricting the standards to the point that nobody can > do anything new. But anyway, getting off track. > > There is a link to "example files"- which is broken- > http://fuge.sourceforge.net/dev/V1Final/Instances/ > > Although this one does work: > http://fuge.sourceforge.net/presentation/master_example.xml > though it might be too meta for me :-). > > Ultimately I'd like to come up with a system that I can dump into EXPO > (like The Robot Scientist) but also into a human readable format > (comparably, instructables?). > > http://expo.sf.net/ > videos: http://www.aber.ac.uk/compsci/Research/bio/robotsci/video/ > > - Bryan > http://heybryan.org/ > 1 512 203 0507 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kvddrift at earthlink.net Thu Mar 5 08:17:01 2009 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 5 Mar 2009 08:17:01 -0500 (GMT-05:00) Subject: [Bioperl-l] bioperl-run version Message-ID: <1484293.1236259021799.JavaMail.root@elwamui-ovcar.atl.sa.earthlink.net> Hi, I downloaded bioperl-run 1.6.1, but when I unpacked the tarball, the containing folder said 1.6.0. No big deal I guess, just a typo, but though I'd mentioned it here. cheers, - Koen. From jason at bioperl.org Thu Mar 5 06:35:53 2009 From: jason at bioperl.org (ajay) Date: Thu, 5 Mar 2009 16:35:53 +0500 Subject: [Bioperl-l] How to run this "bp_search2gff.pl" Message-ID: <20090305112844.M50039@nrcpb.org> Respected sir, good morning! sir my self ajay a resercher in the bioinformatics ,sir please guide me how to run this bioperl scripts sir i have compleated the local BLAST P serch of 255 fasta files and the blast result was saved in the (.txt) format sir i need to convert the blast result to GFF3 format using this script "bp_search2gff.pl" but i can not understand how to input my BLAST result file and get the output file in the GFF3 format so please tell me the steps to input the blast result file what changes i have to made in the program sir plesae help me i sahll be very thankfull to you waiting for your kind responce sincerly your!s Ajay Ajay Kumar Mahato (Research Associate) National Research Centre for Plant Biotechnology (IARI) (http://www.nrcpb.org) From cjfields at illinois.edu Thu Mar 5 15:59:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Mar 2009 14:59:38 -0600 Subject: [Bioperl-l] bioperl-run version In-Reply-To: <1484293.1236259021799.JavaMail.root@elwamui-ovcar.atl.sa.earthlink.net> References: <1484293.1236259021799.JavaMail.root@elwamui-ovcar.atl.sa.earthlink.net> Message-ID: <387CA230-4A47-4D49-A0FC-EA822630101A@illinois.edu> That's known and was an issue on my part (it was mentioned in the announcement). Apparently bioperl-run 1.6.0 had a bug in the test suite, a off-by-one error I introduced when fixing a Primer3 test. This came out via CPAN Testers; it fails if the executable isn't installed. The error was fixed, but as PAUSE only uniquely named versions, I went ahead and bumped the bioperl-run tarball to 1.6.1, but the stated version (I assume derived from Bio::Root::Root, via core) is 1.006000. It requires bioperl core 1.6.0. I'll likely bump the next release versions to 1.6.2 to sync everything when the next release is due (sometime in mid-April), then follow incremental bumps along the way. chris On Mar 5, 2009, at 7:17 AM, Koen van der Drift wrote: > Hi, > > I downloaded bioperl-run 1.6.1, but when I unpacked the tarball, the > containing folder said 1.6.0. No big deal I guess, just a typo, but > though I'd mentioned it here. > > cheers, > > - Koen. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjm at berkeleybop.org Thu Mar 5 18:48:07 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Thu, 5 Mar 2009 15:48:07 -0800 Subject: [Bioperl-l] Offering to help In-Reply-To: References: <1a0c1b750903021053o80eef69q35ecacd26df17739@mail.gmail.com> <55ad6af70903030555n42bda636ta00ceb0f252ea5cd@mail.gmail.com> <55ad6af70903030656t20171ce8p2168de9ad97ee336@mail.gmail.com> <55ad6af70903030921p349a972dxfe38e89f17cbda56@mail.gmail.com> <0F42665A-075D-452C-80D4-17354FB994F4@sgul.ac.uk> <55ad6af70903031205v45e9637ex9f665afad6e2b861@mail.gmail.com> Message-ID: On Mar 4, 2009, at 9:49 PM, Kevin Clancy wrote: > > Hi Guys > You might also want to check out Ontologies for Biomedical > Investigations - OBI > http://obi-ontology.org/page/Main_Page > I've been following this thread and am intreagued by the use of > roles in Moose. So if I have a defined ontology with classes, each > of which has roles, can that be dropped into the Moose framework? Hi Kevin Can you explain what you have in mind here? Particularly with respect to classes in ontologies having roles. The idea of having the ontology drive the object model is appealing, but there are a number of gotchas here. There's a significant impedance mismatch between the semantics of classes, roles and so in between ontology formalisms and object-oriented concepts. > The main reason for asking is that a lot of work has gone into OBI > including normalizing the ontology to others in the national center > for biomedical ontologies http://bioontology.org/ and obo foundry http://www.obofoundry.org/ > . having some of this ontological work extended into the behaviour > of the Bio* languages would be an interesting extension of these > projects. My main concern with FuGE is that I don't know how > compatable it is with these communities. I don't really know much about FuGE. But speaking of ontologies and Moose, I have been thinking it's about time to update the creaky old go-perl framework, fixing it to support some of the more modern features of OBO and OWL. I'd like this to be done in such a way that bioperl2 can use this. If Moose is the way to go we'll move in this direction, though probably not until after I'm convinced that Moose and DBIx::Class can play well together. > > kevin > >> Date: Tue, 3 Mar 2009 14:05:22 -0600 >> From: kanzure at gmail.com >> To: awitney at sgul.ac.uk >> CC: cjfields at illinois.edu; bioperl-l at lists.open-bio.org; kanzure at gmail.com >> Subject: Re: [Bioperl-l] Offering to help >> >> On Tue, Mar 3, 2009 at 12:49 PM, Adam Witney wrote: >>> On 3 Mar 2009, at 17:21, Bryan Bishop wrote: >>>> On Tue, Mar 3, 2009 at 11:14 AM, Chris Fields wrote: >>>>> For a Moose-based bioperl implementation I suggest a separate repo >>>>> completely. We're using svn currently on dev.open-bio.org, >>>>> though I and >>>>> a >>>>> few others are also using git. I'm neutral on the matter but it's >>>>> possible >>>>> the consensus may be to keep everything in the open-bio svn repo >>>>> (not >>>>> everyone has git or uses it). >>>> >>>> Yes, I've been figuring that the protocols work should be done in a >>>> separate repository for now anyway, I just haven't got around to >>>> starting the code, although with XML::Simple it should be easier or >>>> quicker to do than I'm making it out to be. Part of the issue is >>>> that >>>> there needs to be some good thought put into what exactly the >>>> schema >>>> should be or how exactly to represent the information, which I've >>>> always had a little bit of anxiety over. It would be fantastic if I >>>> could somehow come up with an easy format and a wizard creation >>>> tool >>>> front-end for something like protocol-online.org, since all of >>>> those >>>> protocols truly deserve to be in some sort of microformat, whether >>>> CLP-ML or something based off of YAML. Maybe I'll post a thread >>>> soon >>>> outlining what a discussion would have to go over to make sure >>>> there's >>>> no shooting of self in the foot, to come up with that relatively >>>> mature API or microformat, and see what others on the list have >>>> to say >>>> about it? My background is more computer science than >>>> bioinformatics >>>> and biology (even though I've been in biology labs longer >>>> (strange)), >>>> so it would be great to get some support on that front. >>> >>> The FuGE folks have modeled Protocols as part of their object >>> model. I >>> haven't looked at it recently but it is supposed to be fairly >>> generic for >>> any kind of protocol. We have used the MAGE (FuGE's almost >>> predecessor) >>> version of Protocol within our database for several years and it >>> covers most >>> cases. >>> >>> The FuGE project pages can be found here: >>> >>> http://fuge.sourceforge.net/ >>> >>> Maybe that would be a good place to start? >> >> Thanks Adam, I hadn't seen that before. I glanced over their API >> documentation. >> >> http://fuge.sourceforge.net/dev/V1Final/FuGEv1-refManual.html#Action >> >> So, it looks like they are defining individual steps with an ordinal >> number (which is good), also equipment via make and model tags. One >> of >> the ideas that I have been working on is the concept of instructions >> being "solved" by equipment: so you can either execute a PCR protocol >> by manually dunking test tubes into warm and cold baths, or you can >> use a thermocycler or microfluidic chip using some standardized >> interface. This is also the same problem that "semantic web" >> advocates >> have to put up with: UDDI, WSDL, B2B, "service discovery", etc., >> without overly restricting the standards to the point that nobody can >> do anything new. But anyway, getting off track. >> >> There is a link to "example files"- which is broken- >> http://fuge.sourceforge.net/dev/V1Final/Instances/ >> >> Although this one does work: >> http://fuge.sourceforge.net/presentation/master_example.xml >> though it might be too meta for me :-). >> >> Ultimately I'd like to come up with a system that I can dump into >> EXPO >> (like The Robot Scientist) but also into a human readable format >> (comparably, instructables?). >> >> http://expo.sf.net/ >> videos: http://www.aber.ac.uk/compsci/Research/bio/robotsci/video/ >> >> - Bryan >> http://heybryan.org/ >> 1 512 203 0507 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ocarnorsk138 at gmail.com Fri Mar 6 04:51:15 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Fri, 6 Mar 2009 06:51:15 -0300 Subject: [Bioperl-l] Remote Blast and Report In-Reply-To: References: <1F1240778FB0AF46B4E5A72C44D2C74722EB61D7@exch1-hi.accelrys.net> <18DF7D20DFEC044098A1062202F5FFF321B220D7C7@exchsth.agresearch.co.nz> <053E9CBD-5784-47F7-8F3F-255449752AC6@bioperl.org> Message-ID: Hello: Try checking on what BioPerl version you are working with, I had the same problem, I was working with 1.4 version, and now it has been released the 1.6 version, so I updated and worked fine. Cheers. O'car Campos C. Bioinformatics Engineering Student. Universidad de Talca. From MEC at stowers.org Fri Mar 6 09:49:35 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 6 Mar 2009 08:49:35 -0600 Subject: [Bioperl-l] Opportunity: bioinformatics programmer/analyst at Stowers Institute for Medical Research Message-ID: My colleague, Dr. Julia Zeitlinger, here at the Stowers Institute for Medical Research in Kansas City, is seeking a programmer/analyst to join her lab (http://www.stowers-institute.org/labs/ZeitlingerLab.asp). The position description, repeated below, is posted at http://www.stowers-institute.org/ScientistsSought/ScientistsSought.asp where also is described the application process and job benefits. I might add that though 'minimum requirements' are stated, more educationally and professionally advanced candidates are encouraged to consider the position as well. Regards, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri --------------------------- Programmer/Analyst The Stowers Institute for Medical Research has an opening for a Programmer/Analyst to support scientific data analysis and assist with computational biology tasks, in particular with the analysis of high-throughput sequencing data and microarray experiments. Responsibilities include the following: Writing code and developing solutions to computational biology problems, typically using PERL, Python and R; Developing and/or maintaining software for analyzing biological data, and documenting how packages can be used with the existing computing infrastructure; In addition to excellent communication skills and experience in PERL, Python and R, the successful candidate will also have a background in statistics and basic biology, and is highly motivated to solve scientific problems. Minimum requirements include an undergraduate degree in science, math, computer science, engineering, or a related field; at least one year of programming experience. From paolo.pavan at gmail.com Fri Mar 6 12:00:08 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 6 Mar 2009 18:00:08 +0100 Subject: [Bioperl-l] Primer3 help Message-ID: <56be91b60903060900i4b3c87cfs45ce87485932d4db@mail.gmail.com> Hi all, Is the first time I need to use Primer3 with bioperl and I was trying to investigate the API. I found the snip of code below but it reports to me: Can't locate Bio/Tools/Run/Primer3.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at primers.3.pl line 8. BEGIN failed--compilation aborted at primers.3.pl line 8. In fact the file: /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Primer3.pm does not exist in my system. My bioperl's instalation dir is /usr/lib/perl5/site_perl/5.8.8/Bio that is in the path and the bioperl distribution is 1.6. Do I haven't understood how to use Primer3 or is a distribution problem? Thank you in advance, Paolo # The easiest way to use this is probably either, (i), get the # output from Bio::Tools::Run::Primer3, Bio::Tools::Primer3, or # Bio::Tools::PCRSimulation: # For example, start with a fasta file use Bio::SeqIO; use Bio::Tools::Run::Primer3; my $file = shift || die "need a file to read"; my $seqin = Bio::SeqIO->new(-file => $file); my $seq = $seqin->next_seq; # use primer3 to design some primers my $primer3run = Bio::Tools::Run::Primer3->new(-seq => $seq); $primer3run -> run; # run it with the default parameters # create a file to write the results to my $seqout = Bio::SeqIO->new(-file => ">primed_sequence.gbk", -format => 'genbank'); # now just get all the results and write them out. while (my $results = $primer3run->next_primer) { $seqout->write_seq($results->annotated_seq); } From cjfields at illinois.edu Fri Mar 6 14:51:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 6 Mar 2009 13:51:45 -0600 Subject: [Bioperl-l] Primer3 help In-Reply-To: <56be91b60903060900i4b3c87cfs45ce87485932d4db@mail.gmail.com> References: <56be91b60903060900i4b3c87cfs45ce87485932d4db@mail.gmail.com> Message-ID: You need bioperl-run: http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/ chris On Mar 6, 2009, at 11:00 AM, Paolo Pavan wrote: > Hi all, > Is the first time I need to use Primer3 with bioperl and I was > trying to > investigate the API. I found the snip of code below but it reports > to me: > Can't locate Bio/Tools/Run/Primer3.pm in @INC (@INC contains: > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi > /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi > /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 > /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 > /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi > /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi > /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 > /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 > /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux- > thread-multi > /usr/lib/perl5/5.8.8 .) at primers.3.pl line 8. > BEGIN failed--compilation aborted at primers.3.pl line 8. > > In fact the file: > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Primer3.pm > does not exist in my system. > > My bioperl's instalation dir is /usr/lib/perl5/site_perl/5.8.8/Bio > that is > in the path and the bioperl distribution is 1.6. > Do I haven't understood how to use Primer3 or is a distribution > problem? > > Thank you in advance, > Paolo > > > > # The easiest way to use this is probably either, (i), get the > # output from Bio::Tools::Run::Primer3, Bio::Tools::Primer3, or > # Bio::Tools::PCRSimulation: > > # For example, start with a fasta file > use Bio::SeqIO; > use Bio::Tools::Run::Primer3; > > my $file = shift || die "need a file to read"; > my $seqin = Bio::SeqIO->new(-file => $file); > my $seq = $seqin->next_seq; > > # use primer3 to design some primers > my $primer3run = Bio::Tools::Run::Primer3->new(-seq => $seq); > $primer3run -> run; # run it with the default parameters > > # create a file to write the results to > my $seqout = Bio::SeqIO->new(-file => ">primed_sequence.gbk", > -format => 'genbank'); > > # now just get all the results and write them out. > while (my $results = $primer3run->next_primer) { > $seqout->write_seq($results->annotated_seq); > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Sun Mar 8 15:13:44 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 8 Mar 2009 12:13:44 -0700 Subject: [Bioperl-l] Regarding help for Bioperl on window server 2003 In-Reply-To: <31bb4380903080859y229a099ew8894e7b88e4c8f38@mail.gmail.com> References: <31bb4380903080859y229a099ew8894e7b88e4c8f38@mail.gmail.com> Message-ID: <555EFCBE-FB65-4763-8F55-68C91F608EB0@bioperl.org> please ask your questions on the mailing list and include more detailed error messages. On Mar 8, 2009, at 8:59 AM, Sanjay Harke wrote: > Dear Jason, > > > here i am sanjay harke from India. Presently i am working at MGM > School of Health Sciences, India as associate Professor.i am working > on development of database, In that regard i am working on bioperl. > But couldn't get image on window with Bio::Graphic.it is not execute. > Kindly help me out. > > thanking you > > yours truly > > sanjay harke Jason Stajich jason at bioperl.org From sanjay.harke at gmail.com Mon Mar 9 03:04:21 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Mon, 9 Mar 2009 12:34:21 +0530 Subject: [Bioperl-l] Regarding Bioperl help for windows server 2003 Message-ID: <31bb4380903090004k6b23a574uca9d95598ad4e2fa@mail.gmail.com> I am sanjay harke from India. I am working in MGM School of Health Sciences, India as associate Professor in department of Bioinformatics and Biotecnology. Presently i am working on database development of Plant proteins In that regard i am presuing Bioperl script. I know very well Perl, PHP, CGI... But i am not getting image with the help of Bio::Graphics for bioperl module execution on window 2003 server. Is it possible to you all to help me out.following script is not working properly on window 2003 server. ---------------------------------------------------------------------------- #!/usr/local/bin/perl # This is code example 1 in the Graphics-HOWTO # data1.txt file use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new(-length => 1000,-width => 800); my $track = $panel->add_track(-glyph => 'generic',-label => 1); while (<>) { # read blast file chomp; next if /^\#/; # ignore comments my($name,$score,$start,$end) = split /\t+/; my $feature = Bio::SeqFeature::Generic->new( -display_name => $name, -score => $score, -start => $start, -end => $end ); $track->add_feature($feature); } print $panel->png; ------------------------------------------------------------ So, kindly help me out for this problem. thanking you yours truly sanjay harke From alden.huang at gmail.com Mon Mar 9 04:14:02 2009 From: alden.huang at gmail.com (Alden Huang) Date: Mon, 9 Mar 2009 01:14:02 -0700 Subject: [Bioperl-l] Regarding Bioperl help for windows server 2003 In-Reply-To: <31bb4380903090004k6b23a574uca9d95598ad4e2fa@mail.gmail.com> References: <31bb4380903090004k6b23a574uca9d95598ad4e2fa@mail.gmail.com> Message-ID: <9e408d720903090114j39f7ec4fhb130966f2594e8eb@mail.gmail.com> did u try writing with binmode STDOUT; Just a guess tho, I don't really know. On Mon, Mar 9, 2009 at 12:04 AM, Sanjay Harke wrote: > I am sanjay harke from India. I am working in MGM School of Health > Sciences, India as associate Professor in department of Bioinformatics > and Biotecnology. Presently i am working on database development of > Plant proteins In that regard i am presuing Bioperl script. I know > very well Perl, PHP, CGI... But i am not getting image with the help of > Bio::Graphics for bioperl module execution on window 2003 server. Is > it possible to you all to help me out.following script is not working > properly on window 2003 server. > > ---------------------------------------------------------------------------- > #!/usr/local/bin/perl > > # This is code example 1 in the Graphics-HOWTO > # data1.txt file > > use strict; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel = Bio::Graphics::Panel->new(-length => 1000,-width => 800); > > my $track = $panel->add_track(-glyph => 'generic',-label => 1); > > while (<>) { # read blast file > chomp; > next if /^\#/; # ignore comments > my($name,$score,$start,$end) = split /\t+/; > my $feature = Bio::SeqFeature::Generic->new( > -display_name => $name, > -score => $score, > -start => $start, > -end => $end > ); > $track->add_feature($feature); > } > > print $panel->png; > ------------------------------------------------------------ > > So, kindly help me out for this problem. > > thanking you > > yours truly > > sanjay harke > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From spiros at lokku.com Mon Mar 9 04:19:36 2009 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 9 Mar 2009 08:19:36 +0000 Subject: [Bioperl-l] Regarding Bioperl help for windows server 2003 In-Reply-To: <31bb4380903090004k6b23a574uca9d95598ad4e2fa@mail.gmail.com> References: <31bb4380903090004k6b23a574uca9d95598ad4e2fa@mail.gmail.com> Message-ID: On Mon, Mar 9, 2009 at 7:04 AM, Sanjay Harke wrote: > So, kindly help me out for this problem. > Hello, What is the output of the script when you run in locally? Instinctively, I would say you have a problem with GD not being installed properly since Bio::Graphics requires it. More info, http://www.libgd.org/Main_Page Hope this helps out, Spiros Denaxas > > thanking you > > yours truly > > sanjay harke > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Mon Mar 9 11:39:43 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 9 Mar 2009 08:39:43 -0700 (PDT) Subject: [Bioperl-l] Help in Programming a substitution routine Message-ID: <22413520.post@talk.nabble.com> Hi there, i need a little help in programming. I am still stucking in a problem and have no idea how to get out a solution. I have a fasta alignment file (with two aligned sequences) and I want to optimize the genetic code for a better alignment score. Everything is fine up to the line where I'd like to compare the codons of each sequence. If both codons are equal i push them into a new array, if not I'd like to lookup all possible codons for the respective amino acid and compare all codons from one amino acid of one site with the corresponding on the other: lets say in sequence 1 is a "A" (GCC, GCA, GCG,GCT) and in sequence 2 is a "E" (GAA,GAG). So the best hit should be the combination GCA and GAA or GCG and GAG. But I don't have clue how to get this working - as you might see in the code... I've probably made a lot of mistakes. But I would feel better if someone could suggest some code... #the code starts here use Bio::SeqIO; use Bio::AlignIO; my ($i, $j, $seq1, $seq2, $seqName1, $seqName2); #the complete hash is not shown for the sake of clarity my(%cod2aa) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine ... ... ); my %aa2cod = reverse %cod2aa; my $input = Bio::AlignIO->new(-file => 'align1.fas' , '-format' => 'fasta'); while (my $aln = $input->next_aln()) { push @nuc_seqs, $aln; $seq1 = $aln->get_seq_by_pos(1)->seq(); $seq2 = $aln->get_seq_by_pos(2)->seq(); } #create an array for the sequences push @sequences, $seq1; push @sequences, $seq2; #separate whole sequence into single triplets (subroutine is not shown) foreach (@sequences) { push @tmpArray, CreateTripletArray($_); } #count number of triplets in the array $count = @tmpArray; #compare pairwise and substitute if possible (assuming the length of both genes is equal) foreach $i(0 .. (($count/2)+1)) { $value1=@tmpArray[$i]; $value2=@tmpArray[($count/2)+$i]; if ($value1 eq $value2) { #push equal triplets in new arrays push @newArray1, $value1; push @newArray2, $value2; } else { #split codons to compare each base in a triplet @zwvalue1 = split(//, $value1); @zwvalue2 = split(//, $value2); #I want to get back all codons for one amino acid $aminoacid1 = $cod2aa{$value1}; $aminoacid2 = $cod2aa{$value2}; push @codon1, $aa2cod{$aminoacid1}; push @codon2, $aa2cod{$aminoacid2}; #then something like go through all bases of both codons a every position to find the best fitting partners if (exists $cod2aa{$value1} and $cod2aa{$value2} ) { if (@zwvalue1[0] eq @zwvalue2[0]){ push @newArray1, @zwvalue1[0]; push @newArray2, @zwvalue2[0]; } } } } -- View this message in context: http://www.nabble.com/Help-in-Programming-a-substitution-routine-tp22413520p22413520.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From torsten.seemann at infotech.monash.edu.au Mon Mar 9 15:31:44 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 10 Mar 2009 06:31:44 +1100 Subject: [Bioperl-l] Help in Programming a substitution routine In-Reply-To: <22413520.post@talk.nabble.com> References: <22413520.post@talk.nabble.com> Message-ID: Markus > #the complete hash is not shown for the sake of clarity > my(%cod2aa) = ( > ? ?'TCA' => 'S', ? ?# Serine > ? ?'TCC' => 'S', ? ?# Serine > ? ?'TCG' => 'S', ? ?# Serine > ? ?'TCT' => 'S', ? ?# Serine > ? ?'TTC' => 'F', ? ?# Phenylalanine > ? ?'TTT' => 'F', ? ?# Phenylalanine > ? ?'TTA' => 'L', ? ?# Leucine > ); > my %aa2cod = reverse %cod2aa; Your hash %cod2aa is many-to-one - it has many keys (codons) which map to the same value (amino acid). When you reverse this hash, it will NOT become one-to-many; it will become one-to-one. $aa2cod{'S'} will only return ONE value (which one is random and depends on how Perl orders them). Is this what you intended? -- --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From toniher at softcatala.cat Mon Mar 9 20:58:26 2009 From: toniher at softcatala.cat (Toni Hermoso Pulido) Date: Tue, 10 Mar 2009 01:58:26 +0100 Subject: [Bioperl-l] Indexing HMMER reports Message-ID: <49B5BB32.7020808@softcatala.cat> Hello, I'm trying to index HMMER reports using: http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer However, I have not found enough information in the module code examples for being able to index and get info from those reports. I would like to get HSP alignments associated to hit accession codes from reports distributed in several files. Does anyone have experience on this? After dumping variables, I think I'm just indexing report files, but not its contents. Best regards, -- Toni Hermoso Pulido http://www.cau.cat From hlapp at gmx.net Mon Mar 9 23:34:07 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 9 Mar 2009 23:34:07 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: References: Message-ID: <06AC1C3A-B128-44A5-900C-50EBC22D36B5@gmx.net> You may recall my message to the developer lists of several O|B|F projects in February about the idea of O|B|F applying to Google Summer of Code as a mentoring organization [1]. I felt that the response to this was very positive and encouraging. Although late (sorry, been swamped too much), I've now put up the skeleton of an ideas page at http://open-bio.org/wiki/Google_Summer_Code_2009 I basically modeled (in fact, largely copied) this page after the NESCent Phyloinformatics Summer of Code ideas pages, which I think worked pretty well. We can completely rework this, though - any feedback and suggestions are very much welcome. In the meantime, I need all developers to double check the information under 'Contact'. Would the open-bio-l mailing list indeed reach the prospective mentors and other devs? Will be you be fine with students asking for feedback to their applications on the developers (i.e., this) list? Is the IRC channel (#bioperl) where at least some of the prospective mentors hang out there for students to ask questions during the time they apply? I also need space for the reference information for all projects that will participate with at least one project idea (I would hope that that's all projects) to be added in the 'Open-Bio projects involved' section. ***** Most important of all, if you can volunteer to mentor a project, please post a project idea to the page in the respective section, using the idea template that's there already (copy, paste, and edit). ***** The deadline for organization applications is Friday this week, Mar 13, which is very soon. The ideas page is a major factor and component in how Google scores new mentoring organizations - the more we can show the resourcefulness and diversity of our member projects the more competitive I think we'll be. So all those who responded with ideas or willingness to help out as primary or secondary mentores earlier, I need you to think about and put up your idea(s) now. Cheers, -hilmar [1] http://tinyurl.com/ck7tqe -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Mon Mar 9 23:54:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 9 Mar 2009 20:54:54 -0700 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B5BB32.7020808@softcatala.cat> References: <49B5BB32.7020808@softcatala.cat> Message-ID: that's right - it is really just for indexing reports and not particularly useful beyond that. you can parse a report with Bio::SearchIO or Bio::Tools::HMMer to get domains out. On Mar 9, 2009, at 5:58 PM, Toni Hermoso Pulido wrote: > Hello, > > I'm trying to index HMMER reports using: > http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer > > However, I have not found enough information in the module code > examples for being able to index and get info from those reports. > > I would like to get HSP alignments associated to hit accession codes > from reports distributed in several files. > > Does anyone have experience on this? > > After dumping variables, I think I'm just indexing report files, but > not its contents. > > Best regards, > > -- > Toni Hermoso Pulido > http://www.cau.cat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cjfields at illinois.edu Tue Mar 10 00:01:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Mar 2009 23:01:39 -0500 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B5BB32.7020808@softcatala.cat> References: <49B5BB32.7020808@softcatala.cat> Message-ID: <96BD2748-63EC-4EEC-BFFD-698300648FB4@illinois.edu> I'm not familiar with that one, but I assume the indexing is very likely by the query ID (similar to BLAST report indexing, which I've worked on). So, you would index the report first: -------------------------------- #!/usr/bin/perl -w use strict; use Bio::Index::Hmmer; my $indexfile = shift; my $index = Bio::Index::Hmmer->new( -filename => $indexfile, -write_flag => 1 ); $index->make_index(@ARGV); -------------------------------- Then, retrieve the HMMER report using the query ID and fetch_report(), which just advances the file pointer to the start of the indicated report prior to parsing, and returns the Bio::Search::Result::ResultI, I believe. chris On Mar 9, 2009, at 7:58 PM, Toni Hermoso Pulido wrote: > Hello, > > I'm trying to index HMMER reports using: > http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer > > However, I have not found enough information in the module code > examples for being able to index and get info from those reports. > > I would like to get HSP alignments associated to hit accession codes > from reports distributed in several files. > > Does anyone have experience on this? > > After dumping variables, I think I'm just indexing report files, but > not its contents. > > Best regards, > > -- > Toni Hermoso Pulido > http://www.cau.cat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Tue Mar 10 00:11:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Mar 2009 00:11:43 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> References: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> Message-ID: Hi Mark, On Feb 13, 2009, at 12:14 PM, Mark A. Jensen wrote: > If my newbie status is not a barrier, I would be pleased to mentor a > student. If it is a barrier, I would be pleased to look at > applications > or what have you. as a developer with commit privileges you're not a newbie anymore for the purposes of the program :-) Thanks for offering your help - to help out with mentoring, either put up a project idea yourself as a primary mentor on the page I just posted, or team up as secondary (backup) mentor with someone else. If we are accepted I do hope that we'd get enough applications to also need help with reviewing those. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Mar 10 00:20:01 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Mar 2009 00:20:01 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <52cea20c0902130925x6d831303q5144020f06a42638@mail.gmail.com> References: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> <52cea20c0902130925x6d831303q5144020f06a42638@mail.gmail.com> Message-ID: Thanks Josh, appreciate your offer - I added both you and Mark to the list of mentors on the ideas page I posted in the other email. (Feel free to hyperlink your name to some kind of home page, or email address.) If you find one of the project ideas that I hope will start accumulating soon on the page particularly suited for you to help with mentoring, feel free to add yourself as a mentor for that idea. Cheers, -hilmar On Feb 13, 2009, at 12:25 PM, Joshua Udall wrote: > Ditto here. I would be happy to mentor a student or pitch in some > other way. > > Josh > > On Fri, Feb 13, 2009 at 10:14 AM, Mark A. Jensen > wrote: >> If my newbie status is not a barrier, I would be pleased to mentor a >> student. If it is a barrier, I would be pleased to look at >> applications >> or what have you. >> Mark > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Mar 10 00:20:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Mar 2009 23:20:55 -0500 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: References: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> Message-ID: <662DB6E4-34B9-4456-BA91-F521927F1037@illinois.edu> Hilmar, Let me know if you need help reviewing. -chris On Mar 9, 2009, at 11:11 PM, Hilmar Lapp wrote: > Hi Mark, > > On Feb 13, 2009, at 12:14 PM, Mark A. Jensen wrote: > >> If my newbie status is not a barrier, I would be pleased to mentor >> a student. If it is a barrier, I would be pleased to look at >> applications >> or what have you. > > > as a developer with commit privileges you're not a newbie anymore > for the purposes of the program :-) Thanks for offering your help - > to help out with mentoring, either put up a project idea yourself as > a primary mentor on the page I just posted, or team up as secondary > (backup) mentor with someone else. > > If we are accepted I do hope that we'd get enough applications to > also need help with reviewing those. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Mar 10 00:18:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Mar 2009 00:18:46 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: References: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> Message-ID: <47C5A68DF6694CCBBD7C5295057579C4@NewLife> Hilmar- I am speechless with gratitude. (Ok, I'm rarely speechless...) I had a brainstorm, and will be posting it shortly. Will be happy to pitch in in all ways, and I encourage others to step on up- cheers all- Mark ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "bioPerl List" Sent: Tuesday, March 10, 2009 12:11 AM Subject: Re: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers > Hi Mark, > > On Feb 13, 2009, at 12:14 PM, Mark A. Jensen wrote: > >> If my newbie status is not a barrier, I would be pleased to mentor a >> student. If it is a barrier, I would be pleased to look at >> applications >> or what have you. > > > as a developer with commit privileges you're not a newbie anymore for > the purposes of the program :-) Thanks for offering your help - to > help out with mentoring, either put up a project idea yourself as a > primary mentor on the page I just posted, or team up as secondary > (backup) mentor with someone else. > > If we are accepted I do hope that we'd get enough applications to also > need help with reviewing those. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Tue Mar 10 00:40:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Mar 2009 00:40:55 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <0F41DEF6-1A63-4F4E-A31F-E3D515D474BA@illinois.edu> References: <37ABB1B8019D480CAC42C26A00DA97CE@NewLife> <52cea20c0902130925x6d831303q5144020f06a42638@mail.gmail.com> <0F41DEF6-1A63-4F4E-A31F-E3D515D474BA@illinois.edu> Message-ID: <568E0364-EEE2-4D9E-9893-0F9122713BC8@gmx.net> On Feb 13, 2009, at 3:20 PM, Chris Fields wrote: > We had co-mentors last year for most projects (though in general > there is one primary mentor). Not sure if the same will occur for > this year. We should aim for the same, I think it helped and having a backup mentor allows the primary mentor to be away at a conference etc without running the risk of not being able to help the student when he/ she gets stuck. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Mar 10 00:43:59 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Mar 2009 00:43:59 -0400 Subject: [Bioperl-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <7C21CA74-9694-4962-9C64-A7D95D06CD53@illinois.edu> References: <7C21CA74-9694-4962-9C64-A7D95D06CD53@illinois.edu> Message-ID: <603F4091-8121-4DD9-9BD9-DB1D269E6088@gmx.net> On Feb 13, 2009, at 3:17 PM, Chris Fields wrote: > Hilmar, is there a particular focus on projects this year? The focus for the O|B|F participation would be the O|B|F's member projects (most prominently Bio*, but also EMBOSSS and DAS I suppose). (The focus for NESCent is all things phyloinformatics, as it was in previous years.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mauricio at open-bio.org Tue Mar 10 01:39:23 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Mon, 09 Mar 2009 23:39:23 -0600 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B5BB32.7020808@softcatala.cat> References: <49B5BB32.7020808@softcatala.cat> Message-ID: <49B5FD0B.4070907@open-bio.org> I've been kind of disconnected from this stuff for a while but I "think" you could use (or take ideas from) a script of mine in the core distribution: http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/searchio/parse_hmmsearch.PLS If you give the script a file with the list of reports to parse, it will print out most of the useful data from them. `bp_parse_hmmsearch.pl -h` will provide more help about its use. Mauricio. Toni Hermoso Pulido wrote: > Hello, > > I'm trying to index HMMER reports using: > http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer > > However, I have not found enough information in the module code examples > for being able to index and get info from those reports. > > I would like to get HSP alignments associated to hit accession codes > from reports distributed in several files. > > Does anyone have experience on this? > > After dumping variables, I think I'm just indexing report files, but not > its contents. > > Best regards, > From toniher at softcatala.cat Tue Mar 10 09:49:35 2009 From: toniher at softcatala.cat (Toni Hermoso Pulido) Date: Tue, 10 Mar 2009 14:49:35 +0100 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B5FD0B.4070907@open-bio.org> References: <49B5BB32.7020808@softcatala.cat> <49B5FD0B.4070907@open-bio.org> Message-ID: <49B66FEF.1080902@softcatala.cat> Thanks Mauricio. I will take a look at your script and I will adapt my own ones. As Jason commented before, as this module is documented, it only indexes reports (not their contents) and so it's not very useful in the end. En/na Mauricio Herrera Cuadra ha escrit: > I've been kind of disconnected from this stuff for a while but I "think" > you could use (or take ideas from) a script of mine in the core > distribution: > > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/searchio/parse_hmmsearch.PLS > > > If you give the script a file with the list of reports to parse, it will > print out most of the useful data from them. `bp_parse_hmmsearch.pl -h` > will provide more help about its use. > > Mauricio. > > > Toni Hermoso Pulido wrote: >> Hello, >> >> I'm trying to index HMMER reports using: >> http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer >> >> However, I have not found enough information in the module code >> examples for being able to index and get info from those reports. >> >> I would like to get HSP alignments associated to hit accession codes >> from reports distributed in several files. >> >> Does anyone have experience on this? >> >> After dumping variables, I think I'm just indexing report files, but >> not its contents. >> >> Best regards, >> -- Toni Hermoso Pulido http://www.cau.cat From shwetakagliwal at gmail.com Tue Mar 10 02:57:02 2009 From: shwetakagliwal at gmail.com (shweta kagliwal) Date: Tue, 10 Mar 2009 12:27:02 +0530 Subject: [Bioperl-l] Emboss factory script Message-ID: <16b96b950903092357k7792836eg29507a4736671f74@mail.gmail.com> I want to use Emboss programs. I tried running the attached script. But I get a warning: Msg: Application [water] is not available. can't call method "run" on an undefined value at bpe.pl line 15. I am not able to understand the problem. I just want to make pairwise alignments using emboss programs 'water' and 'needle' programatically. Please help me. Thanks. -------------- next part -------------- A non-text attachment was scrubbed... Name: bpe1.pl Type: application/octet-stream Size: 1103 bytes Desc: not available URL: From cjfields at illinois.edu Tue Mar 10 10:22:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Mar 2009 09:22:42 -0500 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B66FEF.1080902@softcatala.cat> References: <49B5BB32.7020808@softcatala.cat> <49B5FD0B.4070907@open-bio.org> <49B66FEF.1080902@softcatala.cat> Message-ID: I wouldn't say it's not useful; just depends on your purpose. The similar Bio::Index::Blast was very useful when I ran MCL clustering and needed to pull out a single BLAST report (out of a very large concatenated BLAST file) via the query ID. Saying that, it would be nice to come up with a way to index the hit ID but that may require reconfiguring how parsed IDs are stored. chris On Mar 10, 2009, at 8:49 AM, Toni Hermoso Pulido wrote: > Thanks Mauricio. I will take a look at your script and I will adapt my > own ones. > > As Jason commented before, as this module is documented, it only > indexes > reports (not their contents) and so it's not very useful in the > end. > > En/na Mauricio Herrera Cuadra ha escrit: >> I've been kind of disconnected from this stuff for a while but I >> "think" >> you could use (or take ideas from) a script of mine in the core >> distribution: >> >> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/searchio/parse_hmmsearch.PLS >> >> >> If you give the script a file with the list of reports to parse, it >> will >> print out most of the useful data from them. `bp_parse_hmmsearch.pl >> -h` >> will provide more help about its use. >> >> Mauricio. >> >> >> Toni Hermoso Pulido wrote: >>> Hello, >>> >>> I'm trying to index HMMER reports using: >>> http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer >>> >>> However, I have not found enough information in the module code >>> examples for being able to index and get info from those reports. >>> >>> I would like to get HSP alignments associated to hit accession codes >>> from reports distributed in several files. >>> >>> Does anyone have experience on this? >>> >>> After dumping variables, I think I'm just indexing report files, but >>> not its contents. >>> >>> Best regards, >>> > > > -- > Toni Hermoso Pulido > http://www.cau.cat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Mar 10 10:02:48 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 10 Mar 2009 10:02:48 -0400 Subject: [Bioperl-l] Indexing HMMER reports In-Reply-To: <49B66FEF.1080902@softcatala.cat> References: <49B5BB32.7020808@softcatala.cat> <49B5FD0B.4070907@open-bio.org> <49B66FEF.1080902@softcatala.cat> Message-ID: <0B820B48-AE0B-4B7C-A0FA-B9A0A15D81F4@verizon.net> Toni, It's not designed to parse the contents, that's the job of SearchIO. Check the SearchIO HOWTO, it may be informative. You would use this if, for example, you had many hmmsearch results and you had the corresponding ids and you wanted to retrieve the results quickly and directly. Brian O. On Mar 10, 2009, at 9:49 AM, Toni Hermoso Pulido wrote: > Thanks Mauricio. I will take a look at your script and I will adapt my > own ones. > > As Jason commented before, as this module is documented, it only > indexes > reports (not their contents) and so it's not very useful in the > end. > > En/na Mauricio Herrera Cuadra ha escrit: >> I've been kind of disconnected from this stuff for a while but I >> "think" >> you could use (or take ideas from) a script of mine in the core >> distribution: >> >> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/searchio/parse_hmmsearch.PLS >> >> >> If you give the script a file with the list of reports to parse, it >> will >> print out most of the useful data from them. `bp_parse_hmmsearch.pl >> -h` >> will provide more help about its use. >> >> Mauricio. >> >> >> Toni Hermoso Pulido wrote: >>> Hello, >>> >>> I'm trying to index HMMER reports using: >>> http://www.bioperl.org/wiki/Module:Bio::Index::Hmmer >>> >>> However, I have not found enough information in the module code >>> examples for being able to index and get info from those reports. >>> >>> I would like to get HSP alignments associated to hit accession codes >>> from reports distributed in several files. >>> >>> Does anyone have experience on this? >>> >>> After dumping variables, I think I'm just indexing report files, but >>> not its contents. >>> >>> Best regards, >>> > > > -- > Toni Hermoso Pulido > http://www.cau.cat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From SMarkel at accelrys.com Tue Mar 10 13:15:27 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 10 Mar 2009 13:15:27 -0400 Subject: [Bioperl-l] Emboss factory script In-Reply-To: <16b96b950903092357k7792836eg29507a4736671f74@mail.gmail.com> References: <16b96b950903092357k7792836eg29507a4736671f74@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74726CF83BF@exch1-hi.accelrys.net> Shweta, You should check the return value in line 6. When we use BioPerl to control EMBOSS programs we need to set some environment variables. We set the following: EMBOSS_ROOT, EMBOSS_ACDROOT, EMBOSS_DB_DIR, EMBOSS_DATA, and PATH. Not all of the these are needed in every case. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shweta kagliwal > Sent: Monday, 09 March 2009 11:57 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Emboss factory script > > I want to use Emboss programs. > I tried running the attached script. > But I get a warning: > > Msg: Application [water] is not available. > can't call method "run" on an undefined value at bpe.pl line 15. > > I am not able to understand the problem. I just want to make pairwise > alignments using emboss programs 'water' and 'needle' programatically. > Please help me. > Thanks. From maj at fortinbras.us Tue Mar 10 17:06:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Mar 2009 17:06:45 -0400 Subject: [Bioperl-l] CPAN module? In-Reply-To: References: Message-ID: <234E65D3511C415B8F70E1A4DBC1EDC8@NewLife> Hi Caitlin- [I'm copying your question to the bioperl list-- you will find this a great resource for all questions bioperl...] B:D:HIV is part of the core, so it should have installed for you. You may want to upgrade to the stable 1.6 release (see http://www.bioperl.org/wiki/Release_1.6); it's definitely there. Post again to the list if you have trouble-- cheers, Mark ----- Original Message ----- From: Caitlin Gibbons To: maj at fortinbras.us Sent: Tuesday, March 10, 2009 4:54 PM Subject: CPAN module? Hi Dr. Jensen, I recently upgraded to BioPerl 1.5.9_RC4(Win32 ppm) and I was wondering if the ' Bio::DB::HIV' module was built-in or required a separate download? I received the following error when I attempted to use it: C:\Perl\bin>perl -e "use Bio::DB::HIV" Can't locate Bio/DB/HIV.pm in @INC (@INC contains: C:/Perl/site/lib C:/Perl/lib .) at -e line 1. BEGIN failed--compilation aborted at -e line 1. Thanks, ~Caitlin From hartzell at alerce.com Tue Mar 10 20:31:49 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Mar 2009 17:31:49 -0700 Subject: [Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN" Message-ID: <18871.1653.500578.292183@almost.alerce.com> I just tripped over the $self->throw() in B::L::Fuzzy->start where it won't let me use a Fuzzy if the start and end are adjacent. I think that this is going to be one of those More Than One Way To Do It kind of things, but I don't understand the restriction. I have some data for insertions that get a location between two adjacent bases, e.g. s^e, an IN-BETWEEN that starts at s and ends at e. Then I map that location via an alignment to a second sequence and it maps into a gap on the second sequence. In this case the sequences are no longer adjacent, e.g. the left edge of the gap is l and the right edge is r. At this point I know less than I did, and am trying to represent it as >l^ Hi all, I wrote a script that reverse translates a Profam-like protein motif into its correspondent degenerate nucleotide sequence, using Chris' and Brian's suggestions on this thread. You can download it from here: http://github.com/brunoV/revtrans-motif/tree/master Even though right now it's built as a script, I wrote it thinking of adding it as a method; maybe to Bio::Tools::CodonTable. What do you think? The script is a thin wrapper over a module temporarily named "Revtrans", it already has tests. Cheers, Bruno. > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: 08 December 2008 16:41 > To: Samantha Thompson > Cc: bioperl-l List > Subject: Re: [Bioperl-l] Degenerate primer calculation > > > On Dec 8, 2008, at 9:59 AM, Samantha Thompson wrote: > >> Hi, >> >> I also have another similar sequence analysis/primer problem. >> >> What I'd like to do is produce degenerate primers from amino acid >> sequences. >> >> What I did initially was take the codon usage table and rewrite it >> in a >> hash in perl in the form of degenerate codon usage e.g Lysine/K >> would be >> AAR, its reverse complement would be YTT. So my form then takes an >> amino >> acid sequence (derived as a consensus from multiple the alignment of >> homologous proteins) and converts them into degenerate codons and then >> that degenerate primer (actually several primers synthesised with >> different bases pooled together), in order to search for homologues to >> the protein in unsequenced organisms. >> >> I would like to improve this by being able to take a consensus >> described >> more in the form of a Prosite motif (I think thats the right one) such >> as [TS]YW[RKSD] and then develop a degenerate nucleotide sequence >> corresponding to this. >> >> So I'm wondering if bioperl contains anything like this (both prosite >> motif format parsing and degenerate code from multiple alignments or >> such a motif), or if I need to write this myself (which I want to if >> it >> doesn't exist already). >> >> Thanks again, >> >> Sam > > Bio::Tools::CodonTable reverse translates, but I don't think it > accepts patterns. Maybe a pipeline including Bio::Tools::SeqPattern? > Might be an interesting programming challenge if it isn't already set > up for that. > > Chris > ........... > Hi, > > I'm trying to have a go at solving this problem and I'm looking at > Bio::Tools::SeqPattern. What I would like to be able to obtain from a > motif is a list of all the sequences that that sequence could correspond > to. E.g IKL[GP]NM could be IKLGNM or IKLPNM ... so I take both of these > sequences and turn them into degenerate codons for each amino acid. The > complicated part (I thought) here is creating a degenerate codon that > corresponds to either G or P. The way I will do this is by producing > each of the 3 degenerate bases and creating a new codon by creating each > of the 3 degenerate bases separately based on a 2D matrix which contains > the result of 'crossing' each of the nucleotide bases of the degenerate > code with each other. So when you cross the codon for G (GGN) with the > codon for P (CCN) you get a codon that contains the degeneracy of both > (SSN). So then you have a degenerate nucleotide sequence for your > peptide motif. > I have written this part already but I am wondering about the expand > function of Bio::Tools::SeqPattern . I'm not quite sure what it means by > the expanded sequence (if there is just one?) that it returns. I'm > trying to get every possible permutation of the motif is there any > function that does this or will I have to write one to parse it myself? > ..... > This would be great, but what would make things even better would be if > I could take multiple sequence alignments and produce patterns/motifs > from them. Is there a part of BioPerl that does something like this? > > Thanks, > > Sam > From gopu_36 at yahoo.com Wed Mar 11 02:44:08 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Tue, 10 Mar 2009 23:44:08 -0700 (PDT) Subject: [Bioperl-l] Gene name to Accession id Message-ID: <22449585.post@talk.nabble.com> Hi I have a gene list as below: Ets2 Vegfc Capg Ly6c1 Pdlim2 Sema3f Tes Arsj Figf Osr1 Stc1 Ptgs1 6330406I15Rik Fosl2 Ptgs1 How do I get the accesion id like NM_001025602 for the above gene list? Thanks. Regards -- View this message in context: http://www.nabble.com/Gene-name-to-Accession-id-tp22449585p22449585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From markus.liebscher at gmx.de Wed Mar 11 05:14:45 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 11 Mar 2009 02:14:45 -0700 (PDT) Subject: [Bioperl-l] How to access an array of codons Message-ID: <22451211.post@talk.nabble.com> Hi there, if I want to access my hash of arrays %aa2cod I can print out the array key and the first value by for my $key (keys %aa2cod) { print "$key => $aa2cod{$key}[0]\n"; } How can I find a specific codon within this hash of arrays and extract the corresponding array. For instance I am looking for GCA and want to get the whole array [GCA GCC GCG GCT]. I appreciate any comment!! my %aa2cod = ( 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine 'C' => ['TGC', 'TGT'], # Cysteine 'D' => ['GAC', 'GAT'], # Aspartic Acid 'E' => ['GAA', 'GAG'], # Glutamic Acid 'F' => ['TTC', 'TTT'], # Phenylalanine 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine 'H' => ['CAC', 'CAT'], # Histidine 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine 'K' => ['AAA', 'AAG'], # Lysine 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine 'M' => ['ATG'], # Methionine 'N' => ['AAC', 'AAT'], # Asparagine 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline 'Q' => ['CAA', 'CAG'], # Glutamine 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine 'W' => ['TGG'], # Tryptophan 'Y' => ['TAC', 'TAT'], # Tyrosine '_' => ['TAA', 'TAG', 'TGA'] # Stop ); -- View this message in context: http://www.nabble.com/How-to-access-an-array-of-codons-tp22451211p22451211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Wed Mar 11 06:13:28 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 11 Mar 2009 10:13:28 +0000 Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22451211.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> Message-ID: <628aabb70903110313i3bad787ag73bdcc7e92a6692e@mail.gmail.com> > > > How can I find a specific codon within this hash of arrays and extract the > corresponding array. You can't. Hashes are used for looking up a value based on the key. Here, your keys are the amino acids, and the values are the codons. The simplest solution for you, I think, would be to maintain a separate hash with codons as keys and amino acids as values. So, if you start with a given codon and want to find the other codons that code for the same amino acid, you could do something like: my %cod2aa = ( 'GCA' => 'A', 'GCC' => 'A', # and so on ); my $codon = 'GCA'; my $aa = $cod2aa{$codon}; my @codons = $aa2cod{$aa}; If you haven't already, you might want to take a look at Jim Tisdall's book "Beginning Perl for Bioinformatics". It has some good discussion of these sorts of things. Dave From vecchi.b at gmail.com Wed Mar 11 06:15:07 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Wed, 11 Mar 2009 08:15:07 -0200 Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22451211.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> Message-ID: <1a0c1b750903110315i69c0e3f9wea655d6ca846d8a0@mail.gmail.com> Hi, I think this will print out what you want. [%aa2cod declaration here] foreach my $key (keys %aa2cod) { print "@{$aa2cod{$key}}\n"; } Cheers, Bruno. 2009/3/11 manni122 > > Hi there, if I want to access my hash of arrays %aa2cod I can print out the > array key and the first value by > > for my $key (keys %aa2cod) { > print "$key => $aa2cod{$key}[0]\n"; > } > > How can I find a specific codon within this hash of arrays and extract the > corresponding array. For instance I am looking for GCA and want to get the > whole array [GCA GCC GCG GCT]. > > I appreciate any comment!! > > > my %aa2cod = ( > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine > 'C' => ['TGC', 'TGT'], # Cysteine > 'D' => ['GAC', 'GAT'], # Aspartic Acid > 'E' => ['GAA', 'GAG'], # Glutamic Acid > 'F' => ['TTC', 'TTT'], # Phenylalanine > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine > 'H' => ['CAC', 'CAT'], # Histidine > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine > 'K' => ['AAA', 'AAG'], # Lysine > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine > 'M' => ['ATG'], # Methionine > 'N' => ['AAC', 'AAT'], # Asparagine > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline > 'Q' => ['CAA', 'CAG'], # Glutamine > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine > 'W' => ['TGG'], # Tryptophan > 'Y' => ['TAC', 'TAT'], # Tyrosine > '_' => ['TAA', 'TAG', 'TGA'] # Stop > ); > > -- > View this message in context: > http://www.nabble.com/How-to-access-an-array-of-codons-tp22451211p22451211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Wed Mar 11 08:37:11 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 11 Mar 2009 05:37:11 -0700 (PDT) Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22451211.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> Message-ID: <22454228.post@talk.nabble.com> thanks for helping me. I understand! Cheers, Markus. manni122 wrote: > > Hi there, if I want to access my hash of arrays %aa2cod I can print out > the array key and the first value by > > for my $key (keys %aa2cod) { > print "$key => $aa2cod{$key}[0]\n"; > } > > How can I find a specific codon within this hash of arrays and extract the > corresponding array. For instance I am looking for GCA and want to get the > whole array [GCA GCC GCG GCT]. > > I appreciate any comment!! > > > my %aa2cod = ( > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine > 'C' => ['TGC', 'TGT'], # Cysteine > 'D' => ['GAC', 'GAT'], # Aspartic Acid > 'E' => ['GAA', 'GAG'], # Glutamic Acid > 'F' => ['TTC', 'TTT'], # Phenylalanine > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine > 'H' => ['CAC', 'CAT'], # Histidine > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine > 'K' => ['AAA', 'AAG'], # Lysine > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine > 'M' => ['ATG'], # Methionine > 'N' => ['AAC', 'AAT'], # Asparagine > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline > 'Q' => ['CAA', 'CAG'], # Glutamine > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine > 'W' => ['TGG'], # Tryptophan > 'Y' => ['TAC', 'TAT'], # Tyrosine > '_' => ['TAA', 'TAG', 'TGA'] # Stop > ); > > -- View this message in context: http://www.nabble.com/How-to-access-an-array-of-codons-tp22451211p22454228.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Mar 11 10:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Mar 2009 09:04:14 -0500 Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22451211.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> Message-ID: <5208CFBD-D7A2-4731-9F0D-F24BCDC0E641@illinois.edu> On Mar 11, 2009, at 4:14 AM, manni122 wrote: > Hi there, if I want to access my hash of arrays %aa2cod I can print > out the > array key and the first value by > > for my $key (keys %aa2cod) { > print "$key => $aa2cod{$key}[0]\n"; > } > > How can I find a specific codon within this hash of arrays and > extract the > corresponding array. For instance I am looking for GCA and want to > get the > whole array [GCA GCC GCG GCT]. > > I appreciate any comment!! You would have to iterate through all the keys in the hash to get what you want. TIMTOWDI: ------------------------ my ($codons) = map { $aa2cod{$_} } grep { my ($hit) = grep { $_ eq 'GCA' } @{ $aa2cod{$_} }; } sort keys %aa2cod; print join(',',@$codons)."\n" if ref $codons; ------------------------ while (my ($aa, $codons) = each %aa2cod) { if (grep {$_ eq 'GCA'} @{$codons}) { print join(',',@{$codons})."\n"; last; } } ------------------------ chris From markus.liebscher at gmx.de Wed Mar 11 14:16:26 2009 From: markus.liebscher at gmx.de (manni122) Date: Wed, 11 Mar 2009 11:16:26 -0700 (PDT) Subject: [Bioperl-l] walk through array of two sequences splitted into codons Message-ID: <22461343.post@talk.nabble.com> I have an array with two sequences that are splitted into their codons. Is is possible to walk in parallel through these sequences and compare them. I have tried the following (but this doesn't work) foreach my $codon (@tmpArray) { my $count = @tmpArray; #length of the array #start at the beginning of both sequences, assuming the same length (pretty dirty way) for (my $i=0; $i <= (($count/2)+1); $i++) { $value1=@tmpArray[$i]; $value2=@tmpArray[($count/2)+$i]; if ($value1 eq $value2) { #push equal triplets in new arrays, but here it stores all found triplets push @newArray1, $value1; push @newArray2, $value2; } }next; }; thanks again for any comment on this... -- View this message in context: http://www.nabble.com/walk-through-array-of-two-sequences-splitted-into-codons-tp22461343p22461343.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Kevin.M.Brown at asu.edu Wed Mar 11 14:27:30 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 11 Mar 2009 11:27:30 -0700 Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22451211.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> Message-ID: <1A4207F8295607498283FE9E93B775B405D61C56@EX02.asurite.ad.asu.edu> You can, using the Bio::Perl method translate_as_string @array = @{$aa2cod{translate_as_string($codon)}}; > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of manni122 > Sent: Wednesday, March 11, 2009 2:15 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to access an array of codons > > > Hi there, if I want to access my hash of arrays %aa2cod I can > print out the > array key and the first value by > > for my $key (keys %aa2cod) { > print "$key => $aa2cod{$key}[0]\n"; > } > > How can I find a specific codon within this hash of arrays > and extract the > corresponding array. For instance I am looking for GCA and > want to get the > whole array [GCA GCC GCG GCT]. > > I appreciate any comment!! > > > my %aa2cod = ( > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine > 'C' => ['TGC', 'TGT'], # Cysteine > 'D' => ['GAC', 'GAT'], # Aspartic Acid > 'E' => ['GAA', 'GAG'], # Glutamic Acid > 'F' => ['TTC', 'TTT'], # Phenylalanine > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine > 'H' => ['CAC', 'CAT'], # Histidine > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine > 'K' => ['AAA', 'AAG'], # Lysine > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine > 'M' => ['ATG'], # Methionine > 'N' => ['AAC', 'AAT'], # Asparagine > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline > 'Q' => ['CAA', 'CAG'], # Glutamine > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine > 'W' => ['TGG'], # Tryptophan > 'Y' => ['TAC', 'TAT'], # Tyrosine > '_' => ['TAA', 'TAG', 'TGA'] # Stop > ); > > -- > View this message in context: > http://www.nabble.com/How-to-access-an-array-of-codons-tp22451 > 211p22451211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Mar 11 15:26:46 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Mar 2009 08:26:46 +1300 Subject: [Bioperl-l] Gene name to Accession id In-Reply-To: <22449585.post@talk.nabble.com> References: <22449585.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BCD8142D@exchsth.agresearch.co.nz> The easiest way is to grab the gene2accession or gene2refseq lists from here: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ --Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of gopu_36 > Sent: Wednesday, 11 March 2009 7:44 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Gene name to Accession id > > > Hi > I have a gene list as below: > > Ets2 > Vegfc > Capg > Ly6c1 > Pdlim2 > Sema3f > Tes > Arsj > Figf > Osr1 > Stc1 > Ptgs1 > 6330406I15Rik > Fosl2 > Ptgs1 > > How do I get the accesion id like NM_001025602 for the above gene list? > Thanks. > > Regards > -- > View this message in context: http://www.nabble.com/Gene-name-to-Accession-id- > tp22449585p22449585.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Wed Mar 11 15:55:49 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Mar 2009 08:55:49 +1300 Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <22454228.post@talk.nabble.com> References: <22451211.post@talk.nabble.com> <22454228.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BCD81462@exchsth.agresearch.co.nz> Might be a bit hacky but how about a temp hash for the reverse lookup? my %aa2cod = ( 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine 'C' => ['TGC', 'TGT'], # Cysteine 'D' => ['GAC', 'GAT'], # Aspartic Acid 'E' => ['GAA', 'GAG'], # Glutamic Acid 'F' => ['TTC', 'TTT'], # Phenylalanine 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine 'H' => ['CAC', 'CAT'], # Histidine 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine 'K' => ['AAA', 'AAG'], # Lysine 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine 'M' => ['ATG'], # Methionine 'N' => ['AAC', 'AAT'], # Asparagine 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline 'Q' => ['CAA', 'CAG'], # Glutamine 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine 'W' => ['TGG'], # Tryptophan 'Y' => ['TAC', 'TAT'], # Tyrosine '_' => ['TAA', 'TAG', 'TGA'] # Stop ); foreach $cod (keys %aa2cod) { map{$cod2aa{$_} = $cod }(@{$aa2cod{$cod}}); } print join ",", @{$aa2cod{$cod2aa{GCA}}} ,"\n"; --Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of manni122 > Sent: Thursday, 12 March 2009 1:37 a.m. > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] How to access an array of codons > > > thanks for helping me. I understand! > Cheers, Markus. > > > > manni122 wrote: > > > > Hi there, if I want to access my hash of arrays %aa2cod I can print out > > the array key and the first value by > > > > for my $key (keys %aa2cod) { > > print "$key => $aa2cod{$key}[0]\n"; > > } > > > > How can I find a specific codon within this hash of arrays and extract the > > corresponding array. For instance I am looking for GCA and want to get the > > whole array [GCA GCC GCG GCT]. > > > > I appreciate any comment!! > > > > > > my %aa2cod = ( > > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine > > 'C' => ['TGC', 'TGT'], # Cysteine > > 'D' => ['GAC', 'GAT'], # Aspartic Acid > > 'E' => ['GAA', 'GAG'], # Glutamic Acid > > 'F' => ['TTC', 'TTT'], # Phenylalanine > > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine > > 'H' => ['CAC', 'CAT'], # Histidine > > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine > > 'K' => ['AAA', 'AAG'], # Lysine > > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine > > 'M' => ['ATG'], # Methionine > > 'N' => ['AAC', 'AAT'], # Asparagine > > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline > > 'Q' => ['CAA', 'CAG'], # Glutamine > > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine > > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine > > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine > > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine > > 'W' => ['TGG'], # Tryptophan > > 'Y' => ['TAC', 'TAT'], # Tyrosine > > '_' => ['TAA', 'TAG', 'TGA'] # Stop > > ); > > > > > > -- > View this message in context: http://www.nabble.com/How-to-access-an-array-of- > codons-tp22451211p22454228.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From gopu_36 at yahoo.com Wed Mar 11 23:24:08 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Wed, 11 Mar 2009 20:24:08 -0700 (PDT) Subject: [Bioperl-l] Gene name to Accession id In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BCD8142D@exchsth.agresearch.co.nz> References: <22449585.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF321BCD8142D@exchsth.agresearch.co.nz> Message-ID: <22469209.post@talk.nabble.com> Thanks Russell. A great piece of information. Smithies, Russell wrote: > > The easiest way is to grab the gene2accession or gene2refseq lists from > here: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ > > > > --Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809?? > F? +64 3 489 9174? > www.agresearch.co.nz > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of gopu_36 >> Sent: Wednesday, 11 March 2009 7:44 p.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Gene name to Accession id >> >> >> Hi >> I have a gene list as below: >> >> Ets2 >> Vegfc >> Capg >> Ly6c1 >> Pdlim2 >> Sema3f >> Tes >> Arsj >> Figf >> Osr1 >> Stc1 >> Ptgs1 >> 6330406I15Rik >> Fosl2 >> Ptgs1 >> >> How do I get the accesion id like NM_001025602 for the above gene list? >> Thanks. >> >> Regards >> -- >> View this message in context: >> http://www.nabble.com/Gene-name-to-Accession-id- >> tp22449585p22449585.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Gene-name-to-Accession-id-tp22449585p22469209.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From markus.liebscher at gmx.de Thu Mar 12 04:36:28 2009 From: markus.liebscher at gmx.de (manni122) Date: Thu, 12 Mar 2009 01:36:28 -0700 (PDT) Subject: [Bioperl-l] How to access an array of codons In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BCD81462@exchsth.agresearch.co.nz> References: <22451211.post@talk.nabble.com> <22454228.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF321BCD81462@exchsth.agresearch.co.nz> Message-ID: <22471256.post@talk.nabble.com> >From the first entries I thought it wasn't possible ... But now! Thank you for helping, Markus Smithies, Russell wrote: > > Might be a bit hacky but how about a temp hash for the reverse lookup? > > my %aa2cod = ( > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine > 'C' => ['TGC', 'TGT'], # Cysteine > 'D' => ['GAC', 'GAT'], # Aspartic Acid > 'E' => ['GAA', 'GAG'], # Glutamic Acid > 'F' => ['TTC', 'TTT'], # Phenylalanine > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine > 'H' => ['CAC', 'CAT'], # Histidine > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine > 'K' => ['AAA', 'AAG'], # Lysine > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine > 'M' => ['ATG'], # Methionine > 'N' => ['AAC', 'AAT'], # Asparagine > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline > 'Q' => ['CAA', 'CAG'], # Glutamine > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine > 'W' => ['TGG'], # Tryptophan > 'Y' => ['TAC', 'TAT'], # Tyrosine > '_' => ['TAA', 'TAG', 'TGA'] # Stop > ); > > > foreach $cod (keys %aa2cod) { > map{$cod2aa{$_} = $cod }(@{$aa2cod{$cod}}); > } > > print join ",", @{$aa2cod{$cod2aa{GCA}}} ,"\n"; > > > > --Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809?? > F? +64 3 489 9174? > www.agresearch.co.nz > > > --Russell > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of manni122 >> Sent: Thursday, 12 March 2009 1:37 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] How to access an array of codons >> >> >> thanks for helping me. I understand! >> Cheers, Markus. >> >> >> >> manni122 wrote: >> > >> > Hi there, if I want to access my hash of arrays %aa2cod I can print out >> > the array key and the first value by >> > >> > for my $key (keys %aa2cod) { >> > print "$key => $aa2cod{$key}[0]\n"; >> > } >> > >> > How can I find a specific codon within this hash of arrays and extract >> the >> > corresponding array. For instance I am looking for GCA and want to get >> the >> > whole array [GCA GCC GCG GCT]. >> > >> > I appreciate any comment!! >> > >> > >> > my %aa2cod = ( >> > 'A' => ['GCA', 'GCC', 'GCG', 'GCT'], # Alanine >> > 'C' => ['TGC', 'TGT'], # Cysteine >> > 'D' => ['GAC', 'GAT'], # Aspartic Acid >> > 'E' => ['GAA', 'GAG'], # Glutamic Acid >> > 'F' => ['TTC', 'TTT'], # Phenylalanine >> > 'G' => ['GGA', 'GGC', 'GGG', 'GGT'], # Glycine >> > 'H' => ['CAC', 'CAT'], # Histidine >> > 'I' => ['ATA', 'ATC', 'ATT'], # Isoleucine >> > 'K' => ['AAA', 'AAG'], # Lysine >> > 'L' => ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'], # Leucine >> > 'M' => ['ATG'], # Methionine >> > 'N' => ['AAC', 'AAT'], # Asparagine >> > 'P' => ['CCA', 'CCC', 'CCG', 'CCT'], # Proline >> > 'Q' => ['CAA', 'CAG'], # Glutamine >> > 'R' => ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'], # Arginine >> > 'S' => ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'], # Serine >> > 'T' => ['ACA', 'ACC', 'ACG', 'ACT'], # Threonine >> > 'V' => ['GTA', 'GTC', 'GTG', 'GTT'], # Valine >> > 'W' => ['TGG'], # Tryptophan >> > 'Y' => ['TAC', 'TAT'], # Tyrosine >> > '_' => ['TAA', 'TAG', 'TGA'] # Stop >> > ); >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-access-an-array-of- >> codons-tp22451211p22454228.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/How-to-access-an-array-of-codons-tp22451211p22471256.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gopu_36 at yahoo.com Thu Mar 12 09:59:50 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Thu, 12 Mar 2009 06:59:50 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl Message-ID: <22476983.post@talk.nabble.com> Hi, I did go through FAQ from BLAT on how to calculate the precentage identity from http://genome.ucsc.edu/FAQ/FAQblat#blat4 As a new comer, I don;t usederstand on how to implement this. Please let me know how to plugin the script for my output.psl file. Please let me know. It would be of great help. Thanks and Regards. -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22476983.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bosborne11 at verizon.net Thu Mar 12 10:25:17 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 12 Mar 2009 10:25:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [Bug 2789] Script submission: reverse translate Profam motifs In-Reply-To: <200903121139.n2CBddH4018476@portal.open-bio.org> References: <200903121139.n2CBddH4018476@portal.open-bio.org> Message-ID: Bruno, Thank you for this script, I've added it to the scripts/utilities directory. It will be installed as part of Bioperl by default. Brian O. On Mar 12, 2009, at 7:39 AM, bugzilla-daemon at portal.open-bio.org wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2789 > > > > > > ------- Comment #1 from vecchi.b at gmail.com 2009-03-12 07:39 EST > ------- > Created an attachment (id=1261) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1261&action=view) > The script > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? > tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From Russell.Smithies at agresearch.co.nz Thu Mar 12 18:42:08 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 13 Mar 2009 11:42:08 +1300 Subject: [Bioperl-l] reading gff3? In-Reply-To: References: <200903121139.n2CBddH4018476@portal.open-bio.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> What's the trick to reading the fasta attached to gff files? Bio:FeatureIO and Bio::Tools::GFF both seem to ignore it (unless I'm doing it wrong) What I'm trying to do is read in a gff3 file (with attached fasta) then get the sequence for the CDS features contained within. Any ideas? Thanx, --Russell Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz Toitu te whenua, Toitu te tangata Sustain the land, Sustain the people ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Kevin.M.Brown at asu.edu Thu Mar 12 19:19:54 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 12 Mar 2009 16:19:54 -0700 Subject: [Bioperl-l] reading gff3? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> References: <200903121139.n2CBddH4018476@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> Message-ID: <1A4207F8295607498283FE9E93B775B405D61E6D@EX02.asurite.ad.asu.edu> http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3AFeatureIO%3A%3Agff&sort_order=by+method&search_string=Bio%3A%3AFeatureIO%3A%3Agff Method: fasta_mode And comment in the next_seq() method: "access the FASTA section (if any) at the end of the GFF stream. note that this method will return undef if not all features in the stream have been handled" >From a quick read through the code, it seems that once you've gotten all the features, you should be able to call next_seq() to get the fasta information. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Smithies, Russell > Sent: Thursday, March 12, 2009 3:42 PM > To: 'BioPerl List' > Subject: [Bioperl-l] reading gff3? > > What's the trick to reading the fasta attached to gff files? > Bio:FeatureIO and Bio::Tools::GFF both seem to ignore it > (unless I'm doing it wrong) > > What I'm trying to do is read in a gff3 file (with attached > fasta) then get the sequence for the CDS features contained within. > > Any ideas? > > Thanx, > > --Russell > > > Russell Smithies > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809?? > F? +64 3 489 9174? > www.agresearch.co.nz > > Toitu te whenua, Toitu te tangata > Sustain the land, Sustain the people > > > ============================================================== > ========= > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other > use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ============================================================== > ========= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Thu Mar 12 19:23:45 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 12 Mar 2009 19:23:45 -0400 Subject: [Bioperl-l] reading gff3? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> References: <200903121139.n2CBddH4018476@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> Message-ID: <812022F8-CCB7-444B-A97F-5E31E09C4F7A@gmail.com> Hi Russell, I'm away from my computer at the moment, but I'm pretty sure you can call next_seq on a Bio::FeatureIO object. Scott -- Scott Cain, Ph. D. scott at scottcain dot net Ontario Institute for Cancer Research http://gmod.org/ 216 392 3087 On Mar 12, 2009, at 6:42 PM, "Smithies, Russell" wrote: > What's the trick to reading the fasta attached to gff files? > Bio:FeatureIO and Bio::Tools::GFF both seem to ignore it (unless I'm > doing it wrong) > > What I'm trying to do is read in a gff3 file (with attached fasta) > then get the sequence for the CDS features contained within. > > Any ideas? > > Thanx, > > --Russell > > > Russell Smithies > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > Toitu te whenua, Toitu te tangata > Sustain the land, Sustain the people > > > === > ==================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > === > ==================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Mar 12 20:44:14 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 13 Mar 2009 13:44:14 +1300 Subject: [Bioperl-l] reading gff3? In-Reply-To: <1A4207F8295607498283FE9E93B775B405D61E6D@EX02.asurite.ad.asu.edu> References: <200903121139.n2CBddH4018476@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF321BD22E45B@exchsth.agresearch.co.nz> <1A4207F8295607498283FE9E93B775B405D61E6D@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E500@exchsth.agresearch.co.nz> Thanx guys, I was fairly sure it _should_ work :-) The trick is to cal next_seq() AFTER you read all the features. Also, the gff output from GBrowse doesn't work with Bio::FeatureIO as it has a few extra pragmas and is missing the ##FASTA line. Here's what I ended up with: ------------------------- #!perl -w use Bio::FeatureIO; my $gff_in = Bio::FeatureIO->new(-file => "test.gff" , -format => "GFF"); my $seq_out = Bio::SeqIO->new(-fh => \*STDOUT, -format => "fasta"); while ( my $feat = $gff_in->next_feature() ) { push (@cds, $feat) if $feat->primary_tag =~ /CDS/; } ## MUST be after you've read the features!! my $seq = $gff_in->next_seq(); foreach $c (@cds){ $seqobj = Bio::PrimarySeq->new ( -seq => $seq->subseq($c->location), -id => join("_",$c->primary_tag,$c->start, $c->end), ); $seq_out->write_seq($seqobj); } ----------------------------------- --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Friday, 13 March 2009 12:20 p.m. > To: BioPerl List > Subject: Re: [Bioperl-l] reading gff3? > > http://bioperl.org/cgi- > bin/deob_interface.cgi?Search=Search&module=Bio%3A%3AFeatureIO%3A%3Agff&sort_o > rder=by+method&search_string=Bio%3A%3AFeatureIO%3A%3Agff > > Method: fasta_mode > > And comment in the next_seq() method: > > "access the FASTA section (if any) at the end of the GFF stream. note that > this method > will return undef if not all features in the stream have been handled" > > >From a quick read through the code, it seems that once you've gotten all the > features, you should be able to call next_seq() to get the fasta information. > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Smithies, Russell > > Sent: Thursday, March 12, 2009 3:42 PM > > To: 'BioPerl List' > > Subject: [Bioperl-l] reading gff3? > > > > What's the trick to reading the fasta attached to gff files? > > Bio:FeatureIO and Bio::Tools::GFF both seem to ignore it > > (unless I'm doing it wrong) > > > > What I'm trying to do is read in a gff3 file (with attached > > fasta) then get the sequence for the CDS features contained within. > > > > Any ideas? > > > > Thanx, > > > > --Russell > > > > > > Russell Smithies > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T? +64 3 489 3809 > > F? +64 3 489 9174 > > www.agresearch.co.nz > > > > Toitu te whenua, Toitu te tangata > > Sustain the land, Sustain the people > > > > > > ============================================================== > > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other > > use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ============================================================== > > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Mar 12 22:07:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 13 Mar 2009 15:07:30 +1300 Subject: [Bioperl-l] minor bug in Bio::FeatureIO::gff Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E569@exchsth.agresearch.co.nz> I think there's a bug in Bio::FeatureIO::GFF when it's reading fasta from a gff file. If there's no ##FASTA directive in the gff file, it ignores the fasta header and takes the first line of sequence as the primary_id and display_id Eg: Here's some gff: super_1:34972746,34974962 BlastN barley_ta_match 1558 1764 . + . Parent=barley_transgrp_blast:TC135274;Note=%22%22 super_1:34972746,34974962 BlastN barley_ta_match 1911 2262 . + . Parent=barley_transgrp_blast:TC135274;Note=%22%22 >super_1:34972746,34974962 ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT This is what I get from DataDumper: $VAR1 = bless( { 'primary_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC', 'primary_seq' => bless( { 'display_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC ', 'primary_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC ', 'desc' => '', 'seq' => 'GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT', 'alphabet' => 'dna' }, 'Bio::PrimarySeq' ) }, 'Bio::Seq' ); If I put the ##FASTA directive back in the gff file, I get this (which is correct) from DataDumper: $VAR1 = bless( { 'primary_id' => 'super_1:34972746,34974962', 'primary_seq' => bless( { 'display_id' => 'super_1:34972746,34974962', 'primary_id' => 'super_1:34972746,34974962', 'desc' => '', 'seq' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACCGTTGCCG CCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT', 'alphabet' => 'dna' }, 'Bio::PrimarySeq' ) }, 'Bio::Seq' ); It also breaks other stuff as now the $seq->end coord is longer than the sequence length. Also, I think _handle_feature should warn rather than stack dump when it gets an unknown directive type, if only to stop it dying when reading gff dumped from GBrowse. --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Fri Mar 13 00:01:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Mar 2009 23:01:10 -0500 Subject: [Bioperl-l] minor bug in Bio::FeatureIO::gff In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E569@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E569@exchsth.agresearch.co.nz> Message-ID: <94F7F333-CB92-45B2-B57B-6ED6CD216B0F@illinois.edu> Rusell, I would file that in bugzilla. We need to take that into consideration when refactoring Bio::FeatureIO. chris On Mar 12, 2009, at 9:07 PM, Smithies, Russell wrote: > I think there's a bug in Bio::FeatureIO::GFF when it's reading fasta > from a gff file. > If there's no ##FASTA directive in the gff file, it ignores the > fasta header and takes the first line of sequence as the primary_id > and display_id > > Eg: > > Here's some gff: > > super_1:34972746,34974962 BlastN barley_ta_match 1558 1764 . + . > Parent=barley_transgrp_blast:TC135274;Note=%22%22 > super_1:34972746,34974962 BlastN barley_ta_match 1911 2262 . + . > Parent=barley_transgrp_blast:TC135274;Note=%22%22 >> super_1:34972746,34974962 > ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC > GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT > > > This is what I get from DataDumper: > $VAR1 = bless( { > 'primary_id' => > 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC', > 'primary_seq' => bless( { > 'display_id' => > 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC > ', > 'primary_id' => > 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC > ', > 'desc' => '', > 'seq' => > 'GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT', > 'alphabet' => 'dna' > }, 'Bio::PrimarySeq' ) > }, 'Bio::Seq' ); > > If I put the ##FASTA directive back in the gff file, > I get this (which is correct) from DataDumper: > $VAR1 = bless( { > 'primary_id' => 'super_1:34972746,34974962', > 'primary_seq' => bless( { > 'display_id' => > 'super_1:34972746,34974962', > 'primary_id' => > 'super_1:34972746,34974962', > 'desc' => '', > 'seq' => > 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACCGTTGCCG > CCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT', > 'alphabet' => 'dna' > }, 'Bio::PrimarySeq' ) > }, 'Bio::Seq' ); > > > It also breaks other stuff as now the $seq->end coord is longer than > the sequence length. > Also, I think _handle_feature should warn rather than stack dump > when it gets an unknown directive type, if only to stop it dying > when reading gff dumped from GBrowse. > > > --Russell > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 13 00:11:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Mar 2009 23:11:10 -0500 Subject: [Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN" In-Reply-To: <18871.1653.500578.292183@almost.alerce.com> References: <18871.1653.500578.292183@almost.alerce.com> Message-ID: <3C511B14-4B6D-4BBF-9562-F1378075D10E@illinois.edu> (responding to get the discussion going, maybe one of the LocationI designers can respond...) On Mar 10, 2009, at 7:31 PM, George Hartzell wrote: > I just tripped over the $self->throw() in B::L::Fuzzy->start where it > won't let me use a Fuzzy if the start and end are adjacent. > > I think that this is going to be one of those More Than One Way To Do > It kind of things, but I don't understand the restriction. > > I have some data for insertions that get a location between two > adjacent bases, e.g. s^e, an IN-BETWEEN that starts at s and ends at > e. > > Then I map that location via an alignment to a second sequence and it > maps into a gap on the second sequence. In this case the sequences > are no longer adjacent, e.g. the left edge of the gap is l and the > right edge is r. At this point I know less than I did, and am trying > to represent it as >l^ after l and ends somewhere before r. This seems to work. Something like the following? foo/1-17 aataaataaaagggcca bar/1-14 aataaa---aagggcaa ^ So that would be 8^9 for 'foo' (excuse the arrow if you don't have fixed-width text, it's pointing at pos 8 on 'foo'). I would argue that there is no similar feature there at all for 'bar'. The relevant sequence is missing for the insertion, so no feature can be reliably assigned. That would be interesting by itself (at least to me). If one had to mark it I would guess 'bar' is 6^7, not >6^7< as the ends are both known and present, (no '<' or '>') but the sequence maps between the two coordinates. > If I have something on the second sequence in the gap region and am > mapping it back to the first then it's going to end up with adjacent > start and end. I don't think the converse works for the reasons stated above; the position of interest lies in a gap, so it's lossy. If I understand this correctly, we wouldn't know exactly which gap position the insertion would be in; the feature would map back as somewhere within 7-9 (or 7.9). BTW, I believe that latter WITHIN designation is deprecated in the Feature Table definition. At least at this point, the only way I can think of to reliably translate the position back is if the start/end is referring to the alignment column position, not the sequence. SimpleAlign does allow features but I'm not sure if they point to the alignment position or to individual sequences within the alignment. > It seems like it's be useful for me to just use Bio::Location::Fuzzy's > everywhere and use exact info when I have it. Unfortunately several > methods in Bio::Location::Fuzzy check for the first case and throw an > exception. > > I'm hoping that a history lesson or other insight might help me > understand why those checks are there. There don't seem to be any > other checks that prevent one from specifying something exact in a > Fuzzy and there doesn't seem to be any restriction about specifying an > IN-BETWEEN Fuzzy.... > > g. Anyone else have thoughts? I had some issues with Location a while back, dealing with (I think) how split locations deal with strandedness, but I've slept since then... chris From gopu_36 at yahoo.com Fri Mar 13 06:55:14 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Fri, 13 Mar 2009 03:55:14 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: <22476983.post@talk.nabble.com> References: <22476983.post@talk.nabble.com> Message-ID: <22494163.post@talk.nabble.com> Hi As given in that page I tried as below as one of the script. But I always get 100% identity. Please let me know the problem in my following code. Since I don't use any options and am also mapping DNA sequences, I have made following changes in the code: Please let me know the problem. 1 my $sizeMul = 1; and commented: #if ($option{p}) { # $sizeMul = 3; #} else { # $sizeMul = 1; #} 2 if ($sizeDiff < 0) { $sizeDiff = 0; #if ($option{m}) { #$sizeDiff = 0; #} else { #$sizeDiff = -($sizeDiff); #} } =================================================== And perl script is as bleow: Thanks. #!/usr/bin/perl while(<>) { chomp $_; my @v = split(/\t/,$_); get_pid($_); } sub get_pid { my @line = @_; my $pid = (100.0 - (&pslCalcMilliBad(@line) * 0.1)); print "The percentage: $pid\n"; #return $pid; } sub pslCalcMilliBad { my @cols = @_; # sizeNul depens of dna/Prot my $sizeMul = 1; #if ($option{p}) { # $sizeMul = 3; #} else { # $sizeMul = 1; #} # cols[0] matches # cols[1] misMatches # cols[2] repMaches # cols[4] qNumInsert # cols[6] tNumInsert # cols[11] qStart # cols[12] qEnd # cols[15] tStart # cols[16] tEnd my $qAliSize = $sizeMul * ($cols[12] - $cols[11]); my $tAliSize = $cols[16] - $cols[15]; # I want the minimum of qAliSize and tAliSize my $aliSize; $qAliSize < $tAliSize ? $aliSize = $qAliSize : $aliSize = $tAliSize; # return 0 is AliSize == 0 return 0 if ($aliSize <= 0); # size diff my $sizeDiff = $qAliSize - $tAliSize; if ($sizeDiff < 0) { $sizeDiff = 0; #if ($option{m}) { #$sizeDiff = 0; #} else { #$sizeDiff = -($sizeDiff); #} } # insert Factor my $insertFactor = $cols[4]; $insertFactor += $cols[6] unless ($option{m}); my $milliBad = (1000 * ($cols[1]*$sizeMul + $insertFactor + &round(3*log( 1 + $sizeDiff)))) / ($sizeMul * ($cols[0] + $cols[2] + $cols[1])); return $milliBad; } sub round { my $number = shift; return int($number + .5); } gopu_36 wrote: > > Hi, > > I did go through FAQ from BLAT on how to calculate the precentage identity > from http://genome.ucsc.edu/FAQ/FAQblat#blat4 > As a new comer, I don;t usederstand on how to implement this. Please let > me know how to plugin the script for my output.psl file. Please let me > know. It would be of great help. > > Thanks and Regards. > -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22494163.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sufei at sjtu.edu.cn Fri Mar 13 06:55:25 2009 From: sufei at sjtu.edu.cn (=?gb2312?B?y9Xssw==?=) Date: Fri, 13 Mar 2009 18:55:25 +0800 Subject: [Bioperl-l] problem about format convertion Message-ID: <200903131855242656851@sjtu.edu.cn> hi, I am new to bioperl. Recently, I have finished Solexa assembly. I convert it to .ace by velvet's amos2ace. I want to transfer it to .phd file. But when I open it with Consed, it tell me that "BS line refers to a read that isn't in a AF line for this contig ". So I write a script like that : #!/bin/perl -w use Bio::SeqIO; open TEMP, ">xp.phd" or die "can not create the file xp.phd"; close TEMP; $in = Bio::SeqIO->new ( -file => "xp_4.ace", -format => 'ace'); $out = Bio::SeqIO->new (-file => ">xp.phd", -format => 'phd'); while( $seq = $in->next_seq()) { print $seq-> $out->write_seq($seq); } But nothing happen. How should I do? 2009-03-13 From heikki.lehvaslaiho at gmail.com Fri Mar 13 09:24:16 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 13 Mar 2009 15:24:16 +0200 Subject: [Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN" In-Reply-To: <3C511B14-4B6D-4BBF-9562-F1378075D10E@illinois.edu> References: <18871.1653.500578.292183@almost.alerce.com> <3C511B14-4B6D-4BBF-9562-F1378075D10E@illinois.edu> Message-ID: George, Chris is right. You are not suppose to use fuzzy ever!. It was introduced only because in the olden times sequencing was diffucult and you knew that your sequence feature starts before your actual sequence. The early EMBL/GenBank design decision was to mark that with like "CDS <1..2344" when you knew that your sequence did not start from the start of the coding region. You annotate something always in relation to the reference sequence. If there is something, like an insertion in Chris' example, you use IN-BETWEEN notation where the start and end have to be adjacent residues. There is nothing fuzzy in that location, so do not try to add it. Yours, -Heikki 2009/3/13 Chris Fields : > (responding to get the discussion going, maybe one of the LocationI > designers can respond...) > > On Mar 10, 2009, at 7:31 PM, George Hartzell wrote: > >> I just tripped over the $self->throw() in B::L::Fuzzy->start where it >> won't let me use a Fuzzy if the start and end are adjacent. >> >> I think that this is going to be one of those More Than One Way To Do >> It kind of things, but I don't understand the restriction. >> >> I have some data for insertions that get a location between two >> adjacent bases, e.g. s^e, an IN-BETWEEN that starts at s and ends at >> e. >> >> Then I map that location via an alignment to a second sequence and it >> maps into a gap on the second sequence. ?In this case the sequences >> are no longer adjacent, e.g. the left edge of the gap is l and the >> right edge is r. ?At this point I know less than I did, and am trying >> to represent it as >l^> after l and ends somewhere before r. ?This seems to work. > > Something like the following? > > foo/1-17 ? ? ? aataaataaaagggcca > bar/1-14 ? ? ? aataaa---aagggcaa > ? ? ? ? ? ? ? ? ? ? ?^ > So that would be 8^9 for 'foo' (excuse the arrow if you don't have > fixed-width text, it's pointing at pos 8 on 'foo'). ?I would argue that > there is no similar feature there at all for 'bar'. ?The relevant sequence > is missing for the insertion, so no feature can be reliably assigned. ?That > would be interesting by itself (at least to me). > > If one had to mark it I would guess 'bar' is 6^7, not >6^7< as the ends are > both known and present, (no '<' or '>') but the sequence maps between the > two coordinates. > >> If I have something on the second sequence in the gap region and am >> mapping it back to the first then it's going to end up with adjacent >> start and end. > > I don't think the converse works for the reasons stated above; the position > of interest lies in a gap, so it's lossy. ? If I understand this correctly, > we wouldn't know exactly which gap position the insertion would be in; the > feature would map back as somewhere within 7-9 (or 7.9). ?BTW, I believe > that latter WITHIN designation is deprecated in the Feature Table > definition. > > At least at this point, the only way I can think of to reliably translate > the position back is if the start/end is referring to the alignment column > position, not the sequence. ?SimpleAlign does allow features but I'm not > sure if they point to the alignment position or to individual sequences > within the alignment. > >> It seems like it's be useful for me to just use Bio::Location::Fuzzy's >> everywhere and use exact info when I have it. ?Unfortunately several >> methods in Bio::Location::Fuzzy check for the first case and throw an >> exception. >> >> I'm hoping that a history lesson or other insight might help me >> understand why those checks are there. ?There don't seem to be any >> other checks that prevent one from specifying something exact in a >> Fuzzy and there doesn't seem to be any restriction about specifying an >> IN-BETWEEN Fuzzy.... >> >> g. > > Anyone else have thoughts? ?I had some issues with Location a while back, > dealing with (I think) how split locations deal with strandedness, but I've > slept since then... > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - heikki lehvaslaiho gmail com Sent from: London Greater London United Kingdom. From heikki.lehvaslaiho at gmail.com Fri Mar 13 09:53:55 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 13 Mar 2009 15:53:55 +0200 Subject: [Bioperl-l] Degenerate primer calculation In-Reply-To: <1a0c1b750903102016o78950150ge1d42721dd12db9a@mail.gmail.com> References: <1a0c1b750903102016o78950150ge1d42721dd12db9a@mail.gmail.com> Message-ID: Bruno, Adding your code to Bio::Tools::CodonTable might make it slower. We do not want that. Test that first. Even better, why do not put it in as a separate module: Bio::Tools::MotifReverseTranslate? Yours, -Heikki 2009/3/11 Bruno Vecchi : > Hi all, > > I wrote a script that reverse translates a Profam-like protein motif > into its correspondent degenerate nucleotide sequence, using Chris' and > Brian's suggestions on this thread. You can download it from here: > http://github.com/brunoV/revtrans-motif/tree/master > > Even though right now it's built as a script, I wrote it thinking of > adding it as a method; maybe to Bio::Tools::CodonTable. > What do you think? The script is a thin wrapper over a module > temporarily named "Revtrans", it already has tests. > > Cheers, > > ? Bruno. > > >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: 08 December 2008 16:41 >> To: Samantha Thompson >> Cc: bioperl-l List >> Subject: Re: [Bioperl-l] Degenerate primer calculation >> >> >> On Dec 8, 2008, at 9:59 AM, Samantha Thompson wrote: >> >>> Hi, >>> >>> I also have another similar sequence analysis/primer problem. >>> >>> What I'd like to do is produce degenerate primers from amino acid >>> sequences. >>> >>> What I did initially was take the codon usage table and rewrite it >>> in a >>> hash in perl in the form of degenerate codon usage e.g Lysine/K >>> would be >>> AAR, its reverse complement would be YTT. So my form then takes an >>> amino >>> acid sequence (derived as a consensus from multiple the alignment of >>> homologous proteins) and converts them into degenerate codons and then >>> that degenerate primer (actually several primers synthesised with >>> different bases pooled together), in order to search for homologues to >>> the protein in unsequenced organisms. >>> >>> I would like to improve this by being able to take a consensus >>> described >>> more in the form of a Prosite motif (I think thats the right one) such >>> as [TS]YW[RKSD] and then develop a degenerate nucleotide sequence >>> corresponding to this. >>> >>> So I'm wondering if bioperl contains anything like this (both prosite >>> motif format parsing and degenerate code from multiple alignments or >>> such a motif), or if I need to write this myself (which I want to if >>> it >>> doesn't exist already). >>> >>> Thanks again, >>> >>> Sam >> >> Bio::Tools::CodonTable reverse translates, but I don't think it >> accepts patterns. ?Maybe a pipeline including Bio::Tools::SeqPattern? >> Might be an interesting programming challenge if it isn't already set >> up for that. >> >> Chris >> ........... >> Hi, >> >> I'm trying to have a go at solving this problem and I'm looking at >> Bio::Tools::SeqPattern. What I would like to be able to obtain from a >> motif is a list of all the sequences that that sequence could correspond >> to. E.g IKL[GP]NM could be IKLGNM or IKLPNM ... so I take both of these >> sequences and turn them into degenerate codons for each amino acid. The >> complicated part (I thought) here is creating a degenerate codon that >> corresponds to either G or P. The way I will do this is by producing >> each of the 3 degenerate bases and creating a new codon by creating each >> of the 3 degenerate bases separately based on a 2D matrix which contains >> the result of 'crossing' each of the nucleotide bases of the degenerate >> code with each other. So when you cross the codon for G (GGN) with the >> codon for P (CCN) you get a codon that contains the degeneracy of both >> (SSN). So then you have a degenerate nucleotide sequence for your >> peptide motif. >> I have written this part already but I am wondering about the expand >> function of Bio::Tools::SeqPattern . I'm not quite sure what it means by >> the expanded sequence (if there is just one?) that it returns. I'm >> trying to get every possible permutation of the motif is there any >> function that does this or will I have to write one to parse it myself? >> ..... >> This would be great, but what would make things even better would be if >> I could take multiple sequence alignments and produce patterns/motifs >> from them. Is there a part of BioPerl that does something like this? >> >> Thanks, >> >> Sam >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - heikki lehvaslaiho gmail com Sent from: London Greater London United Kingdom. From lichtenj at ohio.edu Fri Mar 13 10:54:42 2009 From: lichtenj at ohio.edu (Jens Lichtenberg) Date: Fri, 13 Mar 2009 10:54:42 -0400 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR Message-ID: <004e01c9a3eb$a45d74c0$ed185e40$@edu> Hello everyone, I am trying to find a way to automatically extract feature information (exons, introns, promoter, utrs, etc.) for given gene ids. Is there a way that I can do that with bioperl? I found structures to hold the information but no extraction features. Jens From David.Messina at sbc.su.se Fri Mar 13 13:47:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 13 Mar 2009 17:47:53 +0000 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR In-Reply-To: <004e01c9a3eb$a45d74c0$ed185e40$@edu> References: <004e01c9a3eb$a45d74c0$ed185e40$@edu> Message-ID: <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> Hey Jens, >From what type of data are you trying to extract feature information? BioPerl can extract features from many different types of data sources, such as Genbank, UniProt, Ensembl, NCBI. See the feature HOWTO for some examples. Dave From David.Messina at sbc.su.se Fri Mar 13 13:50:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 13 Mar 2009 17:50:44 +0000 Subject: [Bioperl-l] problem about format convertion In-Reply-To: <200903131855242656851@sjtu.edu.cn> References: <200903131855242656851@sjtu.edu.cn> Message-ID: <628aabb70903131050g1369d2c8wdb1f881af2549fcc@mail.gmail.com> Hi, Try: #!/bin/perl -w > use Bio::SeqIO; > > $in = Bio::SeqIO->new ( -file => "xp_4.ace", -format => 'ace'); > $out = Bio::SeqIO->new (-file => ">xp.phd", -format => 'phd'); > while( $seq = $in->next_seq()) > { > $out->write_seq($seq); > } > This is covered in the SeqIO HOWTO . Dave From lichtenj at ohio.edu Fri Mar 13 14:40:17 2009 From: lichtenj at ohio.edu (Jens Lichtenberg) Date: Fri, 13 Mar 2009 14:40:17 -0400 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR In-Reply-To: <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> References: <004e01c9a3eb$a45d74c0$ed185e40$@edu> <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> Message-ID: <007301c9a40b$285feae0$791fc0a0$@edu> Dave, I am trying to extract in particular the exon/intron information stored in the Gene Table view of the Genbank entry at NCBI. It seems that that information does not get integrated into the information extracted for the SeqIO object retrieved from Genbank. I would like to use the exons and intron from the feature object of the SeqIO but often the annotation in the Genbank entry is not as complete as the one in the Gene Table. The Ensembl API provides good features for this support, however Ensembl is limited in the organisms that are currently supported. Jens From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Friday, March 13, 2009 1:48 PM To: Jens Lichtenberg Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Extraction of exon, intron, promoter, UTR Hey Jens, >From what type of data are you trying to extract feature information? BioPerl can extract features from many different types of data sources, such as Genbank, UniProt, Ensembl, NCBI. See the feature HOWTO for some examples. Dave From David.Messina at sbc.su.se Fri Mar 13 15:02:12 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 13 Mar 2009 19:02:12 +0000 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR In-Reply-To: <007301c9a40b$285feae0$791fc0a0$@edu> References: <004e01c9a3eb$a45d74c0$ed185e40$@edu> <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> <007301c9a40b$285feae0$791fc0a0$@edu> Message-ID: <628aabb70903131202u1a880694m56b811678e95c38@mail.gmail.com> Hmm, I bet the Gene Table info can be retrieved via EUtilities, for which there is a BioPerl module. Can Chris or someone more familiar with EUtilities confim that? Dave From lzlgboy at gmail.com Fri Mar 13 23:05:21 2009 From: lzlgboy at gmail.com (kenzy ken) Date: Sat, 14 Mar 2009 11:05:21 +0800 Subject: [Bioperl-l] Sliding window alignment How? Message-ID: Hi, How Can I calculate the alignment identity, in each defined sliding windows. i.e. doing a sliding window alignment, Email:lzlgboy at gmail.com ; chenkn at mail2.sysu.edu.cn From hlapp at gmx.net Sat Mar 14 00:42:10 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 14 Mar 2009 00:42:10 -0400 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR In-Reply-To: <007301c9a40b$285feae0$791fc0a0$@edu> References: <004e01c9a3eb$a45d74c0$ed185e40$@edu> <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> <007301c9a40b$285feae0$791fc0a0$@edu> Message-ID: On Mar 13, 2009, at 2:40 PM, Jens Lichtenberg wrote: > I am trying to extract in particular the exon/intron information > stored in the Gene Table view of the Genbank entry at NCBI. Is there a reason you can't obtain the record from NCBI in genbank format? BioPerl lets you do this based on an accession# as input. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From sanjay.harke at gmail.com Sat Mar 14 01:22:50 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Sat, 14 Mar 2009 10:52:50 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 6 In-Reply-To: References: Message-ID: <31bb4380903132222n5effd96coec1847d3f272bfd8@mail.gmail.com> Dear sir, i need to know how use Bioperl with CGI script. sanjay From David.Messina at sbc.su.se Sat Mar 14 05:46:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 14 Mar 2009 09:46:50 +0000 Subject: [Bioperl-l] problem about format convertion In-Reply-To: <200903140952361715591@sjtu.edu.cn> References: <200903131855242656851@sjtu.edu.cn> <628aabb70903131050g1369d2c8wdb1f881af2549fcc@mail.gmail.com> <200903140952361715591@sjtu.edu.cn> Message-ID: <628aabb70903140246o146416baw7836358504757499@mail.gmail.com> Hi again, Please keep replies on the mailing list so everyone can follow and contribute to the discussion. It's impossible to tell what the problem is with the information you've given us. It could be: 1) The .ace file is empty 2) The .ace file is not named "xp_4.ace" 3) The .ace file is not in ace format 4) Bioperl isn't properly installed etcetera I don't have BioPerl installed on my computer at the moment; otherwise you could send me your input file and I could test it here. But really the solution is for you to check your assumptions. Use a debugger. Find a known valid ace file (there's one in the t/data directory that comes with BioPerl) -- what happens if you run that through your code? Think about all of the possible reasons why this may not be working, and test them. Dave On Sat, Mar 14, 2009 at 01:52, ?? wrote: > > hi, > nothing happened as before. The size of xp.phd is zero. > > 2009-03-14 > ------------------------------ > ------------------------------ > *????* Dave Messina > *?????* 2009-03-14 01:50:55 > *????* ?? > *???* bioperl-l > *???* Re: [Bioperl-l] problem about format convertion > Hi, > Try: > > #!/bin/perl -w >> use Bio::SeqIO; >> > > >> $in = Bio::SeqIO->new ( -file => "xp_4.ace", -format => 'ace'); >> $out = Bio::SeqIO->new (-file => ">xp.phd", -format => 'phd'); >> while( $seq = $in->next_seq()) >> { >> $out->write_seq($seq); >> } >> > > > This is covered in the SeqIO HOWTO > . > > > Dave > From David.Messina at sbc.su.se Sat Mar 14 08:52:31 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 14 Mar 2009 12:52:31 +0000 Subject: [Bioperl-l] problem about format convertion In-Reply-To: <20090314202543.79trcn9jdcc48ggc@webmail1.sjtu.edu.cn> References: <200903131855242656851@sjtu.edu.cn> <628aabb70903131050g1369d2c8wdb1f881af2549fcc@mail.gmail.com> <200903140952361715591@sjtu.edu.cn> <628aabb70903140246o146416baw7836358504757499@mail.gmail.com> <20090314202543.79trcn9jdcc48ggc@webmail1.sjtu.edu.cn> Message-ID: <628aabb70903140552q6bf5750ehc768d1511d0a4210@mail.gmail.com> Hi, I strongly urge you to use another ace file to test whether the problem is with the code or with your ace file. I don't know of another ace to phd converter; perhaps google or someone else on the list can suggest one. Dave On Sat, Mar 14, 2009 at 12:25, wrote: > hi, > My ace file is so big to send to you. My data comes from Solexa, so the ace > file is more than 1.5G. Would you like to tell me another way or other tools > to convert ace file to phd file? > > > > Quoting Dave Messina : > > Hi again, >> Please keep replies on the mailing list so everyone can follow and >> contribute to the discussion. >> >> It's impossible to tell what the problem is with the information you've >> given us. >> >> It could be: >> 1) The .ace file is empty >> 2) The .ace file is not named "xp_4.ace" >> 3) The .ace file is not in ace format >> 4) Bioperl isn't properly installed >> etcetera >> >> I don't have BioPerl installed on my computer at the moment; otherwise you >> could send me your input file and I could test it here. >> >> But really the solution is for you to check your assumptions. Use a >> debugger. >> >> Find a known valid ace file (there's one in the t/data directory that >> comes >> with BioPerl) -- what happens if you run that through your code? >> >> Think about all of the possible reasons why this may not be working, and >> test them. >> >> >> Dave >> >> >> >> >> >> On Sat, Mar 14, 2009 at 01:52, ?? wrote: >> >> >>> hi, >>> nothing happened as before. The size of xp.phd is zero. >>> >>> 2009-03-14 >>> ------------------------------ >>> ------------------------------ >>> *????* Dave Messina >>> *?????* 2009-03-14 01:50:55 >>> *????* ?? >>> *???* bioperl-l >>> *???* Re: [Bioperl-l] problem about format convertion >>> Hi, >>> Try: >>> >>> #!/bin/perl -w >>> >>>> use Bio::SeqIO; >>>> >>>> >>> >>> $in = Bio::SeqIO->new ( -file => "xp_4.ace", -format => 'ace'); >>>> $out = Bio::SeqIO->new (-file => ">xp.phd", -format => 'phd'); >>>> while( $seq = $in->next_seq()) >>>> { >>>> $out->write_seq($seq); >>>> } >>>> >>>> >>> >>> This is covered in the SeqIO HOWTO< >>> http://www.bioperl.org/wiki/HOWTO:SeqIO> >>> . >>> >>> >>> Dave >>> >>> >> > > > From cjfields at illinois.edu Sat Mar 14 10:08:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 14 Mar 2009 09:08:47 -0500 Subject: [Bioperl-l] problem about format convertion In-Reply-To: <628aabb70903140552q6bf5750ehc768d1511d0a4210@mail.gmail.com> References: <200903131855242656851@sjtu.edu.cn> <628aabb70903131050g1369d2c8wdb1f881af2549fcc@mail.gmail.com> <200903140952361715591@sjtu.edu.cn> <628aabb70903140246o146416baw7836358504757499@mail.gmail.com> <20090314202543.79trcn9jdcc48ggc@webmail1.sjtu.edu.cn> <628aabb70903140552q6bf5750ehc768d1511d0a4210@mail.gmail.com> Message-ID: Just to note, the Bio::Assembly modules aren't really geared towards using Solexa data as far as I know (and there is the problem of the ulimit bug that still isn't fixed). There is interest in getting them up-to-speed but it needs someone with the time to do it. chris On Mar 14, 2009, at 7:52 AM, Dave Messina wrote: > Hi, > I strongly urge you to use another ace file to test whether the > problem is > with the code or with your ace file. > > I don't know of another ace to phd converter; perhaps google or > someone else > on the list can suggest one. > > Dave > > > > > > On Sat, Mar 14, 2009 at 12:25, wrote: > >> hi, >> My ace file is so big to send to you. My data comes from Solexa, so >> the ace >> file is more than 1.5G. Would you like to tell me another way or >> other tools >> to convert ace file to phd file? >> >> >> >> Quoting Dave Messina : >> >> Hi again, >>> Please keep replies on the mailing list so everyone can follow and >>> contribute to the discussion. >>> >>> It's impossible to tell what the problem is with the information >>> you've >>> given us. >>> >>> It could be: >>> 1) The .ace file is empty >>> 2) The .ace file is not named "xp_4.ace" >>> 3) The .ace file is not in ace format >>> 4) Bioperl isn't properly installed >>> etcetera >>> >>> I don't have BioPerl installed on my computer at the moment; >>> otherwise you >>> could send me your input file and I could test it here. >>> >>> But really the solution is for you to check your assumptions. Use a >>> debugger. >>> >>> Find a known valid ace file (there's one in the t/data directory >>> that >>> comes >>> with BioPerl) -- what happens if you run that through your code? >>> >>> Think about all of the possible reasons why this may not be >>> working, and >>> test them. >>> >>> >>> Dave >>> >>> >>> >>> >>> >>> On Sat, Mar 14, 2009 at 01:52, ?? wrote: >>> >>> >>>> hi, >>>> nothing happened as before. The size of xp.phd is zero. >>>> >>>> 2009-03-14 >>>> ------------------------------ >>>> ------------------------------ >>>> *????* Dave Messina >>>> *?????* 2009-03-14 01:50:55 >>>> *????* ?? >>>> *???* bioperl-l >>>> *???* Re: [Bioperl-l] problem about format convertion >>>> Hi, >>>> Try: >>>> >>>> #!/bin/perl -w >>>> >>>>> use Bio::SeqIO; >>>>> >>>>> >>>> >>>> $in = Bio::SeqIO->new ( -file => "xp_4.ace", -format => 'ace'); >>>>> $out = Bio::SeqIO->new (-file => ">xp.phd", -format => 'phd'); >>>>> while( $seq = $in->next_seq()) >>>>> { >>>>> $out->write_seq($seq); >>>>> } >>>>> >>>>> >>>> >>>> This is covered in the SeqIO HOWTO< >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO> >>>> . >>>> >>>> >>>> Dave >>>> >>>> >>> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Mar 14 10:09:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 14 Mar 2009 09:09:54 -0500 Subject: [Bioperl-l] Extraction of exon, intron, promoter, UTR In-Reply-To: <628aabb70903131202u1a880694m56b811678e95c38@mail.gmail.com> References: <004e01c9a3eb$a45d74c0$ed185e40$@edu> <628aabb70903131047s1c64b6fcg15b0e338970072c0@mail.gmail.com> <007301c9a40b$285feae0$791fc0a0$@edu> <628aabb70903131202u1a880694m56b811678e95c38@mail.gmail.com> Message-ID: On Mar 13, 2009, at 2:02 PM, Dave Messina wrote: > Hmm, I bet the Gene Table info can be retrieved via EUtilities, for > which > there is a BioPerl module. > Can Chris or someone more familiar with EUtilities confim that? > > > > Dave This can be done via EUtilities, but the data can be retrieved via Bio::DB::EntrezGene as a Bio::Seq. See this HOWTO: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris From hlapp at gmx.net Sat Mar 14 18:59:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 14 Mar 2009 18:59:51 -0400 Subject: [Bioperl-l] Google Summer of Code: application submitted, action needed In-Reply-To: <06AC1C3A-B128-44A5-900C-50EBC22D36B5@gmx.net> References: <06AC1C3A-B128-44A5-900C-50EBC22D36B5@gmx.net> Message-ID: <18780940-2491-412A-881E-250C549ACA5F@gmx.net> Hi all, I have submitted the application yesterday for O|B|F participating in the 2009 Google Summer of Code as a mentoring organization. The application is at http://docs.google.com/Doc?id=dhs98hzv_7zn8bxqjm and is also linked to from the ideas page at http://open-bio.org/wiki/Google_Summer_of_Code_2009 Now keep your fingers crossed, Google is slated to announce acceptances on March 18. This is the last cross-project message re: Summer of Code that addresses mentors and our projects; future messages that I'll post across projects will be primarily for students such as announcing whether we are accepted or not and issuing calls for application. **What we need most and right now is action from our projects' developers and from possible mentors.** Google admins will start reviewing organization applications on Monday. The ideas page has 6 project ideas right now - though the ideas are good ones, the quantity won't be particularly impressive to Google. Therefore, if you have an idea for a summer project for a student please use the C& template (it is commented out now but you'll see it when you pull the Ideas section into the editor) and put it up there ASAP. If you're not sure yet who'll mentor, put tentative names there. We don't need a full commitment from mentors until the student application period starts (March 23). Next, for all projects, the leads and/or volunteers should check the reference information for their project: http://open-bio.org/wiki/Google_Summer_of_Code_2009#Open-Bio_projects_involved I just culled these links from the various project websites - it'd be much appreciated if going forward everyone can lend a hand in this. Please review what's there and add or fix as you see fit. *These links must be correct and complete - otherwise potential students may not find you.* Finally, all prospective mentors, primary or secondary, committed or not, and anyone else who would like to volunteer to help out, should subscribe themselves ASAP to the mailing list for communicating GSoC- related administrivia: http://lists.open-bio.org/mailman/listinfo/gsoc I will *not* cross-post all administrative announcements or requests for information, and so you *will* miss information if you don't subscribe yourself there. (Note: students will be subscribed there only *after* acceptance). Those who are considering to mentor, primary or helping out, please also add yourselves to the Mentors section on the Ideas page (and check your link if you're already there): http://open-bio.org/wiki/Google_Summer_of_Code_2009#Mentors Cheers everyone, and fingers crossed! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gopu_36 at yahoo.com Sat Mar 14 21:04:32 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Sat, 14 Mar 2009 18:04:32 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: <22476983.post@talk.nabble.com> References: <22476983.post@talk.nabble.com> Message-ID: <22519073.post@talk.nabble.com> Please let me know whether any other way of calculating the precentage identity for commandline BLAT results? . Thanks. gopu_36 wrote: > > Hi, > > I did go through FAQ from BLAT on how to calculate the precentage identity > from http://genome.ucsc.edu/FAQ/FAQblat#blat4 > As a new comer, I don;t usederstand on how to implement this. Please let > me know how to plugin the script for my output.psl file. Please let me > know. It would be of great help. > > Thanks and Regards. > -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sat Mar 14 21:11:01 2009 From: jason at bioperl.org (Jason Stajich) Date: Sat, 14 Mar 2009 18:11:01 -0700 Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: <22519073.post@talk.nabble.com> References: <22476983.post@talk.nabble.com> <22519073.post@talk.nabble.com> Message-ID: well you could just output it in blast9 output and then the 3rd column is percent identity. On Mar 14, 2009, at 6:04 PM, gopu_36 wrote: > > Please let me know whether any other way of calculating the precentage > identity for commandline BLAT results? . Thanks. > > gopu_36 wrote: >> >> Hi, >> >> I did go through FAQ from BLAT on how to calculate the precentage >> identity >> from http://genome.ucsc.edu/FAQ/FAQblat#blat4 >> As a new comer, I don;t usederstand on how to implement this. >> Please let >> me know how to plugin the script for my output.psl file. Please let >> me >> know. It would be of great help. >> >> Thanks and Regards. >> > > -- > View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From gopu_36 at yahoo.com Sat Mar 14 22:38:31 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Sat, 14 Mar 2009 19:38:31 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: References: <22476983.post@talk.nabble.com> <22519073.post@talk.nabble.com> Message-ID: <22519508.post@talk.nabble.com> Hi Jason, Thanks for your help. I did try as below for testing purpose: blat database.fa query.fa output.psl blat database.fa query.fa output.blast Both the results looks the same and the output.blast also lokks lo=ike normal psl format. As I have always used .psl format, I am not sure what the problem while getting the blast like output? Could you please let me know? Thanks you so much. Jason Stajich-3 wrote: > > well you could just output it in blast9 output and then the 3rd column > is percent identity. > > On Mar 14, 2009, at 6:04 PM, gopu_36 wrote: > >> >> Please let me know whether any other way of calculating the precentage >> identity for commandline BLAT results? . Thanks. >> >> gopu_36 wrote: >>> >>> Hi, >>> >>> I did go through FAQ from BLAT on how to calculate the precentage >>> identity >>> from http://genome.ucsc.edu/FAQ/FAQblat#blat4 >>> As a new comer, I don;t usederstand on how to implement this. >>> Please let >>> me know how to plugin the script for my output.psl file. Please let >>> me >>> know. It would be of great help. >>> >>> Thanks and Regards. >>> >> >> -- >> View this message in context: >> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519508.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From umylny at apbri.org Sat Mar 14 22:55:39 2009 From: umylny at apbri.org (Boris Umylny) Date: Sun, 15 Mar 2009 11:55:39 +0900 Subject: [Bioperl-l] Consulting project Message-ID: <15D6E5BCFD99417FB0B6C0DE71316813@Boris> I apologize in advance if this is not a proper forum for this kind of post. We are looking for an individual to help us integrate some of the Bioconductor tools and a few other algorithms into our product. The work would be done in Chiang Mai, Thailand and have an estimated duration of 3-4 months. In addition to being a commercial venture, we expect that this work would result in a publishable, web-accessible application. If anyone is interested, please feel free to contact me for further details. Sincerely, Boris Umylny From jason at bioperl.org Sat Mar 14 23:09:01 2009 From: jason at bioperl.org (Jason Stajich) Date: Sat, 14 Mar 2009 20:09:01 -0700 Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: <26043046.1018481237084765557.JavaMail.nabble@isper.nabble.com> References: <26043046.1018481237084765557.JavaMail.nabble@isper.nabble.com> Message-ID: you should read the blat documentation - if you type blat alone on the command-line or read any of the documentation UCSC provides for the tool it gives you lots of detailed information on how to run it. I bet you can figure out how to specify a different output format based on reading that documentation. -jason On Mar 14, 2009, at 7:39 PM, gopu_36 at yahoo.com wrote: > Hi Jason, > Thanks for your help. > > I did try as below for testing purpose: > > blat database.fa query.fa output.psl > blat database.fa query.fa output.blast > > Both the results looks the same and the output.blast also lokks > lo=ike normal psl format. As I have always used .psl format, I am > not sure what the problem while getting the blast like output? Could > you please let me know? > Jason Stajich-3 wrote: >> >> well you could just output it in blast9 output and then the 3rd >> column >> is percent identity. >> >> On Mar 14, 2009, at 6:04 PM, gopu_36 wrote: >> >>> >>> Please let me know whether any other way of calculating the >>> precentage >>> identity for commandline BLAT results? . Thanks. >>> >>> gopu_36 wrote: >>>> >>>> Hi, >>>> >>>> I did go through FAQ from BLAT on how to calculate the precentage >>>> identity >>>> from http://genome.ucsc.edu/FAQ/FAQblat#blat4 >>>> As a new comer, I don;t usederstand on how to implement this. >>>> Please let >>>> me know how to plugin the script for my output.psl file. Please let >>>> me >>>> know. It would be of great help. >>>> >>>> Thanks and Regards. >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason at bioperl.org >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Quoted from: > http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519116.html > Jason Stajich jason at bioperl.org From gopu_36 at yahoo.com Sat Mar 14 23:33:01 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Sat, 14 Mar 2009 20:33:01 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: References: <22476983.post@talk.nabble.com> Message-ID: <22519795.post@talk.nabble.com> Yes Jason, Thanks and I did it using -out=type. Thanks. Jason Stajich-3 wrote: > > you should read the blat documentation - if you type blat alone on the > command-line or read any of the documentation UCSC provides for the > tool it gives you lots of detailed information on how to run it. I > bet you can figure out how to specify a different output format based > on reading that documentation. > > -jason > On Mar 14, 2009, at 7:39 PM, gopu_36 at yahoo.com wrote: > >> Hi Jason, >> Thanks for your help. >> >> I did try as below for testing purpose: >> >> blat database.fa query.fa output.psl >> blat database.fa query.fa output.blast >> >> Both the results looks the same and the output.blast also lokks >> lo=ike normal psl format. As I have always used .psl format, I am >> not sure what the problem while getting the blast like output? Could >> you please let me know? >> Jason Stajich-3 wrote: >>> >>> well you could just output it in blast9 output and then the 3rd >>> column >>> is percent identity. >>> >>> On Mar 14, 2009, at 6:04 PM, gopu_36 wrote: >>> >>>> >>>> Please let me know whether any other way of calculating the >>>> precentage >>>> identity for commandline BLAT results? . Thanks. >>>> >>>> gopu_36 wrote: >>>>> >>>>> Hi, >>>>> >>>>> I did go through FAQ from BLAT on how to calculate the precentage >>>>> identity >>>>> from http://genome.ucsc.edu/FAQ/FAQblat#blat4 >>>>> As a new comer, I don;t usederstand on how to implement this. >>>>> Please let >>>>> me know how to plugin the script for my output.psl file. Please let >>>>> me >>>>> know. It would be of great help. >>>>> >>>>> Thanks and Regards. >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason at bioperl.org >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Quoted from: >> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519116.html >> > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519795.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gopu_36 at yahoo.com Sun Mar 15 06:08:42 2009 From: gopu_36 at yahoo.com (gopu_36) Date: Sun, 15 Mar 2009 03:08:42 -0700 (PDT) Subject: [Bioperl-l] script to calculate percentage identity for BLAT psl In-Reply-To: <22519795.post@talk.nabble.com> References: <22476983.post@talk.nabble.com> <22519795.post@talk.nabble.com> Message-ID: <22521531.post@talk.nabble.com> Hi Jason, I have a doubt now. I started blat with psl output as well as with blast output format. But when I try to compare the results, the total number of results vary. For blast output when I grep for Identities, it is more than the psl results. But why is it happening? I thought only the ouput format is going to be in BLAST like but why there is change the number of results? Am I missing something. Basically, I want to get some fields from psl output and trying to get the identity score from blast output result file. Is it not possible? or can I use as below: percent_id = 100x($matches + $rep_matches)/( $matches + $mismatches + $rep_matches ) or percentage identity = ( matches not in repeats + matches in repeats ) / query length Please let me know as I am keen in parsing psl as I don't have bioperl to start parsing blast output. Thanks. gopu_36 wrote: > > Yes Jason, Thanks and I did it using -out=type. Thanks. > > Jason Stajich-3 wrote: >> >> you should read the blat documentation - if you type blat alone on the >> command-line or read any of the documentation UCSC provides for the >> tool it gives you lots of detailed information on how to run it. I >> bet you can figure out how to specify a different output format based >> on reading that documentation. >> >> -jason >> On Mar 14, 2009, at 7:39 PM, gopu_36 at yahoo.com wrote: >> >>> Hi Jason, >>> Thanks for your help. >>> >>> I did try as below for testing purpose: >>> >>> blat database.fa query.fa output.psl >>> blat database.fa query.fa output.blast >>> >>> Both the results looks the same and the output.blast also lokks >>> lo=ike normal psl format. As I have always used .psl format, I am >>> not sure what the problem while getting the blast like output? Could >>> you please let me know? >>> Jason Stajich-3 wrote: >>>> >>>> well you could just output it in blast9 output and then the 3rd >>>> column >>>> is percent identity. >>>> >>>> On Mar 14, 2009, at 6:04 PM, gopu_36 wrote: >>>> >>>>> >>>>> Please let me know whether any other way of calculating the >>>>> precentage >>>>> identity for commandline BLAT results? . Thanks. >>>>> >>>>> gopu_36 wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I did go through FAQ from BLAT on how to calculate the precentage >>>>>> identity >>>>>> from http://genome.ucsc.edu/FAQ/FAQblat#blat4 >>>>>> As a new comer, I don;t usederstand on how to implement this. >>>>>> Please let >>>>>> me know how to plugin the script for my output.psl file. Please let >>>>>> me >>>>>> know. It would be of great help. >>>>>> >>>>>> Thanks and Regards. >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519073.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Quoted from: >>> http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22519116.html >>> >> >> Jason Stajich >> jason at bioperl.org >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- View this message in context: http://www.nabble.com/script-to-calculate-percentage-identity-for-BLAT-psl-tp22476983p22521531.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hlapp at gmx.net Sun Mar 15 09:27:28 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 15 Mar 2009 09:27:28 -0400 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 References: Message-ID: Just forwarding this here in case someone has a moment to check whether there might be anything here that can be improved from the BioPerl side of things. (Note that in Isabelle's case she wouldn't need to install GBrowse and BioPerl in concert.) -hilmar Begin forwarded message: From: iphan Date: March 15, 2009 12:29:10 AM EDT To: "gmod-ajax at lists.sourceforge.net" Subject: [Gmod-ajax] Jbrowse on mac os 10.5 Hello I've managed to install Jbrowse on my linux box, but somehow the tricks that worked there don't seem to apply to the Mac. What are the minimal set of Gbrowse libraries required for Jbrowse? Installing GD with Fink was the only step that worked (see big rant below). Is there a pre-packaged Gbrowse perl library that I can just download and put in my PERL5LIB path to get Jbrowse running? That would really make my day! Thanks, Isabelle I've spent several hours trying to install Gbrowse on my mac (10.5.6) and can't get it to go past the blasted cpan nightmare. Sorry, had to let my frustration out. I really hate cpan with a vengance, and it probably knows it ;-) Errors are inconsistent: cpan can't install either LWP, or Harness, or GD, or Compress::Zlib. I've first tried to execute the Gbrowse installer without any options. It crashes with: Force getting a BioPerl nightly build; the most recent release is too old *** Installing BioPerl *** Downloading bioperl-live... Can't locate LWP.pm in @INC (@INC contains: /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ 5.8.1) at /System/Library/Perl/Extras/5.8.8/LWP/Simple.pm line 37. I tried to install LWP from cpan and get: Can't make directory /Users/iphan/.cpan/build/YAML-0.68 read+writeable: Operation not permitted at /System/Library/Perl/5.8.8/CPAN.pm line 966 I understand that error means I shouldn't have run the gbrowse installer script with 'sudo', but since it won't run otherwise, what the hell am I supposed to do??! Manually do a chmod of .cpan after each installer crash? I've tried pre-installing Bioperl 7 times, from the tarball, from cpan, from cvs, version 1.6.0, 1.5.x version, dev version (guess that's the same as downloading from cvs?), bioperl-live. None of them seem to play with the Gbrowse perl installer: *** (back in Bioperl Build.PL) *** Cannot chdir() back to /Users/iphan/.cpan/build/BioPerl-1.6.0: No such file or directory at Bio/Root/Build.pm line 461. Couldn't run Build.PL: /System/Library/Perl/Extras/5.8.8/Module/Build/Compat.pm line 200. Running make test Make had some problems, maybe interrupted? Won't test Running make install Make had some problems, maybe interrupted? Won't install ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ Gmod-ajax mailing list Gmod-ajax at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gmod-ajax -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From robert.citek at gmail.com Sun Mar 15 10:54:40 2009 From: robert.citek at gmail.com (Robert Citek) Date: Sun, 15 Mar 2009 09:54:40 -0500 Subject: [Bioperl-l] genome position mapping of RefSeq IDs In-Reply-To: <4145b6790902251240o4a876a0auba0c2dd5ddc9cdfe@mail.gmail.com> References: <4145b6790902251240o4a876a0auba0c2dd5ddc9cdfe@mail.gmail.com> Message-ID: <4145b6790903150754m275164a3iff22fe8269d0c240@mail.gmail.com> Thanks, Chris and Alden, Turns out that what I was looking for was here: ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/mapview/ Regards, - Robert On Wed, Feb 25, 2009 at 3:40 PM, Robert Citek wrote: > I have a list of RefSeq IDs for which I can parse out all the > annotation (e.g. exons, SNPs, etc.). ?For this one project, I need the > same coordinate information relative to the genome rather than the > transcript. ?Is such mapping information available? From cjfields at illinois.edu Sun Mar 15 11:27:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 15 Mar 2009 10:27:24 -0500 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: References: Message-ID: <71F9ACDB-B249-40A0-866C-7F574D329F4B@illinois.edu> On Mar 15, 2009, at 8:27 AM, Hilmar Lapp wrote: > Just forwarding this here in case someone has a moment to check > whether there might be anything here that can be improved from the > BioPerl side of things. (Note that in Isabelle's case she wouldn't > need to install GBrowse and BioPerl in concert.) > > -hilmar > > Begin forwarded message: > > From: iphan > Date: March 15, 2009 12:29:10 AM EDT > To: "gmod-ajax at lists.sourceforge.net" ajax at lists.sourceforge.net> > Subject: [Gmod-ajax] Jbrowse on mac os 10.5 > > Hello > > > I've managed to install Jbrowse on my linux box, but somehow the > tricks that > worked there don't seem to apply to the Mac. What are the minimal > set of > Gbrowse libraries required for Jbrowse? > > Installing GD with Fink was the only step that worked (see big rant > below). > > Is there a pre-packaged Gbrowse perl library that I can just > download and > put in my PERL5LIB path to get Jbrowse running? > > That would really make my day! > > Thanks, > > Isabelle > > > > I've spent several hours trying to install Gbrowse on my mac > (10.5.6) and > can't get it to go past the blasted cpan nightmare. Sorry, had to > let my > frustration out. I really hate cpan with a vengance, and it probably > knows > it ;-) On Mac OS X one has to use 'sudo' for installation of anything from CPAN unless installing to a location the user has read-write privs to, like a local directory. You can set up BioPerl by just downloading it, unpacking the tarball, and adding it to PERL5LIB (that's how I run it). > Errors are inconsistent: cpan can't install either LWP, or Harness, > or GD, > or Compress::Zlib. I've first tried to execute the Gbrowse installer > without > any options. It crashes with: > > Force getting a BioPerl nightly build; the most recent release is > too old > > *** Installing BioPerl *** > Downloading bioperl-live... > Can't locate LWP.pm in @INC (@INC contains: > /System/Library/Perl/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- > multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ > 5.8.1) > at /System/Library/Perl/Extras/5.8.8/LWP/Simple.pm line 37. > > I tried to install LWP from cpan and get: > Can't make directory /Users/iphan/.cpan/build/YAML-0.68 read > +writeable: > Operation not permitted at /System/Library/Perl/5.8.8/CPAN.pm line 966 Appears to be the same issue described above (need to use 'sudo'). > I understand that error means I shouldn't have run the gbrowse > installer > script with 'sudo', but since it won't run otherwise, what the hell > am I > supposed to do??! Manually do a chmod of .cpan after each installer > crash? > > I've tried pre-installing Bioperl 7 times, from the tarball, from > cpan, > from cvs, version 1.6.0, 1.5.x version, dev version (guess that's > the same > as downloading from cvs?), bioperl-live. None of them seem to play > with the > Gbrowse perl installer: > > *** (back in Bioperl Build.PL) *** > Cannot chdir() back to /Users/iphan/.cpan/build/BioPerl-1.6.0: No > such file > or directory at Bio/Root/Build.pm line 461. > Couldn't run Build.PL: > /System/Library/Perl/Extras/5.8.8/Module/Build/Compat.pm line 200. > Running make test > Make had some problems, maybe interrupted? Won't test > Running make install > Make had some problems, maybe interrupted? Won't install > > References: <31bb4380903132222n5effd96coec1847d3f272bfd8@mail.gmail.com> Message-ID: You need to be more specific. Most modules work for me just using 'use Bio::SeqIO' or similar. chris On Mar 14, 2009, at 12:22 AM, Sanjay Harke wrote: > Dear sir, > > i need to know how use Bioperl with CGI script. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MAB at stowers.org Sun Mar 15 16:01:04 2009 From: MAB at stowers.org (Blanchette, Marco) Date: Sun, 15 Mar 2009 15:01:04 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature oddities Message-ID: One more Bio::DB::SeqFeature oddities... I am trying to fetch the genes underlying a given location in the genome. The MySql database was populated whit the most recent Drosophila gff3 annotation using bp_seqfeature_load.pl using the fast mode (-f). Somehow, there is something I don?t get with the features Ogene?. For example, I can get 10 mRNAs from the fourth chromosome using: use Bio::DB::SeqFeature::Store; our $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:dmel_5_15'); map{print $_->name, "\n"} (map{$_->get_SeqFeatures('mRNA')} $db->get_features_by_location(-seq_id => '4'))[1..10]; Results: CG11231-RA CG33797-RA CG11077-RA JYalpha-RB RpS3A-RA RpS3A-RB RpS3A-RD CG32006-RA CG31997-RA CG2219-RA However, if I try to fetch the Ogene? features, I get an empty array as in map{print $_->name, "\n"} (map{$_->get_SeqFeatures('gene')} $db->get_features_by_location(-seq_id => '4'))[1..10]; >>NOTHING And I get the same whether I use OGene? or OGENE?. Any idea what?s going on here? Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 From MEC at stowers.org Sun Mar 15 16:28:29 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Sun, 15 Mar 2009 15:28:29 -0500 Subject: [Bioperl-l] :DB::SeqFeature oddities In-Reply-To: References: Message-ID: Marco, It looks to me like in your first call, get_features_by_location is returning many features, the first 10 of which happen to be gene features that have subordinate mRNA features which are collected in the map. In the 2nd call, you are now trying to collect the 'gene' features subordinate to the first 10 features. But these 10 features ARE themselves gene features and so don't have subordinate gene features. Depending on what you are trying to do, you might want instead: map{print $_->name, "\n"} ($db->get_features_by_location(-seq_id =>'4', -types => [qw(mRNA)]))[1..10]; and map{print $_->name, "\n"} ($db->get_features_by_location(-seq_id =>'4', -types => [qw(gene)]))[1..10]; --Malcolm ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Blanchette, Marco [MAB at stowers.org] Sent: Sunday, March 15, 2009 3:01 PM To: BioPerl mailing list Subject: [Bioperl-l] Bio::DB::SeqFeature oddities One more Bio::DB::SeqFeature oddities... I am trying to fetch the genes underlying a given location in the genome. The MySql database was populated whit the most recent Drosophila gff3 annotation using bp_seqfeature_load.pl using the fast mode (-f). Somehow, there is something I don?t get with the features Ogene?. For example, I can get 10 mRNAs from the fourth chromosome using: use Bio::DB::SeqFeature::Store; our $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:dmel_5_15'); map{print $_->name, "\n"} (map{$_->get_SeqFeatures('mRNA')} $db->get_features_by_location(-seq_id => '4'))[1..10]; Results: CG11231-RA CG33797-RA CG11077-RA JYalpha-RB RpS3A-RA RpS3A-RB RpS3A-RD CG32006-RA CG31997-RA CG2219-RA However, if I try to fetch the Ogene? features, I get an empty array as in map{print $_->name, "\n"} (map{$_->get_SeqFeatures('gene')} $db->get_features_by_location(-seq_id => '4'))[1..10]; >>NOTHING And I get the same whether I use OGene? or OGENE?. Any idea what?s going on here? Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Mar 15 18:41:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 15 Mar 2009 17:41:37 -0500 Subject: [Bioperl-l] CGI and BioPerl Re: Bioperl-l Digest, Vol 71, Issue 6 In-Reply-To: <31bb4380903150833h46f65345n9c80dbbe12ac5391@mail.gmail.com> References: <31bb4380903132222n5effd96coec1847d3f272bfd8@mail.gmail.com> <31bb4380903150833h46f65345n9c80dbbe12ac5391@mail.gmail.com> Message-ID: Sanjay, This needs to stay on the list for archiving. I'm unsure about how CGI works on Windows but several things about this script appear fundamentally incorrect. Are you trying to comment out lines c-style? Perl comments start with '#'. Also, under 'use strict' you must declare scope on all variables using 'my', 'our', etc. You also don't 'use Bio::DB::RefSeq' (a compile-time fail) and 'use DBI' w/o actually using DBI. I'll say this nicely: I'm wary about proceeding with this further unless you can establish you understand some of the basic fundamentals of Perl programming. The issues above make me think you need to brush up on that a bit. chris On Mar 15, 2009, at 10:33 AM, Sanjay Harke wrote: > Dear charis, > > following script s not working with cgi script > > #!c:\perl\bin\perl.exe > > print "Content-type: text/html\n\n"; > > //use warnings; > //use CGI; > use strict; > //use CGI qw(standard); > use CGI::Carp qw(fatalsToBrowser warningsToBrowser); > use DBI; > > use Bio::DB::RefSeq; > > $gb = new Bio::DB::RefSeq; > > $seq = $gb->get_Seq_by_acc('NM_007304'); > > print $seq->seq(); > > kindly guide me in this script execution > > but one thing i want clear following script is working with command > line > #!/usr/local/bin/perl > > # Get a sequence from RefSeq by accession number > > use Bio::DB::RefSeq; > > $gb = new Bio::DB::RefSeq; > > $seq = $gb->get_Seq_by_acc('NM_007304'); > > print $seq->seq(); > > kindly help me for cgi execution > > sanjay From Russell.Smithies at agresearch.co.nz Sun Mar 15 23:30:24 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Mar 2009 16:30:24 +1300 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> I've just been reformatting some fasta from dbSNP that contains ambiguity symbols and had to come up with a non-standard solution as I needed to turn validation off in Bio::Seq but couldn't see an obvious way to do it. dbSNP format the fasta for their rs* SNPs like this: >gnl|dbSNP|rs29025902 rs=29025902|pos=251|len=501|taxid=9913|mol="genomic"|class=1|alleles="A/G"|build=125 AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA GGACACAGAA TCTCCTTCAC ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT AGTGGTGAGT TTTGAGTATG AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT ATTTATTTGC CACATGGCTT GTGGGTTTGC R GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC CTAACAACTG GACCACCAGG GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT TCCTGAAGAT TTAAGCATGG GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA GGGGAGGTTT GATTGTGTCT ACATGCTAAT But I needed it (for the Sequenom) with the ambiguity symbol on brackets like this: >gnl|dbSNP|rs29025902 AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTCACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAGTATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACATGGCTTGTGGGTTTGC[A/G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGAGTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGCAGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTACTTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT I came up with a 50% BioPerl solution using Bio::SeqIO and Bio::Tools::IUPAC but the final printing of the fasta is dome 'manually'. It's a bit hacky but I'm particularly proud of the obscurity I managed in my switch statement :-) ############################### #!perl -w use Bio::SeqIO; use Bio::Tools::IUPAC; use Switch; my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) or die $!; while (my $seqobj = $seq_in->next_seq) { my $seq = sprintf ">%s\n", $seqobj->display_id; my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); foreach (@{$iupac_seq->{_alpha}}){ switch($#{@{$_}}){ case 0{$seq.= @{$_}[0]} case 1{$seq .= sprintf "[%s]",join("/",@{$_})} else {$seq .= 'N'} } } print "$seq\n"; } ####################### :-) --Russell Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz Toitu te whenua, Toitu te tangata Sustain the land, Sustain the people ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Sun Mar 15 23:52:55 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Mar 2009 16:52:55 +1300 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E970@exchsth.agresearch.co.nz> Typical Microsoft Outlook changed my formatting. This is what dbSNP fasta looks like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=snp&report=fasta&mode=text&id=29011166 --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Monday, 16 March 2009 4:30 p.m. > To: 'Bioperl-l at lists.open-bio.org' > Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols > > I've just been reformatting some fasta from dbSNP that contains ambiguity > symbols and had to come up with a non-standard solution as I needed to turn > validation off in Bio::Seq but couldn't see an obvious way to do it. > > dbSNP format the fasta for their rs* SNPs like this: > > >gnl|dbSNP|rs29025902 > rs=29025902|pos=251|len=501|taxid=9913|mol="genomic"|class=1|alleles="A/G"|bui > ld=125 > AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA GGACACAGAA > TCTCCTTCAC > ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT AGTGGTGAGT > TTTGAGTATG > AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT ATTTATTTGC > CACATGGCTT > GTGGGTTTGC > R > GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC CTAACAACTG > GACCACCAGG > GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT TCCTGAAGAT > TTAAGCATGG > GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA GGGGAGGTTT > GATTGTGTCT > ACATGCTAAT > > But I needed it (for the Sequenom) with the ambiguity symbol on brackets like > this: > > >gnl|dbSNP|rs29025902 > AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTC > ACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAG > TATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACA > TGGCTTGTGGGTTTGC[A/G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGA > GTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGC > AGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTAC > TTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT > > I came up with a 50% BioPerl solution using Bio::SeqIO and Bio::Tools::IUPAC > but the final printing of the fasta is dome 'manually'. > It's a bit hacky but I'm particularly proud of the obscurity I managed in my > switch statement :-) > > ############################### > #!perl -w > > use Bio::SeqIO; > use Bio::Tools::IUPAC; > use Switch; > > my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) or die $!; > > while (my $seqobj = $seq_in->next_seq) { > my $seq = sprintf ">%s\n", $seqobj->display_id; > my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); > foreach (@{$iupac_seq->{_alpha}}){ > switch($#{@{$_}}){ > case 0{$seq.= @{$_}[0]} > case 1{$seq .= sprintf "[%s]",join("/",@{$_})} > else {$seq .= 'N'} > } > } > print "$seq\n"; > } > ####################### > > :-) > > --Russell > > > Russell Smithies > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > > Toitu te whenua, Toitu te tangata > Sustain the land, Sustain the people > > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Mon Mar 16 05:57:55 2009 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 16 Mar 2009 10:57:55 +0100 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> Message-ID: <716af09c0903160257v14e9d424kf2232cf876b581b4@mail.gmail.com> Hi Russel, Sometimes I have a non-standard symbol too (though these relate to residue letters generally and do not change sequence length). I add these letters to the variable used for matching: $Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?'; Bernd On Mon, Mar 16, 2009 at 4:30 AM, Smithies, Russell wrote: > I've just been reformatting some fasta from dbSNP that contains ambiguity symbols and had to come up with a non-standard solution as I needed to turn validation off in Bio::Seq but couldn't see an obvious way to do it. > > dbSNP format the fasta for their rs* SNPs like this: > >>gnl|dbSNP|rs29025902 rs=29025902|pos=251|len=501|taxid=9913|mol="genomic"|class=1|alleles="A/G"|build=125 > AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA GGACACAGAA TCTCCTTCAC > ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT AGTGGTGAGT TTTGAGTATG > AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT ATTTATTTGC CACATGGCTT > GTGGGTTTGC > R > GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC CTAACAACTG GACCACCAGG > GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT TCCTGAAGAT TTAAGCATGG > GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA GGGGAGGTTT GATTGTGTCT > ACATGCTAAT > > But I needed it (for the Sequenom) with the ambiguity symbol on brackets like this: > >>gnl|dbSNP|rs29025902 > AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTCACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAGTATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACATGGCTTGTGGGTTTGC[A/G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGAGTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGCAGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTACTTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT > > I came up with a 50% BioPerl solution using Bio::SeqIO and Bio::Tools::IUPAC but the final printing of the fasta is dome 'manually'. > It's a bit hacky but I'm particularly proud of the obscurity I managed in my switch statement :-) > > ############################### > #!perl -w > > use Bio::SeqIO; > use Bio::Tools::IUPAC; > use Switch; > > my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) or die $!; > > while (my $seqobj = $seq_in->next_seq) { > my $seq = sprintf ">%s\n", $seqobj->display_id; > my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); > foreach (@{$iupac_seq->{_alpha}}){ > switch($#{@{$_}}){ > case 0{$seq.= @{$_}[0]} > case 1{$seq .= sprintf "[%s]",join("/",@{$_})} > else {$seq .= 'N'} > } > } > print "$seq\n"; > } > ####################### > > :-) > > --Russell > > > Russell Smithies > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > Toitu te whenua, Toitu te tangata > Sustain the land, Sustain the people > > > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From markus.liebscher at gmx.de Mon Mar 16 07:29:14 2009 From: markus.liebscher at gmx.de (manni122) Date: Mon, 16 Mar 2009 04:29:14 -0700 (PDT) Subject: [Bioperl-l] Problems with a while loop, please help Message-ID: <22536301.post@talk.nabble.com> Hi there, I need a little bit help in simple programming. I have two arrays in which codons are stored. These arrays I need to compare. If I find similar codons at the same position in both arrays everything is fine, if not I am splitting every codon further into the bases. @zwvalue1 = split(//, $value1); @zwvalue2 = split(//, $value2); And then I want to compare base 1 from array 1 with base 1 from array 2 and so on. But if I am looking at the values from the following while loop my $k = 0; while ($k <= 2) { $value1a = @zwvalue1[$k]; $value1b = @zwvalue2[$k]; $value2a = @zwvalue1[$k+1]; $value2b = @zwvalue2[$k+1]; $value3a = @zwvalue1[$k+2]; $value3b = @zwvalue2[$k+2]; print "@zwvalue1, $value1a, $value2a, $value3a\n"; } continue { $k++; I get as output for example: T A A, T, A, A T A A, A, A, T A A, A, , Why is this loop running 3times? I just need this loop running once to compare all three bases at one time. Any help is appreciated. -- View this message in context: http://www.nabble.com/Problems-with-a-while-loop%2C-please-help-tp22536301p22536301.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Mon Mar 16 08:16:43 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Mar 2009 13:16:43 +0100 Subject: [Bioperl-l] Problems with a while loop, please help In-Reply-To: <22536301.post@talk.nabble.com> References: <22536301.post@talk.nabble.com> Message-ID: <628aabb70903160516re20e5d6n80424e471f2e2d2b@mail.gmail.com> > Why is this loop running 3times ? You designed the loop to execute three times with your while: while ($k <= 2) { > If $k=0 initially, and you add 1 to k each time through the loop, how many times will the above statement be true? By explicitly writing $k+1 and $k+2 within the loop, you're accomplishing the same thing that the loop is designed to do, namely accessing each individual element of the arrays. The questions you are asking are about fundamental aspects of Perl programming and are not specific to BioPerl or even biological questions answered with Perl. This will sound harsh, but you really need to spend some time learning how to program in Perl rather than asking questions on this mailing list. Start by turning on warnings. If your program had the line use warnings; near the top of the program, you would see the following output: Scalar value @zwvalue1[$k] better written as $zwvalue1[$k] at test.pl line 16. Scalar value @zwvalue2[$k] better written as $zwvalue2[$k] at test.pl line 17. Scalar value @zwvalue1[$k+1] better written as $zwvalue1[$k+1] at test.pl line 18. Scalar value @zwvalue2[$k+1] better written as $zwvalue2[$k+1] at test.pl line 19. Scalar value @zwvalue1[$k+2] better written as $zwvalue1[$k+2] at test.pl line 20. Scalar value @zwvalue2[$k+2] better written as $zwvalue2[$k+2] at test.pl line 21. Name "main::value1b" used only once: possible typo at test.pl line 17. Name "main::value2b" used only once: possible typo at test.pl line 19. Name "main::value3b" used only once: possible typo at test.pl line 21. T A A, T, A, A Use of uninitialized value in concatenation (.) or string at test.pl line 23. T A A, A, A, Use of uninitialized value in concatenation (.) or string at test.pl line 23. Use of uninitialized value in concatenation (.) or string at test.pl line 23. T A A, A, , All of those warning messages are clues to some of the problems in your code. Dave From maj at fortinbras.us Mon Mar 16 07:55:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Mar 2009 07:55:36 -0400 Subject: [Bioperl-l] Problems with a while loop, please help In-Reply-To: <22536301.post@talk.nabble.com> References: <22536301.post@talk.nabble.com> Message-ID: <9E9894F070A44DB3B7C268BC31364B69@NewLife> Markus: I think you want to increment $k by 3 in the continue block; i.e., $k+=3, not $k++ cheers MAJ ----- Original Message ----- From: "manni122" To: Sent: Monday, March 16, 2009 7:29 AM Subject: [Bioperl-l] Problems with a while loop, please help > > Hi there, I need a little bit help in simple programming. I have two arrays > in which codons are stored. These arrays I need to compare. If I find > similar codons at the same position in both arrays everything is fine, if > not I am splitting every codon further into the bases. > > @zwvalue1 = split(//, $value1); > @zwvalue2 = split(//, $value2); > > And then I want to compare base 1 from array 1 with base 1 from array 2 and > so on. But if I am looking at the values from the following while loop > > my $k = 0; > > while ($k <= 2) { > $value1a = @zwvalue1[$k]; > $value1b = @zwvalue2[$k]; > $value2a = @zwvalue1[$k+1]; > $value2b = @zwvalue2[$k+1]; > $value3a = @zwvalue1[$k+2]; > $value3b = @zwvalue2[$k+2]; > > print "@zwvalue1, $value1a, $value2a, $value3a\n"; > } > continue { > $k++; > > I get as output for example: > T A A, T, A, A > T A A, A, A, > T A A, A, , > > Why is this loop running 3times? I just need this loop running once to > compare all three bases at one time. > Any help is appreciated. > -- > View this message in context: > http://www.nabble.com/Problems-with-a-while-loop%2C-please-help-tp22536301p22536301.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Mar 16 08:48:18 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Mar 2009 07:48:18 -0500 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols In-Reply-To: <716af09c0903160257v14e9d424kf2232cf876b581b4@mail.gmail.com> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> <716af09c0903160257v14e9d424kf2232cf876b581b4@mail.gmail.com> Message-ID: I wouldn't rely completely on that functionality for anything in 1.7 and beyond (and that may include trunk). It's very likely that particular global will change to an instance variable in the future to deal with seqs coming from same alphabet but different symbol tables; it causes all sorts of subtle scoping problems within code. See here for some reasoning re: that and other Bio::Align issues (the page is a stub for now): http://www.bioperl.org/wiki/Align_Refactor chris On Mar 16, 2009, at 4:57 AM, Bernd Web wrote: > Hi Russel, > > Sometimes I have a non-standard symbol too (though these relate to > residue letters generally and do not change sequence length). I add > these letters to the variable used for matching: > > $Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?'; > > > Bernd > > On Mon, Mar 16, 2009 at 4:30 AM, Smithies, Russell > wrote: >> I've just been reformatting some fasta from dbSNP that contains >> ambiguity symbols and had to come up with a non-standard solution >> as I needed to turn validation off in Bio::Seq but couldn't see an >> obvious way to do it. >> >> dbSNP format the fasta for their rs* SNPs like this: >> >>> gnl|dbSNP|rs29025902 rs=29025902|pos=251|len=501|taxid=9913| >>> mol="genomic"|class=1|alleles="A/G"|build=125 >> AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA >> GGACACAGAA TCTCCTTCAC >> ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT >> AGTGGTGAGT TTTGAGTATG >> AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT >> ATTTATTTGC CACATGGCTT >> GTGGGTTTGC >> R >> GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC >> CTAACAACTG GACCACCAGG >> GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT >> TCCTGAAGAT TTAAGCATGG >> GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA >> GGGGAGGTTT GATTGTGTCT >> ACATGCTAAT >> >> But I needed it (for the Sequenom) with the ambiguity symbol on >> brackets like this: >> >>> gnl|dbSNP|rs29025902 >> AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTCACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAGTATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACATGGCTTGTGGGTTTGC[A/G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGAGTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGCAGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTACTTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT >> >> I came up with a 50% BioPerl solution using Bio::SeqIO and >> Bio::Tools::IUPAC but the final printing of the fasta is dome >> 'manually'. >> It's a bit hacky but I'm particularly proud of the obscurity I >> managed in my switch statement :-) >> >> ############################### >> #!perl -w >> >> use Bio::SeqIO; >> use Bio::Tools::IUPAC; >> use Switch; >> >> my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) or >> die $!; >> >> while (my $seqobj = $seq_in->next_seq) { >> my $seq = sprintf ">%s\n", $seqobj->display_id; >> my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); >> foreach (@{$iupac_seq->{_alpha}}){ >> switch($#{@{$_}}){ >> case 0{$seq.= @{$_}[0]} >> case 1{$seq .= sprintf >> "[%s]",join("/",@{$_})} >> else {$seq .= 'N'} >> } >> } >> print "$seq\n"; >> } >> ####################### >> >> :-) >> >> --Russell >> >> >> Russell Smithies >> Bioinformatics Applications Developer >> T +64 3 489 9085 >> E russell.smithies at agresearch.co.nz >> Invermay Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T +64 3 489 3809 >> F +64 3 489 9174 >> www.agresearch.co.nz >> >> Toitu te whenua, Toitu te tangata >> Sustain the land, Sustain the people >> >> >> >> >> >> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Mar 16 08:56:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Mar 2009 08:56:27 -0400 Subject: [Bioperl-l] Resources for Perl newbies Message-ID: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> Hi All- I thought that I would run down a couple of nice resources full of friendly patient folk for biologists new to Perl. While many of us are happy to chime in on Perl-specific questions, this list is mainly for questions that relate directly to BioPerl modules, or difficulties/ideas for applying those modules to specific tasks. Since BioPerl is written at a high level in sometimes terse code, the new perler jumping in is apt to stumble. No problem--just have a look at some of the following: http://learn.perl.org/faq/beginners.html - here you can find excellent links, plus directions for signing up to http://www.nntp.perl.org/group/perl.beginners/, a newsgroup devoted to questions at all levels Extremely sane, erudite, helpful, and patient responses can be had at http://www.perlmonks.org/ where there are perl resources for all user levels. When I google an error message, and see a perlmonks link, that's the one I click first (unless Hilmar appears in the links, of course). I've found that the best way to learn any language and make it stay learned is to work through a smarter/more experienced person's code. There are plenty of opportunities for that in the BioPerl distributions. Wondering about that error? Read the error message, which contains the line number in the module that threw it, then open that module and dig in! Another great resource for seeing how the developers expect their modules to work are the regression tests, found in the ../t directories of the BioPerl distribution directory (find it at the Subversion repository: http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t.) These are just Perl programs; nothing scary about them. Another repos directory of interest is "examples": http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/examples where many many examples of BioPerl module use reside. And don't forget the many introductory resources on the BioPerl wiki (http://www.bioperl.org/wiki), including the HOWTOs (http://www.bioperl.org/wiki/HOWTO) and the Scrapbook (http://www.bioperl.org/wiki/Category:Scrapbook). There are many short and simple code snippets in these places. Happy Coding- Mark From dan.bolser at gmail.com Mon Mar 16 09:03:53 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 16 Mar 2009 13:03:53 +0000 Subject: [Bioperl-l] How to run this "bp_search2gff.pl" In-Reply-To: <20090305112844.M50039@nrcpb.org> References: <20090305112844.M50039@nrcpb.org> Message-ID: <2c8757af0903160603r66ac7d0coeda072b69a43b0cf@mail.gmail.com> 2009/3/5 ajay : > Respected sir, > > > good morning! sir my self ajay a resercher in the bioinformatics ,sir please > guide me how to run this bioperl scripts > > sir i have compleated the local BLAST P serch ?of 255 fasta files and the > blast result ?was saved in the (.txt) format ?sir ?i need to convert the > blast result to GFF3 format using this script "bp_search2gff.pl" > > > but i can not understand how to ?input my BLAST result file and get the > output file in the ?GFF3 format > > so please tell me ?the steps ?to input the blast result file > > what changes i have to made in the program If you run "bp_search2gff.pl -h" you get the following output, which should help answer your question: SYNOPSIS Usage: search2gff [-o outputfile] [-f reportformat] [-i inputfilename] OR file1 file2 .. DESCRIPTION This script will turn a protein Search report (BLASTP, FASTP, SSEARCH, AXT, WABA) into a GFF File. The options are: -i infilename - (optional) inputfilename, will read either ARGV files or from STDIN -o filename - the output filename [default STDOUT] -f format - search result format (blast, fasta,waba,axt) (ssearch is fasta format). default is blast. -t/--type seqtype - if you want to see query or hit information in the GFF report -s/--source - specify the source (will be algorithm name otherwise like BLASTN) --method - the method tag (primary_tag) of the features (default is similarity) --scorefunc - a string or a file that when parsed evaluates to a closure which will be passed a feature object and that returns the score to be printed --locfunc - a string or a file that when parsed evaluates to a closure which will be passed two features, query and hit, and returns the location (Bio::LocationI compliant) for the GFF3 feature created for each HSP; the closure may use the clone_loc() and create_loc() functions for convenience, see their PODs --onehsp - only print the first HSP feature for each hit -p/--parent - the parent to which HSP features should refer if not the name of the hit or query (depending on --type) --target/--notarget - whether to always add the Target tag or not -h - this help menu --version - GFF version to use (put a 3 here to use gff 3) --component - generate GFF component fields (chromosome) -m/--match - generate a ?match? line which is a container of all the similarity HSPs --addid - add ID tag in the absence of --match -c/--cutoff - specify an evalue cutoff Additionally specify the filenames you want to process on the com- mand-line. If no files are specified then STDIN input is assumed. You specify this by doing: search2gff < file1 file2 file3 AUTHOR Jason Stajich, jason-at-bioperl-dot-org Contributors Hilmar Lapp, hlapp-at-gmx-dot-net clone_loc Title : clone_loc Usage : my $l = clone_loc($feature->location); Function: Helper function to simplify the task of cloning locations for --locfunc closures. Presently simply implemented using Storable::dclone(). Example : Returns : A L object of the same type and with the same properties as the argument, but physically different. All structured properties will be cloned as well. Args : A L compliant object create_loc Title : create_loc Usage : my $l = create_loc("10..12"); Function: Helper function to simplify the task of creating locations for --locfunc closures. Creates a location from a feature- table formatted string. Example : Returns : A L object representing the location given as formatted string. Args : A GenBank feature-table formatted string. perl v5.8.8 2009-01-14 BP_SEARCH2GFF(1) Dan. > sir plesae help me > i sahll be very thankfull to you > > > waiting for your kind responce > > sincerly your!s > Ajay > > > > > > Ajay Kumar Mahato > (Research Associate) > National Research Centre for Plant Biotechnology (IARI) > (http://www.nrcpb.org) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From isabelle.phan at sbri.org Mon Mar 16 09:39:29 2009 From: isabelle.phan at sbri.org (Isabelle Phan) Date: Mon, 16 Mar 2009 06:39:29 -0700 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: <71F9ACDB-B249-40A0-866C-7F574D329F4B@illinois.edu> Message-ID: Hello Hilmar Many thanks for your help. I executed the gbrowse installer (which deals with installing bioperl) with 'sudo'. However, I explicitely set --install_param_str and --build_param_str To a directory in my home directory, e.g. I didn't want to install the libraries in the default locations, so the package can be wiped out without breaking my system. Best greetings Isabelle On 3/15/09 11:27 AM, "Chris Fields" wrote: > On Mar 15, 2009, at 8:27 AM, Hilmar Lapp wrote: > >> Just forwarding this here in case someone has a moment to check >> whether there might be anything here that can be improved from the >> BioPerl side of things. (Note that in Isabelle's case she wouldn't >> need to install GBrowse and BioPerl in concert.) >> >> -hilmar >> >> Begin forwarded message: >> >> From: iphan >> Date: March 15, 2009 12:29:10 AM EDT >> To: "gmod-ajax at lists.sourceforge.net" > ajax at lists.sourceforge.net> >> Subject: [Gmod-ajax] Jbrowse on mac os 10.5 >> >> Hello >> >> >> I've managed to install Jbrowse on my linux box, but somehow the >> tricks that >> worked there don't seem to apply to the Mac. What are the minimal >> set of >> Gbrowse libraries required for Jbrowse? >> >> Installing GD with Fink was the only step that worked (see big rant >> below). >> >> Is there a pre-packaged Gbrowse perl library that I can just >> download and >> put in my PERL5LIB path to get Jbrowse running? >> >> That would really make my day! >> >> Thanks, >> >> Isabelle >> >> >> >> I've spent several hours trying to install Gbrowse on my mac >> (10.5.6) and >> can't get it to go past the blasted cpan nightmare. Sorry, had to >> let my >> frustration out. I really hate cpan with a vengance, and it probably >> knows >> it ;-) > > On Mac OS X one has to use 'sudo' for installation of anything from > CPAN unless installing to a location the user has read-write privs to, > like a local directory. You can set up BioPerl by just downloading > it, unpacking the tarball, and adding it to PERL5LIB (that's how I run > it). > >> Errors are inconsistent: cpan can't install either LWP, or Harness, >> or GD, >> or Compress::Zlib. I've first tried to execute the Gbrowse installer >> without >> any options. It crashes with: >> >> Force getting a BioPerl nightly build; the most recent release is >> too old >> >> *** Installing BioPerl *** >> Downloading bioperl-live... >> Can't locate LWP.pm in @INC (@INC contains: >> /System/Library/Perl/5.8.8/darwin-thread-multi-2level >> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- >> multi-2level >> /Library/Perl/5.8.8 /Library/Perl >> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level >> /Network/Library/Perl/5.8.8 /Network/Library/Perl >> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level >> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ >> 5.8.1) >> at /System/Library/Perl/Extras/5.8.8/LWP/Simple.pm line 37. >> >> I tried to install LWP from cpan and get: >> Can't make directory /Users/iphan/.cpan/build/YAML-0.68 read >> +writeable: >> Operation not permitted at /System/Library/Perl/5.8.8/CPAN.pm line 966 > > Appears to be the same issue described above (need to use 'sudo'). > >> I understand that error means I shouldn't have run the gbrowse >> installer >> script with 'sudo', but since it won't run otherwise, what the hell >> am I >> supposed to do??! Manually do a chmod of .cpan after each installer >> crash? >> >> I've tried pre-installing Bioperl 7 times, from the tarball, from >> cpan, >> from cvs, version 1.6.0, 1.5.x version, dev version (guess that's >> the same >> as downloading from cvs?), bioperl-live. None of them seem to play >> with the >> Gbrowse perl installer: >> >> *** (back in Bioperl Build.PL) *** >> Cannot chdir() back to /Users/iphan/.cpan/build/BioPerl-1.6.0: No >> such file >> or directory at Bio/Root/Build.pm line 461. >> Couldn't run Build.PL: >> /System/Library/Perl/Extras/5.8.8/Module/Build/Compat.pm line 200. >> Running make test >> Make had some problems, maybe interrupted? Won't test >> Running make install >> Make had some problems, maybe interrupted? Won't install >> >> > I may test this out and let you know how it goes; will report back > what I find. > > chris > From jay at jays.net Mon Mar 16 09:40:54 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 16 Mar 2009 08:40:54 -0500 Subject: [Bioperl-l] Resources for Perl newbies In-Reply-To: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> References: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> Message-ID: <49BE56E6.4050303@jays.net> Mark A. Jensen wrote: > I thought that I would run down a couple of nice resources > full of friendly patient folk for biologists new to Perl. You may also find IRC helpful: http://www.bioperl.org/wiki/IRC It's very quiet. Welcoming new lurkers of all skill levels. :) Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at illinois.edu Mon Mar 16 10:27:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Mar 2009 09:27:30 -0500 Subject: [Bioperl-l] Resources for Perl newbies In-Reply-To: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> References: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> Message-ID: <08CCB4A7-30A5-4BBF-B9DD-968286D24C3C@illinois.edu> Don't forget the very useful (shout out to Dave M!) Deobfuscator: http://www.bioperl.org/wiki/Deobfuscator chris On Mar 16, 2009, at 7:56 AM, Mark A. Jensen wrote: > Hi All- > > I thought that I would run down a couple of nice resources > full of friendly patient folk for biologists new to Perl. While > many of us are happy to chime in on Perl-specific questions, > this list is mainly for questions that relate directly to > BioPerl modules, or difficulties/ideas for applying those > modules to specific tasks. Since BioPerl is written at a > high level in sometimes terse code, the new perler jumping > in is apt to stumble. No problem--just have a look at some > of the following: > > http://learn.perl.org/faq/beginners.html > - here you can find excellent links, plus directions for signing up > to > http://www.nntp.perl.org/group/perl.beginners/, > a newsgroup devoted to questions at all levels > > Extremely sane, erudite, helpful, and patient responses can be had at > > http://www.perlmonks.org/ > > where there are perl resources for all user levels. When I > google an error message, and see a perlmonks link, that's > the one I click first (unless Hilmar appears in the links, of > course). > > I've found that the best way to learn any language and > make it stay learned is to work through a smarter/more > experienced person's code. There are plenty of opportunities > for that in the BioPerl distributions. Wondering about that > error? Read the error message, which contains the line number > in the module that threw it, then open that module and dig in! > Another great resource for seeing how the developers expect > their modules to work are the regression tests, found in the > ../t directories of the BioPerl distribution directory (find it > at the Subversion repository: > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t.) > These are just Perl programs; nothing scary about them. > Another repos directory of interest is "examples": > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/examples > where many many examples of BioPerl module use reside. > > And don't forget the many introductory resources on the > BioPerl wiki (http://www.bioperl.org/wiki), including the > HOWTOs (http://www.bioperl.org/wiki/HOWTO) and > the Scrapbook (http://www.bioperl.org/wiki/Category:Scrapbook). > There are many short and simple code snippets in these places. > > Happy Coding- > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 16 10:35:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Mar 2009 07:35:34 -0700 Subject: [Bioperl-l] Fwd: How to run this "bp_search2gff.pl" References: <2c8757af0903160605y7c683fdey36ba1341007f3706@mail.gmail.com> Message-ID: <4BEC0745-5D16-4180-9F28-F4ECA465FC92@bioperl.org> redirecting message again. Begin forwarded message: > From: Dan Bolser > Date: March 16, 2009 6:05:19 AM PDT > To: jason at bioperl.org > Subject: Fwd: [Bioperl-l] How to run this "bp_search2gff.pl" > > Ajay? > > > ---------- Forwarded message ---------- > From: Dan Bolser > Date: 2009/3/16 > Subject: Re: [Bioperl-l] How to run this "bp_search2gff.pl" > To: bioperl-l at lists.open-bio.org > > > 2009/3/5 ajay : >> Respected sir, >> >> >> good morning! sir my self ajay a resercher in the >> bioinformatics ,sir please >> guide me how to run this bioperl scripts >> >> sir i have compleated the local BLAST P serch of 255 fasta files >> and the >> blast result was saved in the (.txt) format sir i need to >> convert the >> blast result to GFF3 format using this script "bp_search2gff.pl" >> >> >> but i can not understand how to input my BLAST result file and get >> the >> output file in the GFF3 format >> >> so please tell me the steps to input the blast result file >> >> what changes i have to made in the program > > If you run "bp_search2gff.pl -h" you get the following output, which > should help answer your question: > > SYNOPSIS > Usage: > search2gff [-o outputfile] [-f reportformat] [-i > inputfilename] OR > file1 file2 .. > > DESCRIPTION > This script will turn a protein Search report (BLASTP, FASTP, > SSEARCH, > AXT, WABA) into a GFF File. > > The options are: > > -i infilename - (optional) inputfilename, will read > either ARGV files or from STDIN > -o filename - the output filename [default STDOUT] > -f format - search result format (blast, > fasta,waba,axt) > (ssearch is fasta format). default is > blast. > -t/--type seqtype - if you want to see query or hit > information > in the GFF report > -s/--source - specify the source (will be algorithm > name > otherwise like BLASTN) > --method - the method tag (primary_tag) of the > features > (default is similarity) > --scorefunc - a string or a file that when parsed > evaluates > to a closure which will be passed a > feature > object and that returns the score to > be printed > --locfunc - a string or a file that when parsed > evaluates > to a closure which will be passed two > features, query and hit, and returns the > location (Bio::LocationI compliant) > for the > GFF3 feature created for each HSP; the > closure > may use the clone_loc() and create_loc() > functions for convenience, see their > PODs > --onehsp - only print the first HSP feature for > each hit > -p/--parent - the parent to which HSP features > should refer > if not the name of the hit or query > (depending > on --type) > --target/--notarget - whether to always add the Target tag > or not > -h - this help menu > --version - GFF version to use (put a 3 here to > use gff 3) > --component - generate GFF component fields > (chromosome) > -m/--match - generate a ?match? line which is a > container > of all the similarity HSPs > --addid - add ID tag in the absence of --match > -c/--cutoff - specify an evalue cutoff > > Additionally specify the filenames you want to process on the > com- > mand-line. If no files are specified then STDIN input is > assumed. You > specify this by doing: search2gff < file1 file2 file3 > > AUTHOR > Jason Stajich, jason-at-bioperl-dot-org > > Contributors > Hilmar Lapp, hlapp-at-gmx-dot-net > > clone_loc > > Title : clone_loc > Usage : my $l = clone_loc($feature->location); > Function: Helper function to simplify the task of cloning > locations > for --locfunc closures. > > Presently simply implemented using > Storable::dclone(). > Example : > Returns : A L object of the same type and > with the > same properties as the argument, but physically > different. > All structured properties will be cloned as well. > Args : A L compliant object > > create_loc > > Title : create_loc > Usage : my $l = create_loc("10..12"); > Function: Helper function to simplify the task of creating > locations > for --locfunc closures. Creates a location from a > feature- > table formatted string. > > Example : > Returns : A L object representing the > location given > as formatted string. > Args : A GenBank feature-table formatted string. > > perl v5.8.8 2009-01-14 > BP_SEARCH2GFF(1) > > > Dan. > >> sir plesae help me >> i sahll be very thankfull to you >> >> >> waiting for your kind responce >> >> sincerly your!s >> Ajay >> >> >> >> >> >> Ajay Kumar Mahato >> (Research Associate) >> National Research Centre for Plant Biotechnology (IARI) >> (http://www.nrcpb.org) >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Jason Stajich jason at bioperl.org From maj at fortinbras.us Mon Mar 16 10:46:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Mar 2009 10:46:16 -0400 Subject: [Bioperl-l] Resources for Perl newbies In-Reply-To: <08CCB4A7-30A5-4BBF-B9DD-968286D24C3C@illinois.edu> References: <5CFEB599D11A4D5697D6A5825274E7EE@NewLife> <08CCB4A7-30A5-4BBF-B9DD-968286D24C3C@illinois.edu> Message-ID: Hey All- Have collated this thread in a refactor of http://www.bioperl.org/wiki/Getting_Started under http://www.bioperl.org/wiki/Getting_Started#For_Perl_newbies cheers- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Monday, March 16, 2009 10:27 AM Subject: Re: [Bioperl-l] Resources for Perl newbies > Don't forget the very useful (shout out to Dave M!) Deobfuscator: > > http://www.bioperl.org/wiki/Deobfuscator > > chris > > On Mar 16, 2009, at 7:56 AM, Mark A. Jensen wrote: > >> Hi All- >> >> I thought that I would run down a couple of nice resources >> full of friendly patient folk for biologists new to Perl. While >> many of us are happy to chime in on Perl-specific questions, >> this list is mainly for questions that relate directly to >> BioPerl modules, or difficulties/ideas for applying those >> modules to specific tasks. Since BioPerl is written at a >> high level in sometimes terse code, the new perler jumping >> in is apt to stumble. No problem--just have a look at some >> of the following: >> >> http://learn.perl.org/faq/beginners.html >> - here you can find excellent links, plus directions for signing up >> to >> http://www.nntp.perl.org/group/perl.beginners/, >> a newsgroup devoted to questions at all levels >> >> Extremely sane, erudite, helpful, and patient responses can be had at >> >> http://www.perlmonks.org/ >> >> where there are perl resources for all user levels. When I >> google an error message, and see a perlmonks link, that's >> the one I click first (unless Hilmar appears in the links, of >> course). >> >> I've found that the best way to learn any language and >> make it stay learned is to work through a smarter/more >> experienced person's code. There are plenty of opportunities >> for that in the BioPerl distributions. Wondering about that >> error? Read the error message, which contains the line number >> in the module that threw it, then open that module and dig in! >> Another great resource for seeing how the developers expect >> their modules to work are the regression tests, found in the >> ../t directories of the BioPerl distribution directory (find it >> at the Subversion repository: >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t.) >> These are just Perl programs; nothing scary about them. >> Another repos directory of interest is "examples": >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/examples >> where many many examples of BioPerl module use reside. >> >> And don't forget the many introductory resources on the >> BioPerl wiki (http://www.bioperl.org/wiki), including the >> HOWTOs (http://www.bioperl.org/wiki/HOWTO) and >> the Scrapbook (http://www.bioperl.org/wiki/Category:Scrapbook). >> There are many short and simple code snippets in these places. >> >> Happy Coding- >> Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Mon Mar 16 11:34:14 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Mar 2009 08:34:14 -0700 Subject: [Bioperl-l] BLAST + BioPerl In-Reply-To: <689750f30903160751wa2ed84fie90676fff15ba2eb@mail.gmail.com> References: <689750f30903160751wa2ed84fie90676fff15ba2eb@mail.gmail.com> Message-ID: <1DD9F3B1-D899-4D2E-A325-91599B731725@bioperl.org> I think it is best if you ask this question on the mailing list but the script search_overview.PLS that was previously in the scripts/ biographics directory but is now in the Bio-Graphics/scripts directory on sourceforge. Examples like this also the bread and butter of the Bio-Graphics HOWTO on the bioperl website. -jason On Mar 16, 2009, at 7:51 AM, Christian M. Probst wrote: > Hi, Jason. > > I have found your presentation about BioPerl I and II (CSHL > Programming for Biology 2008) when I was looking for a way to format > BLAST plain text output. > > I am looking specifically on creating the graphical view of HSPs. > Although your code in that presentation does not create this, it has > an example, mentioning Bio::Graphics as the way to do that (as it > would be my first guess). > > My question is: how can I create a panel within the HTML page, showing > the HSPs from a BLAST standalone txt result file? Is there a > tutorial/file/presentation discussing this? > > Thanks in advance. > > Christian Probst > Bioinformatics Lab. > Carlos Chagas Institute > Curitiba - PR Brasil Jason Stajich jason at bioperl.org From lincoln.stein at gmail.com Mon Mar 16 14:44:30 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 16 Mar 2009 14:44:30 -0400 Subject: [Bioperl-l] Problems with a while loop, please help In-Reply-To: <22536301.post@talk.nabble.com> References: <22536301.post@talk.nabble.com> Message-ID: <6dce9a0b0903161144w76adc79cw1df378e5aa440ea6@mail.gmail.com> One problem is that this notation: $value2a = @zwvalue1[$k+1]; does not do what you expect. You want this: $value2a = $zwvalue1[$k+1]; Another problem is that you are running the loop three times: k=0, k=1, k=2. The while loop only stops when k is no longer <= 2. Lincoln On Mon, Mar 16, 2009 at 7:29 AM, manni122 wrote: > > Hi there, I need a little bit help in simple programming. I have two arrays > in which codons are stored. These arrays I need to compare. If I find > similar codons at the same position in both arrays everything is fine, if > not I am splitting every codon further into the bases. > > @zwvalue1 = split(//, $value1); > @zwvalue2 = split(//, $value2); > > And then I want to compare base 1 from array 1 with base 1 from array 2 and > so on. But if I am looking at the values from the following while loop > > my $k = 0; > > while ($k <= 2) { > $value1a = @zwvalue1[$k]; > $value1b = @zwvalue2[$k]; > $value2a = @zwvalue1[$k+1]; > $value2b = @zwvalue2[$k+1]; > $value3a = @zwvalue1[$k+2]; > $value3b = @zwvalue2[$k+2]; > > print "@zwvalue1, $value1a, $value2a, $value3a\n"; > } > continue { > $k++; > > I get as output for example: > T A A, T, A, A > T A A, A, A, > T A A, A, , > > Why is this loop running 3times? I just need this loop running once to > compare all three bases at one time. > Any help is appreciated. > -- > View this message in context: > http://www.nabble.com/Problems-with-a-while-loop%2C-please-help-tp22536301p22536301.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From Russell.Smithies at agresearch.co.nz Mon Mar 16 15:29:10 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 17 Mar 2009 08:29:10 +1300 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols In-Reply-To: <1a0c1b750903160304j32280202sfff4a3b5db834e72@mail.gmail.com> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF321BD22E970@exchsth.agresearch.co.nz> <1a0c1b750903160304j32280202sfff4a3b5db834e72@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22E9C7@exchsth.agresearch.co.nz> Thanx Bruno, I hadn't looked at Bio::Tools::SeqPattern before but will take a closer look. --Rusell From: Bruno Vecchi [mailto:vecchi.b at gmail.com] Sent: Monday, 16 March 2009 11:05 p.m. To: Smithies, Russell Subject: Re: [Bioperl-l] reformating dna sequence containing ambiguity symbols Please correct me if I didn't understand your problem correctly, but I think that the solution to your problem is Bio::Tools::SeqPattern. Here's how you would do away with ambiguity codons: use Bio::Tools::SeqPattern; my $seq = 'GTNATAARCC'; my $pattern = Bio::Tools::SeqPattern->new( -seq => $seq -TYPE => 'Dna' ) $pattern->expand; # outputs: GT[ATCG]ATAA[AG]CC 2009/3/16 Smithies, Russell >: > Typical Microsoft Outlook changed my formatting. > This is what dbSNP fasta looks like: > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=snp&report=fasta&mode=text&id=29011166 > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >> Sent: Monday, 16 March 2009 4:30 p.m. >> To: 'Bioperl-l at lists.open-bio.org' >> Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols >> >> I've just been reformatting some fasta from dbSNP that contains ambiguity >> symbols and had to come up with a non-standard solution as I needed to turn >> validation off in Bio::Seq but couldn't see an obvious way to do it. >> >> dbSNP format the fasta for their rs* SNPs like this: >> >> >gnl|dbSNP|rs29025902 >> rs=29025902|pos=251|len=501|taxid=9913|mol="genomic"|class=1|alleles="A/G"|bui >> ld=125 >> AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA GGACACAGAA >> TCTCCTTCAC >> ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT AGTGGTGAGT >> TTTGAGTATG >> AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT ATTTATTTGC >> CACATGGCTT >> GTGGGTTTGC >> R >> GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC CTAACAACTG >> GACCACCAGG >> GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT TCCTGAAGAT >> TTAAGCATGG >> GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA GGGGAGGTTT >> GATTGTGTCT >> ACATGCTAAT >> >> But I needed it (for the Sequenom) with the ambiguity symbol on brackets like >> this: >> >> >gnl|dbSNP|rs29025902 >> AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTC >> ACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAG >> TATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACA >> TGGCTTGTGGGTTTGC[A/G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGA >> GTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGC >> AGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTAC >> TTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT >> >> I came up with a 50% BioPerl solution using Bio::SeqIO and Bio::Tools::IUPAC >> but the final printing of the fasta is dome 'manually'. >> It's a bit hacky but I'm particularly proud of the obscurity I managed in my >> switch statement :-) >> >> ############################### >> #!perl -w >> >> use Bio::SeqIO; >> use Bio::Tools::IUPAC; >> use Switch; >> >> my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) or die $!; >> >> while (my $seqobj = $seq_in->next_seq) { >> my $seq = sprintf ">%s\n", $seqobj->display_id; >> my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); >> foreach (@{$iupac_seq->{_alpha}}){ >> switch($#{@{$_}}){ >> case 0{$seq.= @{$_}[0]} >> case 1{$seq .= sprintf "[%s]",join("/",@{$_})} >> else {$seq .= 'N'} >> } >> } >> print "$seq\n"; >> } >> ####################### >> >> :-) >> >> --Russell >> >> >> Russell Smithies >> Bioinformatics Applications Developer >> T +64 3 489 9085 >> E russell.smithies at agresearch.co.nz >> Invermay Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T +64 3 489 3809 >> F +64 3 489 9174 >> www.agresearch.co.nz >> >> Toitu te whenua, Toitu te tangata >> Sustain the land, Sustain the people >> >> >> >> >> >> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Mar 16 15:50:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Mar 2009 14:50:29 -0500 Subject: [Bioperl-l] reformating dna sequence containing ambiguity symbols In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF321BD22E9C7@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF321BD22E951@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF321BD22E970@exchsth.agresearch.co.nz> <1a0c1b750903160304j32280202sfff4a3b5db834e72@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF321BD22E9C7@exchsth.agresearch.co.nz> Message-ID: <02448A7A-9DF9-4D0C-91F0-2C68365C2287@illinois.edu> That's probably your best bet. You could follow it up with a s/\w{60}/ $1\n/g; to add in line breaks every 60 bases if you need it. chris On Mar 16, 2009, at 2:29 PM, Smithies, Russell wrote: > Thanx Bruno, > I hadn't looked at Bio::Tools::SeqPattern before but will take a > closer look. > > --Rusell > > From: Bruno Vecchi [mailto:vecchi.b at gmail.com] > Sent: Monday, 16 March 2009 11:05 p.m. > To: Smithies, Russell > Subject: Re: [Bioperl-l] reformating dna sequence containing > ambiguity symbols > > Please correct me if I didn't understand your problem correctly, but > I think that the solution to your problem is Bio::Tools::SeqPattern. > Here's how you would do away with ambiguity codons: > > use Bio::Tools::SeqPattern; > my $seq = 'GTNATAARCC'; > > my $pattern = Bio::Tools::SeqPattern->new( > -seq => $seq > -TYPE => 'Dna' > ) > > $pattern->expand; > # outputs: GT[ATCG]ATAA[AG]CC > > > 2009/3/16 Smithies, Russell >>: >> Typical Microsoft Outlook changed my formatting. >> This is what dbSNP fasta looks like: >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=snp&report=fasta&mode=text&id=29011166 >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org>> > [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On >>> Behalf Of Smithies, Russell >>> Sent: Monday, 16 March 2009 4:30 p.m. >>> To: 'Bioperl-l at lists.open-bio.org>> >' >>> Subject: [Bioperl-l] reformating dna sequence containing ambiguity >>> symbols >>> >>> I've just been reformatting some fasta from dbSNP that contains >>> ambiguity >>> symbols and had to come up with a non-standard solution as I >>> needed to turn >>> validation off in Bio::Seq but couldn't see an obvious way to do it. >>> >>> dbSNP format the fasta for their rs* SNPs like this: >>> >>>> gnl|dbSNP|rs29025902 >>> rs=29025902|pos=251|len=501|taxid=9913|mol="genomic"|class=1| >>> alleles="A/G"|bui >>> ld=125 >>> AGGCTACCAA TAGGACATCA CTGACTGTGA GGCTGGGAAG AAAGACCGAG AAGCACCCCA >>> GGACACAGAA >>> TCTCCTTCAC >>> ATACAGAGGC AGTGGACACA TAGAGTACAG GCAGCGGTAA AATGGAGTAA AAAATTAGAT >>> AGTGGTGAGT >>> TTTGAGTATG >>> AATTGCCTTT GTTTTTAAAT TAGTTCTAAG TTTATAAGAC AAGTTTTATT TTTTTATTTT >>> ATTTATTTGC >>> CACATGGCTT >>> GTGGGTTTGC >>> R >>> GGATCTTAGT TCCCTGACCA GGGATTGAAC CTGTGCCCTC AGCAGTGAAA ACATGGAGTC >>> CTAACAACTG >>> GACCACCAGG >>> GAATTCCCTA TATGACTTAA TTTTTAATAA TATTTGTAGC TAACAATTGA CATGCAGAGT >>> TCCTGAAGAT >>> TTAAGCATGG >>> GCTCCCATGA ACCAGTATGA ACCAGCTCCA GCACAGCACA GGTTTTGTTT TACTTTTGGA >>> GGGGAGGTTT >>> GATTGTGTCT >>> ACATGCTAAT >>> >>> But I needed it (for the Sequenom) with the ambiguity symbol on >>> brackets like >>> this: >>> >>>> gnl|dbSNP|rs29025902 >>> AGGCTACCAATAGGACATCACTGACTGTGAGGCTGGGAAGAAAGACCGAGAAGCACCCCAGGACACAGAATCTCCTTC >>> ACATACAGAGGCAGTGGACACATAGAGTACAGGCAGCGGTAAAATGGAGTAAAAAATTAGATAGTGGTGAGTTTTGAG >>> TATGAATTGCCTTTGTTTTTAAATTAGTTCTAAGTTTATAAGACAAGTTTTATTTTTTTATTTTATTTATTTGCCACA >>> TGGCTTGTGGGTTTGC[A/ >>> G]GGATCTTAGTTCCCTGACCAGGGATTGAACCTGTGCCCTCAGCAGTGAAAACATGGA >>> GTCCTAACAACTGGACCACCAGGGAATTCCCTATATGACTTAATTTTTAATAATATTTGTAGCTAACAATTGACATGC >>> AGAGTTCCTGAAGATTTAAGCATGGGCTCCCATGAACCAGTATGAACCAGCTCCAGCACAGCACAGGTTTTGTTTTAC >>> TTTTGGAGGGGAGGTTTGATTGTGTCTACATGCTAAT >>> >>> I came up with a 50% BioPerl solution using Bio::SeqIO and >>> Bio::Tools::IUPAC >>> but the final printing of the fasta is dome 'manually'. >>> It's a bit hacky but I'm particularly proud of the obscurity I >>> managed in my >>> switch statement :-) >>> >>> ############################### >>> #!perl -w >>> >>> use Bio::SeqIO; >>> use Bio::Tools::IUPAC; >>> use Switch; >>> >>> my $seq_in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"fasta" ) >>> or die $!; >>> >>> while (my $seqobj = $seq_in->next_seq) { >>> my $seq = sprintf ">%s\n", $seqobj->display_id; >>> my $iupac_seq = new Bio::Tools::IUPAC(-seq => $seqobj); >>> foreach (@{$iupac_seq->{_alpha}}){ >>> switch($#{@{$_}}){ >>> case 0{$seq.= @{$_}[0]} >>> case 1{$seq .= sprintf >>> "[%s]",join("/",@{$_})} >>> else {$seq .= 'N'} >>> } >>> } >>> print "$seq\n"; >>> } >>> ####################### >>> >>> :-) >>> >>> --Russell >>> >>> >>> Russell Smithies >>> Bioinformatics Applications Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz>> > >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> Toitu te whenua, Toitu te tangata >>> Sustain the land, Sustain the people >>> >>> >>> >>> >>> >>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Mon Mar 16 16:53:51 2009 From: clements at nescent.org (Dave Clements) Date: Mon, 16 Mar 2009 13:53:51 -0700 Subject: [Bioperl-l] 2009 GMOD Summer Schools - Americas & Europe In-Reply-To: References: Message-ID: We are now accepting applications for the 2009 GMOD Summer Schools: Americas, 16-19 July ?- National Evolutionary Synthesis Center (NESCent), Durham, NC, USA ?- Student tuition is free, thanks to NIH grant 1R01HG004483-01. ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas Europe, 3-6 August ?- University of Oxford, Oxford, United Kingdom ?- Part of GMOD Europe 2009, which includes the next GMOD Meeting ?- Student tuition is ?95 ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe GMOD (http://gmod.org/) is a collection of interoperable open source software components for managing, visualizing, annotating and integrating biological, mostly genomic, data. ?GMOD is also a community of developers and users dealing with similar challenges. GMOD is used in diverse contexts, with both emerging and established model organisms. GMOD Summer Schools (http://gmod.org/wiki/GMOD_Summer_School) introduce new GMOD users to the GMOD project and feature several days of hands-on training on how to install, configure and administer GMOD tools. The courses includes training on several GMOD components: ?* GBrowse - the widely used Generic Genome Browser ?* Chado - a modular and extensible database schema for biological data ?* Apollo - genome annotation editor ?* BioMart - biological data warehouse system ?* GBrowse_syn - a GBrowse based synteny viewer ?* JBrowse - a brand new Web 2.0 genome browser ?* Artemis-Chado Integration (Europe only) ?* MAKER - Genome annotation pipeline (Americas only) ?* Tripal - Web front end for Chado (Americas only) ***Please submit an application by the end of 6 April 2009, if you are interested in attending. *** Enrollment is limited to 25 students in each course. ?If applications exceed capacity (and we expect they will) then applicants will be picked based on the strength of their application. ?Applicants will be notified of their admission status in mid-April. Thanks, Dave Clements GMOD Help Desk help at gmod.org http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe http://gmod.org/wiki/GMOD_Europe_2009 From cjfields at illinois.edu Tue Mar 17 00:09:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Mar 2009 23:09:00 -0500 Subject: [Bioperl-l] Primer3 Redux Message-ID: All, I have been working on a Primer3 refactor (for $job) that is pretty much ready to go; I'll be adding it to bioperl-dev in a few days along with tests. This'll also include a Primer3 wrapper. It's a fairly major overhaul, so I'm using the namespace Bio::Tools::Primer3Redux in the meantime so there are no API collisions (any other suggestions for a namespace are welcome). The main changes, for those interested: Bio::Tools::Run::Primer3Redux (the wrapper): * Compatible with both version 1 and the (very recently released) v2.0 alpha now in Sourceforge (features generated should have the same tags). * Uses either 'primer3' or 'primer3_core' by default, unless program_name is specified. * No sequence caching; Bio::SeqI are simply passed in (like most other wrappers) to run(). * run() now accepts multiple sequences and returns the parser; each seq result is iteratively parsed and returned as a separate AnalysisResultI. * primer3 parameters are getter/setters (varies for v1 and v2). * Any primer3 parameters can accept multiple values (for instance, if multiple regions need to be excluded) * no parameter checking yet * no arguments hash (just refer to the primer3 docs for now). I'm not planning on implementing this one at least for now. Bio::Tools::Primer3Redux (the AnalysisParserI parser): * parses multiple results * each result is a Bio::Tools::Primer3Redux::Result Bio::Tools::Primer3Redux::Result (seq-specific result set) * each result contains 0 or more Primers (for/rev/internal), and addition possible PrimerPairs, Bio::Seq, and runtime parameters * one can iterate through either left/right primers, internal oligos, or pairs * Features are attached to either a default Bio::Seq or a passed-in Bio::SeqI (via attach_seq()) Bio::Tools::Primer3Redux::Primer (oligo-specific class) * Primer instances are simply Bio::SeqFeature::Generic with a few additional convenience methods * Tags are data related directly to the primer * Features are attached to either a default Bio::Seq or a passed in Bio::SeqI (via attach_seq()) * Primer sequences (via seq()) can be validated against the one returned via primer3. Comes in useful when using attaching a Bio::Seq to the Result. Bio::Tools::Primer3Redux::PrimerPair (amplicon/pair-specific class) * PrimerPair instances are simply Bio::SeqFeature::Generic with a few additional convenience methods * Contain left/right primers and internal oligos as subfeatures * Tags are data related directly to the product/amplicon * seq() returns the amplicon sequence. chris From philip.xiang at roche.com Tue Mar 17 19:54:18 2009 From: philip.xiang at roche.com (Xiang, Philip) Date: Tue, 17 Mar 2009 16:54:18 -0700 Subject: [Bioperl-l] RemoteBlast filter option Message-ID: <8697393B889CD74C99B2E77056523781032C557B@rpbmsem01.nala.roche.com> Does anyone know what the RemoteBlast FILTER options stand for? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ %PUTPARAMS = ( ... 'FILTER' => '[LRm]', # L or R or m ... } Phil Xiang From Russell.Smithies at agresearch.co.nz Tue Mar 17 20:24:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 18 Mar 2009 13:24:39 +1300 Subject: [Bioperl-l] RemoteBlast filter option In-Reply-To: <8697393B889CD74C99B2E77056523781032C557B@rpbmsem01.nala.roche.com> References: <8697393B889CD74C99B2E77056523781032C557B@rpbmsem01.nala.roche.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF321BD22EF63@exchsth.agresearch.co.nz> Filters for masking the query sequence. L = low complexity, R = repeats, m = masking. A good explanation here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node80.html --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Xiang, Philip > Sent: Wednesday, 18 March 2009 12:54 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] RemoteBlast filter option > > Does anyone know what the RemoteBlast FILTER options stand for? > > > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ > > > > %PUTPARAMS = ( > > ... > > 'FILTER' => '[LRm]', # > L or R or m > > ... > > } > > > > Phil Xiang > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shwetakagliwal at gmail.com Tue Mar 17 07:01:41 2009 From: shwetakagliwal at gmail.com (shweta kagliwal) Date: Tue, 17 Mar 2009 16:31:41 +0530 Subject: [Bioperl-l] bl2seq Message-ID: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> I want to carry out pairwise blast using bl2seq program in bioperl. I have installed bioperl-1.5.9. I have also installed standalone blast from ncbi ftp in my perl/bin folder. But when I run the attached script I get the following error- ref: cant locate method "next feature" via package "Bio:SearchIO:blast" at bl2seq1.pl line 20, line 1. Error removing C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\qPx195u5TQ at c:/perl/site/lib/File/Temp.pm line 890. I cant get the error. Please help me. -------------- next part -------------- A non-text attachment was scrubbed... Name: bl2seq1.pl Type: application/octet-stream Size: 674 bytes Desc: not available URL: From isabelle.phan at sbri.org Tue Mar 17 20:27:19 2009 From: isabelle.phan at sbri.org (iphan) Date: Tue, 17 Mar 2009 17:27:19 -0700 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: <71F9ACDB-B249-40A0-866C-7F574D329F4B@illinois.edu> Message-ID: On 3/15/09 11:27 AM, "Chris Fields" wrote: > On Mac OS X one has to use 'sudo' for installation of anything from > CPAN unless installing to a location the user has read-write privs to, > like a local directory. You can set up BioPerl by just downloading > it, unpacking the tarball, and adding it to PERL5LIB (that's how I run > it). If it is that simple, then it would help immensely if the INSTALL instruction would be updated, because the latest release says: Download, then unpack the tar file. For example: >gunzip bioperl-1.5.2_100.tar.gz >tar xvf bioperl-1.5.2_100.tar >cd bioperl-1.5.2_100 Now issue the build commands: >perl Build.PL >./Build test >./Build install I've followed the instructions, ran cpan as 'sudo' and could not get past the required libraries: HTML::Tagset Digest::MD5 Cpan crashes trying to install these, returns to BioPerl libraries, fails, and we are back to the beginning: cpan asks for installation of HTML::Tagset and Digest::MD5, and enters an infinite loop of endlessly crashing installs. I had assumed one HAD to run the build in order to resolve all the dependencies. Going through the DEPENDENCIES document is not really helpful, because a) you are warned that the list is not complete, and b) you realize the actual length of the list of dependencies is depressingly long... I am aware of the heroic efforts of the GMOD people to distribute virtual machines with everything prepackaged, but somehow I wonder if this is a sustainable solution to the Perl dependency issue? Isabelle From cjfields at illinois.edu Tue Mar 17 23:30:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Mar 2009 22:30:53 -0500 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: References: Message-ID: <88F11AA1-6AAB-4C07-8F90-F50BA5D7D27C@illinois.edu> On Mar 17, 2009, at 7:27 PM, iphan wrote: > > On 3/15/09 11:27 AM, "Chris Fields" wrote: > >> On Mac OS X one has to use 'sudo' for installation of anything from >> CPAN unless installing to a location the user has read-write privs >> to, >> like a local directory. You can set up BioPerl by just downloading >> it, unpacking the tarball, and adding it to PERL5LIB (that's how I >> run >> it). > > If it is that simple, then it would help immensely if the INSTALL > instruction would be updated, because the latest release says: > > > Download, then unpack the tar file. For example: > >> gunzip bioperl-1.5.2_100.tar.gz >> tar xvf bioperl-1.5.2_100.tar >> cd bioperl-1.5.2_100 > > Now issue the build commands: > >> perl Build.PL >> ./Build test > > >> ./Build install The following, from the INSTALL file, alludes to the use of 'sudo' but could be made clearer: "To './Build install' you need write permission in the perl5/site_perl/source area (or similar, depending on your environment). Usually this will require you becoming root, so you will want to talk to your systems manager if you don't have the necessary privileges." > > > I've followed the instructions, ran cpan as 'sudo' and could not get > past > the required libraries: > > HTML::Tagset > Digest::MD5 > > Cpan crashes trying to install these, returns to BioPerl libraries, > fails, > and we are back to the beginning: cpan asks for installation of > HTML::Tagset > and Digest::MD5, and enters an infinite loop of endlessly crashing > installs. It's hard to narrow down the problem w/o some actual output. Can you send output that the installation is producing? The infinite loop is particularly worrisome; I'm assuming that stems from the automated CPAN installation within Bio::Root::Build module. If this is true my inclination is to turn that functionality off (or at least make it optional) until the problem is resolved; it's a definite bug. Just to note: the two above packages you are having problems with are not direct requirements for BioPerl 1.6 (aren't listed in Build.PL nor in DEPENDENCIES). That's not to say they aren't somewhere in the dependency tree, however. > I had assumed one HAD to run the build in order to resolve all the > dependencies. Going through the DEPENDENCIES document is not really > helpful, > because a) you are warned that the list is not complete, and b) you > realize > the actual length of the list of dependencies is depressingly long... You can run Build.PL directly and it will attempt to check dependencies (required and recommended dependencies are listed in the Build.PL file). We do NOT require you to install everything, hence the division between 'required' vs 'recommends', the former required for the most critical core classes. The 'recommends' modules are only useful if you want full functionality, and my guess is you will never use every single module in BioPerl's core modules. For Gbrowse functionality, you would need some minimal stuff (GD for Bio::Graphics, Text::Parsewords, and maybe one or two more). I'm not sure what you would need for JBrowse, you would need to ask the devs for that. > I am aware of the heroic efforts of the GMOD people to distribute > virtual > machines with everything prepackaged, but somehow I wonder if this > is a > sustainable solution to the Perl dependency issue? > > > Isabelle Sorry that you're having problems with the installation. You are more than welcome to file a bug in bugzilla: http://bugzilla.open-bio.org/ chris From maj at fortinbras.us Tue Mar 17 23:34:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 17 Mar 2009 23:34:32 -0400 Subject: [Bioperl-l] bl2seq In-Reply-To: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> References: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> Message-ID: <78ACE59DC1ED489998C4E72E8ED118CA@NewLife> Looks like the "use Bio::SearchIO::blast;" statement is commented out. Put it back in and give it a try. MAJ ----- Original Message ----- From: "shweta kagliwal" To: Sent: Tuesday, March 17, 2009 7:01 AM Subject: [Bioperl-l] bl2seq >I want to carry out pairwise blast using bl2seq program in bioperl. I have > installed bioperl-1.5.9. > I have also installed standalone blast from ncbi ftp in my perl/bin folder. > But when I run the attached script I get the following error- > > ref: > cant locate method "next feature" via package "Bio:SearchIO:blast" at > bl2seq1.pl line 20, line 1. > Error removing C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\qPx195u5TQ at > c:/perl/site/lib/File/Temp.pm line 890. > > > I cant get the error. Please help me. > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From isabelle.phan at sbri.org Wed Mar 18 12:35:15 2009 From: isabelle.phan at sbri.org (Isabelle Phan) Date: Wed, 18 Mar 2009 09:35:15 -0700 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: <88F11AA1-6AAB-4C07-8F90-F50BA5D7D27C@illinois.edu> Message-ID: Hello Chris > The following, from the INSTALL file, alludes to the use of > 'sudo' but > could be made clearer: Perhaps I've missed something, but we seem to be writing at cross purposes: I am running the build with 'sudo' and I have admin privileges on my machine. I am not installing any modules that the bioperl Build flags as optional. My question was: Do I have to issue the Build command? I understand from the INSTALL that I have to. Your previous message seems to imply I do NOT have to. It would help me immensely to get a clear answer: after I unpack the Bioperl tarball, do I have to run the Build, Yes or No? Many thanks for your patience, Isabelle From hlapp at gmx.net Wed Mar 18 13:57:15 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Mar 2009 13:57:15 -0400 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: References: Message-ID: <88D0224A-834B-4743-817A-4AE045D7748C@gmx.net> On Mar 18, 2009, at 12:35 PM, Isabelle Phan wrote: > It would help me immensely to get a clear answer: after I unpack the > Bioperl tarball, do I have to run the Build, Yes or No? You don't. I myself only do so to check that it does build (and to get the tests run). I know that several other devs do the same. Specifically, I never *install* BioPerl (because I want to be able to switch versions easily). I believe Build sets up a config file for Bio::DB::GFF, which is used by GBrowse, but I may be mistaken on that. If I'm not you'd have to either create that file by hand, or run Build but you still don't have to *install*. The only caveat if you don't install BioPerl is that you'll have to include its location in the PERL5LIB environment variable for every script that uses it (or add -I, or add 'use lib' to the script). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Mar 18 14:11:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Mar 2009 13:11:34 -0500 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: References: Message-ID: <95735307-F6FC-4C30-985C-CFE1F86879E8@illinois.edu> On Mar 18, 2009, at 11:35 AM, Isabelle Phan wrote: > Hello Chris > >> The following, from the INSTALL file, alludes to the use of >> 'sudo' but >> could be made clearer: > > Perhaps I've missed something, but we seem to be writing at cross > purposes: > > I am running the build with 'sudo' and I have admin privileges on my > machine. I am not installing any modules that the bioperl Build > flags as optional. My question is where do HTML::Tagset and Digest::MD5 sneak in during the installation. They are not listed as prereqs for BioPerl > My question was: > > Do I have to issue the Build command? I understand from the INSTALL > that I have to. Your previous message seems to imply I do NOT have to. If you are installing using the package tarball, you would use: sudo perl Build.PL ./Build ./Build test sudo ./Build install if you want the Build.PL script to install the required modules (it uses CPAN.pm to do this), and if you want to install to a location that requires admin privileges. The four modules absolutely required are: DB_File, Data::Stag, IO::String, Scalar::Util. > It would help me immensely to get a clear answer: after I unpack the > Bioperl tarball, do I have to run the Build, Yes or No? > > Many thanks for your patience, > > Isabelle As Hilmar points out, you don't have to actually install BioPerl (his reasons for not installing are the same as mine). If you have your env set up it correctly you could use it from wherever (maybe even a copy within a secure folder for web apps, for instance). chris From hlapp at gmx.net Wed Mar 18 14:45:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Mar 2009 14:45:50 -0400 Subject: [Bioperl-l] OBF application for Summer of Code has been rejected Message-ID: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> I hope to find out later why, but our Google Summer of Code application as an umbrella org has been rejected. However, NESCent has been accepted. If you can give your project idea a phylogenetics/phyloinformatics focus, go and put it up on the NESCent ideas page at http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 Do so pretty much **now** - we will start broadcasting and reaching out to students tonight and tomorrow. If someone comes to the site and they don't see a Bio* project that they would have been interested in, they may not check back for updates. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Mar 18 15:08:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Mar 2009 14:08:48 -0500 Subject: [Bioperl-l] [BioSQL-l] OBF application for Summer of Code has been rejected In-Reply-To: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> References: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> Message-ID: Hilmar, The idea was floated on the google SOC list that language-specific organizations that have been accepted may potentially take bioinformatics-related applications. Specifically, Jonathan Leto (from The Perl Foundation) indicated that bioinformatics-related projects using BioPerl might be able to apply through them. Not sure about others (Python Software Foundation, etc) but might be worth checking into. Any idea on who's been accepted beyond NEScent? chris On Mar 18, 2009, at 1:45 PM, Hilmar Lapp wrote: > I hope to find out later why, but our Google Summer of Code > application as an umbrella org has been rejected. > > However, NESCent has been accepted. If you can give your project > idea a phylogenetics/phyloinformatics focus, go and put it up on the > NESCent ideas page at > > http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > > Do so pretty much **now** - we will start broadcasting and reaching > out to students tonight and tomorrow. If someone comes to the site > and they don't see a Bio* project that they would have been > interested in, they may not check back for updates. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From dan.bolser at gmail.com Wed Mar 18 15:31:01 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 18 Mar 2009 19:31:01 +0000 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... Message-ID: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> Hi, I'm sure this question comes up again and again, but searching the BioPerl mailing list didn't turn up any answers (to the second question). Basically I want to manually merge HSP's into 'contigious hits', and I want to look at the effect of various parameters on an algorithm to do this. This task prompted me to run a 'sanity check' on the blast data that I had, and I found that this check fails to fulfil my expectation of the data. This means that either I don't understand the data or the results are buggy. Can someone clarify the definition of the 'gaps' column in the blast -m8 output format for me? I thought that the column 'gaps' was basically the number of columns in the HSP that contains a gap character. To test this on my data, I checked the following equality: GAPS + 2 = LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - HIT_START) This says that the number of GAPS should be equal to the difference between the LENGTH of the alignment minus the distance between the START and END point on either the QUERY or the HIT (+2 for the 'off by one' error introduced by the two END-START calculations). e.g. 10-> MMMMMMMM**MMMM*M <-22 |||| || | | | 20-> MMMM**MMMMM*M*MM <-31 where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, and HIT_END - HIT_START = 11. The formula gives: 7+2(9) = 16-12(4) + 16-11(5) The formula is correct for 11,282 out of 12,745 HSPs in my dataset (89%), however it fails for 1,463 cases (11%). Each of these cases has a value of MISMATCHES smaller than calculated by the formula. The difference is usually 1 or 2, but is seen to go as high as 96, and scales roughly linearly with the size of GAPS. Did I misunderstand what the value of GAPS is supposed to mean? How come it does apparently mean what I thought for so much of the data? Thanks very much for any help on the above. Dan. From pmiguel at purdue.edu Wed Mar 18 16:12:19 2009 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 18 Mar 2009 16:12:19 -0400 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> Message-ID: <49C155A3.204@purdue.edu> Dan Bolser wrote: > > Can someone clarify the definition of the 'gaps' column in the blast -m8 > output format for me? > > I thought that the column 'gaps' was basically the number of columns in the > HSP that contains a gap character. Hi Dan, "gaps", to me, denotes the number of gaps. Not the total length of all the gaps. Just my interpretation, but given your results my guess is that whomever wrote blastall was thinking the way I do. Phillip From maj at fortinbras.us Wed Mar 18 16:29:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 18 Mar 2009 16:29:12 -0400 Subject: [Bioperl-l] OBF application for Summer of Code has been rejected In-Reply-To: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> References: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> Message-ID: <4164752F4ECA4A929AFAF4995FEDC3C8@NewLife> Done. MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" ; "BioJava" ; "Biopython List" ; "Bioruby" ; "BioLib Project" ; "BioSQL" ; Sent: Wednesday, March 18, 2009 2:45 PM Subject: [Bioperl-l] OBF application for Summer of Code has been rejected >I hope to find out later why, but our Google Summer of Code application as an >umbrella org has been rejected. > > However, NESCent has been accepted. If you can give your project idea a > phylogenetics/phyloinformatics focus, go and put it up on the NESCent ideas > page at > > http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > > Do so pretty much **now** - we will start broadcasting and reaching out to > students tonight and tomorrow. If someone comes to the site and they don't > see a Bio* project that they would have been interested in, they may not > check back for updates. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.bolser at gmail.com Wed Mar 18 18:30:16 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 18 Mar 2009 22:30:16 +0000 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <49C155A3.204@purdue.edu> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> Message-ID: <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> 2009/3/18 Phillip San Miguel > Dan Bolser wrote: > >> >> Can someone clarify the definition of the 'gaps' column in the blast -m8 >> output format for me? >> >> I thought that the column 'gaps' was basically the number of columns in >> the >> HSP that contains a gap character. >> > Hi Dan, > "gaps", to me, denotes the number of gaps. Not the total length of all the > gaps. > Just my interpretation, but given your results my guess is that whomever > wrote blastall was thinking the way I do. Yeah, I'll have to go look at the HSPs to confirm this... I'm just surprised that there are not more gaps of length >1. i.e. my data (given your interpretation) suggests that 90% of the HSPs have no gaps > length 1. However, it would make sense given the roughly linear relationship between the discrepancy and the total number of gaps. I'll let you know. Dan. Phillip > From hlapp at gmx.net Wed Mar 18 18:50:26 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Mar 2009 18:50:26 -0400 Subject: [Bioperl-l] [BioSQL-l] OBF application for Summer of Code has been rejected In-Reply-To: References: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> Message-ID: Yes, thanks for mentioning that, was going to do so too. The Perl Foundation and the Python foundation have been accepted. I guess there isn't a Java Foundation, and if there is a Ruby one it hasn't been accepted or hasn't applied. However, Ruby on Rails has been accepted. Don't know how open they would be a Bioruby project. -hilmar On Mar 18, 2009, at 3:08 PM, Chris Fields wrote: > Hilmar, > > The idea was floated on the google SOC list that language-specific > organizations that have been accepted may potentially take > bioinformatics-related applications. Specifically, Jonathan Leto > (from The Perl Foundation) indicated that bioinformatics-related > projects using BioPerl might be able to apply through them. Not > sure about others (Python Software Foundation, etc) but might be > worth checking into. > > Any idea on who's been accepted beyond NEScent? > > chris > > On Mar 18, 2009, at 1:45 PM, Hilmar Lapp wrote: > >> I hope to find out later why, but our Google Summer of Code >> application as an umbrella org has been rejected. >> >> However, NESCent has been accepted. If you can give your project >> idea a phylogenetics/phyloinformatics focus, go and put it up on >> the NESCent ideas page at >> >> http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >> >> Do so pretty much **now** - we will start broadcasting and reaching >> out to students tonight and tomorrow. If someone comes to the site >> and they don't see a Bio* project that they would have been >> interested in, they may not check back for updates. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Mar 18 21:20:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Mar 2009 20:20:26 -0500 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: References: Message-ID: <23920CAC-464D-44E9-AB0F-E454AD08D3B0@illinois.edu> On Mar 18, 2009, at 7:40 PM, iphan wrote: > Hello Chris and Hilmar > > Thanks very much for the explanations, now the fog is lifting :-) > >> The four modules absolutely required >> are: DB_File, Data::Stag, IO::String, Scalar::Util. > > Could this perhaps be explicitly stated in the INSTALL? Yes, along with the use of 'sudo' where needed. >> My question is where do HTML::Tagset and Digest::MD5 sneak in during >> the installation. They are not listed as prereqs for BioPerl > > (See below) > > I copied the Bio directory to $HOME/perl , which is included in my > PERL5LIB. > Despite the fact that I don't have Data::Stag installed, I can get > jbrowse > to work on my Mac :-) > > > Isabelle Glad to hear you got Jbrowse running! As for Data::Stag, it is used for one of the Bio::Annotation objects. I think it's mainly used in UniProt sequence and (possibly) Stockholm alignment parsing. You could possibly get by w/o it; if you face problems you can attempt installing it separately (it is pure perl, so there shouldn't be a significant issue). As for the rest... > -- > Here goes: > > In the unpacked BioPerl-1.6.0 directory, I issued the command: > > sudo perl Build.PL --install_base=/Users/iphan/Documents/bioperl > > This goes through cpan installs, until I get prompted: > > > *** (back in Bioperl Build.PL) *** > Install [a]ll optional external modules, [n]one, or choose > [i]nteractively? > [n] > - ERROR: You chose to install Test::Harness but it failed to install > - ERROR: You chose to install CPAN but it failed to install > - ERROR: You chose to install Data::Stag but it failed to install The above is may play a part in the looping (it is attempting to forcibly install the modules). > > > ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the > versions > of the modules indicated above before proceeding with this > installation > > Checking features: > Network.................. - ERROR: LWP::UserAgent is not installed > Running install for module LWP::UserAgent > Running make for G/GA/GAAS/libwww-perl-5.825.tar.gz Now that's not good, it shouldn't attempt to install this (it's not required). > CPAN: MD5 security checks disabled because Digest::MD5 not installed. > Please consider installing the Digest::MD5 module. > > ... > > CPAN.pm: Going to build G/GA/GAAS/libwww-perl-5.825.tar.gz > > WARNING: LICENSE is not a known parameter. > Checking if your kit is complete... > Looks good > Warning: prerequisite Digest::MD5 0 not found. > Warning: prerequisite HTML::Tagset 0 not found. > 'LICENSE' is not a known MakeMaker parameter name. > Writing Makefile for LWP > ---- Unsatisfied dependencies detected during > [G/GA/GAAS/libwww-perl-5.825.tar.gz] ----- > HTML::Tagset > Digest::MD5 > Shall I follow them and prepend them to the queue > of modules we are processing right now? [yes] > > If I hit return, I end up in an endless loop, where it keeps asking > whether > I want to install HTML::Tagset and Digest::MD5. > > I then updated CPAN with 'install Bundle::CPAN', and tried to build > bioperl > again, but that made no difference. If you try installing libwww-perl does it give you similar errors? Regardless, we'll probably need to shut down the automatic CPAN module installation via 'perl Build.PL' or make it optional. I'll file a bug to track this. chris From isabelle.phan at sbri.org Wed Mar 18 20:40:59 2009 From: isabelle.phan at sbri.org (iphan) Date: Wed, 18 Mar 2009 17:40:59 -0700 Subject: [Bioperl-l] Fwd: [Gmod-ajax] Jbrowse on mac os 10.5 In-Reply-To: <95735307-F6FC-4C30-985C-CFE1F86879E8@illinois.edu> Message-ID: Hello Chris and Hilmar Thanks very much for the explanations, now the fog is lifting :-) > The four modules absolutely required > are: DB_File, Data::Stag, IO::String, Scalar::Util. Could this perhaps be explicitly stated in the INSTALL? > My question is where do HTML::Tagset and Digest::MD5 sneak in during > the installation. They are not listed as prereqs for BioPerl (See below) I copied the Bio directory to $HOME/perl , which is included in my PERL5LIB. Despite the fact that I don't have Data::Stag installed, I can get jbrowse to work on my Mac :-) Isabelle -- Here goes: In the unpacked BioPerl-1.6.0 directory, I issued the command: sudo perl Build.PL --install_base=/Users/iphan/Documents/bioperl This goes through cpan installs, until I get prompted: *** (back in Bioperl Build.PL) *** Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] - ERROR: You chose to install Test::Harness but it failed to install - ERROR: You chose to install CPAN but it failed to install - ERROR: You chose to install Data::Stag but it failed to install * Optional prerequisite Ace is not installed (wanted for access of ACeDB database, used by Bio::DB::Ace and Bio::DB::GFF::Adaptor::ace) * Optional prerequisite Spreadsheet::ParseExcel is not installed (wanted for parsing Excel files, used by Bio::SeqIO::excel) * XML::SAX (0.14) is installed, but we prefer to have 0.15 (wanted for parsing xml, used by Bio::SearchIO::blastxml, Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) * Optional prerequisite Math::Random is not installed (wanted for Random Phylogenetic Networks, used by Bio::PhyloNetwork::RandomFactory) * Optional prerequisite Graph is not installed (wanted for ontology engine implementation for the GO parser, used by Bio::PhyloNetwork) * Optional prerequisite SVG::Graph is not installed (wanted for creating SVG images, used by Bio::TreeIO::svggraph) * Optional prerequisite SOAP::Lite is not installed (wanted for Bibliographic queries, used by Bio::DB::Biblio::soap) * Optional prerequisite Bio::ASN1::EntrezGene is not installed (wanted for parsing entrezgene, used by Bio::SeqIO::entrezgene [circular dependency!]) * Optional prerequisite GraphViz is not installed (wanted for Phylogenetic Network Visulization, used by Bio::PhyloNetwork::GraphViz) * Optional prerequisite Array::Compare is not installed (wanted for Phylogenetic Networks, used by Bio::PhyloNetwork) * Optional prerequisite Convert::Binary::C is not installed (wanted for strider functionality, used by Bio::SeqIO::strider) * Optional prerequisite Algorithm::Munkres is not installed (wanted for Phylogenetic Networks, used by Bio::PhyloNetwork) * Optional prerequisite XML::Twig is not installed (wanted for parsing xml, used by Bio::Variation::IO::xml, Bio::DB::Taxonomy::entrez and Bio::DB::Biblio::eutils) * HTML::HeadParser (2.22) is installed, but we prefer to have 3 (wanted for parsing section of HTML docs, used by Bio::Tools::Analysis::DNA::ESEfinder) * Optional prerequisite HTTP::Request::Common is not installed (wanted for GenBank+GenPept sequence retrieval, remote http Blast jobs, used by Bio::DB::*, Bio::Tools::Run::RemoteBlast, Bio::Tools::Analysis::Protein* and Bio::Tools::Analysis::DNA*) * Optional prerequisite Set::Scalar is not installed (wanted for proper operation, used by Bio::Tree::Compatible) * Optional prerequisite LWP::UserAgent is not installed (wanted for remote access, used by Bio::DB::*, Bio::Tools::Run::RemoteBlast and Bio::WebAgent) * Optional prerequisite XML::Parser::PerlSAX is not installed (wanted for parsing xml, used by Bio::SeqIO::tinyseq, Bio::SeqIO::game::gameSubs, Bio::OntologyIO::InterProParser and Bio::ClusterIO::dbsnp) * Optional prerequisite XML::SAX::Writer is not installed (wanted for writing xml, used by Bio::SeqIO::tigrxml) * Optional prerequisite Clone is not installed (wanted for cloning objects, used by Bio::Tools::Primer3) * Optional prerequisite XML::DOM::XPath is not installed (wanted for parsing interpro features, used by Bio::FeatureIO::interpro) * Optional prerequisite PostScript::TextBlock is not installed (wanted for EPS output, used by Bio::Tree::Draw::Cladogram) ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versions of the modules indicated above before proceeding with this installation Checking features: Network.................. - ERROR: LWP::UserAgent is not installed Running install for module LWP::UserAgent Running make for G/GA/GAAS/libwww-perl-5.825.tar.gz CPAN: MD5 security checks disabled because Digest::MD5 not installed. Please consider installing the Digest::MD5 module. libwww-perl-5.825/ libwww-perl-5.825/AUTHORS libwww-perl-5.825/bin/ libwww-perl-5.825/bin/lwp-download libwww-perl-5.825/bin/lwp-mirror libwww-perl-5.825/bin/lwp-request libwww-perl-5.825/bin/lwp-rget libwww-perl-5.825/Changes libwww-perl-5.825/lib/ libwww-perl-5.825/lib/Bundle/ libwww-perl-5.825/lib/Bundle/LWP.pm libwww-perl-5.825/lib/File/ libwww-perl-5.825/lib/File/Listing.pm libwww-perl-5.825/lib/HTML/ libwww-perl-5.825/lib/HTML/Form.pm libwww-perl-5.825/lib/HTTP/ libwww-perl-5.825/lib/HTTP/Config.pm libwww-perl-5.825/lib/HTTP/Cookies/ libwww-perl-5.825/lib/HTTP/Cookies/Microsoft.pm libwww-perl-5.825/lib/HTTP/Cookies/Netscape.pm libwww-perl-5.825/lib/HTTP/Cookies.pm libwww-perl-5.825/lib/HTTP/Daemon.pm libwww-perl-5.825/lib/HTTP/Date.pm libwww-perl-5.825/lib/HTTP/Headers/ libwww-perl-5.825/lib/HTTP/Headers/Auth.pm libwww-perl-5.825/lib/HTTP/Headers/ETag.pm libwww-perl-5.825/lib/HTTP/Headers/Util.pm libwww-perl-5.825/lib/HTTP/Headers.pm libwww-perl-5.825/lib/HTTP/Message.pm libwww-perl-5.825/lib/HTTP/Negotiate.pm libwww-perl-5.825/lib/HTTP/Request/ libwww-perl-5.825/lib/HTTP/Request/Common.pm libwww-perl-5.825/lib/HTTP/Request.pm libwww-perl-5.825/lib/HTTP/Response.pm libwww-perl-5.825/lib/HTTP/Status.pm libwww-perl-5.825/lib/LWP/ libwww-perl-5.825/lib/LWP/Authen/ libwww-perl-5.825/lib/LWP/Authen/Basic.pm libwww-perl-5.825/lib/LWP/Authen/Digest.pm libwww-perl-5.825/lib/LWP/Authen/Ntlm.pm libwww-perl-5.825/lib/LWP/ConnCache.pm libwww-perl-5.825/lib/LWP/Debug.pm libwww-perl-5.825/lib/LWP/DebugFile.pm libwww-perl-5.825/lib/LWP/media.types libwww-perl-5.825/lib/LWP/MediaTypes.pm libwww-perl-5.825/lib/LWP/MemberMixin.pm libwww-perl-5.825/lib/LWP/Protocol/ libwww-perl-5.825/lib/LWP/Protocol/cpan.pm libwww-perl-5.825/lib/LWP/Protocol/data.pm libwww-perl-5.825/lib/LWP/Protocol/file.pm libwww-perl-5.825/lib/LWP/Protocol/ftp.pm libwww-perl-5.825/lib/LWP/Protocol/GHTTP.pm libwww-perl-5.825/lib/LWP/Protocol/gopher.pm libwww-perl-5.825/lib/LWP/Protocol/http.pm libwww-perl-5.825/lib/LWP/Protocol/http10.pm libwww-perl-5.825/lib/LWP/Protocol/https.pm libwww-perl-5.825/lib/LWP/Protocol/https10.pm libwww-perl-5.825/lib/LWP/Protocol/loopback.pm libwww-perl-5.825/lib/LWP/Protocol/mailto.pm libwww-perl-5.825/lib/LWP/Protocol/nntp.pm libwww-perl-5.825/lib/LWP/Protocol/nogo.pm libwww-perl-5.825/lib/LWP/Protocol.pm libwww-perl-5.825/lib/LWP/RobotUA.pm libwww-perl-5.825/lib/LWP/Simple.pm libwww-perl-5.825/lib/LWP/UserAgent.pm libwww-perl-5.825/lib/LWP.pm libwww-perl-5.825/lib/Net/ libwww-perl-5.825/lib/Net/HTTP/ libwww-perl-5.825/lib/Net/HTTP/Methods.pm libwww-perl-5.825/lib/Net/HTTP/NB.pm libwww-perl-5.825/lib/Net/HTTP.pm libwww-perl-5.825/lib/Net/HTTPS.pm libwww-perl-5.825/lib/WWW/ libwww-perl-5.825/lib/WWW/RobotRules/ libwww-perl-5.825/lib/WWW/RobotRules/AnyDBM_File.pm libwww-perl-5.825/lib/WWW/RobotRules.pm libwww-perl-5.825/lwpcook.pod libwww-perl-5.825/lwptut.pod libwww-perl-5.825/Makefile.PL libwww-perl-5.825/MANIFEST libwww-perl-5.825/META.yml libwww-perl-5.825/README libwww-perl-5.825/README.SSL libwww-perl-5.825/t/ libwww-perl-5.825/t/base/ libwww-perl-5.825/t/base/common-req.t libwww-perl-5.825/t/base/cookies.t libwww-perl-5.825/t/base/date.t libwww-perl-5.825/t/base/headers-auth.t libwww-perl-5.825/t/base/headers-etag.t libwww-perl-5.825/t/base/headers-util.t libwww-perl-5.825/t/base/headers.t libwww-perl-5.825/t/base/http-config.t libwww-perl-5.825/t/base/http.t libwww-perl-5.825/t/base/listing.t libwww-perl-5.825/t/base/mediatypes.t libwww-perl-5.825/t/base/message-old.t libwww-perl-5.825/t/base/message-parts.t libwww-perl-5.825/t/base/message.t libwww-perl-5.825/t/base/negotiate.t libwww-perl-5.825/t/base/protocols.t libwww-perl-5.825/t/base/request.t libwww-perl-5.825/t/base/response.t libwww-perl-5.825/t/base/status-old.t libwww-perl-5.825/t/base/status.t libwww-perl-5.825/t/base/ua.t libwww-perl-5.825/t/html/ libwww-perl-5.825/t/html/form-maxlength.t libwww-perl-5.825/t/html/form-multi-select.t libwww-perl-5.825/t/html/form-param.t libwww-perl-5.825/t/html/form.t libwww-perl-5.825/t/live/ libwww-perl-5.825/t/live/apache-listing.t libwww-perl-5.825/t/live/apache.t libwww-perl-5.825/t/live/https.t libwww-perl-5.825/t/live/jigsaw-auth-b.t libwww-perl-5.825/t/live/jigsaw-auth-d.t libwww-perl-5.825/t/live/jigsaw-chunk.t libwww-perl-5.825/t/live/jigsaw-md5-get.t libwww-perl-5.825/t/live/jigsaw-md5.t libwww-perl-5.825/t/live/jigsaw-neg-get.t libwww-perl-5.825/t/live/jigsaw-neg.t libwww-perl-5.825/t/live/jigsaw-te.t libwww-perl-5.825/t/local/ libwww-perl-5.825/t/local/autoload-get.t libwww-perl-5.825/t/local/autoload.t libwww-perl-5.825/t/local/chunked.t libwww-perl-5.825/t/local/get.t libwww-perl-5.825/t/local/http.t libwww-perl-5.825/t/local/protosub.t libwww-perl-5.825/t/net/ libwww-perl-5.825/t/net/cgi-bin/ libwww-perl-5.825/t/net/cgi-bin/moved libwww-perl-5.825/t/net/cgi-bin/nph-slowdata libwww-perl-5.825/t/net/cgi-bin/slowread libwww-perl-5.825/t/net/cgi-bin/test libwww-perl-5.825/t/net/cgi-bin/timeout libwww-perl-5.825/t/net/config.pl.dist libwww-perl-5.825/t/net/http-get.t libwww-perl-5.825/t/net/http-post.t libwww-perl-5.825/t/net/http-timeout.t libwww-perl-5.825/t/net/mirror.t libwww-perl-5.825/t/net/moved.t libwww-perl-5.825/t/net/proxy.t libwww-perl-5.825/t/README libwww-perl-5.825/t/robot/ libwww-perl-5.825/t/robot/rules-dbm.t libwww-perl-5.825/t/robot/rules.t libwww-perl-5.825/t/robot/ua-get.t libwww-perl-5.825/t/robot/ua.t libwww-perl-5.825/t/TEST libwww-perl-5.825/talk-to-ourself Removing previously used /Users/iphan/.cpan/build/libwww-perl-5.825 CPAN.pm: Going to build G/GA/GAAS/libwww-perl-5.825.tar.gz WARNING: LICENSE is not a known parameter. Checking if your kit is complete... Looks good Warning: prerequisite Digest::MD5 0 not found. Warning: prerequisite HTML::Tagset 0 not found. 'LICENSE' is not a known MakeMaker parameter name. Writing Makefile for LWP ---- Unsatisfied dependencies detected during [G/GA/GAAS/libwww-perl-5.825.tar.gz] ----- HTML::Tagset Digest::MD5 Shall I follow them and prepend them to the queue of modules we are processing right now? [yes] If I hit return, I end up in an endless loop, where it keeps asking whether I want to install HTML::Tagset and Digest::MD5. I then updated CPAN with 'install Bundle::CPAN', and tried to build bioperl again, but that made no difference. From jgrg at sanger.ac.uk Thu Mar 19 07:04:18 2009 From: jgrg at sanger.ac.uk (James Gilbert) Date: Thu, 19 Mar 2009 11:04:18 +0000 Subject: [Bioperl-l] Perl developer vacancy at WTSI Message-ID: <2E3CC952-B5E3-4E86-9ECF-E702F70A0CAB@sanger.ac.uk> Apologies for the job spam. I have a Perl developer position open in my group here at the Sanger: https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_job? p_web_site_id=1764&p_web_page_id=69092 James -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hartzell at alerce.com Thu Mar 19 13:54:12 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu, 19 Mar 2009 10:54:12 -0700 Subject: [Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN" In-Reply-To: References: <18871.1653.500578.292183@almost.alerce.com> <3C511B14-4B6D-4BBF-9562-F1378075D10E@illinois.edu> Message-ID: <18882.34500.635043.979409@already.dhcp.gene.com> Heikki Lehvaslaiho writes: > George, > > Chris is right. > > You are not suppose to use fuzzy ever!. It was introduced only because > in the olden times sequencing was diffucult and you knew that your > sequence feature starts before your actual sequence. The early > EMBL/GenBank design decision was to mark that with like "CDS <1..2344" > when you knew that your sequence did not start from the start of the > coding region. > > You annotate something always in relation to the reference sequence. > If there is something, like an insertion in Chris' example, you use > IN-BETWEEN notation where the start and end have to be adjacent > residues. There is nothing fuzzy in that location, so do not try to > add it. Thanks for the feedback. Sorry for the delay in following up, I was (yay) skiing for the weekend and then working on a presentation and digging myself back out (of work, not a snowbank). I'm kind of stuck with some of this, I need to work with the way my community thinks about it (e.g. annotating changes 6 bases into an intron by putting features on a cDNA with positions like 33+6). I'm trying to make it more logical going forward, but.... The larger thing that I'm trying to do is track observed changes in sample sequences relative to some reference (e.g. at position 77 the C changed to a T, or between position 99 and 100 there was an insert of TTTTATT, or deletion of the region between 223 and 256). These reference sequences are then aligned to a reference genome, frequently with gaps. There is also a set of transcripts/genes aligned to the genome, frequently with gaps. If everything aligned perfectly, then something that was at 8 in the target might be at 80 in the genome and then at 18 in the transcript. Ditto for 8^9 in the target, 80^81 in the genome, and 18^19 in the transcript. If there's an insertion of bases 5-10 in the target relative to the genome, then with Chris's "don't do that" solution (intentionally overstated, sorry...) none of these features could be attached/localized to the transcript. If I mark then as "after 5" and "before 6" with some indication that they're not really well located and then map that data up to the transcript than I can still do things like "Tell me all of the insertions that were observed in exon 3". It's more problematic than usual to do things like applying the mutations to the transcript sequence and assuming that the resulting protein is "correct" or "real", but that's another story. If I just use 5^6 then I have a hard time differentiating that from something that was 5^6 in the target. One way or the other it seems like I have to carry something around out of band, designating when something's location is uncertain (I know that it occured at a position in a related/aligned sequence such that it's after 5 and before 6 in this sequence) and keeping that separated from the concept of "IN-BETWEEN". Is Fuzzy deprecated? It seems like it's useful for things like being before the M, or after the end of this exon, or..... g. From pmiguel at purdue.edu Thu Mar 19 15:11:26 2009 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Thu, 19 Mar 2009 15:11:26 -0400 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> Message-ID: <49C298DE.2000403@purdue.edu> Dan Bolser wrote: > 2009/3/18 Phillip San Miguel > > >> Dan Bolser wrote: >> >> >>> Can someone clarify the definition of the 'gaps' column in the blast -m8 >>> output format for me? >>> >>> I thought that the column 'gaps' was basically the number of columns in >>> the >>> HSP that contains a gap character. >>> >>> >> Hi Dan, >> "gaps", to me, denotes the number of gaps. Not the total length of all the >> gaps. >> Just my interpretation, but given your results my guess is that whomever >> wrote blastall was thinking the way I do. >> > > > Yeah, I'll have to go look at the HSPs to confirm this... I'm just surprised > that there are not more gaps of length >1. i.e. my data (given your > interpretation) suggests that 90% of the HSPs have no gaps > length 1. > Sounds about right. Depends on how you have gap opening vs gap lengthening parameters set. -- Phillip From albezg at gmail.com Thu Mar 19 17:30:35 2009 From: albezg at gmail.com (albezg) Date: Thu, 19 Mar 2009 17:30:35 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function Message-ID: <49C2B97B.7070304@gmail.com> Hi all, I'm using bioperl-1.6. Here are some things that bother me about Bio::SimpleAlign add_seq function 1) Sequence order is counted from 0 not 1, unlike in other functions, such as get_seq_by_pos and select. 2) If position $order is already occupied then the old sequence is removed. Is there a way to insert a new sequence in the middle of an alignment? Alexandr From maj at fortinbras.us Thu Mar 19 18:00:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Mar 2009 18:00:01 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: <49C2B97B.7070304@gmail.com> References: <49C2B97B.7070304@gmail.com> Message-ID: Alexandr- I did write a patch for issue 2); I can make it available to you. It does a proper insertion of a new seq at the specified $order, pushing the rest down to make room. Not sure if the Core would like this committed? cheers Mark ----- Original Message ----- From: "albezg" To: Sent: Thursday, March 19, 2009 5:30 PM Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > Hi all, > I'm using bioperl-1.6. Here are some things that bother me about > Bio::SimpleAlign add_seq function > > 1) Sequence order is counted from 0 not 1, unlike in other functions, > such as get_seq_by_pos and select. > 2) If position $order is already occupied then the old sequence is > removed. Is there a way to insert a new sequence in the middle of an > alignment? > > Alexandr > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Thu Mar 19 18:08:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Mar 2009 18:08:00 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: <49C2B97B.7070304@gmail.com> References: <49C2B97B.7070304@gmail.com> Message-ID: Hang on, looks like that change made it into the trunk: issue 2 is resolved in the HEAD revision of Bio::SimpleAlign (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SimpleAlign.pm). (revision 15533/cjfields)... ----- Original Message ----- From: "albezg" To: Sent: Thursday, March 19, 2009 5:30 PM Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > Hi all, > I'm using bioperl-1.6. Here are some things that bother me about > Bio::SimpleAlign add_seq function > > 1) Sequence order is counted from 0 not 1, unlike in other functions, > such as get_seq_by_pos and select. > 2) If position $order is already occupied then the old sequence is > removed. Is there a way to insert a new sequence in the middle of an > alignment? > > Alexandr > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Thu Mar 19 18:50:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Mar 2009 18:50:58 -0400 Subject: [Bioperl-l] Summer of Code 2009 Message-ID: PHYLOINFORMATICS SUMMER OF CODE 2009 http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 The Phyloinformatics Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for evolutionary informatics under the mentorship of experienced developers from around the world. The program is the participation of the US National Evolutionary Synthesis Center (NESCent) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/ ). Students in the program will receive a stipend from Google (and possibly more importantly, a T-shirt solely available to successful participants), and may work from their home, or home institution, for the duration of the 3 month program. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. NESCent is particularly targeting students interested in both evolutionary biology and software development. Initial project ideas are listed on the website. These range from hardware accerelation for phylogenetic inference, to tree visualization within a wiki, to alignment of next-gen sequencing data, to development of a reusable ontology term markup module for biocuration. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome novel project ideas that dovetail with student interests. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/ ), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students opens on Monday March 23rd and runs through Friday, April 3rd, 2009. INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. 2009 NESCent Phyloinformatics Summer of Code: http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009 Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs Cyberinfrastructure Traineeships (managed separately from GSoC; postdocs also eligible): http://hackathon.nescent.org/Cyberinfrastructure_Summer_Traineeships_2009 To sign up for quarterly NESCent newsletters: http://www.nescent.org/about/contact.php --------- Todd Vision and Hilmar Lapp National Evolutionary Synthesis Center http://nescent.org From maj at fortinbras.us Thu Mar 19 19:10:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Mar 2009 19:10:09 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: References: <49C2B97B.7070304@gmail.com> Message-ID: This is in the head revision of the trunk; happened post 1.6. I think the issue scrolled off my screen before I confirmed all tests. Looks like it was a cjfields commit; did tests go through for you? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "albezg" ; Sent: Thursday, March 19, 2009 7:04 PM Subject: Re: [Bioperl-l] issues with Bio::SimpleAlign add_seq function >I forget, but was this committed to svn? Did it pass tests? > > I don't think it was incorporated into 1.6 ... > > chris > > On Mar 19, 2009, at 5:00 PM, Mark A. Jensen wrote: > >> Alexandr- I did write a patch for issue 2); I can make it available to you. >> It does >> a proper insertion of a new seq at the specified $order, pushing the rest >> down >> to make room. Not sure if the Core would like this committed? >> cheers Mark >> ----- Original Message ----- From: "albezg" >> To: >> Sent: Thursday, March 19, 2009 5:30 PM >> Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function >> >> >>> Hi all, >>> I'm using bioperl-1.6. Here are some things that bother me about >>> Bio::SimpleAlign add_seq function >>> >>> 1) Sequence order is counted from 0 not 1, unlike in other functions, >>> such as get_seq_by_pos and select. >>> 2) If position $order is already occupied then the old sequence is >>> removed. Is there a way to insert a new sequence in the middle of an >>> alignment? >>> >>> Alexandr >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Thu Mar 19 19:04:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Mar 2009 18:04:24 -0500 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: References: <49C2B97B.7070304@gmail.com> Message-ID: I forget, but was this committed to svn? Did it pass tests? I don't think it was incorporated into 1.6 ... chris On Mar 19, 2009, at 5:00 PM, Mark A. Jensen wrote: > Alexandr- I did write a patch for issue 2); I can make it available > to you. It does > a proper insertion of a new seq at the specified $order, pushing the > rest down > to make room. Not sure if the Core would like this committed? > cheers Mark > ----- Original Message ----- From: "albezg" > To: > Sent: Thursday, March 19, 2009 5:30 PM > Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > > >> Hi all, >> I'm using bioperl-1.6. Here are some things that bother me about >> Bio::SimpleAlign add_seq function >> >> 1) Sequence order is counted from 0 not 1, unlike in other functions, >> such as get_seq_by_pos and select. >> 2) If position $order is already occupied then the old sequence is >> removed. Is there a way to insert a new sequence in the middle of an >> alignment? >> >> Alexandr >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 19 19:18:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Mar 2009 18:18:09 -0500 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: References: <49C2B97B.7070304@gmail.com> Message-ID: <7AB88593-F6E6-4D56-B7FC-0C5DEA0EB8BC@illinois.edu> ok, that's what I thought. As for (1), if there is a discrepancy btwn how methods are using indices then this is a bug (odd that it didn't pop up before). Could someone file this in bugzilla? chris On Mar 19, 2009, at 5:08 PM, Mark A. Jensen wrote: > Hang on, looks like that change made it into the trunk: issue 2 is > resolved in the HEAD revision > of Bio::SimpleAlign (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SimpleAlign.pm > ). > (revision 15533/cjfields)... > ----- Original Message ----- From: "albezg" > To: > Sent: Thursday, March 19, 2009 5:30 PM > Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > > >> Hi all, >> I'm using bioperl-1.6. Here are some things that bother me about >> Bio::SimpleAlign add_seq function >> >> 1) Sequence order is counted from 0 not 1, unlike in other functions, >> such as get_seq_by_pos and select. >> 2) If position $order is already occupied then the old sequence is >> removed. Is there a way to insert a new sequence in the middle of an >> alignment? >> >> Alexandr >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 19 19:48:15 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Mar 2009 19:48:15 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: <7AB88593-F6E6-4D56-B7FC-0C5DEA0EB8BC@illinois.edu> References: <49C2B97B.7070304@gmail.com> <7AB88593-F6E6-4D56-B7FC-0C5DEA0EB8BC@illinois.edu> Message-ID: <091BE43838B241D18635F5ABDE85EC02@NewLife> Done (#2793), and accepted by moi- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "albezg" ; Sent: Thursday, March 19, 2009 7:18 PM Subject: Re: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > ok, that's what I thought. > > As for (1), if there is a discrepancy btwn how methods are using indices then > this is a bug (odd that it didn't pop up before). Could someone file this in > bugzilla? > > chris > > On Mar 19, 2009, at 5:08 PM, Mark A. Jensen wrote: > >> Hang on, looks like that change made it into the trunk: issue 2 is resolved >> in the HEAD revision >> of Bio::SimpleAlign >> (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SimpleAlign.pm >> ). >> (revision 15533/cjfields)... >> ----- Original Message ----- From: "albezg" >> To: >> Sent: Thursday, March 19, 2009 5:30 PM >> Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function >> >> >>> Hi all, >>> I'm using bioperl-1.6. Here are some things that bother me about >>> Bio::SimpleAlign add_seq function >>> >>> 1) Sequence order is counted from 0 not 1, unlike in other functions, >>> such as get_seq_by_pos and select. >>> 2) If position $order is already occupied then the old sequence is >>> removed. Is there a way to insert a new sequence in the middle of an >>> alignment? >>> >>> Alexandr >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Fri Mar 20 10:22:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Mar 2009 10:22:03 -0400 Subject: [Bioperl-l] bl2seq In-Reply-To: <16b96b950903200452j6fa7f6f4lf36f6461978fc43e@mail.gmail.com> References: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> <78ACE59DC1ED489998C4E72E8ED118CA@NewLife> <16b96b950903200452j6fa7f6f4lf36f6461978fc43e@mail.gmail.com> Message-ID: Shweta- I'm not sure where the methods you are using to access the report and its hits came from (commented section below), but the replacement code below works for me. Note that the blast report is structured as follows: $report (is-a 'Bio::SearchIO::blast'), which contains $result (is-a 'Bio::Search::Result::BlastResult'), which contains $hit (is-a 'Bio::Search::Hit::BlastHit'), which contains $hsp (is-a 'Bio::Search::HSP::GenericHSP') So as a professor (David Wollkind) of mine once said, you have to peel away the players until you find the one with the ball. Note there are many accessors on the B:S:H:G object where the interesting stuff resides. I'll put this up as a Scrapbook entry soon. cheers, Mark # while(my $hsp = $report->next_feature) { # print "homology seq :\n", $hsp->homologySeq, "\n"; # print "sbjctSeq :\n", $hsp->sbjctSeq, "\n";} while (my $result = $report->next_result) { print "Query: ".$result->query_name."\n"; while (my $hit = $result->next_hit) { while ($hsp = $hit->next_hsp) { print $hsp->algorithm, ": identity ", 100*$hsp->frac_identical, "\%, rank ", $hsp->rank, " (E:", $hsp->evalue, ")\n"; printf("%7s: %s\n", "subj", $hsp->query_string); printf("%7s: %s\n", "", $hsp->homology_string); printf("%7s: %s\n", "hom", $hsp->hit_string); print "\n"; } print "\n"; } } ----- Original Message ----- From: shweta kagliwal To: Mark A. Jensen Sent: Friday, March 20, 2009 7:52 AM Subject: Re: [Bioperl-l] bl2seq Hello Sir, That comment was because I inserted an extra line which not in the priginal code. The problem is not with that line. I cant figure out the problem. somewhere I read that Bio::SearchIO::blast is not supported now. Thanks for the response. Shweta On Wed, Mar 18, 2009 at 9:04 AM, Mark A. Jensen wrote: Looks like the "use Bio::SearchIO::blast;" statement is commented out. Put it back in and give it a try. MAJ ----- Original Message ----- From: "shweta kagliwal" To: Sent: Tuesday, March 17, 2009 7:01 AM Subject: [Bioperl-l] bl2seq I want to carry out pairwise blast using bl2seq program in bioperl. I have installed bioperl-1.5.9. I have also installed standalone blast from ncbi ftp in my perl/bin folder. But when I run the attached script I get the following error- ref: cant locate method "next feature" via package "Bio:SearchIO:blast" at bl2seq1.pl line 20, line 1. Error removing C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\qPx195u5TQ at c:/perl/site/lib/File/Temp.pm line 890. I cant get the error. Please help me. -------------------------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.bolser at gmail.com Fri Mar 20 13:13:59 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 20 Mar 2009 17:13:59 +0000 Subject: [Bioperl-l] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> References: <20090319142301.46C44188039@mail2.ncbi.nlm.nih.gov> <8087412E-ABB0-455C-8B2B-2119488B1950@ncbi.nlm.nih.gov> <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> Message-ID: <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> Here is what the man from the NCBI said: ---------- Forwarded message ---------- From: Peter Cooper Date: 2009/3/20 Subject: Re: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... To: dan.bolser at gmail.com Cc: blast-help at ncbi.nlm.nih.gov Hello, The number reported tin the -m 8 output is the number of gap openings. This will only equal the number of gap characters if the length of each gap is 1. Peter ------------------------------- Peter S. Cooper, Ph.D. Public Services The National Center for Biotechnology Information 301-435-5951 On Mar 19, 2009, at 12:04 PM, romiti wrote: > > > Begin forwarded message: > >> From: User Services Service Account >> Date: March 19, 2009 10:23:01 AM EDT >> To: romiti at ncbi.nlm.nih.gov >> Subject: Question about the definition of 'gaps' in blast -m8 output... >> Reply-To: User Services Service Account >> >> >> ------------- Begin Forwarded Message ------------- >> >> Date: Wed, 18 Mar 2009 19:31:01 +0000 >> Subject: Question about the definition of 'gaps' in blast -m8 output... >> From: Dan Bolser >> To: bbb at bioinformatics.org, bioperl-l at lists.open-bio.org, info at ncbi.nlm.nih.gov >> >> >> Hi, >> >> I'm sure this question comes up again and again, but searching the BioPerl >> mailing list didn't turn up any answers (to the second question). Basically >> I want to manually merge HSP's into 'contigious hits', and I want to look at >> the effect of various parameters on an algorithm to do this. This task >> prompted me to run a 'sanity check' on the blast data that I had, and I >> found that this check fails to fulfil my expectation of the data. This means >> that either I don't understand the data or the results are buggy. >> >> Can someone clarify the definition of the 'gaps' column in the blast -m8 >> output format for me? >> >> I thought that the column 'gaps' was basically the number of columns in the >> HSP that contains a gap character. To test this on my data, I checked the >> following equality: >> >> GAPS + 2 = >> LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - HIT_START) >> >> >> This says that the number of GAPS should be equal to the difference between >> the LENGTH of the alignment minus the distance between the START and END >> point on either the QUERY or the HIT (+2 for the 'off by one' error >> introduced by the two END-START calculations). >> >> e.g. >> >> 10-> MMMMMMMM**MMMM*M <-22 >> |||| || | | | >> 20-> MMMM**MMMMM*M*MM <-31 >> >> >> where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, and HIT_END >> - HIT_START = 11. The formula gives: >> >> 7+2(9) = 16-12(4) + 16-11(5) >> >> >> The formula is correct for 11,282 out of 12,745 HSPs in my dataset (89%), >> however it fails for 1,463 cases (11%). Each of these cases has a value of >> MISMATCHES smaller than calculated by the formula. The difference is usually >> 1 or 2, but is seen to go as high as 96, and scales roughly linearly with >> the size of GAPS. >> >> >> Did I misunderstand what the value of GAPS is supposed to mean? How come it >> does apparently mean what I thought for so much of the data? >> >> >> Thanks very much for any help on the above. >> >> Dan. >> >> ------------- End Forwarded Message ------------- From dan.bolser at gmail.com Fri Mar 20 13:23:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 20 Mar 2009 17:23:00 +0000 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <49C298DE.2000403@purdue.edu> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> <49C298DE.2000403@purdue.edu> Message-ID: <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> 2009/3/19 Phillip San Miguel : > Dan Bolser wrote: >> >> 2009/3/18 Phillip San Miguel >> >> >>> >>> Dan Bolser wrote: >>> >>> >>>> >>>> Can someone clarify the definition of the 'gaps' column in the blast -m8 >>>> output format for me? >>>> >>>> I thought that the column 'gaps' was basically the number of columns in >>>> the >>>> HSP that contains a gap character. >>>> >>>> >>> >>> Hi Dan, >>> "gaps", to me, denotes the number of gaps. Not the total length of all >>> the >>> gaps. >>> Just my interpretation, but given your results my guess is that whomever >>> wrote blastall was thinking the way I do. >>> >> >> >> Yeah, I'll have to go look at the HSPs to confirm this... I'm just >> surprised >> that there are not more gaps of length >1. i.e. my data (given your >> interpretation) suggests that 90% of the HSPs have no gaps > length 1. >> > > Sounds about right. Depends on how you have gap opening vs gap lengthening > parameters set. I see. I thought that by default extension was less than opening, so I had expected there to be more gaps of length >1 ... anyway... where can I read more about selecting parameters for certain tasks? Currently I'm blasting tomato against potato sequence, and the two organisms are known to be 'highly syntenic' - I'm just not sure how that translates into how I should set the parameters. I'm after large alignments of large regions of the chromosome. My thinking is to just run through the list of HSPs and merge based on gap / window size (dynamic programming style) - that way I can play with the set of HSPs that I have, and look at the effect of different settings, then I can just globally align the matching regions using SW (if I need to). Does that sound reasonable, or is using the default settings just dumb? Cheers, Dan. > > -- > Phillip > From David.Messina at sbc.su.se Fri Mar 20 14:14:00 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Mar 2009 19:14:00 +0100 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> <49C298DE.2000403@purdue.edu> <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> Message-ID: <628aabb70903201114n1648ec4fp55fef18143758316@mail.gmail.com> Hey Dan, > where can I read more about selecting parameters for certain tasks? > I like the Korf, Yandell, and Bedell BLAST bookfrom O'Reilly. For your tomato/potato, I would look at some of the specialized genomic aligners, for example LAGAN or AVID . Dave From cjfields at illinois.edu Fri Mar 20 15:06:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Mar 2009 14:06:05 -0500 Subject: [Bioperl-l] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> References: <20090319142301.46C44188039@mail2.ncbi.nlm.nih.gov> <8087412E-ABB0-455C-8B2B-2119488B1950@ncbi.nlm.nih.gov> <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> Message-ID: <18EF821F-77A7-4E4F-AD2F-4B581D3B09DE@illinois.edu> We have a way to check for both within HSP objects I believe. 1) gaps() is documented to return the number of gap characters within the query/hit/total 2) seq_inds('gap', 'hit/query') returns the number of gap positions, with the position repeated for every gap character I believe, so getting this in scalar context should be similar to gaps(). However, to reduce that down to just the gap positions (no repeats) use the collapse flag: seq_inds('gap', 'hit/query', 1) chris On Mar 20, 2009, at 12:13 PM, Dan Bolser wrote: > Here is what the man from the NCBI said: > > > ---------- Forwarded message ---------- > From: Peter Cooper > Date: 2009/3/20 > Subject: Re: [blast-help] Fwd: Question about the definition of 'gaps' > in blast -m8 output... > To: dan.bolser at gmail.com > Cc: blast-help at ncbi.nlm.nih.gov > > > Hello, > > The number reported tin the -m 8 output is the number of gap > openings. This will only equal the number of gap characters if the > length of each gap is 1. > > > Peter > ------------------------------- > Peter S. Cooper, Ph.D. > Public Services > The National Center for Biotechnology Information > 301-435-5951 > > > > > > > On Mar 19, 2009, at 12:04 PM, romiti wrote: > >> >> >> Begin forwarded message: >> >>> From: User Services Service Account >>> Date: March 19, 2009 10:23:01 AM EDT >>> To: romiti at ncbi.nlm.nih.gov >>> Subject: Question about the definition of 'gaps' in blast -m8 >>> output... >>> Reply-To: User Services Service Account >>> >>> >>> ------------- Begin Forwarded Message ------------- >>> >>> Date: Wed, 18 Mar 2009 19:31:01 +0000 >>> Subject: Question about the definition of 'gaps' in blast -m8 >>> output... >>> From: Dan Bolser >>> To: bbb at bioinformatics.org, bioperl-l at lists.open-bio.org, info at ncbi.nlm.nih.gov >>> >>> >>> Hi, >>> >>> I'm sure this question comes up again and again, but searching the >>> BioPerl >>> mailing list didn't turn up any answers (to the second question). >>> Basically >>> I want to manually merge HSP's into 'contigious hits', and I want >>> to look at >>> the effect of various parameters on an algorithm to do this. This >>> task >>> prompted me to run a 'sanity check' on the blast data that I had, >>> and I >>> found that this check fails to fulfil my expectation of the data. >>> This means >>> that either I don't understand the data or the results are buggy. >>> >>> Can someone clarify the definition of the 'gaps' column in the >>> blast -m8 >>> output format for me? >>> >>> I thought that the column 'gaps' was basically the number of >>> columns in the >>> HSP that contains a gap character. To test this on my data, I >>> checked the >>> following equality: >>> >>> GAPS + 2 = >>> LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - >>> HIT_START) >>> >>> >>> This says that the number of GAPS should be equal to the >>> difference between >>> the LENGTH of the alignment minus the distance between the START >>> and END >>> point on either the QUERY or the HIT (+2 for the 'off by one' error >>> introduced by the two END-START calculations). >>> >>> e.g. >>> >>> 10-> MMMMMMMM**MMMM*M <-22 >>> |||| || | | | >>> 20-> MMMM**MMMMM*M*MM <-31 >>> >>> >>> where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, >>> and HIT_END >>> - HIT_START = 11. The formula gives: >>> >>> 7+2(9) = 16-12(4) + 16-11(5) >>> >>> >>> The formula is correct for 11,282 out of 12,745 HSPs in my dataset >>> (89%), >>> however it fails for 1,463 cases (11%). Each of these cases has a >>> value of >>> MISMATCHES smaller than calculated by the formula. The difference >>> is usually >>> 1 or 2, but is seen to go as high as 96, and scales roughly >>> linearly with >>> the size of GAPS. >>> >>> >>> Did I misunderstand what the value of GAPS is supposed to mean? >>> How come it >>> does apparently mean what I thought for so much of the data? >>> >>> >>> Thanks very much for any help on the above. >>> >>> Dan. >>> >>> ------------- End Forwarded Message ------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 20 15:10:06 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Mar 2009 14:10:06 -0500 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> <49C298DE.2000403@purdue.edu> <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> Message-ID: <74867D37-49BD-4346-83ED-5190B645763C@illinois.edu> On Mar 20, 2009, at 12:23 PM, Dan Bolser wrote: > 2009/3/19 Phillip San Miguel : >> Dan Bolser wrote: >>> >>> 2009/3/18 Phillip San Miguel >>> >>> >>>> >>>> Dan Bolser wrote: >>>> >>>> >>>>> >>>>> Can someone clarify the definition of the 'gaps' column in the >>>>> blast -m8 >>>>> output format for me? >>>>> >>>>> I thought that the column 'gaps' was basically the number of >>>>> columns in >>>>> the >>>>> HSP that contains a gap character. >>>>> >>>>> >>>> >>>> Hi Dan, >>>> "gaps", to me, denotes the number of gaps. Not the total length >>>> of all >>>> the >>>> gaps. >>>> Just my interpretation, but given your results my guess is that >>>> whomever >>>> wrote blastall was thinking the way I do. >>>> >>> >>> >>> Yeah, I'll have to go look at the HSPs to confirm this... I'm just >>> surprised >>> that there are not more gaps of length >1. i.e. my data (given your >>> interpretation) suggests that 90% of the HSPs have no gaps > >>> length 1. >>> >> >> Sounds about right. Depends on how you have gap opening vs gap >> lengthening >> parameters set. > > I see. I thought that by default extension was less than opening, so I > had expected there to be more gaps of length >1 ... anyway... where > can I read more about selecting parameters for certain tasks? > Currently I'm blasting tomato against potato sequence, and the two > organisms are known to be 'highly syntenic' - I'm just not sure how > that translates into how I should set the parameters. I'm after large > alignments of large regions of the chromosome. My thinking is to just > run through the list of HSPs and merge based on gap / window size > (dynamic programming style) - that way I can play with the set of HSPs > that I have, and look at the effect of different settings, then I can > just globally align the matching regions using SW (if I need to). Does > that sound reasonable, or is using the default settings just dumb? > > Cheers, > Dan. The zebrafinch group here is using BLAT for some of their work, though I would suggest AVID, LAGAN, or maybe even MUMmer for this purpose (no sure how the latter performs compared to the others, we have used it for archaeal whole-genome alignments but nothing larger). chris From jay at jays.net Fri Mar 20 16:49:49 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 20 Mar 2009 15:49:49 -0500 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: <091BE43838B241D18635F5ABDE85EC02@NewLife> References: <49C2B97B.7070304@gmail.com> <7AB88593-F6E6-4D56-B7FC-0C5DEA0EB8BC@illinois.edu> <091BE43838B241D18635F5ABDE85EC02@NewLife> Message-ID: <49C4016D.8030807@jays.net> Mark A. Jensen wrote: > Done (#2793), and accepted by moi- > MAJ balin and I hacked out a patch in IRC today, and attached it to the ticket: http://bugzilla.open-bio.org/show_bug.cgi?id=2793 Cheers, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From albezg at gmail.com Fri Mar 20 17:09:04 2009 From: albezg at gmail.com (albezg) Date: Fri, 20 Mar 2009 17:09:04 -0400 Subject: [Bioperl-l] Problems after changing display_id of a sequence in SimpleAlign In-Reply-To: References: <49C2B97B.7070304@gmail.com> Message-ID: <49C405F0.5050100@gmail.com> Hi all, I'm trying to change FASTA header(display_id) for a sequence in an alignment(SimpleAlign). There are no issues when I print it, however when I use AlignIO to write the alignment to a FASTA file, it does not work. Is this behavior intended? Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug The error: ------------- EXCEPTION ------------- MSG: No sequence with name [1/1-11] STACK Bio::SimpleAlign::displayname /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 STACK Bio::AlignIO::fasta::write_aln /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 STACK toplevel ./demo.pl:14 ------------------------------------- Alexandr From ahabnar at yahoo.fr Fri Mar 20 18:16:26 2009 From: ahabnar at yahoo.fr (nary raveloson) Date: Fri, 20 Mar 2009 22:16:26 +0000 (GMT) Subject: [Bioperl-l] problem with bioperl-ext-1.4 Message-ID: <161899.73648.qm@web26706.mail.ukl.yahoo.com> Hi, I'm trying to install bioperl-ext-1.4 on windows vista. It's seem's good on begining with "perl makefile" (lookks good) but on the end i have an error like "BEGIN failed--compilation aborted at ./Makefile.pl line 1.". When i edit the file, at line 1 I have this: use ExtUtils ::MakeMaker; I need your help. I don't know what to do and I do not have It on Active Perl package? when I search. Thanks a lot Nary R From Kevin.M.Brown at asu.edu Fri Mar 20 19:15:12 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 20 Mar 2009 16:15:12 -0700 Subject: [Bioperl-l] problem with bioperl-ext-1.4 In-Reply-To: <161899.73648.qm@web26706.mail.ukl.yahoo.com> References: <161899.73648.qm@web26706.mail.ukl.yahoo.com> Message-ID: <1A4207F8295607498283FE9E93B775B405DC7997@EX02.asurite.ad.asu.edu> 1.4 is several years old and no longer fully functional (websites/programs have changed). Try installing 1.6 http://www.bioperl.org/wiki/Installing_BioPerl > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > nary raveloson > Sent: Friday, March 20, 2009 3:16 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] problem with bioperl-ext-1.4 > > Hi, > > I'm trying to install bioperl-ext-1.4 on windows vista. It's > seem's good on begining with "perl makefile" (lookks good) > but on the end i have an error like "BEGIN > failed--compilation aborted at ./Makefile.pl line 1.". When i > edit the file, at line 1 I have this: use ExtUtils ::MakeMaker; > > I need your help. I don't know what to do and I do not have > It on Active Perl package? when I search. > > Thanks a lot > > Nary R > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lgwilson at pennantsystems.com Fri Mar 20 21:32:24 2009 From: lgwilson at pennantsystems.com (Lori G. Wilson) Date: Fri, 20 Mar 2009 18:32:24 -0700 Subject: [Bioperl-l] problem with bioperl-ext-1.4 In-Reply-To: <161899.73648.qm@web26706.mail.ukl.yahoo.com> References: <161899.73648.qm@web26706.mail.ukl.yahoo.com> Message-ID: <49C443A8.4010600@pennantsystems.com> Hi, Nary, Did you try running cpan at the command line and then install ExtUtils::MakeMaker? Lori nary raveloson wrote: >Hi, > >I'm trying to install bioperl-ext-1.4 on windows vista. It's seem's good on begining with "perl makefile" (lookks good) but on the end i have an error like "BEGIN failed--compilation aborted at ./Makefile.pl line 1.". When i edit the file, at line 1 I have this: use ExtUtils ::MakeMaker; > >I need your help. I don't know what to do and I do not have It on Active Perl package when I search. > >Thanks a lot > >Nary R > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Sat Mar 21 12:43:28 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Mar 2009 11:43:28 -0500 Subject: [Bioperl-l] problem with bioperl-ext-1.4 In-Reply-To: <49C443A8.4010600@pennantsystems.com> References: <161899.73648.qm@web26706.mail.ukl.yahoo.com> <49C443A8.4010600@pennantsystems.com> Message-ID: bioperl-ext was not released with BioPerl 1.6 (only core, run, db, and network were released). It still works but much of it's functionality will be replaced with BioLib-related functionality: http://biolib.open-bio.org/wiki/Main_Page I don't think bioperl-ext ever worked under Windows except under Cygwin. chris On Mar 20, 2009, at 8:32 PM, Lori G. Wilson wrote: > Hi, Nary, > > Did you try running cpan at the command line and then install > ExtUtils::MakeMaker? > > Lori > > nary raveloson wrote: > >> Hi, >> >> I'm trying to install bioperl-ext-1.4 on windows vista. It's seem's >> good on begining with "perl makefile" (lookks good) but on the end >> i have an error like "BEGIN failed--compilation aborted at ./ >> Makefile.pl line 1.". When i edit the file, at line 1 I have this: >> use ExtUtils ::MakeMaker; >> >> I need your help. I don't know what to do and I do not have It on >> Active Perl package when I search. >> >> Thanks a lot >> >> Nary R >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sanjay.harke at gmail.com Sun Mar 22 05:09:29 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Sun, 22 Mar 2009 14:39:29 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 11 In-Reply-To: References: Message-ID: <31bb4380903220209q4d66bb1cw1d1c835fbfe200e1@mail.gmail.com> Dear friends, How Bioperl work with Mysql local database. Is their any tutorials available for it. Actually i want to connect my database through Bioperl which in Mysql. So,kindly help me out for this. sanjay From sanjay.harke at gmail.com Sun Mar 22 05:10:54 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Sun, 22 Mar 2009 14:40:54 +0530 Subject: [Bioperl-l] Regarding help for Bioperl with Local MYsql database Message-ID: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> Dear friends, How Bioperl work with Mysql local database. Is their any tutorials available for it. Actually i want to connect my database through Bioperl which in Mysql. So,kindly help me out for this. sanjay From hlapp at gmx.net Sun Mar 22 11:37:38 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 22 Mar 2009 11:37:38 -0400 Subject: [Bioperl-l] Regarding help for Bioperl with Local MYsql database In-Reply-To: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> References: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> Message-ID: <5FA5574B-236C-4D7D-9150-6BC275E00FCE@gmx.net> BioPerl has tight bindings for sequences, features, annotation, and ontology terms to BioSQL. BioSQL does support MySQL. There is a presentation given to BOSC 6 years ago, but the content is still pretty much valid (http://dx.doi.org/10.1038/npre.2007.1233.1). There is additional documentation in BioSQL (look at the doc/ directory). However, there isn't a tutorial for BioPerl (yet). -hilmar On Mar 22, 2009, at 5:10 AM, Sanjay Harke wrote: > Dear friends, > > How Bioperl work with Mysql local database. > Is their any tutorials available for it. > > Actually i want to connect my database through Bioperl which in Mysql. > So,kindly help me out for this. > > sanjay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jovel_juan at hotmail.com Sun Mar 22 21:38:52 2009 From: jovel_juan at hotmail.com (Juan Jovel) Date: Mon, 23 Mar 2009 01:38:52 +0000 Subject: [Bioperl-l] Regarding help for Bioperl with Local MYsql database In-Reply-To: <5FA5574B-236C-4D7D-9150-6BC275E00FCE@gmx.net> References: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> <5FA5574B-236C-4D7D-9150-6BC275E00FCE@gmx.net> Message-ID: Hello Everybody! My name is JUAN JOVEL, a molecular biologist and cell biologist at The Scripps Research Institute (TSRI), in La Jolla, California. I have used PERL to characterize small RNAs derived from virus genomes. With the advent of pyrosequencing, more and more siRNAs databases are being generated. I am planing to publish a paper explaining some methodologies to analyze those siRNAs by using PERL and GBROWSE. While I understand PERL programing, and can solve my own needs I am far from being an expert. On the other hand, I have a very solid knowledge of siRNAs biology in plants and animals. Thus, I am looking for one or a couple of partners to write the paper with me (I will lead this and be the corresponding author), but if somebody is interested in helping me analyzing a data base that we would use as a case study, I would be happy to put your name in the publication. The original idea is to publish the paper in the journal RNA (impact factor ~5.5). If interested, and you have good skills in Graphic Programming with PERL (and/or GBROWSE), database (NCBI) searching, please contact me and we can start planing together the mentioned paper. Best regards, JUAN _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE From jovel_juan at hotmail.com Sun Mar 22 22:02:56 2009 From: jovel_juan at hotmail.com (Juan Jovel) Date: Mon, 23 Mar 2009 02:02:56 +0000 Subject: [Bioperl-l] Invitation/Proposal In-Reply-To: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> References: <31bb4380903220210o378fceaax64d0ad4da6581cfa@mail.gmail.com> Message-ID: Hello everybody! My name is JUAN JOVEL, a molecualr biologist and cell biologist at The Scripps Research Institute (TSRI), in La JOlla California. I have used PERL to characterize small RNAs derived from viruses. With the advent of pyrosequencing, more and more viral siRNAs are being generated. I am planing to write a paper describing some methodologies to analyze such databases with PERL, BLAST and GBROWSE. While I understand PERL programming and can solve my own needs, I am far away from being an expert. On the other hand, I have a very solid understand on siRNAs biology in plants and animals. I am looking for one or two partners to write the paper with me (I will be first author and corresponding author). If you have good skills in Graphic Programming with PERL, BLAST and GBROWSE, please contact me and we can start planing whit effort together, and of course, you will be a coathor of the paper. The original idea is to submit the paper to the Journal RNA, which has an impact factor of ~5.5. If you are interested, please let me know about your skills and area in which you are working now, to be able to select the most suitable candidates. Many thanks in advance and best wishes, JUAN _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx From sanjay.harke at gmail.com Mon Mar 23 00:01:13 2009 From: sanjay.harke at gmail.com (sanjay Harke) Date: Mon, 23 Mar 2009 00:01:13 -0400 Subject: [Bioperl-l] sanjay Harke sent you a Friend Request on Yaari Message-ID: <9178a7ff1235e5c29b13664b6cf40024@localhost.localdomain> sanjay Harke wants you to join Yaari! Is sanjay your friend? Yes, sanjay is my friend! No, sanjay isn't my friend. Please respond or sanjay may think you said no :( Thanks, The Yaari Team ____ If you prefer not to receive this email tell us here. If you have any concerns regarding the content of this message, please email abuse at yaari.com. Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 YaariOEI268XYI236PWI249DSQ546 From sanjay.harke at gmail.com Mon Mar 23 00:02:13 2009 From: sanjay.harke at gmail.com (sanjay Harke) Date: Mon, 23 Mar 2009 00:02:13 -0400 Subject: [Bioperl-l] sanjay Harke sent you a Friend Request on Yaari Message-ID: <00e538aaed44cdb79f39f569b43de6c5@localhost.localdomain> sanjay Harke wants you to join Yaari! Is sanjay your friend? Yes, sanjay is my friend! No, sanjay isn't my friend. Please respond or sanjay may think you said no :( Thanks, The Yaari Team ____ If you prefer not to receive this email tell us here. If you have any concerns regarding the content of this message, please email abuse at yaari.com. Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 YaariBOZ763ZAC906FHB907ISL880 From sanjay.harke at gmail.com Mon Mar 23 00:02:13 2009 From: sanjay.harke at gmail.com (sanjay Harke) Date: Mon, 23 Mar 2009 00:02:13 -0400 Subject: [Bioperl-l] sanjay Harke sent you a Friend Request on Yaari Message-ID: <365cc2302c65e4819209517533ab2c59@localhost.localdomain> sanjay Harke wants you to join Yaari! Is sanjay your friend? Yes, sanjay is my friend! No, sanjay isn't my friend. Please respond or sanjay may think you said no :( Thanks, The Yaari Team ____ If you prefer not to receive this email tell us here. If you have any concerns regarding the content of this message, please email abuse at yaari.com. Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 YaariOJJ652VMK909YVC861LJY561 From jason at bioperl.org Mon Mar 23 00:47:21 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 22 Mar 2009 21:47:21 -0700 Subject: [Bioperl-l] Problems after changing display_id of a sequence in SimpleAlign In-Reply-To: <49C405F0.5050100@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> Message-ID: <27E83B72-3491-4131-AA51-8862F0324061@bioperl.org> It has to do with the way the names are kept to handle the idea that there can be multiple sequences with same id (the ID is stored as ID/ START-END). So changing the ID means it cannot be found without updating the hash with the names. Currently it is generally easier to remove the sequence and re-add it. We can add a rename function I suppose -- there should be a way to do it via the $aln object but I don't think that API is currently written. $aln->remove_seq($seq); $seq->display_id("1"); $aln->add_seq($seq); On Mar 20, 2009, at 2:09 PM, albezg wrote: > Hi all, > I'm trying to change FASTA header(display_id) for a sequence in an > alignment(SimpleAlign). > > There are no issues when I print it, however when I use AlignIO to > write > the alignment to a FASTA file, it does not work. Is this behavior > intended? > > Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug > > The error: > ------------- EXCEPTION ------------- > MSG: No sequence with name [1/1-11] > STACK Bio::SimpleAlign::displayname > /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 > STACK Bio::AlignIO::fasta::write_aln > /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 > STACK toplevel ./demo.pl:14 > ------------------------------------- > > Alexandr > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From maj at fortinbras.us Mon Mar 23 01:51:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 01:51:18 -0400 Subject: [Bioperl-l] issues with Bio::SimpleAlign add_seq function In-Reply-To: <49C4016D.8030807@jays.net> References: <49C2B97B.7070304@gmail.com> <7AB88593-F6E6-4D56-B7FC-0C5DEA0EB8BC@illinois.edu> <091BE43838B241D18635F5ABDE85EC02@NewLife> <49C4016D.8030807@jays.net> Message-ID: <567CB5B9D31A4542B0E7C6046145D051@NewLife> All hacked in. Thanks! MAJ ----- Original Message ----- From: "Jay Hannah" To: "Mark A. Jensen" Cc: "Chris Fields" ; "albezg" ; Sent: Friday, March 20, 2009 4:49 PM Subject: Re: [Bioperl-l] issues with Bio::SimpleAlign add_seq function > Mark A. Jensen wrote: >> Done (#2793), and accepted by moi- >> MAJ > > balin and I hacked out a patch in IRC today, and attached it to the ticket: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2793 > > Cheers, > > j > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > > > From shwetakagliwal at gmail.com Mon Mar 23 05:56:14 2009 From: shwetakagliwal at gmail.com (Shweta Kagliwal) Date: Mon, 23 Mar 2009 05:56:14 -0400 Subject: [Bioperl-l] Shweta Kagliwal sent you a Friend Request on Yaari Message-ID: Shweta Kagliwal wants you to join Yaari! Is Shweta your friend? Yes, Shweta is my friend! No, Shweta isn't my friend. Please respond or Shweta may think you said no :( Thanks, The Yaari Team ____ If you prefer not to receive this email tell us here. If you have any concerns regarding the content of this message, please email abuse at yaari.com. Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 YaariGEO817QKI466UKS514PJM315 From maj at fortinbras.us Mon Mar 23 09:56:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 09:56:40 -0400 Subject: [Bioperl-l] bl2seq In-Reply-To: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> References: <16b96b950903170401o56c7126bg535db7ab8088097a@mail.gmail.com> Message-ID: <8236A69BEFF2476693D5D795A046621C@NewLife> See the new scrap at http://www.bioperl.org/wiki/Parsing_BLAST_HSPs cheers, MAJ ----- Original Message ----- From: "shweta kagliwal" To: Sent: Tuesday, March 17, 2009 7:01 AM Subject: [Bioperl-l] bl2seq >I want to carry out pairwise blast using bl2seq program in bioperl. I have > installed bioperl-1.5.9. > I have also installed standalone blast from ncbi ftp in my perl/bin folder. > But when I run the attached script I get the following error- > > ref: > cant locate method "next feature" via package "Bio:SearchIO:blast" at > bl2seq1.pl line 20, line 1. > Error removing C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\qPx195u5TQ at > c:/perl/site/lib/File/Temp.pm line 890. > > > I cant get the error. Please help me. > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Mar 23 09:45:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 09:45:12 -0400 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent Message-ID: <89B32791A8144B78832125F64530F9D5@NewLife> Hi all-- With apologies to Jason, I took the liberty of throwing out an idea or two re the BioPerl coalescent implementation as a NESCent Summer of Code project. The underlying motivation is to make the module more immediately useful to infectious disease evolutionists, and also to lay a foundation for a coalescent API (and who couldn't use another coalescent API?). The main conceptual addition would be writing routines to implement the so-called serial coalescent, which is a natural modification of Hudson's algorithm that allows for specification of the time of the sample, as well as the size and mutation rate. Rather than reproducing the entire screed, I direct interested folks to the following https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009#Building_out_BioPerl_PopGen::Simulation_modules_for_infectious_disease If this is interesting to you (as a student or as a co-mentor), please reply here, respond in the phylosoc at nescent.org list, or contact me directly. cheers all- Mark From jason at bioperl.org Mon Mar 23 12:54:34 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Mar 2009 09:54:34 -0700 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: <89B32791A8144B78832125F64530F9D5@NewLife> References: <89B32791A8144B78832125F64530F9D5@NewLife> Message-ID: <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> No apologies necessary, this is open source so I am delighted to have others work on this. You might want to recognize that the Perl implementation is slow relative to the C code so at some point for practical utility we may want to also explore an Inline::C component. -jason On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: > Hi all-- > > With apologies to Jason, I took the liberty of throwing out an > idea or two re the BioPerl coalescent implementation > as a NESCent Summer of Code project. The underlying > motivation is to make the module more immediately useful > to infectious disease evolutionists, and also to lay a foundation > for a coalescent API (and who couldn't use another coalescent > API?). The main conceptual addition would > be writing routines to implement the so-called serial coalescent, > which is a natural modification of Hudson's algorithm that > allows for specification of the time of the sample, as well > as the size and mutation rate. > > Rather than reproducing the entire screed, I direct interested > folks to the following > > https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 > #Building_out_BioPerl_PopGen > ::Simulation_modules_for_infectious_disease > > If this is interesting to you (as a student or as a co-mentor), > please reply here, > respond in the phylosoc at nescent.org list, or contact me directly. > > cheers all- > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From hlapp at gmx.net Mon Mar 23 13:18:38 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 23 Mar 2009 13:18:38 -0400 Subject: [Bioperl-l] sanjay Harke sent you a Friend Request on Yaari In-Reply-To: <9178a7ff1235e5c29b13664b6cf40024@localhost.localdomain> References: <9178a7ff1235e5c29b13664b6cf40024@localhost.localdomain> Message-ID: Just to make it clear for everyone - these messages are spam and are not tolerated. Sanjay and Shweta - I have removed you from this mailing list. You can re-subscribe, but if you continue to show that you don't understand that content other than bioperl-related is inappropriate to send to more than 1,500 people who haven't asked for this, we will ban you from this list. -hilmar On Mar 23, 2009, at 12:01 AM, sanjay Harke wrote: > sanjay Harke wants you to join Yaari! > > Is sanjay your friend? > > Yes, sanjay is my friend! No, sanjay isn't my friend. > > Please respond or sanjay may think you said no :( > > Thanks, > The Yaari Team > ____ > If you prefer not to receive this email tell us here. If you have any concerns > regarding the content of this message, please email abuse at yaari.com. > Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 > > > YaariOEI268XYI236PWI249DSQ546 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From abhishek.vit at gmail.com Mon Mar 23 13:30:53 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 23 Mar 2009 13:30:53 -0400 Subject: [Bioperl-l] sanjay Harke sent you a Friend Request on Yaari In-Reply-To: References: <9178a7ff1235e5c29b13664b6cf40024@localhost.localdomain> Message-ID: Thanks Hilmar. I was too annoyed to see people abusing such mailing list. Best, -Abhi On Mon, Mar 23, 2009 at 1:18 PM, Hilmar Lapp wrote: > Just to make it clear for everyone - these messages are spam and are not > tolerated. > > Sanjay and Shweta - I have removed you from this mailing list. You can > re-subscribe, but if you continue to show that you don't understand that > content other than bioperl-related is inappropriate to send to more than > 1,500 people who haven't asked for this, we will ban you from this list. > > -hilmar > > On Mar 23, 2009, at 12:01 AM, sanjay Harke wrote: > > sanjay Harke wants you to join Yaari! >> >> Is sanjay your friend? >> >> Yes, >> sanjay is my friend! No, >> sanjay isn't my friend. >> >> Please respond or sanjay may think you said no :( >> >> Thanks, >> The Yaari Team >> ____ >> If you prefer not to receive this email tell us here. >> If you have any concerns >> regarding the content of this message, please email abuse at yaari.com. >> Yaari LLC, 358 Angier Ave, Atlanta, GA 30312 >> >> >> YaariOEI268XYI236PWI249DSQ546 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Mon Mar 23 14:27:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 14:27:12 -0400 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> Message-ID: Absolutely-- an XS implementation of the guts (at least) is one of the overall goals. I'm still new to the C <->Perl world, so links to info in that direction would be very much appreciated- cheers MAJ ----- Original Message ----- From: "Jason Stajich" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, March 23, 2009 12:54 PM Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent > No apologies necessary, this is open source so I am delighted to have others > work on this. You might want to recognize that the Perl implementation is > slow relative to the C code so at some point for practical utility we may > want to also explore an Inline::C component. > > -jason > On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: > >> Hi all-- >> >> With apologies to Jason, I took the liberty of throwing out an >> idea or two re the BioPerl coalescent implementation >> as a NESCent Summer of Code project. The underlying >> motivation is to make the module more immediately useful >> to infectious disease evolutionists, and also to lay a foundation >> for a coalescent API (and who couldn't use another coalescent >> API?). The main conceptual addition would >> be writing routines to implement the so-called serial coalescent, >> which is a natural modification of Hudson's algorithm that >> allows for specification of the time of the sample, as well >> as the size and mutation rate. >> >> Rather than reproducing the entire screed, I direct interested >> folks to the following >> >> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >> #Building_out_BioPerl_PopGen ::Simulation_modules_for_infectious_disease >> >> If this is interesting to you (as a student or as a co-mentor), please reply >> here, >> respond in the phylosoc at nescent.org list, or contact me directly. >> >> cheers all- >> Mark >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > > From cjfields at illinois.edu Mon Mar 23 14:49:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 23 Mar 2009 13:49:46 -0500 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> Message-ID: <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> It might be worth coordinating some of this with BioLib if there is a C- or C++-based library around one can link into (in this case via swig, not XS). libsequence is supposed to be capable of coalescence simulation and has some C code: http://molpopgen.org/software/libsequence/doc/html/index.html chris On Mar 23, 2009, at 1:27 PM, Mark A. Jensen wrote: > Absolutely-- an XS implementation of the guts (at least) is one of > the overall goals. I'm still new to the C <->Perl world, so links to > info in that direction would be very much appreciated- > cheers MAJ > ----- Original Message ----- From: "Jason Stajich" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, March 23, 2009 12:54 PM > Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of > PopGen::Simulation::Coalescent > > >> No apologies necessary, this is open source so I am delighted to >> have others work on this. You might want to recognize that the >> Perl implementation is slow relative to the C code so at some >> point for practical utility we may want to also explore an >> Inline::C component. >> >> -jason >> On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: >> >>> Hi all-- >>> >>> With apologies to Jason, I took the liberty of throwing out an >>> idea or two re the BioPerl coalescent implementation >>> as a NESCent Summer of Code project. The underlying >>> motivation is to make the module more immediately useful >>> to infectious disease evolutionists, and also to lay a foundation >>> for a coalescent API (and who couldn't use another coalescent >>> API?). The main conceptual addition would >>> be writing routines to implement the so-called serial coalescent, >>> which is a natural modification of Hudson's algorithm that >>> allows for specification of the time of the sample, as well >>> as the size and mutation rate. >>> >>> Rather than reproducing the entire screed, I direct interested >>> folks to the following >>> >>> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >>> >>> #Building_out_BioPerl_PopGen >>> ::Simulation_modules_for_infectious_disease >>> >>> If this is interesting to you (as a student or as a co-mentor), >>> please reply here, >>> respond in the phylosoc at nescent.org list, or contact me directly. >>> >>> cheers all- >>> Mark >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason at bioperl.org >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Mar 23 15:09:49 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 15:09:49 -0400 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> Message-ID: <5672818F52494FDBAE77D8C4757E5BC9@NewLife> This is excellent-- I trust all those brains implicitly (looks like my entire grad school cohort wrote code for it). cheers ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Jason Stajich" ; "BioPerl List" Sent: Monday, March 23, 2009 2:49 PM Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent > It might be worth coordinating some of this with BioLib if there is a C- or > C++-based library around one can link into (in this case via swig, not XS). > libsequence is supposed to be capable of coalescence simulation and has some > C code: > > http://molpopgen.org/software/libsequence/doc/html/index.html > > chris > > On Mar 23, 2009, at 1:27 PM, Mark A. Jensen wrote: > >> Absolutely-- an XS implementation of the guts (at least) is one of >> the overall goals. I'm still new to the C <->Perl world, so links to >> info in that direction would be very much appreciated- >> cheers MAJ >> ----- Original Message ----- From: "Jason Stajich" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" >> Sent: Monday, March 23, 2009 12:54 PM >> Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of >> PopGen::Simulation::Coalescent >> >> >>> No apologies necessary, this is open source so I am delighted to have >>> others work on this. You might want to recognize that the Perl >>> implementation is slow relative to the C code so at some point for >>> practical utility we may want to also explore an Inline::C component. >>> >>> -jason >>> On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: >>> >>>> Hi all-- >>>> >>>> With apologies to Jason, I took the liberty of throwing out an >>>> idea or two re the BioPerl coalescent implementation >>>> as a NESCent Summer of Code project. The underlying >>>> motivation is to make the module more immediately useful >>>> to infectious disease evolutionists, and also to lay a foundation >>>> for a coalescent API (and who couldn't use another coalescent >>>> API?). The main conceptual addition would >>>> be writing routines to implement the so-called serial coalescent, >>>> which is a natural modification of Hudson's algorithm that >>>> allows for specification of the time of the sample, as well >>>> as the size and mutation rate. >>>> >>>> Rather than reproducing the entire screed, I direct interested >>>> folks to the following >>>> >>>> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >>>> #Building_out_BioPerl_PopGen ::Simulation_modules_for_infectious_disease >>>> >>>> If this is interesting to you (as a student or as a co-mentor), please >>>> reply here, >>>> respond in the phylosoc at nescent.org list, or contact me directly. >>>> >>>> cheers all- >>>> Mark >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason at bioperl.org >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From jason at bioperl.org Mon Mar 23 15:31:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Mar 2009 12:31:26 -0700 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: <5672818F52494FDBAE77D8C4757E5BC9@NewLife> References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> <5672818F52494FDBAE77D8C4757E5BC9@NewLife> Message-ID: yeah - I've talked to Kevin about some wrappers for the libsequence code at one point so it seems like just the thing a SoC could work towards. To some extent there is a reasonable matching of the popgen objects in bioperl to the libsequence code, but it may require some effort to match well enough. I think it would make for a great project to tie these our object set more closely to it -- If that means the bioperl objects have to change, I have no problem with that. Right now we are quite inefficient when storing all the markers so it can be too slow to read in a whole genome's worth of marker data, but with some simpler approaches we can get there. Will try and weigh in more on the proposals but I think this would be a really great project at any level it is addressed. -jason On Mar 23, 2009, at 12:09 PM, Mark A. Jensen wrote: > This is excellent-- I trust all those brains implicitly (looks like > my entire grad school cohort wrote code for it). > cheers > ----- Original Message ----- From: "Chris Fields" > > To: "Mark A. Jensen" > Cc: "Jason Stajich" ; "BioPerl List" > > Sent: Monday, March 23, 2009 2:49 PM > Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of > PopGen::Simulation::Coalescent > > >> It might be worth coordinating some of this with BioLib if there is >> a C- or C++-based library around one can link into (in this case >> via swig, not XS). libsequence is supposed to be capable of >> coalescence simulation and has some C code: >> >> http://molpopgen.org/software/libsequence/doc/html/index.html >> >> chris >> >> On Mar 23, 2009, at 1:27 PM, Mark A. Jensen wrote: >> >>> Absolutely-- an XS implementation of the guts (at least) is one of >>> the overall goals. I'm still new to the C <->Perl world, so links to >>> info in that direction would be very much appreciated- >>> cheers MAJ >>> ----- Original Message ----- From: "Jason Stajich" >> > >>> To: "Mark A. Jensen" >>> Cc: "BioPerl List" >>> Sent: Monday, March 23, 2009 12:54 PM >>> Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of >>> PopGen::Simulation::Coalescent >>> >>> >>>> No apologies necessary, this is open source so I am delighted to >>>> have others work on this. You might want to recognize that the >>>> Perl implementation is slow relative to the C code so at some >>>> point for practical utility we may want to also explore an >>>> Inline::C component. >>>> >>>> -jason >>>> On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: >>>> >>>>> Hi all-- >>>>> >>>>> With apologies to Jason, I took the liberty of throwing out an >>>>> idea or two re the BioPerl coalescent implementation >>>>> as a NESCent Summer of Code project. The underlying >>>>> motivation is to make the module more immediately useful >>>>> to infectious disease evolutionists, and also to lay a foundation >>>>> for a coalescent API (and who couldn't use another coalescent >>>>> API?). The main conceptual addition would >>>>> be writing routines to implement the so-called serial coalescent, >>>>> which is a natural modification of Hudson's algorithm that >>>>> allows for specification of the time of the sample, as well >>>>> as the size and mutation rate. >>>>> >>>>> Rather than reproducing the entire screed, I direct interested >>>>> folks to the following >>>>> >>>>> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >>>>> >>>>> #Building_out_BioPerl_PopGen >>>>> ::Simulation_modules_for_infectious_disease >>>>> >>>>> If this is interesting to you (as a student or as a co- >>>>> mentor), please reply here, >>>>> respond in the phylosoc at nescent.org list, or contact me directly. >>>>> >>>>> cheers all- >>>>> Mark >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Jason Stajich jason at bioperl.org From maj at fortinbras.us Mon Mar 23 15:45:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Mar 2009 15:45:11 -0400 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> <5672818F52494FDBAE77D8C4757E5BC9@NewLife> Message-ID: Great, this is starting to sound useful-- I'll dig in to the libsequence api a bit. Are you into being a co-mentor for this, or maybe even Kev? ----- Original Message ----- From: "Jason Stajich" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Monday, March 23, 2009 3:31 PM Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent > yeah - I've talked to Kevin about some wrappers for the libsequence code at > one point so it seems like just the thing a SoC could work towards. To some > extent there is a reasonable matching of the popgen objects in bioperl to the > libsequence code, but it may require some effort to match well enough. I > think it would make for a great project to tie these our object set more > closely to it -- If that means the bioperl objects have to change, I have no > problem with that. Right now we are quite inefficient when storing all the > markers so it can be too slow to read in a whole genome's worth of marker > data, but with some simpler approaches we can get there. > > Will try and weigh in more on the proposals but I think this would be a > really great project at any level it is addressed. > > -jason > > On Mar 23, 2009, at 12:09 PM, Mark A. Jensen wrote: > >> This is excellent-- I trust all those brains implicitly (looks like >> my entire grad school cohort wrote code for it). >> cheers >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Mark A. Jensen" >> Cc: "Jason Stajich" ; "BioPerl List" >> > > >> Sent: Monday, March 23, 2009 2:49 PM >> Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of >> PopGen::Simulation::Coalescent >> >> >>> It might be worth coordinating some of this with BioLib if there is a C- >>> or C++-based library around one can link into (in this case via swig, not >>> XS). libsequence is supposed to be capable of coalescence simulation and >>> has some C code: >>> >>> http://molpopgen.org/software/libsequence/doc/html/index.html >>> >>> chris >>> >>> On Mar 23, 2009, at 1:27 PM, Mark A. Jensen wrote: >>> >>>> Absolutely-- an XS implementation of the guts (at least) is one of >>>> the overall goals. I'm still new to the C <->Perl world, so links to >>>> info in that direction would be very much appreciated- >>>> cheers MAJ >>>> ----- Original Message ----- From: "Jason Stajich" >>> > >>>> To: "Mark A. Jensen" >>>> Cc: "BioPerl List" >>>> Sent: Monday, March 23, 2009 12:54 PM >>>> Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of >>>> PopGen::Simulation::Coalescent >>>> >>>> >>>>> No apologies necessary, this is open source so I am delighted to have >>>>> others work on this. You might want to recognize that the Perl >>>>> implementation is slow relative to the C code so at some point for >>>>> practical utility we may want to also explore an Inline::C component. >>>>> >>>>> -jason >>>>> On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: >>>>> >>>>>> Hi all-- >>>>>> >>>>>> With apologies to Jason, I took the liberty of throwing out an >>>>>> idea or two re the BioPerl coalescent implementation >>>>>> as a NESCent Summer of Code project. The underlying >>>>>> motivation is to make the module more immediately useful >>>>>> to infectious disease evolutionists, and also to lay a foundation >>>>>> for a coalescent API (and who couldn't use another coalescent >>>>>> API?). The main conceptual addition would >>>>>> be writing routines to implement the so-called serial coalescent, >>>>>> which is a natural modification of Hudson's algorithm that >>>>>> allows for specification of the time of the sample, as well >>>>>> as the size and mutation rate. >>>>>> >>>>>> Rather than reproducing the entire screed, I direct interested >>>>>> folks to the following >>>>>> >>>>>> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >>>>>> #Building_out_BioPerl_PopGen ::Simulation_modules_for_infectious_disease >>>>>> >>>>>> If this is interesting to you (as a student or as a co- mentor), please >>>>>> reply here, >>>>>> respond in the phylosoc at nescent.org list, or contact me directly. >>>>>> >>>>>> cheers all- >>>>>> Mark >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > Jason Stajich > jason at bioperl.org > > > > > From jonathan at leto.net Mon Mar 23 15:59:55 2009 From: jonathan at leto.net (Jonathan Leto) Date: Mon, 23 Mar 2009 12:59:55 -0700 Subject: [Bioperl-l] Summer of Code project idea: build-out of PopGen::Simulation::Coalescent In-Reply-To: <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> References: <89B32791A8144B78832125F64530F9D5@NewLife> <4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> Message-ID: <9aaadf9c0903231259g41fc3f38p7d353f7eaf6e3024@mail.gmail.com> Howdy, > It might be worth coordinating some of this with BioLib if there is a C- or > C++-based library around one can link into (in this case via swig, not XS). I would suggest looking at the Math::GSL [1] CPAN module for how to have a CPAN module use SWIG to interface to an existing C/C++ library. To my knowledge it is the only CPAN module which allows you to recompile the SWIG bindings via ./Build test (great for development) but also puts the generated SWIG wrappers in the CPAN distribution so that end-users do not require SWIG. I wrote it and spent many months making it build nice on many platforms, I promise you that you don't want to reinvent that wheel :) I can also give inspirational advice for Perl+SWIG/XS shenanigans. Basically SWIG will generate a (mostly) 1-to-1 API with the underlying C/C++ library and then you write another (hopefully OO) interface on top of the SWIG-generated API (which uses very non-Perlish call conventions because C/C++ can only return a single thing from a function call). Cheers, [1] http://search.cpan.org/dist/Math-GSL/ -- [---------------------] Jonathan Leto jaleto at gmail.com From lmanchon at univ-montp2.fr Mon Mar 23 17:07:54 2009 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Mon, 23 Mar 2009 22:07:54 +0100 Subject: [Bioperl-l] fit genomic coordinates Message-ID: <49C7FA2A.7090305@univ-montp2.fr> -- hi how is it possible to fit range of genomic coordinates ? first file (file1.txt) is my annotation file with format as: regulatory_region 3455 3463 regulatory_region 3535 3544 regulatory_region 3601 3608 transcriptional_cis_regulatory_region 3622 3630 five_prime_UTR 3631 3759 CDS 3760 3913 exon 3631 3913 CDS 3996 4276 exon 3996 4276 CDS 4486 4605 exon 4486 4605 CDS 4706 5095 exon 4706 5095 CDS 5174 5326 exon 5174 5326 .... .... second file (file2.txt) is my experimental file with format as: acc_2765773 3222 3239 - acc_2842543 3222 3239 - acc_2842544 3222 3239 - acc_442945 3222 3239 - acc_442946 3222 3239 - acc_4873 3222 3239 - acc_53956 3222 3239 - acc_562588 3222 3239 - acc_807114 3222 3239 - acc_84146 3222 3239 - acc_2419732 3268 3285 + acc_3041065 3565 3583 + acc_362358 3640 3656 - acc_3279485 3793 3813 + acc_3091017 3794 3811 - acc_2807380 3832 3848 + acc_3105138 3832 3848 + acc_3105139 3832 3848 + acc_3105140 3832 3848 + acc_3116450 3832 3848 + acc_86708 3832 3848 + acc_1987802 3922 3938 - acc_1679660 4113 4129 + acc_891489 4113 4129 + acc_2829973 4299 4318 + .... ..... number of lines in file1.txt ~ 200000 number of lines in file2.txt ~ 800000 so, how to annotate my file2 with the file1. I need to compare each couple of range of my file1 with each couple of range of my file1: 800000x200000 combinaisons ? i'm looking for a fast method to do that. many thanks. Laurent From pmiguel at purdue.edu Tue Mar 24 07:53:10 2009 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Tue, 24 Mar 2009 07:53:10 -0400 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> <49C298DE.2000403@purdue.edu> <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> Message-ID: <49C8C9A6.3050203@purdue.edu> Dan Bolser wrote: > 2009/3/19 Phillip San Miguel : > >> Dan Bolser wrote: >> >>> 2009/3/18 Phillip San Miguel >>> >>> >>> >>>> Dan Bolser wrote: >>>> >>>> >>>> >>>>> Can someone clarify the definition of the 'gaps' column in the blast -m8 >>>>> output format for me? >>>>> >>>>> I thought that the column 'gaps' was basically the number of columns in >>>>> the >>>>> HSP that contains a gap character. >>>>> >>>>> >>>>> >>>> Hi Dan, >>>> "gaps", to me, denotes the number of gaps. Not the total length of all >>>> the >>>> gaps. >>>> Just my interpretation, but given your results my guess is that whomever >>>> wrote blastall was thinking the way I do. >>>> >>>> >>> Yeah, I'll have to go look at the HSPs to confirm this... I'm just >>> surprised >>> that there are not more gaps of length >1. i.e. my data (given your >>> interpretation) suggests that 90% of the HSPs have no gaps > length 1. >>> >>> >> Sounds about right. Depends on how you have gap opening vs gap lengthening >> parameters set. >> > > I see. I thought that by default extension was less than opening, so I > had expected there to be more gaps of length >1 ... anyway... where > can I read more about selecting parameters for certain tasks? > Currently I'm blasting tomato against potato sequence, and the two > organisms are known to be 'highly syntenic' - I'm just not sure how > that translates into how I should set the parameters. I'm after large > alignments of large regions of the chromosome. My thinking is to just > run through the list of HSPs and merge based on gap / window size > (dynamic programming style) - that way I can play with the set of HSPs > that I have, and look at the effect of different settings, then I can > just globally align the matching regions using SW (if I need to). Does > that sound reasonable, or is using the default settings just dumb? > > Cheers, > Dan. > > Hi Dan, Sorry, I didn't mean to suggest that the only reason you were seeing a preponderance of single base indels was due to your settings. I do expect single base indels to outnumber longer indels. Nevertheless, I always thought standard alignment tools should not use a linear gap extension penalty. That past some point, extending a gap should be further "discounted". Maybe the gap extension penalty should be the log of the number of bases extended? BTW, I just noticed that the blastall '-m 9' parameter, includes the column headers. They are: # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score So, the column in question is "gap openings". Phillip From maj at fortinbras.us Tue Mar 24 12:23:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Mar 2009 12:23:00 -0400 Subject: [Bioperl-l] Summer of Code project idea: build-out ofPopGen::Simulation::Coalescent In-Reply-To: <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> References: <89B32791A8144B78832125F64530F9D5@NewLife><4C41114E-2C13-44D9-ACB4-788294A4DD0D@bioperl.org> <121D2531-50C2-42EE-944B-C42FC551232B@illinois.edu> Message-ID: <742BD83F95854BA08C3F682EB43044CD@NewLife> Hey guys, I took a bunch of your suggestions and refactored the SoC idea. I split out the C binding issue from the PopGen::Simulation API. The bindings now come under the BioLib umbrella with SWIG. The "pure-Perl" (sort of) implementation of the serial coalescent now shouts out to Math::GSL. The B:P:S api project is split into two more or less independent subprojects. Thanks for your input, all, and if you have time, please take a look and let me know if I put my foot in it anywhere: https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009#Building_out_BioPerl_PopGen::Simulation_modules_for_infectious_disease https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009#A__BioLib_mapping_for_the_libsequence_population_genetics_libraries I appreciate it, as always- Mark ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, March 23, 2009 2:49 PM Subject: Re: [Bioperl-l] Summer of Code project idea: build-out ofPopGen::Simulation::Coalescent > It might be worth coordinating some of this with BioLib if there is a C- or > C++-based library around one can link into (in this case via swig, not XS). > libsequence is supposed to be capable of coalescence simulation and has some > C code: > > http://molpopgen.org/software/libsequence/doc/html/index.html > > chris > > On Mar 23, 2009, at 1:27 PM, Mark A. Jensen wrote: > >> Absolutely-- an XS implementation of the guts (at least) is one of >> the overall goals. I'm still new to the C <->Perl world, so links to >> info in that direction would be very much appreciated- >> cheers MAJ >> ----- Original Message ----- From: "Jason Stajich" >> To: "Mark A. Jensen" >> Cc: "BioPerl List" >> Sent: Monday, March 23, 2009 12:54 PM >> Subject: Re: [Bioperl-l] Summer of Code project idea: build-out of >> PopGen::Simulation::Coalescent >> >> >>> No apologies necessary, this is open source so I am delighted to have >>> others work on this. You might want to recognize that the Perl >>> implementation is slow relative to the C code so at some point for >>> practical utility we may want to also explore an Inline::C component. >>> >>> -jason >>> On Mar 23, 2009, at 6:45 AM, Mark A. Jensen wrote: >>> >>>> Hi all-- >>>> >>>> With apologies to Jason, I took the liberty of throwing out an >>>> idea or two re the BioPerl coalescent implementation >>>> as a NESCent Summer of Code project. The underlying >>>> motivation is to make the module more immediately useful >>>> to infectious disease evolutionists, and also to lay a foundation >>>> for a coalescent API (and who couldn't use another coalescent >>>> API?). The main conceptual addition would >>>> be writing routines to implement the so-called serial coalescent, >>>> which is a natural modification of Hudson's algorithm that >>>> allows for specification of the time of the sample, as well >>>> as the size and mutation rate. >>>> >>>> Rather than reproducing the entire screed, I direct interested >>>> folks to the following >>>> >>>> https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 >>>> #Building_out_BioPerl_PopGen ::Simulation_modules_for_infectious_disease >>>> >>>> If this is interesting to you (as a student or as a co-mentor), please >>>> reply here, >>>> respond in the phylosoc at nescent.org list, or contact me directly. >>>> >>>> cheers all- >>>> Mark >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason at bioperl.org >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Tue Mar 24 15:02:03 2009 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Mar 2009 15:02:03 -0400 Subject: [Bioperl-l] bioperl.org (and open-bio.org) websites down? Message-ID: <4536f7700903241202t7da4194bjfe2d7a2eb24a847e@mail.gmail.com> Hi all, Does anyone happen to know the status of the open-bio servers? They are either down or really not feeling well. Of course, given that that is the case, this email may go nowhere :-/ Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From dan.bolser at gmail.com Wed Mar 25 10:07:12 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 25 Mar 2009 14:07:12 +0000 Subject: [Bioperl-l] Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <49C8C9A6.3050203@purdue.edu> References: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> <49C155A3.204@purdue.edu> <2c8757af0903181530l1c3c9f3ct4dc9f8852f627dd@mail.gmail.com> <49C298DE.2000403@purdue.edu> <2c8757af0903201023x5ec6fb21ue3cd0c090e604ef9@mail.gmail.com> <49C8C9A6.3050203@purdue.edu> Message-ID: <2c8757af0903250707h162c13b0sfd4802d59ae1b771@mail.gmail.com> 2009/3/24 Phillip San Miguel : > Dan Bolser wrote: >> >> 2009/3/19 Phillip San Miguel : >> >>> >>> Dan Bolser wrote: >>> >>>> >>>> 2009/3/18 Phillip San Miguel >>>> >>>> >>>> >>>>> >>>>> Dan Bolser wrote: >>>>> >>>>> >>>>> >>>>>> >>>>>> Can someone clarify the definition of the 'gaps' column in the blast >>>>>> -m8 >>>>>> output format for me? >>>>>> >>>>>> I thought that the column 'gaps' was basically the number of columns >>>>>> in >>>>>> the >>>>>> HSP that contains a gap character. >>>>>> >>>>>> >>>>>> >>>>> >>>>> Hi Dan, >>>>> "gaps", to me, denotes the number of gaps. Not the total length of all >>>>> the >>>>> gaps. >>>>> Just my interpretation, but given your results my guess is that >>>>> whomever >>>>> wrote blastall was thinking the way I do. >>>>> >>>>> >>>> >>>> Yeah, I'll have to go look at the HSPs to confirm this... I'm just >>>> surprised >>>> that there are not more gaps of length >1. i.e. my data (given your >>>> interpretation) suggests that 90% of the HSPs have no gaps > length 1. >>>> >>>> >>> >>> Sounds about right. Depends on how you have gap opening vs gap >>> lengthening >>> parameters set. >>> >> >> I see. I thought that by default extension was less than opening, so I >> had expected there to be more gaps of length >1 ... anyway... where >> can I read more about selecting parameters for certain tasks? >> Currently I'm blasting tomato against potato sequence, and the two >> organisms are known to be 'highly syntenic' - I'm just not sure how >> that translates into how I should set the parameters. I'm after large >> alignments of large regions of the chromosome. My thinking is to just >> run through the list of HSPs and merge based on gap / window size >> (dynamic programming style) - that way I can play with the set of HSPs >> that I have, and look at the effect of different settings, then I can >> just globally align the matching regions using SW (if I need to). Does >> that sound reasonable, or is using the default settings just dumb? >> >> Cheers, >> Dan. >> >> > > Hi Dan, > ? Sorry, I didn't mean to suggest that the only reason you were seeing a > preponderance of single base indels was due to your settings. I do expect > single base indels to outnumber longer indels. OK. I didn't expect that, but its good to know. > ? Nevertheless, I always thought standard alignment tools should not use a > linear gap extension penalty. That past some point, extending a gap should > be further "discounted". Maybe the gap extension penalty should be the log > of the number of bases extended? > ? BTW, I just noticed that the blastall '-m 9' parameter, includes the > column headers. They are: > > # Fields: Query id, Subject id, % identity, alignment length, mismatches, > gap openings, q. start, q. end, s. start, s. end, e-value, bit score Good find! It would be great to update some of the 'downstream' docs with that column name :D Thanks again, Dan. > So, the column in question is "gap openings". > Phillip > From lmanchon at univ-montp2.fr Wed Mar 25 10:56:39 2009 From: lmanchon at univ-montp2.fr (Laurent MANCHON) Date: Wed, 25 Mar 2009 15:56:39 +0100 Subject: [Bioperl-l] problem to fit genomic coordinates Message-ID: <49CA4627.7080804@univ-montp2.fr> this is my problem: how is it possible to fit range of genomic coordinates stored in two distinct files ? first file (file1.txt) is my annotation file with format as: regulatory_region 3455 3463 regulatory_region 3535 3544 regulatory_region 3601 3608 transcriptional_cis_regulatory_region 3622 3630 five_prime_UTR 3631 3759 CDS 3760 3913 exon 3631 3913 CDS 3996 4276 exon 3996 4276 CDS 4486 4605 exon 4486 4605 CDS 4706 5095 exon 4706 5095 CDS 5174 5326 exon 5174 5326 .... .... second file (file2.txt) is my experimental file with format as: acc_2765773 3222 3239 - acc_2842543 3222 3239 - acc_2842544 3222 3239 - acc_442945 3222 3239 - acc_442946 3222 3239 - acc_4873 3222 3239 - acc_53956 3222 3239 - acc_562588 3222 3239 - acc_807114 3222 3239 - acc_84146 3222 3239 - acc_2419732 3268 3285 + acc_3041065 3565 3583 + acc_362358 3640 3656 - acc_3279485 3793 3813 + acc_3091017 3794 3811 - acc_2807380 3832 3848 + acc_3105138 3832 3848 + acc_3105139 3832 3848 + acc_3105140 3832 3848 + acc_3116450 3832 3848 + acc_86708 3832 3848 + acc_1987802 3922 3938 - acc_1679660 4113 4129 + acc_891489 4113 4129 + acc_2829973 4299 4318 + .... .... number of lines in file1.txt ~ 150000 number of lines in file2.txt ~ 800000 so, how to annotate my file2 using the genomic coordinates stored in file1. I need to compare each couple of range of my file2 with each couple of range of my file1: 800000x150000 combinaisons (quadratic analysis) ? i'm looking for a fast method to do that, something like linear progression in the analysis thank you so much if you have ideas for help me. Laurent -- From Kevin.M.Brown at asu.edu Wed Mar 25 11:23:24 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 25 Mar 2009 08:23:24 -0700 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CA4627.7080804@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> Message-ID: <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> Read in first file and create a Bio::SimpleAlign object Then use the slice method to find the features that are between the start/end values of your second file =head2 slice Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) =cut > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Laurent MANCHON > Sent: Wednesday, March 25, 2009 7:57 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] problem to fit genomic coordinates > > this is my problem: > how is it possible to fit range of genomic coordinates stored in two > distinct files ? > > first file (file1.txt) is my annotation file with format as: > > regulatory_region 3455 3463 > regulatory_region 3535 3544 > regulatory_region 3601 3608 > transcriptional_cis_regulatory_region 3622 3630 > five_prime_UTR 3631 3759 > CDS 3760 3913 > exon 3631 3913 > CDS 3996 4276 > exon 3996 4276 > CDS 4486 4605 > exon 4486 4605 > CDS 4706 5095 > exon 4706 5095 > CDS 5174 5326 > exon 5174 5326 > .... > .... > > second file (file2.txt) is my experimental file with format as: > > acc_2765773 3222 3239 - > acc_2842543 3222 3239 - > acc_2842544 3222 3239 - > acc_442945 3222 3239 - > acc_442946 3222 3239 - > acc_4873 3222 3239 - > acc_53956 3222 3239 - > acc_562588 3222 3239 - > acc_807114 3222 3239 - > acc_84146 3222 3239 - > acc_2419732 3268 3285 + > acc_3041065 3565 3583 + > acc_362358 3640 3656 - > acc_3279485 3793 3813 + > acc_3091017 3794 3811 - > acc_2807380 3832 3848 + > acc_3105138 3832 3848 + > acc_3105139 3832 3848 + > acc_3105140 3832 3848 + > acc_3116450 3832 3848 + > acc_86708 3832 3848 + > acc_1987802 3922 3938 - > acc_1679660 4113 4129 + > acc_891489 4113 4129 + > acc_2829973 4299 4318 + > .... > .... > > > number of lines in file1.txt ~ 150000 > number of lines in file2.txt ~ 800000 > > so, how to annotate my file2 using the genomic coordinates stored in > file1. I need to compare each couple of range of my file2 with each > couple of range of my file1: 800000x150000 combinaisons (quadratic > analysis) ? > i'm looking for a fast method to do that, something like linear > progression in the analysis > > thank you so much if you have ideas for help me. > > Laurent -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nir at rosettadesigngroup.com Wed Mar 25 12:18:24 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Wed, 25 Mar 2009 18:18:24 +0200 Subject: [Bioperl-l] Rosetta Academic Training Webinar Message-ID: <88F0F36A-FC4D-4A9C-AC31-5B883C3F92CB@rosettadesigngroup.com> The Rosetta Design Group is proud to present the first webinar in the Rosetta Academic Workshop Series. For the first webinar, we have selected to focus on Protein-Protein Docking based on the answers to the interest poll. We hope this will be the first in a line of helpful and inspiring webinars to kick-off our Rosetta Academic Workshop Series. What: Protein-Protein Docking When: May 4th 2009, 0800-1000 AM EST Where: Your office! Click here for more details and registration (For non html emails: http://rosettadesigngroup.com/RDGLS/index.php?sid=54479&lang=en ) Pleas note: This is not a promotional webinar. Rosetta is open-source and freeware for academic and non-profit organizations and can be downloaded here from University of Washington's TechTransfer Digital Ventures. The majority of the webinar is concerned with Rosetta 2.3.0. Rosetta 3.0 is still a beta version. Hope to see you there, Nir London. Rosetta Design Group | http://rosettadesigngroup.com/ From Kevin.M.Brown at asu.edu Wed Mar 25 13:30:12 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 25 Mar 2009 10:30:12 -0700 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CA5F5F.309@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> Message-ID: <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> Please keep all replies on list. Doing it with the SimpleAlign gets rid of the problem of incrementing and reduces the complexity of the number of loop iterations you'll have to do. Based on your sample data you have a lot of IDs that actually have the same location information that they are needing, you also have overlapping information from the first file. So you'll still need to make decisions as to which item is what you really want (e.g. CDS vs Exon). ________________________________ From: Laurent MANCHON [mailto:lmanchon at univ-montp2.fr] Sent: Wednesday, March 25, 2009 9:44 AM To: Kevin Brown Subject: Re: [Bioperl-l] problem to fit genomic coordinates Okay but i think it's not an easy way with this method, the files are already sorted on colum numbers, so maybe another logical method without using Bioperl libraries exist, for example using a while loop, something like: $i = $j = 1; $idx = number of lines in file1 $cpt = number of lines in file2 while ($i <= $idx && $j <= $cpt) { #compare current elements #increment either $i or $j depending which segment comes before the other } the difficulty is when to decide to incremente $i or $j inside the loop Laurent -- Kevin Brown a ?crit : Read in first file and create a Bio::SimpleAlign object Then use the slice method to find the features that are between the start/end values of your second file =head2 slice Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) =cut -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Laurent MANCHON Sent: Wednesday, March 25, 2009 7:57 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] problem to fit genomic coordinates this is my problem: how is it possible to fit range of genomic coordinates stored in two distinct files ? first file (file1.txt) is my annotation file with format as: regulatory_region 3455 3463 regulatory_region 3535 3544 regulatory_region 3601 3608 transcriptional_cis_regulatory_region 3622 3630 five_prime_UTR 3631 3759 CDS 3760 3913 exon 3631 3913 CDS 3996 4276 exon 3996 4276 CDS 4486 4605 exon 4486 4605 CDS 4706 5095 exon 4706 5095 CDS 5174 5326 exon 5174 5326 .... .... second file (file2.txt) is my experimental file with format as: acc_2765773 3222 3239 - acc_2842543 3222 3239 - acc_2842544 3222 3239 - acc_442945 3222 3239 - acc_442946 3222 3239 - acc_4873 3222 3239 - acc_53956 3222 3239 - acc_562588 3222 3239 - acc_807114 3222 3239 - acc_84146 3222 3239 - acc_2419732 3268 3285 + acc_3041065 3565 3583 + acc_362358 3640 3656 - acc_3279485 3793 3813 + acc_3091017 3794 3811 - acc_2807380 3832 3848 + acc_3105138 3832 3848 + acc_3105139 3832 3848 + acc_3105140 3832 3848 + acc_3116450 3832 3848 + acc_86708 3832 3848 + acc_1987802 3922 3938 - acc_1679660 4113 4129 + acc_891489 4113 4129 + acc_2829973 4299 4318 + .... .... number of lines in file1.txt ~ 150000 number of lines in file2.txt ~ 800000 so, how to annotate my file2 using the genomic coordinates stored in file1. I need to compare each couple of range of my file2 with each couple of range of my file1: 800000x150000 combinaisons (quadratic analysis) ? i'm looking for a fast method to do that, something like linear progression in the analysis thank you so much if you have ideas for help me. Laurent -- _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Mar 25 15:05:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 25 Mar 2009 15:05:21 -0400 Subject: [Bioperl-l] Parsing Blast Report Message-ID: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> Hi All, Is there any way i can get the hit sequence from the blast report? I am using SearchIO module to parse the blast report. Thanks Shalabh From SMarkel at accelrys.com Wed Mar 25 15:20:12 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 25 Mar 2009 15:20:12 -0400 Subject: [Bioperl-l] Parsing Blast Report In-Reply-To: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> References: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729A92033@exch1-hi.accelrys.net> Shalabh, No. The full-length hit sequences are not in the BLAST report. You need to use either NCBI's fastacmd (in same set of executables that has formatdb and blastall) or some database look-up. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Wednesday, 25 March 2009 12:05 PM > To: bioperl-l > Subject: [Bioperl-l] Parsing Blast Report > > Hi All, > Is there any way i can get the hit sequence from the blast report? > I am using SearchIO module to parse the blast report. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Mar 25 15:16:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Mar 2009 15:16:13 -0400 Subject: [Bioperl-l] Parsing Blast Report In-Reply-To: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> References: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> Message-ID: Shalabh- check out the following: http://www.bioperl.org/wiki/Parsing_BLAST_HSPs should answer many of your questions- cheers, Mark ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Wednesday, March 25, 2009 3:05 PM Subject: [Bioperl-l] Parsing Blast Report > Hi All, > Is there any way i can get the hit sequence from the blast report? > I am using SearchIO module to parse the blast report. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lmanchon at univ-montp2.fr Wed Mar 25 16:09:21 2009 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Wed, 25 Mar 2009 21:09:21 +0100 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> Message-ID: <49CA8F71.2040909@univ-montp2.fr> -- yes perhaps, but i don't know how to use Bio::SimpleAlign object to resolve my problem, what a pity for me, so i'm going on to search using in another way procedural programmation. thank you -- Kevin Brown a ?crit : > Please keep all replies on list. > > Doing it with the SimpleAlign gets rid of the problem of incrementing and reduces the complexity of the number of loop iterations you'll have to do. Based on your sample data you have a lot of IDs that actually have the same location information that they are needing, you also have overlapping information from the first file. So you'll still need to make decisions as to which item is what you really want (e.g. CDS vs Exon). > > > ________________________________ > > From: Laurent MANCHON [mailto:lmanchon at univ-montp2.fr] > Sent: Wednesday, March 25, 2009 9:44 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] problem to fit genomic coordinates > > > Okay but i think it's not an easy way with this method, > the files are already sorted on colum numbers, so maybe another logical method > without using Bioperl libraries exist, for example using a while loop, > > something like: > > $i = $j = 1; > $idx = number of lines in file1 > $cpt = number of lines in file2 > while ($i <= $idx && $j <= $cpt) { > #compare current elements > #increment either $i or $j depending which segment comes before the other > } > the difficulty is when to decide to incremente $i or $j inside the loop > > Laurent -- > > Kevin Brown a ?crit : > > Read in first file and create a Bio::SimpleAlign object > > Then use the slice method to find the features that are between the > start/end values of your second file > > =head2 slice > > Title : slice > Usage : $aln2 = $aln->slice(20,30) > Function : Creates a slice from the alignment inclusive of start and > end columns, and the first column in the alignment is > denoted 1. > Sequences with no residues in the slice are excluded from > the > new alignment and a warning is printed. Slice beyond the > length of > the sequence does not do padding. > Returns : A Bio::SimpleAlign object > Args : Positive integer for start column, positive integer for end > column, > optional boolean which if true will keep gap-only columns > in the newly > created slice. Example: > > $aln2 = $aln->slice(20,30,1) > > =cut > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Laurent MANCHON > Sent: Wednesday, March 25, 2009 7:57 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] problem to fit genomic coordinates > > this is my problem: > how is it possible to fit range of genomic coordinates stored in two > distinct files ? > > first file (file1.txt) is my annotation file with format as: > > regulatory_region 3455 3463 > regulatory_region 3535 3544 > regulatory_region 3601 3608 > transcriptional_cis_regulatory_region 3622 3630 > five_prime_UTR 3631 3759 > CDS 3760 3913 > exon 3631 3913 > CDS 3996 4276 > exon 3996 4276 > CDS 4486 4605 > exon 4486 4605 > CDS 4706 5095 > exon 4706 5095 > CDS 5174 5326 > exon 5174 5326 > .... > .... > > second file (file2.txt) is my experimental file with format as: > > acc_2765773 3222 3239 - > acc_2842543 3222 3239 - > acc_2842544 3222 3239 - > acc_442945 3222 3239 - > acc_442946 3222 3239 - > acc_4873 3222 3239 - > acc_53956 3222 3239 - > acc_562588 3222 3239 - > acc_807114 3222 3239 - > acc_84146 3222 3239 - > acc_2419732 3268 3285 + > acc_3041065 3565 3583 + > acc_362358 3640 3656 - > acc_3279485 3793 3813 + > acc_3091017 3794 3811 - > acc_2807380 3832 3848 + > acc_3105138 3832 3848 + > acc_3105139 3832 3848 + > acc_3105140 3832 3848 + > acc_3116450 3832 3848 + > acc_86708 3832 3848 + > acc_1987802 3922 3938 - > acc_1679660 4113 4129 + > acc_891489 4113 4129 + > acc_2829973 4299 4318 + > .... > .... > > > number of lines in file1.txt ~ 150000 > number of lines in file2.txt ~ 800000 > > so, how to annotate my file2 using the genomic coordinates stored in > file1. I need to compare each couple of range of my file2 with each > couple of range of my file1: 800000x150000 combinaisons (quadratic > analysis) ? > i'm looking for a fast method to do that, something like linear > progression in the analysis > > thank you so much if you have ideas for help me. > > Laurent -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Wed Mar 25 16:33:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 25 Mar 2009 16:33:34 -0400 Subject: [Bioperl-l] Parsing Blast Report In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74729A92033@exch1-hi.accelrys.net> References: <9fcc48c70903251205s6d6fa5cdoe9b6773ef56eee51@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74729A92033@exch1-hi.accelrys.net> Message-ID: <9fcc48c70903251333w5755ec99l2781320d44e059f7@mail.gmail.com> Thanks Mark and Scott, fastacmd solved my purpose. -Shalabh On Wed, Mar 25, 2009 at 3:20 PM, Scott Markel wrote: > Shalabh, > > No. The full-length hit sequences are not in the BLAST report. > You need to use either NCBI's fastacmd (in same set of executables > that has formatdb and blastall) or some database look-up. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > > Sent: Wednesday, 25 March 2009 12:05 PM > > To: bioperl-l > > Subject: [Bioperl-l] Parsing Blast Report > > > > Hi All, > > Is there any way i can get the hit sequence from the blast > report? > > I am using SearchIO module to parse the blast report. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Mar 25 18:06:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 25 Mar 2009 17:06:41 -0500 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CA8F71.2040909@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> Message-ID: <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> Laurent, All BioPerl modules, including Bio::SimpleAlign, have documentation via 'perldoc', you should have a look at that for specific examples. Myself, I recommend using Bio::DB::SeqFeature::Store (or another Bio::SeqFeature::CollectionI) for this. chris On Mar 25, 2009, at 3:09 PM, Laurent Manchon wrote: > -- yes perhaps, > but i don't know how to use Bio::SimpleAlign object to resolve my > problem, what a pity for me, > so i'm going on to search using in another way procedural > programmation. > > thank you -- > > Kevin Brown a ?crit : >> Please keep all replies on list. >> Doing it with the SimpleAlign gets rid of the problem of >> incrementing and reduces the complexity of the number of loop >> iterations you'll have to do. Based on your sample data you have a >> lot of IDs that actually have the same location information that >> they are needing, you also have overlapping information from the >> first file. So you'll still need to make decisions as to which item >> is what you really want (e.g. CDS vs Exon). >> >> >> ________________________________ >> >> From: Laurent MANCHON [mailto:lmanchon at univ-montp2.fr] Sent: >> Wednesday, March 25, 2009 9:44 AM >> To: Kevin Brown >> Subject: Re: [Bioperl-l] problem to fit genomic coordinates >> >> >> Okay but i think it's not an easy way with this method, >> the files are already sorted on colum numbers, so maybe another >> logical method >> without using Bioperl libraries exist, for example using a while >> loop, >> >> something like: >> >> $i = $j = 1; >> $idx = number of lines in file1 >> $cpt = number of lines in file2 >> while ($i <= $idx && $j <= $cpt) { >> #compare current elements >> #increment either $i or $j depending which segment comes before >> the other >> } >> the difficulty is when to decide to incremente $i or $j inside the >> loop >> >> Laurent -- >> >> Kevin Brown a ?crit : >> Read in first file and create a Bio::SimpleAlign object >> >> Then use the slice method to find the features that are between the >> start/end values of your second file >> >> =head2 slice >> >> Title : slice >> Usage : $aln2 = $aln->slice(20,30) >> Function : Creates a slice from the alignment inclusive of >> start and >> end columns, and the first column in the alignment is >> denoted 1. >> Sequences with no residues in the slice are excluded >> from >> the >> new alignment and a warning is printed. Slice beyond >> the >> length of >> the sequence does not do padding. >> Returns : A Bio::SimpleAlign object >> Args : Positive integer for start column, positive integer >> for end >> column, >> optional boolean which if true will keep gap-only >> columns >> in the newly >> created slice. Example: >> >> $aln2 = $aln->slice(20,30,1) >> >> =cut >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org >> ] On Behalf Of Laurent MANCHON >> Sent: Wednesday, March 25, 2009 7:57 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] problem to fit genomic coordinates >> >> this is my problem: >> how is it possible to fit range of genomic coordinates stored in >> two distinct files ? >> >> first file (file1.txt) is my annotation file with format as: >> >> regulatory_region 3455 3463 >> regulatory_region 3535 3544 >> regulatory_region 3601 3608 >> transcriptional_cis_regulatory_region 3622 3630 >> five_prime_UTR 3631 3759 >> CDS 3760 3913 >> exon 3631 3913 >> CDS 3996 4276 >> exon 3996 4276 >> CDS 4486 4605 >> exon 4486 4605 >> CDS 4706 5095 >> exon 4706 5095 >> CDS 5174 5326 >> exon 5174 5326 >> .... >> .... >> >> second file (file2.txt) is my experimental file with format as: >> >> acc_2765773 3222 3239 - >> acc_2842543 3222 3239 - >> acc_2842544 3222 3239 - >> acc_442945 3222 3239 - >> acc_442946 3222 3239 - >> acc_4873 3222 3239 - >> acc_53956 3222 3239 - >> acc_562588 3222 3239 - >> acc_807114 3222 3239 - >> acc_84146 3222 3239 - >> acc_2419732 3268 3285 + >> acc_3041065 3565 3583 + >> acc_362358 3640 3656 - >> acc_3279485 3793 3813 + >> acc_3091017 3794 3811 - >> acc_2807380 3832 3848 + >> acc_3105138 3832 3848 + >> acc_3105139 3832 3848 + >> acc_3105140 3832 3848 + >> acc_3116450 3832 3848 + >> acc_86708 3832 3848 + >> acc_1987802 3922 3938 - >> acc_1679660 4113 4129 + >> acc_891489 4113 4129 + >> acc_2829973 4299 4318 + >> .... >> .... >> >> >> number of lines in file1.txt ~ 150000 >> number of lines in file2.txt ~ 800000 >> >> so, how to annotate my file2 using the genomic coordinates >> stored in file1. I need to compare each couple of range of my >> file2 with each couple of range of my file1: 800000x150000 >> combinaisons (quadratic analysis) ? >> i'm looking for a fast method to do that, something like linear >> progression in the analysis >> >> thank you so much if you have ideas for help me. >> >> Laurent -- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lmanchon at univ-montp2.fr Thu Mar 26 04:31:40 2009 From: lmanchon at univ-montp2.fr (Laurent MANCHON) Date: Thu, 26 Mar 2009 09:31:40 +0100 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> Message-ID: <49CB3D6C.2000202@univ-montp2.fr> yes but this is a school problem that my teacher ask us to resolve without using Bioperl modules ! i have written a piece of code in awk but it takes too much times to perform the task: #!/usr/bin/awk -f #usage: myprog.awk file1.txt # file2.txt file1.txt # CDS 3760 3913 + AT1G01010 acc_1762592 24 89 112 - # exon 3631 3913 + AT1G01010 acc_2739797 24 304 327 - # CDS 3996 4276 + AT1G01010 acc_1955650 18 308 325 - BEGIN { while((getline < "file2.txt") > 0){ cpt++ descr[cpt]=$1 start[cpt]=$2 end[cpt]=$3 strand[cpt]=$4 tair[cpt]=$5 } close("file2.txt") } { j=1 while(start[j]<=$3 && j<=cpt){ if(end[j]>=$4){print "from="$3,"to="$4,"start="start[j],"end="end[j],"j="j;j++} else{j++} } } Chris Fields a ?crit : > Laurent, > > All BioPerl modules, including Bio::SimpleAlign, have documentation > via 'perldoc', you should have a look at that for specific examples. > Myself, I recommend using Bio::DB::SeqFeature::Store (or another > Bio::SeqFeature::CollectionI) for this. > > chris > > On Mar 25, 2009, at 3:09 PM, Laurent Manchon wrote: > >> -- yes perhaps, >> but i don't know how to use Bio::SimpleAlign object to resolve my >> problem, what a pity for me, >> so i'm going on to search using in another way procedural programmation. >> >> thank you -- >> >> Kevin Brown a ?crit : >>> Please keep all replies on list. >>> Doing it with the SimpleAlign gets rid of the problem of >>> incrementing and reduces the complexity of the number of loop >>> iterations you'll have to do. Based on your sample data you have a >>> lot of IDs that actually have the same location information that >>> they are needing, you also have overlapping information from the >>> first file. So you'll still need to make decisions as to which item >>> is what you really want (e.g. CDS vs Exon). >>> >>> >>> ________________________________ >>> >>> From: Laurent MANCHON [mailto:lmanchon at univ-montp2.fr] Sent: >>> Wednesday, March 25, 2009 9:44 AM >>> To: Kevin Brown >>> Subject: Re: [Bioperl-l] problem to fit genomic coordinates >>> >>> >>> Okay but i think it's not an easy way with this method, >>> the files are already sorted on colum numbers, so maybe another >>> logical method >>> without using Bioperl libraries exist, for example using a while >>> loop, >>> >>> something like: >>> >>> $i = $j = 1; >>> $idx = number of lines in file1 >>> $cpt = number of lines in file2 >>> while ($i <= $idx && $j <= $cpt) { >>> #compare current elements >>> #increment either $i or $j depending which segment comes before >>> the other >>> } >>> the difficulty is when to decide to incremente $i or $j inside >>> the loop >>> >>> Laurent -- >>> >>> Kevin Brown a ?crit : >>> Read in first file and create a Bio::SimpleAlign object >>> >>> Then use the slice method to find the features that are >>> between the >>> start/end values of your second file >>> >>> =head2 slice >>> >>> Title : slice >>> Usage : $aln2 = $aln->slice(20,30) >>> Function : Creates a slice from the alignment inclusive of >>> start and >>> end columns, and the first column in the >>> alignment is >>> denoted 1. >>> Sequences with no residues in the slice are >>> excluded from >>> the >>> new alignment and a warning is printed. Slice >>> beyond the >>> length of >>> the sequence does not do padding. >>> Returns : A Bio::SimpleAlign object >>> Args : Positive integer for start column, positive >>> integer for end >>> column, >>> optional boolean which if true will keep >>> gap-only columns >>> in the newly >>> created slice. Example: >>> >>> $aln2 = $aln->slice(20,30,1) >>> >>> =cut >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>> Of Laurent MANCHON >>> Sent: Wednesday, March 25, 2009 7:57 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] problem to fit genomic coordinates >>> >>> this is my problem: >>> how is it possible to fit range of genomic coordinates >>> stored in two distinct files ? >>> >>> first file (file1.txt) is my annotation file with format >>> as: >>> >>> regulatory_region 3455 3463 >>> regulatory_region 3535 3544 >>> regulatory_region 3601 3608 >>> transcriptional_cis_regulatory_region 3622 3630 >>> five_prime_UTR 3631 3759 >>> CDS 3760 3913 >>> exon 3631 3913 >>> CDS 3996 4276 >>> exon 3996 4276 >>> CDS 4486 4605 >>> exon 4486 4605 >>> CDS 4706 5095 >>> exon 4706 5095 >>> CDS 5174 5326 >>> exon 5174 5326 >>> .... >>> .... >>> >>> second file (file2.txt) is my experimental file with >>> format as: >>> >>> acc_2765773 3222 3239 - >>> acc_2842543 3222 3239 - >>> acc_2842544 3222 3239 - >>> acc_442945 3222 3239 - >>> acc_442946 3222 3239 - >>> acc_4873 3222 3239 - >>> acc_53956 3222 3239 - >>> acc_562588 3222 3239 - >>> acc_807114 3222 3239 - >>> acc_84146 3222 3239 - >>> acc_2419732 3268 3285 + >>> acc_3041065 3565 3583 + >>> acc_362358 3640 3656 - >>> acc_3279485 3793 3813 + >>> acc_3091017 3794 3811 - >>> acc_2807380 3832 3848 + >>> acc_3105138 3832 3848 + >>> acc_3105139 3832 3848 + >>> acc_3105140 3832 3848 + >>> acc_3116450 3832 3848 + >>> acc_86708 3832 3848 + >>> acc_1987802 3922 3938 - >>> acc_1679660 4113 4129 + >>> acc_891489 4113 4129 + >>> acc_2829973 4299 4318 + >>> .... >>> .... >>> >>> >>> number of lines in file1.txt ~ 150000 >>> number of lines in file2.txt ~ 800000 >>> >>> so, how to annotate my file2 using the genomic >>> coordinates stored in file1. I need to compare each >>> couple of range of my file2 with each couple of range of >>> my file1: 800000x150000 combinaisons (quadratic analysis) ? >>> i'm looking for a fast method to do that, something like >>> linear progression in the analysis >>> >>> thank you so much if you have ideas for help me. >>> >>> Laurent -- >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Mar 26 08:38:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Mar 2009 07:38:37 -0500 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB3D6C.2000202@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> Message-ID: <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: > yes but this is a school problem that my teacher ask us to resolve > without using Bioperl modules ! I didn't bother reading beyond that sentence. Not to state the absolute obvious here, but: 1) you are posting to the bioperl list for a non-bioperl-related question, and 2) you are committing one of the biggest no-no's for a list, asking us to help you with your homework. Don't be surprised if you get a few nasty responses and no help. chris From maj at fortinbras.us Thu Mar 26 09:02:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Mar 2009 09:02:33 -0400 Subject: [Bioperl-l] thanks wiki-slayers Message-ID: (at great risk of browning my nose...) Thanks to jason, chris d., cjf, mauricio, and all who fought and subdued the uppity wiki yesterday- I for one appreciate it- cheers, MAJ From cjfields at illinois.edu Thu Mar 26 09:09:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Mar 2009 08:09:52 -0500 Subject: [Bioperl-l] thanks wiki-slayers In-Reply-To: References: Message-ID: That would be mainly jason, chris d., and mauricio. hilmar and I cheered from the sidelines... chris On Mar 26, 2009, at 8:02 AM, Mark A. Jensen wrote: > (at great risk of browning my nose...) > Thanks to jason, chris d., cjf, mauricio, and all who fought > and subdued the uppity wiki yesterday- I for one appreciate > it- > cheers, > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lmanchon at univ-montp2.fr Thu Mar 26 09:36:29 2009 From: lmanchon at univ-montp2.fr (Laurent MANCHON) Date: Thu, 26 Mar 2009 14:36:29 +0100 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> Message-ID: <49CB84DD.3000401@univ-montp2.fr> Chris Fields a ?crit : > > On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: > >> yes but this is a school problem that my teacher ask us to resolve >> without using Bioperl modules ! > > I didn't bother reading beyond that sentence. Not to state the > absolute obvious here, but: > > 1) you are posting to the bioperl list for a non-bioperl-related > question, and genomic coordinates are not questions about biology ? i'm speaking about GENOME, and not GEOGRAPHY > 2) you are committing one of the biggest no-no's for a list, asking us > to help you with your homework. in bioperl you have BIO, okay but too you have PERL ! > > Don't be surprised if you get a few nasty responses and no help. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 26 09:58:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Mar 2009 08:58:32 -0500 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB84DD.3000401@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> Message-ID: <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: > Chris Fields a ?crit : >> >> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >> >>> yes but this is a school problem that my teacher ask us to resolve >>> without using Bioperl modules ! >> >> I didn't bother reading beyond that sentence. Not to state the >> absolute obvious here, but: >> >> 1) you are posting to the bioperl list for a non-bioperl-related >> question, and > genomic coordinates are not questions about biology ? > i'm speaking about GENOME, and not GEOGRAPHY And this is a mail list for BioPerl (the toolkit), not perl and biology. We will sometimes answer questions along these lines if they are relevant, but apparently our answers (all notably BioPerl-related, mind you) were tossed to the side and you asked for more. I suppose you at least showed some honesty and revealed exactly why you needed this answered, but again, don't be surprised if you get a nasty response and no answers. You won't get any from me. >> 2) you are committing one of the biggest no-no's for a list, asking >> us to help you with your homework. > in bioperl you have BIO, okay but too you have PERL ! Interesting how you skirted that last question. We won't do your homework for you. Sorry. chris From philsf79 at gmail.com Thu Mar 26 10:00:39 2009 From: philsf79 at gmail.com (Felipe Figueiredo) Date: Thu, 26 Mar 2009 11:00:39 -0300 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB84DD.3000401@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> Message-ID: <49CB8A87.6000807@gmail.com> Laurent MANCHON escreveu: > Chris Fields a ?crit : >> >> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >> >>> yes but this is a school problem that my teacher ask us to resolve >>> without using Bioperl modules ! >> >> I didn't bother reading beyond that sentence. Not to state the >> absolute obvious here, but: >> >> 1) you are posting to the bioperl list for a non-bioperl-related >> question, and > genomic coordinates are not questions about biology ? > i'm speaking about GENOME, and not GEOGRAPHY And yet, you stated you need help with something that must not use bio-perl. >> 2) you are committing one of the biggest no-no's for a list, asking >> us to help you with your homework. > in bioperl you have BIO, okay but too you have PERL ! You do understand that if you separate Bio from Perl, your subject does not belong to this list, don't you? Do you go a computer kiosk in a mall to ask help configuring your email client? FF From cjfields at illinois.edu Thu Mar 26 11:18:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Mar 2009 10:18:34 -0500 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB997F.7050805@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> Message-ID: <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> (edited for those with sensitive eyes) Laurent, Please keep all responses, no matter how puerile, on the mail list ;> We're trying to point out the blatantly obvious: this isn't the place for your question. Sorry if that irritates you. And, to reiterate, don't be surprised if you get some nasty responses. chris (hoping this isn't one of the GSoC students, as he's introducing Laurent to his spam filter) On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: > p*** off > > Chris Fields a ?crit : >> >> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: >> >>> Chris Fields a ?crit : >>>> >>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >>>> >>>>> yes but this is a school problem that my teacher ask us to >>>>> resolve without using Bioperl modules ! >>>> >>>> I didn't bother reading beyond that sentence. Not to state the >>>> absolute obvious here, but: >>>> >>>> 1) you are posting to the bioperl list for a non-bioperl-related >>>> question, and >>> genomic coordinates are not questions about biology ? >>> i'm speaking about GENOME, and not GEOGRAPHY >> >> And this is a mail list for BioPerl (the toolkit), not perl and >> biology. We will sometimes answer questions along these lines if >> they are relevant, but apparently our answers (all notably BioPerl- >> related, mind you) were tossed to the side and you asked for more. >> >> I suppose you at least showed some honesty and revealed exactly why >> you needed this answered, but again, don't be surprised if you get >> a nasty response and no answers. You won't get any from me. >> >>>> 2) you are committing one of the biggest no-no's for a list, >>>> asking us to help you with your homework. >>> in bioperl you have BIO, okay but too you have PERL ! >> >> Interesting how you skirted that last question. We won't do your >> homework for you. Sorry. >> >> chris >> >> > > From scott at scottcain.net Thu Mar 26 11:26:59 2009 From: scott at scottcain.net (Scott Cain) Date: Thu, 26 Mar 2009 11:26:59 -0400 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> Message-ID: <536f21b00903260826i7698bda5x4b994b433ffd569a@mail.gmail.com> I have to wonder what the likelyhood is of Laurent's instructor reading this mailing list? On Thu, Mar 26, 2009 at 11:18 AM, Chris Fields wrote: > (edited for those with sensitive eyes) > > Laurent, > > Please keep all responses, no matter how puerile, on the mail list ;> > > We're trying to point out the blatantly obvious: this isn't the place for > your question. ?Sorry if that irritates you. ?And, to reiterate, don't be > surprised if you get some nasty responses. > > chris > > (hoping this isn't one of the GSoC students, as he's introducing Laurent to > his spam filter) > > On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: > >> p*** off >> >> Chris Fields a ?crit : >>> >>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: >>> >>>> Chris Fields a ?crit : >>>>> >>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >>>>> >>>>>> yes but this is a school problem that my teacher ask us to resolve >>>>>> without using Bioperl modules ! >>>>> >>>>> I didn't bother reading beyond that sentence. ?Not to state the >>>>> absolute obvious here, but: >>>>> >>>>> 1) you are posting to the bioperl list for a non-bioperl-related >>>>> question, and >>>> >>>> genomic coordinates are not questions about biology ? >>>> i'm speaking about GENOME, and not GEOGRAPHY >>> >>> And this is a mail list for BioPerl (the toolkit), not perl and biology. >>> ?We will sometimes answer questions along these lines if they are relevant, >>> but apparently our answers (all notably BioPerl-related, mind you) were >>> tossed to the side and you asked for more. >>> >>> I suppose you at least showed some honesty and revealed exactly why you >>> needed this answered, but again, don't be surprised if you get a nasty >>> response and no answers. ?You won't get any from me. >>> >>>>> 2) you are committing one of the biggest no-no's for a list, asking us >>>>> to help you with your homework. >>>> >>>> in bioperl you have BIO, okay but too you have PERL ! >>> >>> Interesting how you skirted that last question. ?We won't do your >>> homework for you. ?Sorry. >>> >>> chris >>> >>> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From lmanchon at univ-montp2.fr Thu Mar 26 11:30:14 2009 From: lmanchon at univ-montp2.fr (Laurent MANCHON) Date: Thu, 26 Mar 2009 16:30:14 +0100 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> Message-ID: <49CB9F86.6000706@univ-montp2.fr> okay, you are right, but i think in my opinion that my question is a good question about parsing enormous range of intervals. The problem is not perl, bioperl, or other language, it's just an algorithmic question. I'm not a professionnal in Bioperl and i don't know what is possible to do with all the Bioperl modules. So if you think it's possible to resolve my question with Bioperl maybe you are right, but in my position i stay in the same point. If you want i send you the two files needed in my question. And if you are agree try to use Bioperl to resolve it. Maybe it's not possible because files are big, i don't know. Chris Fields a ?crit : > (edited for those with sensitive eyes) > > Laurent, > > Please keep all responses, no matter how puerile, on the mail list ;> > > We're trying to point out the blatantly obvious: this isn't the place > for your question. Sorry if that irritates you. And, to reiterate, > don't be surprised if you get some nasty responses. > > chris > > (hoping this isn't one of the GSoC students, as he's introducing > Laurent to his spam filter) > > On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: > >> p*** off >> >> Chris Fields a ?crit : >>> >>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: >>> >>>> Chris Fields a ?crit : >>>>> >>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >>>>> >>>>>> yes but this is a school problem that my teacher ask us to >>>>>> resolve without using Bioperl modules ! >>>>> >>>>> I didn't bother reading beyond that sentence. Not to state the >>>>> absolute obvious here, but: >>>>> >>>>> 1) you are posting to the bioperl list for a non-bioperl-related >>>>> question, and >>>> genomic coordinates are not questions about biology ? >>>> i'm speaking about GENOME, and not GEOGRAPHY >>> >>> And this is a mail list for BioPerl (the toolkit), not perl and >>> biology. We will sometimes answer questions along these lines if >>> they are relevant, but apparently our answers (all notably >>> BioPerl-related, mind you) were tossed to the side and you asked for >>> more. >>> >>> I suppose you at least showed some honesty and revealed exactly why >>> you needed this answered, but again, don't be surprised if you get a >>> nasty response and no answers. You won't get any from me. >>> >>>>> 2) you are committing one of the biggest no-no's for a list, >>>>> asking us to help you with your homework. >>>> in bioperl you have BIO, okay but too you have PERL ! >>> >>> Interesting how you skirted that last question. We won't do your >>> homework for you. Sorry. >>> >>> chris >>> >>> >> >> > > From sdavis2 at mail.nih.gov Thu Mar 26 12:25:36 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 26 Mar 2009 12:25:36 -0400 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB9F86.6000706@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> <49CB9F86.6000706@univ-montp2.fr> Message-ID: <264855a00903260925q5e07d62fm17e195e8ab6e0198@mail.gmail.com> On Thu, Mar 26, 2009 at 11:30 AM, Laurent MANCHON wrote: > okay, you are right, > but i think in my opinion that my question is a good question about parsing > enormous > range of intervals. > The problem is not perl, bioperl, or other language, it's just an > algorithmic question. > I'm not a professionnal in Bioperl and i don't know what is possible to do > with all the Bioperl modules. > So if you think it's possible to resolve my question with Bioperl maybe you > are right, but in my position i stay in the same point. > If you want i send you the two files needed in my question. And if you are > agree try to use Bioperl to resolve it. Maybe it's not possible because > files are big, i don't know. > To answer the question a bit more directly, you might consider using an R-tree indexing scheme or something akin to the binning scheme that UCSC uses for range queries. Algorithms like those allow very fast range operations. If you want a sense of how fast, try using the galaxy server at Penn State. Sean > > > > Chris Fields a ?crit : > >> (edited for those with sensitive eyes) >> >> Laurent, >> >> Please keep all responses, no matter how puerile, on the mail list ;> >> >> We're trying to point out the blatantly obvious: this isn't the place for >> your question. Sorry if that irritates you. And, to reiterate, don't be >> surprised if you get some nasty responses. >> >> chris >> >> (hoping this isn't one of the GSoC students, as he's introducing Laurent >> to his spam filter) >> >> On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: >> >> p*** off >>> >>> Chris Fields a ?crit : >>> >>>> >>>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: >>>> >>>> Chris Fields a ?crit : >>>>> >>>>>> >>>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >>>>>> >>>>>> yes but this is a school problem that my teacher ask us to resolve >>>>>>> without using Bioperl modules ! >>>>>>> >>>>>> >>>>>> I didn't bother reading beyond that sentence. Not to state the >>>>>> absolute obvious here, but: >>>>>> >>>>>> 1) you are posting to the bioperl list for a non-bioperl-related >>>>>> question, and >>>>>> >>>>> genomic coordinates are not questions about biology ? >>>>> i'm speaking about GENOME, and not GEOGRAPHY >>>>> >>>> >>>> And this is a mail list for BioPerl (the toolkit), not perl and biology. >>>> We will sometimes answer questions along these lines if they are relevant, >>>> but apparently our answers (all notably BioPerl-related, mind you) were >>>> tossed to the side and you asked for more. >>>> >>>> I suppose you at least showed some honesty and revealed exactly why you >>>> needed this answered, but again, don't be surprised if you get a nasty >>>> response and no answers. You won't get any from me. >>>> >>>> 2) you are committing one of the biggest no-no's for a list, asking us >>>>>> to help you with your homework. >>>>>> >>>>> in bioperl you have BIO, okay but too you have PERL ! >>>>> >>>> >>>> Interesting how you skirted that last question. We won't do your >>>> homework for you. Sorry. >>>> >>>> chris >>>> >>>> >>>> >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Mar 26 14:07:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Mar 2009 13:07:50 -0500 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <264855a00903260925q5e07d62fm17e195e8ab6e0198@mail.gmail.com> References: <49CA4627.7080804@univ-montp2.fr> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> <49CB9F86.6000706@univ-montp2.fr> <264855a00903260925q5e07d62fm17e195e8ab6e0198@mail.gmail.com> Message-ID: <0AF37E15-D112-4E4E-BAF2-5D99243C0CED@illinois.edu> On Mar 26, 2009, at 11:25 AM, Sean Davis wrote: > On Thu, Mar 26, 2009 at 11:30 AM, Laurent MANCHON > wrote: > >> okay, you are right, >> but i think in my opinion that my question is a good question about >> parsing >> enormous >> range of intervals. >> The problem is not perl, bioperl, or other language, it's just an >> algorithmic question. This problem has already been solved to a great degree within BioPerl, and to a great many users satisfaction (see the Gbrowse list for a larger group of users). No need to reinvent the wheel, just optimize it. As you've indicated, if you need a non-toolkit, from-scratch solution you should follow Sean's suggestion (binning and R-tree). >> I'm not a professionnal in Bioperl and i don't know what is >> possible to do >> with all the Bioperl modules. >> So if you think it's possible to resolve my question with Bioperl >> maybe you >> are right, but in my position i stay in the same point. >> If you want i send you the two files needed in my question. And if >> you are >> agree try to use Bioperl to resolve it. Maybe it's not possible >> because >> files are big, i don't know. *sigh* Laurent, that's not it. Obviously this isn't quite sinking in, so I'll give it one last shot then I'm done. It isn't our job to do your work for you, homework or otherwise. We made our suggestions, and (given the situation) we even sometimes write up some demo code, but it's up to you to do the work. Particularly seeing as it's a homework problem. Let's say Scott's right and your instructor is on this list. Judging by one of your previous responses ('not to use BioPerl'), your instructor may very well hang out here. Also, remember this is a public mail list, archived and searchable via any web engine: http://bioperl.org/pipermail/bioperl-l/2009-March/029626.html Good luck with that. > To answer the question a bit more directly, you might consider using > an > R-tree indexing scheme or something akin to the binning scheme that > UCSC > uses for range queries. Binning is what Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store and the like do (hence my suggestion). It's quite fast. I'm not sure if we have an R-tree implementation or not, might be worth looking into. > Algorithms like those allow very fast range > operations. If you want a sense of how fast, try using the galaxy > server at > Penn State. > > Sean Another good option. chris From shalabh.sharma7 at gmail.com Thu Mar 26 14:26:38 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 26 Mar 2009 14:26:38 -0400 Subject: [Bioperl-l] Organism Classification Message-ID: <9fcc48c70903261126y6c083438gc77504c0c9ae55aa@mail.gmail.com> Hi All, I am writing a script and in one of its part i have to classify organism into eukaryotic or prokaryotic. I already have a list of organism, is there any indexed file or table in ncbi which i can use? or is there any other way? I would really appreciate if anyone can help me out. Thanks Shalabh From michael.watson at bbsrc.ac.uk Thu Mar 26 14:43:39 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 26 Mar 2009 18:43:39 -0000 Subject: [Bioperl-l] Organism Classification References: <9fcc48c70903261126y6c083438gc77504c0c9ae55aa@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504AA2C15@iahce2ksrv1.iah.bbsrc.ac.uk> You need the NCBI taxonomy database: http://www.ncbi.nlm.nih.gov/Taxonomy/ http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org on behalf of shalabh sharma Sent: Thu 26/03/2009 6:26 PM To: bioperl-l Subject: [Bioperl-l] Organism Classification Hi All, I am writing a script and in one of its part i have to classify organism into eukaryotic or prokaryotic. I already have a list of organism, is there any indexed file or table in ncbi which i can use? or is there any other way? I would really appreciate if anyone can help me out. Thanks Shalabh _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 26 15:17:22 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Mar 2009 15:17:22 -0400 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB9F86.6000706@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr><9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu><49CB84DD.3000401@univ-montp2.fr><494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu><49CB997F.7050805@univ-montp2.fr><4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> <49CB9F86.6000706@univ-montp2.fr> Message-ID: <786F570F297844438EEAEFB645586475@NewLife> Laurent, I know that enough has been said on this issue, but I want to address not only you but all the student lurkers on the list. bioperl-l is mainly a place for professional questions and answers, but there is a very strong educational component here as well. That comes directly from the attitude of the BioPerl leadership, who are mainly academic in origin, who love to program and to teach programming. The ethic "we don't do homework for you" is really part of that educational mission that is subconsciously built in to Bioperl. It is a really, really serious mistake to treat global experts in a field you're interested in entering as if they owed you something. On the contrary, every help response to this list is time, worth hundreds of euros an hour, given away for free. The motivations for giving it vary, but the fact that it deserves gratitude and attentiveness does not. You will be very hard-pressed to find a listserv as helpful as this, not to mention as forgiving. Other places would have banned you instantly; instead, at the end of your tirade you received several patient and helpful responses. I have thought for a long time (and I've been in the biz a long time) that students need to attempt to forget that they are students, and start imagining themselves as scientists right now where they are. Leave behind the mindset that you're just trying to make it through another course, and adopt the mindset that you are preparing yourself to pursue something you love to do. Until this happens, you are not a professional, you don't think like professional, and you're liable to make serious mistakes. Scientists are very social, and have very long memories. Those two qualities make them good scientists. However, those qualities can also make it very difficult to reinvent yourself in their minds, if you screw up. You have to work very, very hard to overcome it. So, in the future, think of yourself as a scientist, in a community of scientists, each of whom could be helpful and useful to you. They will then think of you that way too, and it will help you do what you want to do. Good luck- Mark ----- Original Message ----- From: "Laurent MANCHON" To: "Chris Fields" ; Sent: Thursday, March 26, 2009 11:30 AM Subject: Re: [Bioperl-l] problem to fit genomic coordinates okay, you are right, but i think in my opinion that my question is a good question about parsing enormous range of intervals. The problem is not perl, bioperl, or other language, it's just an algorithmic question. I'm not a professionnal in Bioperl and i don't know what is possible to do with all the Bioperl modules. So if you think it's possible to resolve my question with Bioperl maybe you are right, but in my position i stay in the same point. If you want i send you the two files needed in my question. And if you are agree try to use Bioperl to resolve it. Maybe it's not possible because files are big, i don't know. Chris Fields a ?crit : > (edited for those with sensitive eyes) > > Laurent, > > Please keep all responses, no matter how puerile, on the mail list ;> > > We're trying to point out the blatantly obvious: this isn't the place for your > question. Sorry if that irritates you. And, to reiterate, don't be surprised > if you get some nasty responses. > > chris > > (hoping this isn't one of the GSoC students, as he's introducing Laurent to > his spam filter) > > On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: > >> p*** off >> >> Chris Fields a ?crit : >>> >>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: >>> >>>> Chris Fields a ?crit : >>>>> >>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: >>>>> >>>>>> yes but this is a school problem that my teacher ask us to resolve >>>>>> without using Bioperl modules ! >>>>> >>>>> I didn't bother reading beyond that sentence. Not to state the absolute >>>>> obvious here, but: >>>>> >>>>> 1) you are posting to the bioperl list for a non-bioperl-related question, >>>>> and >>>> genomic coordinates are not questions about biology ? >>>> i'm speaking about GENOME, and not GEOGRAPHY >>> >>> And this is a mail list for BioPerl (the toolkit), not perl and biology. We >>> will sometimes answer questions along these lines if they are relevant, but >>> apparently our answers (all notably BioPerl-related, mind you) were tossed >>> to the side and you asked for more. >>> >>> I suppose you at least showed some honesty and revealed exactly why you >>> needed this answered, but again, don't be surprised if you get a nasty >>> response and no answers. You won't get any from me. >>> >>>>> 2) you are committing one of the biggest no-no's for a list, asking us to >>>>> help you with your homework. >>>> in bioperl you have BIO, okay but too you have PERL ! >>> >>> Interesting how you skirted that last question. We won't do your homework >>> for you. Sorry. >>> >>> chris >>> >>> >> >> > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Mar 26 16:28:36 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 27 Mar 2009 09:28:36 +1300 Subject: [Bioperl-l] problem to fit genomic coordinates In-Reply-To: <49CB9F86.6000706@univ-montp2.fr> References: <49CA4627.7080804@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7E74@EX02.asurite.ad.asu.edu> <49CA5F5F.309@univ-montp2.fr> <1A4207F8295607498283FE9E93B775B405DC7F24@EX02.asurite.ad.asu.edu> <49CA8F71.2040909@univ-montp2.fr> <93B4D76F-DF75-42FE-B7A3-63F905873A54@illinois.edu> <49CB3D6C.2000202@univ-montp2.fr> <9553BAF9-3663-4E06-B7AC-4FA901FE749B@illinois.edu> <49CB84DD.3000401@univ-montp2.fr> <494AEC91-4845-47C8-80D6-B417AA48504F@illinois.edu> <49CB997F.7050805@univ-montp2.fr> <4874C6A0-E755-4327-B2D5-0B107FD17F78@illinois.edu> <49CB9F86.6000706@univ-montp2.fr> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493505B07@exchsth.agresearch.co.nz> I'm _not_ doing your homework, but I did something similar with hashes. You'd probably look at the loops in the code and expect it to be slow but it was faster and simpler than any other solution I could come up with. I was indexing the position of 4.7 million repeats in a genome then finding (about 1 million) SSRs within 50bp of them. I was reading from gff files but you could split your own data to do get the required fields. Instead of using $OFFSET when searching, use your start and end coords. (this is not all the code and it probably won't run as I've chopped bits out but you get the general idea...) ##load the repeats from a gff file into a hash/array ##please excuse my poorly named variables :-) open(RM,"repeats.gff") or die $!; while(){ chomp; my($chr_rm,undef,undef,$start_rm,$end_rm) = split(/\t/,$_); # MULTIPLE REPEATS CAN HAVE THE SAME CHR AND START OR END POSITION # SO INDEXING LIKE THIS: {BTA1}{12345}(repeat1,repeat2,repeat3) push(@{$rmarray_starts{$chr_rm}{$start_rm}},$_); } close RM; #read the SSRs and find their positions in the repeat hash open(SSR,"ssrs.gff") or die $!; while(){ chomp; my($chr_ssr,undef,undef,$start_ssr,$end_ssr) = split(/\t/,$_); for(my$i = $start_ssr - $OFFSET; $i < $start_ssr; $i++){ if(defined $rmarray_starts{$chr_ssr}{$i}){ foreach my$s (@{$rmarray_starts{$chr_ssr}{$i}}){ #do something with the hit print $s; } } } } Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Laurent MANCHON > Sent: Friday, 27 March 2009 4:30 a.m. > To: Chris Fields; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problem to fit genomic coordinates > > okay, you are right, > but i think in my opinion that my question is a good question about > parsing enormous > range of intervals. > The problem is not perl, bioperl, or other language, it's just an > algorithmic question. > I'm not a professionnal in Bioperl and i don't know what is possible to > do with all the Bioperl modules. > So if you think it's possible to resolve my question with Bioperl maybe > you are right, but in my position i stay in the same point. > If you want i send you the two files needed in my question. And if you > are agree try to use Bioperl to resolve it. Maybe it's not possible because > files are big, i don't know. > > > > Chris Fields a ?crit : > > (edited for those with sensitive eyes) > > > > Laurent, > > > > Please keep all responses, no matter how puerile, on the mail list ;> > > > > We're trying to point out the blatantly obvious: this isn't the place > > for your question. Sorry if that irritates you. And, to reiterate, > > don't be surprised if you get some nasty responses. > > > > chris > > > > (hoping this isn't one of the GSoC students, as he's introducing > > Laurent to his spam filter) > > > > On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote: > > > >> p*** off > >> > >> Chris Fields a ?crit : > >>> > >>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote: > >>> > >>>> Chris Fields a ?crit : > >>>>> > >>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote: > >>>>> > >>>>>> yes but this is a school problem that my teacher ask us to > >>>>>> resolve without using Bioperl modules ! > >>>>> > >>>>> I didn't bother reading beyond that sentence. Not to state the > >>>>> absolute obvious here, but: > >>>>> > >>>>> 1) you are posting to the bioperl list for a non-bioperl-related > >>>>> question, and > >>>> genomic coordinates are not questions about biology ? > >>>> i'm speaking about GENOME, and not GEOGRAPHY > >>> > >>> And this is a mail list for BioPerl (the toolkit), not perl and > >>> biology. We will sometimes answer questions along these lines if > >>> they are relevant, but apparently our answers (all notably > >>> BioPerl-related, mind you) were tossed to the side and you asked for > >>> more. > >>> > >>> I suppose you at least showed some honesty and revealed exactly why > >>> you needed this answered, but again, don't be surprised if you get a > >>> nasty response and no answers. You won't get any from me. > >>> > >>>>> 2) you are committing one of the biggest no-no's for a list, > >>>>> asking us to help you with your homework. > >>>> in bioperl you have BIO, okay but too you have PERL ! > >>> > >>> Interesting how you skirted that last question. We won't do your > >>> homework for you. Sorry. > >>> > >>> chris > >>> > >>> > >> > >> > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Thu Mar 26 23:49:01 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Mar 2009 20:49:01 -0700 Subject: [Bioperl-l] thanks wiki-slayers In-Reply-To: References: Message-ID: <2CF42339-6F1F-4EB4-8147-DAEE34DCBFF2@bioperl.org> Thanks Mark. Glad we got it back up before too long - some silly upstream router problems - hopefully we don't hit that again... Thanks to Chris for restarting the right upstream box and mauricio in helping w/ debugging. -jason On Mar 26, 2009, at 6:09 AM, Chris Fields wrote: > That would be mainly jason, chris d., and mauricio. hilmar and I > cheered from the sidelines... > > chris > > On Mar 26, 2009, at 8:02 AM, Mark A. Jensen wrote: > >> (at great risk of browning my nose...) >> Thanks to jason, chris d., cjf, mauricio, and all who fought >> and subdued the uppity wiki yesterday- I for one appreciate >> it- >> cheers, >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From paolo.pavan at gmail.com Fri Mar 27 09:09:32 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 27 Mar 2009 14:09:32 +0100 Subject: [Bioperl-l] Question about parsing a gb file Message-ID: <56be91b60903270609l11bd0edan75ab8d3f0e552d56@mail.gmail.com> Hi everybody,I have a little problem/question in parsing a genbank file. I've got a $s = Bio::Seq object to which I've added some Bio::SeqFeature::Generic, everything here seem to be ok since I can find all the properties of the $s setted correctly in my visual debugger; for instance, I can find the display_name properties of the SeqFeature in the $s object. Than I perform a print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) to write down the genbank file but there I can't get any more some properties of the sequence, like the "display_name". What does it happens? Below a snip of code, thank you in advance, Paolo my $s = $str->next_seq(); my $f = Bio::SeqFeature::Generic->new( -start => 10, -end => 100, -strand => -1, -primary => 'CDS', # -primary_tag is a synonym -source_tag => 'repeatmasker', -display_name => 'alu family' ); $s->add_SeqFeature($f); print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) From govind.chandra at bbsrc.ac.uk Fri Mar 27 11:26:02 2009 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Fri, 27 Mar 2009 15:26:02 +0000 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: References: Message-ID: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> Hi, The code below ====== code begins ======= #use strict; use Bio::SeqIO; $infile='NC_000913.gbk'; my $seqio=Bio::SeqIO->new(-file => $infile); my $seqobj=$seqio->next_seq(); my @features=$seqobj->all_SeqFeatures(); my $count=0; foreach my $feature (@features) { unless($feature->primary_tag() eq 'CDS') {next;} print($feature->start()," ", $feature->end(), " ",$feature->strand(),"\n"); $ac=$feature->annotation(); $temp1=$ac->get_Annotations("locus_tag"); @temp2=$ac->get_Annotations(); print("$temp1 $temp2[0] @temp2\n"); if($count++ > 5) {last;} } print(ref($ac),"\n"); exit; ======= code ends ======== produces the output ========== output begins ======== 190 255 1 0 337 2799 1 0 2801 3733 1 0 3734 5020 1 0 5234 5530 1 0 5683 6459 -1 0 6529 7959 -1 0 Bio::Annotation::Collection =========== output ends ========== $ac is-a Bio::Annotation::Collection but does not actually contain any annotation from the feature. Is this how it should be? I cannot figure out what is wrong with the script. Earlier I used to use has_tag(), get_tag_values() etc. but the documentation says these are deprecated. Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname -a is Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Thanks in advance for any help. Govind From maj at fortinbras.us Fri Mar 27 12:17:29 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 12:17:29 -0400 Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <9ED3500FE5524639887C2E762053628B@NewLife> Hi Govind- As near as I can tell, the *_tags methods are deprecated for Bio::AnnotatableI objects, but these methods are available off the SeqFeatureI objects themselves: i.e., rather than > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); do $temp1 = $feature->get_tag_values("locus_tag"); directly. hope it helps - Mark ----- Original Message ----- From: "Govind Chandra" To: Sent: Friday, March 27, 2009 11:26 AM Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From govind.chandra at bbsrc.ac.uk Fri Mar 27 13:09:29 2009 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Fri, 27 Mar 2009 17:09:29 +0000 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <9ED3500FE5524639887C2E762053628B@NewLife> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> Message-ID: <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> Thanks Mark, Sorry for not putting a proper subject in the last post. What you suggest is what I have been doing for a long time. I am just trying to alter my code to conform to the latest bioperl version and ran into this issue. I could be wrong (I am more a user rather than writer of modules) but since $feature->annotation() does not result in an error I think $feature is-a Bio::AnnotatableI as well. Cheers Govind On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: > Hi Govind- > > As near as I can tell, the *_tags methods are deprecated for > Bio::AnnotatableI objects, but these methods are available > off the SeqFeatureI objects themselves: i.e., rather than > > > $ac=$feature->annotation(); > > $temp1=$ac->get_Annotations("locus_tag"); > > do > > $temp1 = $feature->get_tag_values("locus_tag"); > > directly. > > hope it helps - > Mark > > ----- Original Message ----- > From: "Govind Chandra" > To: > Sent: Friday, March 27, 2009 11:26 AM > Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 > > > > Hi, > > > > The code below > > > > > > ====== code begins ======= > > #use strict; > > use Bio::SeqIO; > > > > $infile='NC_000913.gbk'; > > my $seqio=Bio::SeqIO->new(-file => $infile); > > my $seqobj=$seqio->next_seq(); > > my @features=$seqobj->all_SeqFeatures(); > > my $count=0; > > foreach my $feature (@features) { > > unless($feature->primary_tag() eq 'CDS') {next;} > > print($feature->start()," ", $feature->end(), " > > ",$feature->strand(),"\n"); > > $ac=$feature->annotation(); > > $temp1=$ac->get_Annotations("locus_tag"); > > @temp2=$ac->get_Annotations(); > > print("$temp1 $temp2[0] @temp2\n"); > > if($count++ > 5) {last;} > > } > > > > print(ref($ac),"\n"); > > exit; > > > > ======= code ends ======== > > > > produces the output > > > > ========== output begins ======== > > > > 190 255 1 > > 0 > > 337 2799 1 > > 0 > > 2801 3733 1 > > 0 > > 3734 5020 1 > > 0 > > 5234 5530 1 > > 0 > > 5683 6459 -1 > > 0 > > 6529 7959 -1 > > 0 > > Bio::Annotation::Collection > > > > =========== output ends ========== > > > > $ac is-a Bio::Annotation::Collection but does not actually contain any > > annotation from the feature. Is this how it should be? I cannot figure > > out what is wrong with the script. Earlier I used to use has_tag(), > > get_tag_values() etc. but the documentation says these are deprecated. > > > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > > -a is > > > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > > x86_64 x86_64 x86_64 GNU/Linux > > > > Thanks in advance for any help. > > > > Govind > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From maj at fortinbras.us Fri Mar 27 13:30:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 13:30:17 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk><9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: Hey Govind-- You're right-- SeqFeature::Generic object inherits from AnnotatableI-- but the *_tags_* methods are now SeqFeature::Generic methods--ie, you can use these on features, and they are no longer hitting AnnotableI. It appears that the feature's AnnotationCollection doesn't even get loaded now. [developer out there like to chime in?] cheers, Mark ----- Original Message ----- From: "Govind Chandra" To: "Mark A. Jensen" Cc: Sent: Friday, March 27, 2009 1:09 PM Subject: [Bioperl-l] Bio::AnnotatableI function annotation() > Thanks Mark, > > Sorry for not putting a proper subject in the last post. > > What you suggest is what I have been doing for a long time. I am just > trying to alter my code to conform to the latest bioperl version and ran > into this issue. I could be wrong (I am more a user rather than writer > of modules) but since $feature->annotation() does not result in an error > I think $feature is-a Bio::AnnotatableI as well. > > Cheers > > Govind > > > > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >> Hi Govind- >> >> As near as I can tell, the *_tags methods are deprecated for >> Bio::AnnotatableI objects, but these methods are available >> off the SeqFeatureI objects themselves: i.e., rather than >> >> > $ac=$feature->annotation(); >> > $temp1=$ac->get_Annotations("locus_tag"); >> >> do >> >> $temp1 = $feature->get_tag_values("locus_tag"); >> >> directly. >> >> hope it helps - >> Mark >> >> ----- Original Message ----- >> From: "Govind Chandra" >> To: >> Sent: Friday, March 27, 2009 11:26 AM >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >> >> >> > Hi, >> > >> > The code below >> > >> > >> > ====== code begins ======= >> > #use strict; >> > use Bio::SeqIO; >> > >> > $infile='NC_000913.gbk'; >> > my $seqio=Bio::SeqIO->new(-file => $infile); >> > my $seqobj=$seqio->next_seq(); >> > my @features=$seqobj->all_SeqFeatures(); >> > my $count=0; >> > foreach my $feature (@features) { >> > unless($feature->primary_tag() eq 'CDS') {next;} >> > print($feature->start()," ", $feature->end(), " >> > ",$feature->strand(),"\n"); >> > $ac=$feature->annotation(); >> > $temp1=$ac->get_Annotations("locus_tag"); >> > @temp2=$ac->get_Annotations(); >> > print("$temp1 $temp2[0] @temp2\n"); >> > if($count++ > 5) {last;} >> > } >> > >> > print(ref($ac),"\n"); >> > exit; >> > >> > ======= code ends ======== >> > >> > produces the output >> > >> > ========== output begins ======== >> > >> > 190 255 1 >> > 0 >> > 337 2799 1 >> > 0 >> > 2801 3733 1 >> > 0 >> > 3734 5020 1 >> > 0 >> > 5234 5530 1 >> > 0 >> > 5683 6459 -1 >> > 0 >> > 6529 7959 -1 >> > 0 >> > Bio::Annotation::Collection >> > >> > =========== output ends ========== >> > >> > $ac is-a Bio::Annotation::Collection but does not actually contain any >> > annotation from the feature. Is this how it should be? I cannot figure >> > out what is wrong with the script. Earlier I used to use has_tag(), >> > get_tag_values() etc. but the documentation says these are deprecated. >> > >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname >> > -a is >> > >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> > x86_64 x86_64 x86_64 GNU/Linux >> > >> > Thanks in advance for any help. >> > >> > Govind >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From govind.chandra at bbsrc.ac.uk Fri Mar 27 13:44:31 2009 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Fri, 27 Mar 2009 17:44:31 +0000 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> Hi Mark, Will it be unfair to say that the documentation as well as the implementation are confusing. SeqFeature::Generic should cause an error when annotation() is called on it if it cannot do the right thing. For the time being I will stick with the old ways (has_tag etc.). Good to know they are not deprecated in the way I intend to use them (via SeqFeature::Generic). Cheers Govind On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: > Hey Govind-- > You're right-- SeqFeature::Generic object inherits from > AnnotatableI-- but the *_tags_* methods are now > SeqFeature::Generic methods--ie, you can use these > on features, and they are no longer hitting AnnotableI. > It appears that the feature's AnnotationCollection doesn't > even get loaded now. > [developer out there like to chime in?] > cheers, > Mark > ----- Original Message ----- > From: "Govind Chandra" > To: "Mark A. Jensen" > Cc: > Sent: Friday, March 27, 2009 1:09 PM > Subject: [Bioperl-l] Bio::AnnotatableI function annotation() > > > > Thanks Mark, > > > > Sorry for not putting a proper subject in the last post. > > > > What you suggest is what I have been doing for a long time. I am just > > trying to alter my code to conform to the latest bioperl version and ran > > into this issue. I could be wrong (I am more a user rather than writer > > of modules) but since $feature->annotation() does not result in an error > > I think $feature is-a Bio::AnnotatableI as well. > > > > Cheers > > > > Govind > > > > > > > > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: > >> Hi Govind- > >> > >> As near as I can tell, the *_tags methods are deprecated for > >> Bio::AnnotatableI objects, but these methods are available > >> off the SeqFeatureI objects themselves: i.e., rather than > >> > >> > $ac=$feature->annotation(); > >> > $temp1=$ac->get_Annotations("locus_tag"); > >> > >> do > >> > >> $temp1 = $feature->get_tag_values("locus_tag"); > >> > >> directly. > >> > >> hope it helps - > >> Mark > >> > >> ----- Original Message ----- > >> From: "Govind Chandra" > >> To: > >> Sent: Friday, March 27, 2009 11:26 AM > >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 > >> > >> > >> > Hi, > >> > > >> > The code below > >> > > >> > > >> > ====== code begins ======= > >> > #use strict; > >> > use Bio::SeqIO; > >> > > >> > $infile='NC_000913.gbk'; > >> > my $seqio=Bio::SeqIO->new(-file => $infile); > >> > my $seqobj=$seqio->next_seq(); > >> > my @features=$seqobj->all_SeqFeatures(); > >> > my $count=0; > >> > foreach my $feature (@features) { > >> > unless($feature->primary_tag() eq 'CDS') {next;} > >> > print($feature->start()," ", $feature->end(), " > >> > ",$feature->strand(),"\n"); > >> > $ac=$feature->annotation(); > >> > $temp1=$ac->get_Annotations("locus_tag"); > >> > @temp2=$ac->get_Annotations(); > >> > print("$temp1 $temp2[0] @temp2\n"); > >> > if($count++ > 5) {last;} > >> > } > >> > > >> > print(ref($ac),"\n"); > >> > exit; > >> > > >> > ======= code ends ======== > >> > > >> > produces the output > >> > > >> > ========== output begins ======== > >> > > >> > 190 255 1 > >> > 0 > >> > 337 2799 1 > >> > 0 > >> > 2801 3733 1 > >> > 0 > >> > 3734 5020 1 > >> > 0 > >> > 5234 5530 1 > >> > 0 > >> > 5683 6459 -1 > >> > 0 > >> > 6529 7959 -1 > >> > 0 > >> > Bio::Annotation::Collection > >> > > >> > =========== output ends ========== > >> > > >> > $ac is-a Bio::Annotation::Collection but does not actually contain any > >> > annotation from the feature. Is this how it should be? I cannot figure > >> > out what is wrong with the script. Earlier I used to use has_tag(), > >> > get_tag_values() etc. but the documentation says these are deprecated. > >> > > >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > >> > -a is > >> > > >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > >> > x86_64 x86_64 x86_64 GNU/Linux > >> > > >> > Thanks in advance for any help. > >> > > >> > Govind > >> > > >> > > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Fri Mar 27 14:10:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 14:10:25 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() Message-ID: <7AC2E54DCD3D4DCBAF57832DC764D050@NewLife> [missed the list, evidently] > Hey Govind- > You are right about that-- if you have time, will you make > a bug report (http://bugzilla.bioperl.org) stating just that? > Shouldn't be a problem to fix. (I have a feeling that this > refactorization was designed precisely to be transparent > to users.) > thanks a lot- > Mark > ----- Original Message ----- > From: "Govind Chandra" > To: "Mark A. Jensen" > Cc: > Sent: Friday, March 27, 2009 1:44 PM > Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > > >> Hi Mark, >> Will it be unfair to say that the documentation as well as the >> implementation are confusing. SeqFeature::Generic should cause an error >> when annotation() is called on it if it cannot do the right thing. For >> the time being I will stick with the old ways (has_tag etc.). Good to >> know they are not deprecated in the way I intend to use them (via >> SeqFeature::Generic). >> Cheers >> Govind >> >> >> >> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>> Hey Govind-- >>> You're right-- SeqFeature::Generic object inherits from >>> AnnotatableI-- but the *_tags_* methods are now >>> SeqFeature::Generic methods--ie, you can use these >>> on features, and they are no longer hitting AnnotableI. >>> It appears that the feature's AnnotationCollection doesn't >>> even get loaded now. >>> [developer out there like to chime in?] >>> cheers, >>> Mark >>> ----- Original Message ----- >>> From: "Govind Chandra" >>> To: "Mark A. Jensen" >>> Cc: >>> Sent: Friday, March 27, 2009 1:09 PM >>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>> >>> >>> > Thanks Mark, >>> > >>> > Sorry for not putting a proper subject in the last post. >>> > >>> > What you suggest is what I have been doing for a long time. I am just >>> > trying to alter my code to conform to the latest bioperl version and ran >>> > into this issue. I could be wrong (I am more a user rather than writer >>> > of modules) but since $feature->annotation() does not result in an error >>> > I think $feature is-a Bio::AnnotatableI as well. >>> > >>> > Cheers >>> > >>> > Govind >>> > >>> > >>> > >>> > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>> >> Hi Govind- >>> >> >>> >> As near as I can tell, the *_tags methods are deprecated for >>> >> Bio::AnnotatableI objects, but these methods are available >>> >> off the SeqFeatureI objects themselves: i.e., rather than >>> >> >>> >> > $ac=$feature->annotation(); >>> >> > $temp1=$ac->get_Annotations("locus_tag"); >>> >> >>> >> do >>> >> >>> >> $temp1 = $feature->get_tag_values("locus_tag"); >>> >> >>> >> directly. >>> >> >>> >> hope it helps - >>> >> Mark >>> >> >>> >> ----- Original Message ----- >>> >> From: "Govind Chandra" >>> >> To: >>> >> Sent: Friday, March 27, 2009 11:26 AM >>> >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>> >> >>> >> >>> >> > Hi, >>> >> > >>> >> > The code below >>> >> > >>> >> > >>> >> > ====== code begins ======= >>> >> > #use strict; >>> >> > use Bio::SeqIO; >>> >> > >>> >> > $infile='NC_000913.gbk'; >>> >> > my $seqio=Bio::SeqIO->new(-file => $infile); >>> >> > my $seqobj=$seqio->next_seq(); >>> >> > my @features=$seqobj->all_SeqFeatures(); >>> >> > my $count=0; >>> >> > foreach my $feature (@features) { >>> >> > unless($feature->primary_tag() eq 'CDS') {next;} >>> >> > print($feature->start()," ", $feature->end(), " >>> >> > ",$feature->strand(),"\n"); >>> >> > $ac=$feature->annotation(); >>> >> > $temp1=$ac->get_Annotations("locus_tag"); >>> >> > @temp2=$ac->get_Annotations(); >>> >> > print("$temp1 $temp2[0] @temp2\n"); >>> >> > if($count++ > 5) {last;} >>> >> > } >>> >> > >>> >> > print(ref($ac),"\n"); >>> >> > exit; >>> >> > >>> >> > ======= code ends ======== >>> >> > >>> >> > produces the output >>> >> > >>> >> > ========== output begins ======== >>> >> > >>> >> > 190 255 1 >>> >> > 0 >>> >> > 337 2799 1 >>> >> > 0 >>> >> > 2801 3733 1 >>> >> > 0 >>> >> > 3734 5020 1 >>> >> > 0 >>> >> > 5234 5530 1 >>> >> > 0 >>> >> > 5683 6459 -1 >>> >> > 0 >>> >> > 6529 7959 -1 >>> >> > 0 >>> >> > Bio::Annotation::Collection >>> >> > >>> >> > =========== output ends ========== >>> >> > >>> >> > $ac is-a Bio::Annotation::Collection but does not actually contain any >>> >> > annotation from the feature. Is this how it should be? I cannot figure >>> >> > out what is wrong with the script. Earlier I used to use has_tag(), >>> >> > get_tag_values() etc. but the documentation says these are deprecated. >>> >> > >>> >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >>> >> > uname >>> >> > -a is >>> >> > >>> >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>> >> > x86_64 x86_64 x86_64 GNU/Linux >>> >> > >>> >> > Thanks in advance for any help. >>> >> > >>> >> > Govind >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Bioperl-l mailing list >>> >> > Bioperl-l at lists.open-bio.org >>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > >>> >> > >>> >> >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > >>> > >> >> >> > From hlapp at gmx.net Fri Mar 27 14:14:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 27 Mar 2009 14:14:58 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> $feature->annotation() is a legitimate method call (it implements AnnotatableI). SeqFeature::Generic has indeed two mechanism to store annotation, the tag system and the annotation collection. This is because it inherits from SeqFeatureI (which brings in the tag/value annotation) and from AnnotatableI (which brings in annotation()). I agree this can be confusing from a user's perspective. As a rule of thumb, SeqIO parsers will almost universally populate only the tag/ value system, because typically they will (or should) assume not more than that the feature object they are dealing with is a SeqFeatureI. Once you have the feature objects in your hands, you can add to either tag/values or annotation() to your heart's content. Just be aware that nearly all SeqIO writers won't use the annotation() collection when you pass the sequence back to them since typically they won't really know what to do with feature annotation that isn't tag/value (unlike as for sequence annotation). If in your code you want to treat tag/value annotation in the same way as (i.e., as if it were part of) the annotation that's in the annotation collection then use SeqFeature::AnnotationAdaptor. That's in fact what Bioperl-db does to ensure that all annotation gets serialized to the database no matter where it is. Hth, -hilmar On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: > Hi Mark, > Will it be unfair to say that the documentation as well as the > implementation are confusing. SeqFeature::Generic should cause an > error > when annotation() is called on it if it cannot do the right thing. For > the time being I will stick with the old ways (has_tag etc.). Good to > know they are not deprecated in the way I intend to use them (via > SeqFeature::Generic). > Cheers > Govind > > > > On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >> Hey Govind-- >> You're right-- SeqFeature::Generic object inherits from >> AnnotatableI-- but the *_tags_* methods are now >> SeqFeature::Generic methods--ie, you can use these >> on features, and they are no longer hitting AnnotableI. >> It appears that the feature's AnnotationCollection doesn't >> even get loaded now. >> [developer out there like to chime in?] >> cheers, >> Mark >> ----- Original Message ----- >> From: "Govind Chandra" >> To: "Mark A. Jensen" >> Cc: >> Sent: Friday, March 27, 2009 1:09 PM >> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >> >> >>> Thanks Mark, >>> >>> Sorry for not putting a proper subject in the last post. >>> >>> What you suggest is what I have been doing for a long time. I am >>> just >>> trying to alter my code to conform to the latest bioperl version >>> and ran >>> into this issue. I could be wrong (I am more a user rather than >>> writer >>> of modules) but since $feature->annotation() does not result in an >>> error >>> I think $feature is-a Bio::AnnotatableI as well. >>> >>> Cheers >>> >>> Govind >>> >>> >>> >>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>> Hi Govind- >>>> >>>> As near as I can tell, the *_tags methods are deprecated for >>>> Bio::AnnotatableI objects, but these methods are available >>>> off the SeqFeatureI objects themselves: i.e., rather than >>>> >>>>> $ac=$feature->annotation(); >>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>> >>>> do >>>> >>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>> >>>> directly. >>>> >>>> hope it helps - >>>> Mark >>>> >>>> ----- Original Message ----- >>>> From: "Govind Chandra" >>>> To: >>>> Sent: Friday, March 27, 2009 11:26 AM >>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>> >>>> >>>>> Hi, >>>>> >>>>> The code below >>>>> >>>>> >>>>> ====== code begins ======= >>>>> #use strict; >>>>> use Bio::SeqIO; >>>>> >>>>> $infile='NC_000913.gbk'; >>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>> my $seqobj=$seqio->next_seq(); >>>>> my @features=$seqobj->all_SeqFeatures(); >>>>> my $count=0; >>>>> foreach my $feature (@features) { >>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>> print($feature->start()," ", $feature->end(), " >>>>> ",$feature->strand(),"\n"); >>>>> $ac=$feature->annotation(); >>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>> @temp2=$ac->get_Annotations(); >>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>> if($count++ > 5) {last;} >>>>> } >>>>> >>>>> print(ref($ac),"\n"); >>>>> exit; >>>>> >>>>> ======= code ends ======== >>>>> >>>>> produces the output >>>>> >>>>> ========== output begins ======== >>>>> >>>>> 190 255 1 >>>>> 0 >>>>> 337 2799 1 >>>>> 0 >>>>> 2801 3733 1 >>>>> 0 >>>>> 3734 5020 1 >>>>> 0 >>>>> 5234 5530 1 >>>>> 0 >>>>> 5683 6459 -1 >>>>> 0 >>>>> 6529 7959 -1 >>>>> 0 >>>>> Bio::Annotation::Collection >>>>> >>>>> =========== output ends ========== >>>>> >>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>> contain any >>>>> annotation from the feature. Is this how it should be? I cannot >>>>> figure >>>>> out what is wrong with the script. Earlier I used to use >>>>> has_tag(), >>>>> get_tag_values() etc. but the documentation says these are >>>>> deprecated. >>>>> >>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output >>>>> of uname >>>>> -a is >>>>> >>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> Thanks in advance for any help. >>>>> >>>>> Govind >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Fri Mar 27 14:29:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 14:29:17 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> Message-ID: <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Thanks Hilmar-- so there isn't really a bug, but would it be useful if the object warned a user who attempts to access an empty $feature->annotation with a hint encapsulating your discussion below? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Govind Chandra" Cc: "Mark A. Jensen" ; Sent: Friday, March 27, 2009 2:14 PM Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > $feature->annotation() is a legitimate method call (it implements > AnnotatableI). > > SeqFeature::Generic has indeed two mechanism to store annotation, the > tag system and the annotation collection. This is because it inherits > from SeqFeatureI (which brings in the tag/value annotation) and from > AnnotatableI (which brings in annotation()). > > I agree this can be confusing from a user's perspective. As a rule of > thumb, SeqIO parsers will almost universally populate only the tag/ > value system, because typically they will (or should) assume not more > than that the feature object they are dealing with is a SeqFeatureI. > > Once you have the feature objects in your hands, you can add to either > tag/values or annotation() to your heart's content. Just be aware that > nearly all SeqIO writers won't use the annotation() collection when > you pass the sequence back to them since typically they won't really > know what to do with feature annotation that isn't tag/value (unlike > as for sequence annotation). > > If in your code you want to treat tag/value annotation in the same way > as (i.e., as if it were part of) the annotation that's in the > annotation collection then use SeqFeature::AnnotationAdaptor. That's > in fact what Bioperl-db does to ensure that all annotation gets > serialized to the database no matter where it is. > > Hth, > > -hilmar > > On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: > >> Hi Mark, >> Will it be unfair to say that the documentation as well as the >> implementation are confusing. SeqFeature::Generic should cause an >> error >> when annotation() is called on it if it cannot do the right thing. For >> the time being I will stick with the old ways (has_tag etc.). Good to >> know they are not deprecated in the way I intend to use them (via >> SeqFeature::Generic). >> Cheers >> Govind >> >> >> >> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>> Hey Govind-- >>> You're right-- SeqFeature::Generic object inherits from >>> AnnotatableI-- but the *_tags_* methods are now >>> SeqFeature::Generic methods--ie, you can use these >>> on features, and they are no longer hitting AnnotableI. >>> It appears that the feature's AnnotationCollection doesn't >>> even get loaded now. >>> [developer out there like to chime in?] >>> cheers, >>> Mark >>> ----- Original Message ----- >>> From: "Govind Chandra" >>> To: "Mark A. Jensen" >>> Cc: >>> Sent: Friday, March 27, 2009 1:09 PM >>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>> >>> >>>> Thanks Mark, >>>> >>>> Sorry for not putting a proper subject in the last post. >>>> >>>> What you suggest is what I have been doing for a long time. I am >>>> just >>>> trying to alter my code to conform to the latest bioperl version >>>> and ran >>>> into this issue. I could be wrong (I am more a user rather than >>>> writer >>>> of modules) but since $feature->annotation() does not result in an >>>> error >>>> I think $feature is-a Bio::AnnotatableI as well. >>>> >>>> Cheers >>>> >>>> Govind >>>> >>>> >>>> >>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>> Hi Govind- >>>>> >>>>> As near as I can tell, the *_tags methods are deprecated for >>>>> Bio::AnnotatableI objects, but these methods are available >>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>> >>>>>> $ac=$feature->annotation(); >>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>> >>>>> do >>>>> >>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>> >>>>> directly. >>>>> >>>>> hope it helps - >>>>> Mark >>>>> >>>>> ----- Original Message ----- >>>>> From: "Govind Chandra" >>>>> To: >>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> The code below >>>>>> >>>>>> >>>>>> ====== code begins ======= >>>>>> #use strict; >>>>>> use Bio::SeqIO; >>>>>> >>>>>> $infile='NC_000913.gbk'; >>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>> my $seqobj=$seqio->next_seq(); >>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>> my $count=0; >>>>>> foreach my $feature (@features) { >>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>> print($feature->start()," ", $feature->end(), " >>>>>> ",$feature->strand(),"\n"); >>>>>> $ac=$feature->annotation(); >>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>> @temp2=$ac->get_Annotations(); >>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>> if($count++ > 5) {last;} >>>>>> } >>>>>> >>>>>> print(ref($ac),"\n"); >>>>>> exit; >>>>>> >>>>>> ======= code ends ======== >>>>>> >>>>>> produces the output >>>>>> >>>>>> ========== output begins ======== >>>>>> >>>>>> 190 255 1 >>>>>> 0 >>>>>> 337 2799 1 >>>>>> 0 >>>>>> 2801 3733 1 >>>>>> 0 >>>>>> 3734 5020 1 >>>>>> 0 >>>>>> 5234 5530 1 >>>>>> 0 >>>>>> 5683 6459 -1 >>>>>> 0 >>>>>> 6529 7959 -1 >>>>>> 0 >>>>>> Bio::Annotation::Collection >>>>>> >>>>>> =========== output ends ========== >>>>>> >>>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>>> contain any >>>>>> annotation from the feature. Is this how it should be? I cannot >>>>>> figure >>>>>> out what is wrong with the script. Earlier I used to use >>>>>> has_tag(), >>>>>> get_tag_values() etc. but the documentation says these are >>>>>> deprecated. >>>>>> >>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output >>>>>> of uname >>>>>> -a is >>>>>> >>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>> >>>>>> Thanks in advance for any help. >>>>>> >>>>>> Govind >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From cjfields at illinois.edu Fri Mar 27 14:55:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 13:55:13 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <98A94F75-17B4-41D2-95D6-E499A950F7D1@illinois.edu> (I think I'm generally right on this, but this may be where someone needs to step in to correct me) To go over why things were set up this way (and then reverted), is a bit of a history lesson. I believe prior to 1.5.0, Bio::SeqFeature::Generic stored most second-class data (dbxrefs, simple secondary tags, etc) as simple untyped text via tags but also allowed a Bio::Annotation::Collection. Therefore one effectively gets a mixed bag of first-class untyped data like 'display_id' and 'primary_tag', untyped tagged text, and 'typed' Bio::AnnotationI objects. Some of this was an attempt to somewhat 'correct' this for those who wanted a cohesive collection of typed data out-of-the-box. Essentially, everything becomes a Bio::AnnotationI. I believe Bio::SeqFeature::Annotated went a step further and made almost everything Bio::AnnotationI (including score, primary_tag, etc) and type-checked tag data against SOFA. As there were collisions between SeqFeature-like 'tag' methods and CollectionI-like methods, the design thought was to store all tag data as Bio::Annotation in a Bio::Annotation::Collection, then eventually deprecate the tag methods in favor of those available via the CollectionI. These deprecations were placed in Bio::AnnotatableI, so any future Bio::SeqFeatureI implementations would also get the deprecation. As noted, Bio::SeqFeature::Generic implements these methods so isn't affected. Now, layer in the fact that many of these (very dramatic) code changes were literally introduced just prior to the 1.5.0 release, AFAIK w/o much code review, and contained additional unwanted changes such as operator overloading and so on. Very little discussion about this occurred on list until after the changes were introduced (a good argument for small commits). Some very good arguments against this were made, including other lightweight implementations. Lots of angry devs! Though the intentions were noble we ended up with a mess. I yanked these out a couple of years ago frankly out of frustration with the overloading issues: http://www.bioperl.org/wiki/Feature_Annotation_rollback You may be seeing some relics that haven't been removed yet. (just noticed Hilmar's post, which is more succinct, d'oh) chris On Mar 27, 2009, at 12:44 PM, Govind Chandra wrote: > Hi Mark, > Will it be unfair to say that the documentation as well as the > implementation are confusing. SeqFeature::Generic should cause an > error > when annotation() is called on it if it cannot do the right thing. For > the time being I will stick with the old ways (has_tag etc.). Good to > know they are not deprecated in the way I intend to use them (via > SeqFeature::Generic). > Cheers > Govind > > > > On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >> Hey Govind-- >> You're right-- SeqFeature::Generic object inherits from >> AnnotatableI-- but the *_tags_* methods are now >> SeqFeature::Generic methods--ie, you can use these >> on features, and they are no longer hitting AnnotableI. >> It appears that the feature's AnnotationCollection doesn't >> even get loaded now. >> [developer out there like to chime in?] >> cheers, >> Mark >> ----- Original Message ----- >> From: "Govind Chandra" >> To: "Mark A. Jensen" >> Cc: >> Sent: Friday, March 27, 2009 1:09 PM >> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >> >> >>> Thanks Mark, >>> >>> Sorry for not putting a proper subject in the last post. >>> >>> What you suggest is what I have been doing for a long time. I am >>> just >>> trying to alter my code to conform to the latest bioperl version >>> and ran >>> into this issue. I could be wrong (I am more a user rather than >>> writer >>> of modules) but since $feature->annotation() does not result in an >>> error >>> I think $feature is-a Bio::AnnotatableI as well. >>> >>> Cheers >>> >>> Govind >>> >>> >>> >>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>> Hi Govind- >>>> >>>> As near as I can tell, the *_tags methods are deprecated for >>>> Bio::AnnotatableI objects, but these methods are available >>>> off the SeqFeatureI objects themselves: i.e., rather than >>>> >>>>> $ac=$feature->annotation(); >>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>> >>>> do >>>> >>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>> >>>> directly. >>>> >>>> hope it helps - >>>> Mark >>>> >>>> ----- Original Message ----- >>>> From: "Govind Chandra" >>>> To: >>>> Sent: Friday, March 27, 2009 11:26 AM >>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>> >>>> >>>>> Hi, >>>>> >>>>> The code below >>>>> >>>>> >>>>> ====== code begins ======= >>>>> #use strict; >>>>> use Bio::SeqIO; >>>>> >>>>> $infile='NC_000913.gbk'; >>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>> my $seqobj=$seqio->next_seq(); >>>>> my @features=$seqobj->all_SeqFeatures(); >>>>> my $count=0; >>>>> foreach my $feature (@features) { >>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>> print($feature->start()," ", $feature->end(), " >>>>> ",$feature->strand(),"\n"); >>>>> $ac=$feature->annotation(); >>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>> @temp2=$ac->get_Annotations(); >>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>> if($count++ > 5) {last;} >>>>> } >>>>> >>>>> print(ref($ac),"\n"); >>>>> exit; >>>>> >>>>> ======= code ends ======== >>>>> >>>>> produces the output >>>>> >>>>> ========== output begins ======== >>>>> >>>>> 190 255 1 >>>>> 0 >>>>> 337 2799 1 >>>>> 0 >>>>> 2801 3733 1 >>>>> 0 >>>>> 3734 5020 1 >>>>> 0 >>>>> 5234 5530 1 >>>>> 0 >>>>> 5683 6459 -1 >>>>> 0 >>>>> 6529 7959 -1 >>>>> 0 >>>>> Bio::Annotation::Collection >>>>> >>>>> =========== output ends ========== >>>>> >>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>> contain any >>>>> annotation from the feature. Is this how it should be? I cannot >>>>> figure >>>>> out what is wrong with the script. Earlier I used to use >>>>> has_tag(), >>>>> get_tag_values() etc. but the documentation says these are >>>>> deprecated. >>>>> >>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output >>>>> of uname >>>>> -a is >>>>> >>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> Thanks in advance for any help. >>>>> >>>>> Govind >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 27 15:02:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 14:02:14 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> Message-ID: <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> Right now SeqFeatureI doesn't inherit AnnotatableI, even though the note about the deprecated *_tag_* methods imply that it does. We should probably add back the abstract (unimplemented) tag methods to Bio::SeqFeatureI and either remove them from Bio::AnnotatableI or activate the deprecation (they have no place with annotations). This shouldn't hurt things (FLW). Not sure if we really need to add AnnotatableI to SeqFeatureI; I think that would be up to the SeqFeatureI implementation, and it doesn't hurt leaving it out. chris On Mar 27, 2009, at 1:14 PM, Hilmar Lapp wrote: > $feature->annotation() is a legitimate method call (it implements > AnnotatableI). > > SeqFeature::Generic has indeed two mechanism to store annotation, > the tag system and the annotation collection. This is because it > inherits from SeqFeatureI (which brings in the tag/value annotation) > and from AnnotatableI (which brings in annotation()). > > I agree this can be confusing from a user's perspective. As a rule > of thumb, SeqIO parsers will almost universally populate only the > tag/value system, because typically they will (or should) assume not > more than that the feature object they are dealing with is a > SeqFeatureI. > > Once you have the feature objects in your hands, you can add to > either tag/values or annotation() to your heart's content. Just be > aware that nearly all SeqIO writers won't use the annotation() > collection when you pass the sequence back to them since typically > they won't really know what to do with feature annotation that isn't > tag/value (unlike as for sequence annotation). > > If in your code you want to treat tag/value annotation in the same > way as (i.e., as if it were part of) the annotation that's in the > annotation collection then use SeqFeature::AnnotationAdaptor. That's > in fact what Bioperl-db does to ensure that all annotation gets > serialized to the database no matter where it is. > > Hth, > > -hilmar > > On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: > >> Hi Mark, >> Will it be unfair to say that the documentation as well as the >> implementation are confusing. SeqFeature::Generic should cause an >> error >> when annotation() is called on it if it cannot do the right thing. >> For >> the time being I will stick with the old ways (has_tag etc.). Good to >> know they are not deprecated in the way I intend to use them (via >> SeqFeature::Generic). >> Cheers >> Govind >> >> >> >> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>> Hey Govind-- >>> You're right-- SeqFeature::Generic object inherits from >>> AnnotatableI-- but the *_tags_* methods are now >>> SeqFeature::Generic methods--ie, you can use these >>> on features, and they are no longer hitting AnnotableI. >>> It appears that the feature's AnnotationCollection doesn't >>> even get loaded now. >>> [developer out there like to chime in?] >>> cheers, >>> Mark >>> ----- Original Message ----- >>> From: "Govind Chandra" >>> To: "Mark A. Jensen" >>> Cc: >>> Sent: Friday, March 27, 2009 1:09 PM >>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>> >>> >>>> Thanks Mark, >>>> >>>> Sorry for not putting a proper subject in the last post. >>>> >>>> What you suggest is what I have been doing for a long time. I am >>>> just >>>> trying to alter my code to conform to the latest bioperl version >>>> and ran >>>> into this issue. I could be wrong (I am more a user rather than >>>> writer >>>> of modules) but since $feature->annotation() does not result in >>>> an error >>>> I think $feature is-a Bio::AnnotatableI as well. >>>> >>>> Cheers >>>> >>>> Govind >>>> >>>> >>>> >>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>> Hi Govind- >>>>> >>>>> As near as I can tell, the *_tags methods are deprecated for >>>>> Bio::AnnotatableI objects, but these methods are available >>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>> >>>>>> $ac=$feature->annotation(); >>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>> >>>>> do >>>>> >>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>> >>>>> directly. >>>>> >>>>> hope it helps - >>>>> Mark >>>>> >>>>> ----- Original Message ----- >>>>> From: "Govind Chandra" >>>>> To: >>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> The code below >>>>>> >>>>>> >>>>>> ====== code begins ======= >>>>>> #use strict; >>>>>> use Bio::SeqIO; >>>>>> >>>>>> $infile='NC_000913.gbk'; >>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>> my $seqobj=$seqio->next_seq(); >>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>> my $count=0; >>>>>> foreach my $feature (@features) { >>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>> print($feature->start()," ", $feature->end(), " >>>>>> ",$feature->strand(),"\n"); >>>>>> $ac=$feature->annotation(); >>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>> @temp2=$ac->get_Annotations(); >>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>> if($count++ > 5) {last;} >>>>>> } >>>>>> >>>>>> print(ref($ac),"\n"); >>>>>> exit; >>>>>> >>>>>> ======= code ends ======== >>>>>> >>>>>> produces the output >>>>>> >>>>>> ========== output begins ======== >>>>>> >>>>>> 190 255 1 >>>>>> 0 >>>>>> 337 2799 1 >>>>>> 0 >>>>>> 2801 3733 1 >>>>>> 0 >>>>>> 3734 5020 1 >>>>>> 0 >>>>>> 5234 5530 1 >>>>>> 0 >>>>>> 5683 6459 -1 >>>>>> 0 >>>>>> 6529 7959 -1 >>>>>> 0 >>>>>> Bio::Annotation::Collection >>>>>> >>>>>> =========== output ends ========== >>>>>> >>>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>>> contain any >>>>>> annotation from the feature. Is this how it should be? I cannot >>>>>> figure >>>>>> out what is wrong with the script. Earlier I used to use >>>>>> has_tag(), >>>>>> get_tag_values() etc. but the documentation says these are >>>>>> deprecated. >>>>>> >>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output >>>>>> of uname >>>>>> -a is >>>>>> >>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT >>>>>> 2008 >>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>> >>>>>> Thanks in advance for any help. >>>>>> >>>>>> Govind >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 27 15:15:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 14:15:48 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> Message-ID: On Mar 27, 2009, at 2:02 PM, Chris Fields wrote: > Right now SeqFeatureI doesn't inherit AnnotatableI, even though the > note about the deprecated *_tag_* methods imply that it does. We > should probably add back the abstract (unimplemented) tag methods to > Bio::SeqFeatureI and either remove them from Bio::AnnotatableI or > activate the deprecation (they have no place with annotations). > This shouldn't hurt things (FLW). ...and just noticed they've been changed already in 1.6 (insert foot into mouth). I'll remove the extraneous deprecation notices at the bottom of SeqFeatureI, they're no longer relevant. -c From maj at fortinbras.us Fri Mar 27 15:14:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 15:14:06 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> Message-ID: <2EFAC0364FC54158AA8002F785393370@NewLife> So, is it right that the inheritance of AnnotatableI by SeqFeature::Generic was essentially for backward-compatibility? MAJ ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "Govind Chandra" ; ; "Mark A. Jensen" Sent: Friday, March 27, 2009 3:02 PM Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > Right now SeqFeatureI doesn't inherit AnnotatableI, even though the note > about the deprecated *_tag_* methods imply that it does. We should probably > add back the abstract (unimplemented) tag methods to Bio::SeqFeatureI and > either remove them from Bio::AnnotatableI or activate the deprecation (they > have no place with annotations). This shouldn't hurt things (FLW). > > Not sure if we really need to add AnnotatableI to SeqFeatureI; I think that > would be up to the SeqFeatureI implementation, and it doesn't hurt leaving it > out. > > chris > > On Mar 27, 2009, at 1:14 PM, Hilmar Lapp wrote: > >> $feature->annotation() is a legitimate method call (it implements >> AnnotatableI). >> >> SeqFeature::Generic has indeed two mechanism to store annotation, the tag >> system and the annotation collection. This is because it inherits from >> SeqFeatureI (which brings in the tag/value annotation) and from AnnotatableI >> (which brings in annotation()). >> >> I agree this can be confusing from a user's perspective. As a rule of thumb, >> SeqIO parsers will almost universally populate only the tag/value system, >> because typically they will (or should) assume not more than that the >> feature object they are dealing with is a SeqFeatureI. >> >> Once you have the feature objects in your hands, you can add to either >> tag/values or annotation() to your heart's content. Just be aware that >> nearly all SeqIO writers won't use the annotation() collection when you pass >> the sequence back to them since typically they won't really know what to do >> with feature annotation that isn't tag/value (unlike as for sequence >> annotation). >> >> If in your code you want to treat tag/value annotation in the same way as >> (i.e., as if it were part of) the annotation that's in the annotation >> collection then use SeqFeature::AnnotationAdaptor. That's in fact what >> Bioperl-db does to ensure that all annotation gets serialized to the >> database no matter where it is. >> >> Hth, >> >> -hilmar >> >> On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: >> >>> Hi Mark, >>> Will it be unfair to say that the documentation as well as the >>> implementation are confusing. SeqFeature::Generic should cause an error >>> when annotation() is called on it if it cannot do the right thing. For >>> the time being I will stick with the old ways (has_tag etc.). Good to >>> know they are not deprecated in the way I intend to use them (via >>> SeqFeature::Generic). >>> Cheers >>> Govind >>> >>> >>> >>> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>>> Hey Govind-- >>>> You're right-- SeqFeature::Generic object inherits from >>>> AnnotatableI-- but the *_tags_* methods are now >>>> SeqFeature::Generic methods--ie, you can use these >>>> on features, and they are no longer hitting AnnotableI. >>>> It appears that the feature's AnnotationCollection doesn't >>>> even get loaded now. >>>> [developer out there like to chime in?] >>>> cheers, >>>> Mark >>>> ----- Original Message ----- >>>> From: "Govind Chandra" >>>> To: "Mark A. Jensen" >>>> Cc: >>>> Sent: Friday, March 27, 2009 1:09 PM >>>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>>> >>>> >>>>> Thanks Mark, >>>>> >>>>> Sorry for not putting a proper subject in the last post. >>>>> >>>>> What you suggest is what I have been doing for a long time. I am just >>>>> trying to alter my code to conform to the latest bioperl version and ran >>>>> into this issue. I could be wrong (I am more a user rather than writer >>>>> of modules) but since $feature->annotation() does not result in an error >>>>> I think $feature is-a Bio::AnnotatableI as well. >>>>> >>>>> Cheers >>>>> >>>>> Govind >>>>> >>>>> >>>>> >>>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>>> Hi Govind- >>>>>> >>>>>> As near as I can tell, the *_tags methods are deprecated for >>>>>> Bio::AnnotatableI objects, but these methods are available >>>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>>> >>>>>>> $ac=$feature->annotation(); >>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>> >>>>>> do >>>>>> >>>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>>> >>>>>> directly. >>>>>> >>>>>> hope it helps - >>>>>> Mark >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Govind Chandra" >>>>>> To: >>>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The code below >>>>>>> >>>>>>> >>>>>>> ====== code begins ======= >>>>>>> #use strict; >>>>>>> use Bio::SeqIO; >>>>>>> >>>>>>> $infile='NC_000913.gbk'; >>>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>>> my $seqobj=$seqio->next_seq(); >>>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>>> my $count=0; >>>>>>> foreach my $feature (@features) { >>>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>>> print($feature->start()," ", $feature->end(), " >>>>>>> ",$feature->strand(),"\n"); >>>>>>> $ac=$feature->annotation(); >>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>> @temp2=$ac->get_Annotations(); >>>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>>> if($count++ > 5) {last;} >>>>>>> } >>>>>>> >>>>>>> print(ref($ac),"\n"); >>>>>>> exit; >>>>>>> >>>>>>> ======= code ends ======== >>>>>>> >>>>>>> produces the output >>>>>>> >>>>>>> ========== output begins ======== >>>>>>> >>>>>>> 190 255 1 >>>>>>> 0 >>>>>>> 337 2799 1 >>>>>>> 0 >>>>>>> 2801 3733 1 >>>>>>> 0 >>>>>>> 3734 5020 1 >>>>>>> 0 >>>>>>> 5234 5530 1 >>>>>>> 0 >>>>>>> 5683 6459 -1 >>>>>>> 0 >>>>>>> 6529 7959 -1 >>>>>>> 0 >>>>>>> Bio::Annotation::Collection >>>>>>> >>>>>>> =========== output ends ========== >>>>>>> >>>>>>> $ac is-a Bio::Annotation::Collection but does not actually contain any >>>>>>> annotation from the feature. Is this how it should be? I cannot figure >>>>>>> out what is wrong with the script. Earlier I used to use has_tag(), >>>>>>> get_tag_values() etc. but the documentation says these are deprecated. >>>>>>> >>>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >>>>>>> uname >>>>>>> -a is >>>>>>> >>>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> Thanks in advance for any help. >>>>>>> >>>>>>> Govind >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Fri Mar 27 15:34:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 14:34:41 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <2EFAC0364FC54158AA8002F785393370@NewLife> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> <2EFAC0364FC54158AA8002F785393370@NewLife> Message-ID: <7AAA2905-E74C-46FA-8C3C-310A22215E4D@illinois.edu> No, it's been that way for quite a while now. From the 1.2 branch (6 yrs old) for Bio::SeqFeature::Generic: @ISA = qw(Bio::Root::Root Bio::SeqFeatureI Bio::AnnotatableI Bio::FeatureHolderI); I think there is still some contention about what the difference is between a Feature and an Annotation. Our basic assumption is a Feature describes a specific section of the sequence while Annotation describes the full sequence. However, we do have Annotation implementations that can describe parts of the sequence (Bio::Annotation::Reference, Bio::Annotation::Target) and Features that describe the entire sequence (the GenBank/EMBL 'source' feature). This is probably a good compromise. Personally, I think all sequences should be RangeI (implement start/ end/strand) and have SeqFeature::Collections, that SeqFeatureI should be genericized to FeatureI to allow descriptives of any range (sequence, alignment, array, whatever), and that most object creation should be on the fly vs. up-front. Oh if I only had the tuits... chris On Mar 27, 2009, at 2:14 PM, Mark A. Jensen wrote: > So, is it right that the inheritance of AnnotatableI by > SeqFeature::Generic was > essentially for backward-compatibility? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "Hilmar Lapp" > Cc: "Govind Chandra" ; >; "Mark A. Jensen" > Sent: Friday, March 27, 2009 3:02 PM > Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > > >> Right now SeqFeatureI doesn't inherit AnnotatableI, even though >> the note about the deprecated *_tag_* methods imply that it does. >> We should probably add back the abstract (unimplemented) tag >> methods to Bio::SeqFeatureI and either remove them from >> Bio::AnnotatableI or activate the deprecation (they have no place >> with annotations). This shouldn't hurt things (FLW). >> >> Not sure if we really need to add AnnotatableI to SeqFeatureI; I >> think that would be up to the SeqFeatureI implementation, and it >> doesn't hurt leaving it out. >> >> chris >> >> On Mar 27, 2009, at 1:14 PM, Hilmar Lapp wrote: >> >>> $feature->annotation() is a legitimate method call (it implements >>> AnnotatableI). >>> >>> SeqFeature::Generic has indeed two mechanism to store annotation, >>> the tag system and the annotation collection. This is because it >>> inherits from SeqFeatureI (which brings in the tag/value >>> annotation) and from AnnotatableI (which brings in annotation()). >>> >>> I agree this can be confusing from a user's perspective. As a >>> rule of thumb, SeqIO parsers will almost universally populate >>> only the tag/value system, because typically they will (or >>> should) assume not more than that the feature object they are >>> dealing with is a SeqFeatureI. >>> >>> Once you have the feature objects in your hands, you can add to >>> either tag/values or annotation() to your heart's content. Just >>> be aware that nearly all SeqIO writers won't use the >>> annotation() collection when you pass the sequence back to them >>> since typically they won't really know what to do with feature >>> annotation that isn't tag/value (unlike as for sequence >>> annotation). >>> >>> If in your code you want to treat tag/value annotation in the >>> same way as (i.e., as if it were part of) the annotation that's >>> in the annotation collection then use >>> SeqFeature::AnnotationAdaptor. That's in fact what Bioperl-db >>> does to ensure that all annotation gets serialized to the >>> database no matter where it is. >>> >>> Hth, >>> >>> -hilmar >>> >>> On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: >>> >>>> Hi Mark, >>>> Will it be unfair to say that the documentation as well as the >>>> implementation are confusing. SeqFeature::Generic should cause >>>> an error >>>> when annotation() is called on it if it cannot do the right >>>> thing. For >>>> the time being I will stick with the old ways (has_tag etc.). >>>> Good to >>>> know they are not deprecated in the way I intend to use them (via >>>> SeqFeature::Generic). >>>> Cheers >>>> Govind >>>> >>>> >>>> >>>> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>>>> Hey Govind-- >>>>> You're right-- SeqFeature::Generic object inherits from >>>>> AnnotatableI-- but the *_tags_* methods are now >>>>> SeqFeature::Generic methods--ie, you can use these >>>>> on features, and they are no longer hitting AnnotableI. >>>>> It appears that the feature's AnnotationCollection doesn't >>>>> even get loaded now. >>>>> [developer out there like to chime in?] >>>>> cheers, >>>>> Mark >>>>> ----- Original Message ----- >>>>> From: "Govind Chandra" >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Friday, March 27, 2009 1:09 PM >>>>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>>>> >>>>> >>>>>> Thanks Mark, >>>>>> >>>>>> Sorry for not putting a proper subject in the last post. >>>>>> >>>>>> What you suggest is what I have been doing for a long time. I >>>>>> am just >>>>>> trying to alter my code to conform to the latest bioperl >>>>>> version and ran >>>>>> into this issue. I could be wrong (I am more a user rather >>>>>> than writer >>>>>> of modules) but since $feature->annotation() does not result >>>>>> in an error >>>>>> I think $feature is-a Bio::AnnotatableI as well. >>>>>> >>>>>> Cheers >>>>>> >>>>>> Govind >>>>>> >>>>>> >>>>>> >>>>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>>>> Hi Govind- >>>>>>> >>>>>>> As near as I can tell, the *_tags methods are deprecated for >>>>>>> Bio::AnnotatableI objects, but these methods are available >>>>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>>>> >>>>>>>> $ac=$feature->annotation(); >>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>> >>>>>>> do >>>>>>> >>>>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>>>> >>>>>>> directly. >>>>>>> >>>>>>> hope it helps - >>>>>>> Mark >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Govind Chandra" >>>>>>> To: >>>>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> The code below >>>>>>>> >>>>>>>> >>>>>>>> ====== code begins ======= >>>>>>>> #use strict; >>>>>>>> use Bio::SeqIO; >>>>>>>> >>>>>>>> $infile='NC_000913.gbk'; >>>>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>>>> my $seqobj=$seqio->next_seq(); >>>>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>>>> my $count=0; >>>>>>>> foreach my $feature (@features) { >>>>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>>>> print($feature->start()," ", $feature->end(), " >>>>>>>> ",$feature->strand(),"\n"); >>>>>>>> $ac=$feature->annotation(); >>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>>> @temp2=$ac->get_Annotations(); >>>>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>>>> if($count++ > 5) {last;} >>>>>>>> } >>>>>>>> >>>>>>>> print(ref($ac),"\n"); >>>>>>>> exit; >>>>>>>> >>>>>>>> ======= code ends ======== >>>>>>>> >>>>>>>> produces the output >>>>>>>> >>>>>>>> ========== output begins ======== >>>>>>>> >>>>>>>> 190 255 1 >>>>>>>> 0 >>>>>>>> 337 2799 1 >>>>>>>> 0 >>>>>>>> 2801 3733 1 >>>>>>>> 0 >>>>>>>> 3734 5020 1 >>>>>>>> 0 >>>>>>>> 5234 5530 1 >>>>>>>> 0 >>>>>>>> 5683 6459 -1 >>>>>>>> 0 >>>>>>>> 6529 7959 -1 >>>>>>>> 0 >>>>>>>> Bio::Annotation::Collection >>>>>>>> >>>>>>>> =========== output ends ========== >>>>>>>> >>>>>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>>>>> contain any >>>>>>>> annotation from the feature. Is this how it should be? I >>>>>>>> cannot figure >>>>>>>> out what is wrong with the script. Earlier I used to use >>>>>>>> has_tag(), >>>>>>>> get_tag_values() etc. but the documentation says these are >>>>>>>> deprecated. >>>>>>>> >>>>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). >>>>>>>> Output of uname >>>>>>>> -a is >>>>>>>> >>>>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 >>>>>>>> EDT 2008 >>>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>>> >>>>>>>> Thanks in advance for any help. >>>>>>>> >>>>>>>> Govind >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Mar 27 15:49:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 14:49:26 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Message-ID: What exactly would the hint be? I'm not sure, but I don't think there is any documentation indicating that one should use anything other than tag methods to retrieve generic data. We *could* possibly add in code to check the feature's Bio::Annotation::Collection for the same tag name, then add any returned AnnotationI::display_text to the array of returned values. This all depends on: 1) whether you want to mix your 'peanut butter' with your 'chocolate' or keep them separate (amazingly, some people don't like Reese's peanut butter cups), and 2) whether you want to automatically check for the empty collection each time (the Collection is lazily created on the fly if one isn't supplied, so it may slow things down by creating the instance for every has_tag check). chris On Mar 27, 2009, at 1:29 PM, Mark A. Jensen wrote: > Thanks Hilmar-- so there isn't really a bug, but would it > be useful if the object warned a user who attempts to access an > empty $feature->annotation with a hint encapsulating > your discussion below? > MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Govind Chandra" > Cc: "Mark A. Jensen" ; > > Sent: Friday, March 27, 2009 2:14 PM > Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > > >> $feature->annotation() is a legitimate method call (it implements >> AnnotatableI). >> SeqFeature::Generic has indeed two mechanism to store annotation, >> the tag system and the annotation collection. This is because it >> inherits from SeqFeatureI (which brings in the tag/value >> annotation) and from AnnotatableI (which brings in annotation()). >> I agree this can be confusing from a user's perspective. As a rule >> of thumb, SeqIO parsers will almost universally populate only the >> tag/ value system, because typically they will (or should) assume >> not more than that the feature object they are dealing with is a >> SeqFeatureI. >> Once you have the feature objects in your hands, you can add to >> either tag/values or annotation() to your heart's content. Just be >> aware that nearly all SeqIO writers won't use the annotation() >> collection when you pass the sequence back to them since typically >> they won't really know what to do with feature annotation that >> isn't tag/value (unlike as for sequence annotation). >> If in your code you want to treat tag/value annotation in the same >> way as (i.e., as if it were part of) the annotation that's in the >> annotation collection then use SeqFeature::AnnotationAdaptor. >> That's in fact what Bioperl-db does to ensure that all annotation >> gets serialized to the database no matter where it is. >> Hth, >> -hilmar >> On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: >>> Hi Mark, >>> Will it be unfair to say that the documentation as well as the >>> implementation are confusing. SeqFeature::Generic should cause an >>> error >>> when annotation() is called on it if it cannot do the right thing. >>> For >>> the time being I will stick with the old ways (has_tag etc.). Good >>> to >>> know they are not deprecated in the way I intend to use them (via >>> SeqFeature::Generic). >>> Cheers >>> Govind >>> >>> >>> >>> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>>> Hey Govind-- >>>> You're right-- SeqFeature::Generic object inherits from >>>> AnnotatableI-- but the *_tags_* methods are now >>>> SeqFeature::Generic methods--ie, you can use these >>>> on features, and they are no longer hitting AnnotableI. >>>> It appears that the feature's AnnotationCollection doesn't >>>> even get loaded now. >>>> [developer out there like to chime in?] >>>> cheers, >>>> Mark >>>> ----- Original Message ----- >>>> From: "Govind Chandra" >>>> To: "Mark A. Jensen" >>>> Cc: >>>> Sent: Friday, March 27, 2009 1:09 PM >>>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>>> >>>> >>>>> Thanks Mark, >>>>> >>>>> Sorry for not putting a proper subject in the last post. >>>>> >>>>> What you suggest is what I have been doing for a long time. I >>>>> am just >>>>> trying to alter my code to conform to the latest bioperl >>>>> version and ran >>>>> into this issue. I could be wrong (I am more a user rather than >>>>> writer >>>>> of modules) but since $feature->annotation() does not result in >>>>> an error >>>>> I think $feature is-a Bio::AnnotatableI as well. >>>>> >>>>> Cheers >>>>> >>>>> Govind >>>>> >>>>> >>>>> >>>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>>> Hi Govind- >>>>>> >>>>>> As near as I can tell, the *_tags methods are deprecated for >>>>>> Bio::AnnotatableI objects, but these methods are available >>>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>>> >>>>>>> $ac=$feature->annotation(); >>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>> >>>>>> do >>>>>> >>>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>>> >>>>>> directly. >>>>>> >>>>>> hope it helps - >>>>>> Mark >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Govind Chandra" >>>>>> To: >>>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The code below >>>>>>> >>>>>>> >>>>>>> ====== code begins ======= >>>>>>> #use strict; >>>>>>> use Bio::SeqIO; >>>>>>> >>>>>>> $infile='NC_000913.gbk'; >>>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>>> my $seqobj=$seqio->next_seq(); >>>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>>> my $count=0; >>>>>>> foreach my $feature (@features) { >>>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>>> print($feature->start()," ", $feature->end(), " >>>>>>> ",$feature->strand(),"\n"); >>>>>>> $ac=$feature->annotation(); >>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>> @temp2=$ac->get_Annotations(); >>>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>>> if($count++ > 5) {last;} >>>>>>> } >>>>>>> >>>>>>> print(ref($ac),"\n"); >>>>>>> exit; >>>>>>> >>>>>>> ======= code ends ======== >>>>>>> >>>>>>> produces the output >>>>>>> >>>>>>> ========== output begins ======== >>>>>>> >>>>>>> 190 255 1 >>>>>>> 0 >>>>>>> 337 2799 1 >>>>>>> 0 >>>>>>> 2801 3733 1 >>>>>>> 0 >>>>>>> 3734 5020 1 >>>>>>> 0 >>>>>>> 5234 5530 1 >>>>>>> 0 >>>>>>> 5683 6459 -1 >>>>>>> 0 >>>>>>> 6529 7959 -1 >>>>>>> 0 >>>>>>> Bio::Annotation::Collection >>>>>>> >>>>>>> =========== output ends ========== >>>>>>> >>>>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>>>> contain any >>>>>>> annotation from the feature. Is this how it should be? I >>>>>>> cannot figure >>>>>>> out what is wrong with the script. Earlier I used to use >>>>>>> has_tag(), >>>>>>> get_tag_values() etc. but the documentation says these are >>>>>>> deprecated. >>>>>>> >>>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). >>>>>>> Output of uname >>>>>>> -a is >>>>>>> >>>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT >>>>>>> 2008 >>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> Thanks in advance for any help. >>>>>>> >>>>>>> Govind >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 27 16:25:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Mar 2009 15:25:10 -0500 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Message-ID: On Mar 27, 2009, at 3:09 PM, Mark A. Jensen wrote: > Probably folks just starting to use 1.6 will be shunted in the > right direction by the docs, but those making the switch > might reasonably get confused (resulting in this thread, e.g.). > If someone tries to do $feature->annotation- > >getAnnotations('locus_tag') > when $feature->annotation is undef, then probably that user is > not hip to $feature->get_tag_values('locus_tag'), and would appreciate > something like > "Annotation property undefined. Did you mean $feature- > >get_tag_values?" > (not nec. that specific, but you get my drift?) > MAJ Yes, but the result of $feature->annotation is never undef (the collection is created on the fly): perl -MBio::SeqFeature::Generic -e 'my $sf = Bio::SeqFeature::Generic- >new(-start => 1, -end => 100); print ref($sf->annotation)."\n"' Bio::Annotation::Collection We can't add a warning for lazy instantiation of the Collection w/o running into significant issues, as it'll pop up with code like this: # don't create the annotation or the collection unless needed if (defined $val) { $seq->annotation->add_Annotation( Bio::Annotation::SimpleValue->new(-tagname => 'foo', -value => $val) ); } -c From maj at fortinbras.us Fri Mar 27 16:33:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 16:33:19 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Message-ID: Ahh....thanks. Hmm. ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Hilmar Lapp" ; "Govind Chandra" ; Sent: Friday, March 27, 2009 4:25 PM Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > On Mar 27, 2009, at 3:09 PM, Mark A. Jensen wrote: > >> Probably folks just starting to use 1.6 will be shunted in the >> right direction by the docs, but those making the switch >> might reasonably get confused (resulting in this thread, e.g.). >> If someone tries to do $feature->annotation- >> >getAnnotations('locus_tag') >> when $feature->annotation is undef, then probably that user is >> not hip to $feature->get_tag_values('locus_tag'), and would appreciate >> something like >> "Annotation property undefined. Did you mean $feature- >> >get_tag_values?" >> (not nec. that specific, but you get my drift?) >> MAJ > > Yes, but the result of $feature->annotation is never undef (the collection is > created on the fly): > > perl -MBio::SeqFeature::Generic -e 'my $sf = Bio::SeqFeature::Generic- > >new(-start => 1, -end => 100); print ref($sf->annotation)."\n"' > Bio::Annotation::Collection > > We can't add a warning for lazy instantiation of the Collection w/o running > into significant issues, as it'll pop up with code like this: > > # don't create the annotation or the collection unless needed > if (defined $val) { > $seq->annotation->add_Annotation( > Bio::Annotation::SimpleValue->new(-tagname => 'foo', -value => $val) > ); > } > > -c > > From maj at fortinbras.us Fri Mar 27 16:09:22 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 27 Mar 2009 16:09:22 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Message-ID: Probably folks just starting to use 1.6 will be shunted in the right direction by the docs, but those making the switch might reasonably get confused (resulting in this thread, e.g.). If someone tries to do $feature->annotation->getAnnotations('locus_tag') when $feature->annotation is undef, then probably that user is not hip to $feature->get_tag_values('locus_tag'), and would appreciate something like "Annotation property undefined. Did you mean $feature->get_tag_values?" (not nec. that specific, but you get my drift?) MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Hilmar Lapp" ; "Govind Chandra" ; Sent: Friday, March 27, 2009 3:49 PM Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > What exactly would the hint be? I'm not sure, but I don't think there is any > documentation indicating that one should use anything other than tag methods > to retrieve generic data. We *could* possibly add in code to check the > feature's Bio::Annotation::Collection for the same tag name, then add any > returned AnnotationI::display_text to the array of returned values. > > This all depends on: > > 1) whether you want to mix your 'peanut butter' with your 'chocolate' or keep > them separate (amazingly, some people don't like Reese's peanut butter cups), > and > 2) whether you want to automatically check for the empty collection each time > (the Collection is lazily created on the fly if one isn't supplied, so it may > slow things down by creating the instance for every has_tag check). > > chris > > On Mar 27, 2009, at 1:29 PM, Mark A. Jensen wrote: > >> Thanks Hilmar-- so there isn't really a bug, but would it >> be useful if the object warned a user who attempts to access an empty >> $feature->annotation with a hint encapsulating >> your discussion below? >> MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Govind Chandra" >> Cc: "Mark A. Jensen" ; > > >> Sent: Friday, March 27, 2009 2:14 PM >> Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() >> >> >>> $feature->annotation() is a legitimate method call (it implements >>> AnnotatableI). >>> SeqFeature::Generic has indeed two mechanism to store annotation, the tag >>> system and the annotation collection. This is because it inherits from >>> SeqFeatureI (which brings in the tag/value annotation) and from >>> AnnotatableI (which brings in annotation()). >>> I agree this can be confusing from a user's perspective. As a rule of >>> thumb, SeqIO parsers will almost universally populate only the tag/ value >>> system, because typically they will (or should) assume not more than that >>> the feature object they are dealing with is a SeqFeatureI. >>> Once you have the feature objects in your hands, you can add to either >>> tag/values or annotation() to your heart's content. Just be aware that >>> nearly all SeqIO writers won't use the annotation() collection when you >>> pass the sequence back to them since typically they won't really know what >>> to do with feature annotation that isn't tag/value (unlike as for sequence >>> annotation). >>> If in your code you want to treat tag/value annotation in the same way as >>> (i.e., as if it were part of) the annotation that's in the annotation >>> collection then use SeqFeature::AnnotationAdaptor. That's in fact what >>> Bioperl-db does to ensure that all annotation gets serialized to the >>> database no matter where it is. >>> Hth, >>> -hilmar >>> On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: >>>> Hi Mark, >>>> Will it be unfair to say that the documentation as well as the >>>> implementation are confusing. SeqFeature::Generic should cause an error >>>> when annotation() is called on it if it cannot do the right thing. For >>>> the time being I will stick with the old ways (has_tag etc.). Good to >>>> know they are not deprecated in the way I intend to use them (via >>>> SeqFeature::Generic). >>>> Cheers >>>> Govind >>>> >>>> >>>> >>>> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>>>> Hey Govind-- >>>>> You're right-- SeqFeature::Generic object inherits from >>>>> AnnotatableI-- but the *_tags_* methods are now >>>>> SeqFeature::Generic methods--ie, you can use these >>>>> on features, and they are no longer hitting AnnotableI. >>>>> It appears that the feature's AnnotationCollection doesn't >>>>> even get loaded now. >>>>> [developer out there like to chime in?] >>>>> cheers, >>>>> Mark >>>>> ----- Original Message ----- >>>>> From: "Govind Chandra" >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Friday, March 27, 2009 1:09 PM >>>>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>>>> >>>>> >>>>>> Thanks Mark, >>>>>> >>>>>> Sorry for not putting a proper subject in the last post. >>>>>> >>>>>> What you suggest is what I have been doing for a long time. I am just >>>>>> trying to alter my code to conform to the latest bioperl version and >>>>>> ran >>>>>> into this issue. I could be wrong (I am more a user rather than writer >>>>>> of modules) but since $feature->annotation() does not result in an >>>>>> error >>>>>> I think $feature is-a Bio::AnnotatableI as well. >>>>>> >>>>>> Cheers >>>>>> >>>>>> Govind >>>>>> >>>>>> >>>>>> >>>>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>>>> Hi Govind- >>>>>>> >>>>>>> As near as I can tell, the *_tags methods are deprecated for >>>>>>> Bio::AnnotatableI objects, but these methods are available >>>>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>>>> >>>>>>>> $ac=$feature->annotation(); >>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>> >>>>>>> do >>>>>>> >>>>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>>>> >>>>>>> directly. >>>>>>> >>>>>>> hope it helps - >>>>>>> Mark >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Govind Chandra" >>>>>>> To: >>>>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> The code below >>>>>>>> >>>>>>>> >>>>>>>> ====== code begins ======= >>>>>>>> #use strict; >>>>>>>> use Bio::SeqIO; >>>>>>>> >>>>>>>> $infile='NC_000913.gbk'; >>>>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>>>> my $seqobj=$seqio->next_seq(); >>>>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>>>> my $count=0; >>>>>>>> foreach my $feature (@features) { >>>>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>>>> print($feature->start()," ", $feature->end(), " >>>>>>>> ",$feature->strand(),"\n"); >>>>>>>> $ac=$feature->annotation(); >>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>>> @temp2=$ac->get_Annotations(); >>>>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>>>> if($count++ > 5) {last;} >>>>>>>> } >>>>>>>> >>>>>>>> print(ref($ac),"\n"); >>>>>>>> exit; >>>>>>>> >>>>>>>> ======= code ends ======== >>>>>>>> >>>>>>>> produces the output >>>>>>>> >>>>>>>> ========== output begins ======== >>>>>>>> >>>>>>>> 190 255 1 >>>>>>>> 0 >>>>>>>> 337 2799 1 >>>>>>>> 0 >>>>>>>> 2801 3733 1 >>>>>>>> 0 >>>>>>>> 3734 5020 1 >>>>>>>> 0 >>>>>>>> 5234 5530 1 >>>>>>>> 0 >>>>>>>> 5683 6459 -1 >>>>>>>> 0 >>>>>>>> 6529 7959 -1 >>>>>>>> 0 >>>>>>>> Bio::Annotation::Collection >>>>>>>> >>>>>>>> =========== output ends ========== >>>>>>>> >>>>>>>> $ac is-a Bio::Annotation::Collection but does not actually contain >>>>>>>> any >>>>>>>> annotation from the feature. Is this how it should be? I cannot >>>>>>>> figure >>>>>>>> out what is wrong with the script. Earlier I used to use has_tag(), >>>>>>>> get_tag_values() etc. but the documentation says these are >>>>>>>> deprecated. >>>>>>>> >>>>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of >>>>>>>> uname >>>>>>>> -a is >>>>>>>> >>>>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >>>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>>> >>>>>>>> Thanks in advance for any help. >>>>>>>> >>>>>>>> Govind >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Sat Mar 28 01:08:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Mar 2009 01:08:25 -0400 Subject: [Bioperl-l] bioperl-dev goes live! Message-ID: <6B21218784804D1CAF7B25DB6C43A9C5@NewLife> Hi All- I pleased to announce the maiden voyage of bioperl-dev. I have put up a stubby distribution skeleton under bioperl-dev/trunk in the Subversion repository. I will let you visit it for the details, but-- Some highlights: - the HEAD revision of Bio/Root/* is present in full, as is - the HEAD revision of t/lib/*, and - the README that I reproduce below The idea behind bioperl-dev, as I understand from Chris, is to provide a sort of sandbox for experimental code. Adventuresome users should feel free to play with the code there, but not expect much in the way of support, bug fixes, and the like. There be dragons there. When a bioperl-dev module graduates to the core, then the usual support mechanisms kick in. Devs please make yourselves comfy there, and modify the structure to suit. I believe it will be most useful (and easiest to integrate installations into working copies of the trunk) if the Bio/ subtree mimics the trunk namespace (with respect to existing modules) as much as possible. And if I'm really off-base someplace, please fix it and/or let me know. Cheers, Mark README: $Id: README 15616 2009-03-28 04:49:43Z maj $ o Version This is bioperl-dev version 1.6.9, a developer release. o Description bioperl-dev contains experimental modules intended to expand the Bioperl envelope. New ideas for future point and stable releases are being explored here. Interested users are encouraged to give these a try, keeping in mind the following points: o the modules here will likely depend on the current HEAD revision of Bioperl (bioperl-live/trunk); a release version may not suffice; o documentation is likely to be spotty at best; o the code should be considered unsupported, though a polite email to the dev it likely to elicit a positive response; o the code should not be considered "production quality"; when this level is reached, bioperl-dev modules will graduate to the core or the appropriate specialty package. See the Changes file for more information about what is contained in here. o Installation See the accompanying INSTALL file for details on installing bioperl-dev o Feedback Write down any problems or praise and send them to bioperl-l at bioperl.org ;-) From cjfields at illinois.edu Sat Mar 28 13:54:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 28 Mar 2009 12:54:04 -0500 Subject: [Bioperl-l] bioperl-dev goes live! In-Reply-To: <6B21218784804D1CAF7B25DB6C43A9C5@NewLife> References: <6B21218784804D1CAF7B25DB6C43A9C5@NewLife> Message-ID: We should be requiring a live core installation for dev, so we probably don't need to have the Bio::Root stuff unless we are reconfiguring those files. chris On Mar 28, 2009, at 12:08 AM, Mark A. Jensen wrote: > Hi All- > > I pleased to announce the maiden voyage of bioperl-dev. I have > put up a stubby distribution skeleton under bioperl-dev/trunk in > the Subversion repository. I will let you visit it for the details, > but-- > > Some highlights: > - the HEAD revision of Bio/Root/* is present in full, as is > - the HEAD revision of t/lib/*, and > - the README that I reproduce below > > The idea behind bioperl-dev, as I understand from > Chris, is to provide a sort of sandbox for experimental > code. Adventuresome users should feel free to play with > the code there, but not expect much in the way of support, > bug fixes, and the like. There be dragons there. When a > bioperl-dev module graduates to the core, then the usual > support mechanisms kick in. > > Devs please make yourselves comfy there, and modify > the structure to suit. I believe it will be most useful (and > easiest to integrate installations into working copies of > the trunk) if the Bio/ subtree mimics the trunk namespace > (with respect to existing modules) as much as possible. > And if I'm really off-base someplace, please fix it and/or > let me know. > > Cheers, > Mark > > README: > > $Id: README 15616 2009-03-28 04:49:43Z maj $ > > o Version > > This is bioperl-dev version 1.6.9, a developer release. > > o Description > > bioperl-dev contains experimental modules intended to expand the > Bioperl envelope. New ideas for future point and stable releases > are being explored here. Interested users are encouraged to > give these a try, keeping in mind the following points: > > o the modules here will likely depend on the current HEAD > revision of Bioperl (bioperl-live/trunk); a release version > may not suffice; > > o documentation is likely to be spotty at best; > > o the code should be considered unsupported, though a polite email > to the dev it likely to elicit a positive response; > > o the code should not be considered "production quality"; when this > level is reached, bioperl-dev modules will graduate to the > core or the appropriate specialty package. > > See the Changes file for more information about what is contained in > here. > > o Installation > > See the accompanying INSTALL file for details on installing > bioperl-dev > > o Feedback > > Write down any problems or praise and send them to > bioperl-l at bioperl.org ;-) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sat Mar 28 15:19:01 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 28 Mar 2009 15:19:01 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <7AAA2905-E74C-46FA-8C3C-310A22215E4D@illinois.edu> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> <2EFAC0364FC54158AA8002F785393370@NewLife> <7AAA2905-E74C-46FA-8C3C-310A22215E4D@illinois.edu> Message-ID: <27DB2E5A-2C36-4776-B6BB-CA86678AA7D8@gmx.net> On Mar 27, 2009, at 3:34 PM, Chris Fields wrote: > I think there is still some contention about what the difference is > between a Feature and an Annotation. A feature is (and in the BioPerl-way of looking at the world, should be) locatable, whereas annotation is not. > Our basic assumption is a Feature describes a specific section of > the sequence while Annotation describes the full sequence. It depends on whether a piece of annotation is associated with the full sequence (i.e., the sequence object itself, or a feature object that happens to extend over the full length of the sequence), or with a feature at a specific location. In principle, an annotation object is something that you would like to attach to an annotatable object (such as a feature). The reason you want to attach it (and any semantics implied by that reason) is in the tag. (Which is why there are in principle good reasons to have that tag come from an ontology.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Mar 28 15:24:29 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 28 Mar 2009 15:24:29 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <8F6A9BF2-555F-4435-B471-7173FB0124AB@illinois.edu> Message-ID: On Mar 27, 2009, at 3:02 PM, Chris Fields wrote: > Not sure if we really need to add AnnotatableI to SeqFeatureI; I > think that would be up to the SeqFeatureI implementation, and it > doesn't hurt leaving it out. I still think we should leave it out. It should be left to implementors to decide whether they need both or not, and the SeqFeatureI contract shouldn't dictate it. Or quoting from another of your emails: > 1) whether you want to mix your 'peanut butter' with your > 'chocolate' or keep them separate (amazingly, some people don't like > Reese's peanut butter cups), I like them, but they are so heavy that I can't eat a lot of them. There should be a possibility to have SeqFeatureI's that have chocolate but not peanut butter so you can have more of them. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Mar 28 15:35:07 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 28 Mar 2009 15:35:07 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <1238175871.20064.41.camel@jic51958.jic.bbsrc.ac.uk> <6B96D0E7-248A-45F5-87D3-7585F27A62D3@gmx.net> <7631BB4864FF4EFB9DB56357AC5C06F6@NewLife> Message-ID: <7B726727-EBE8-483A-9337-79AEE2378289@gmx.net> As I said earlier, it's worth keeping in mind that expressly unless told otherwise in a parser documentation, a sequence you get back from one of the SeqIO parsers (which is where most people will get them from) will *not* have $feature->annotation() populated. And as I said, if you don't want to care where to look for the annotation, use SeqFeature::AnnotationAdapter: my $anncoll = Bio::SeqFeature::AnnotationAdaptor->new(); foreach my $feat ($seq->get_all_SeqFeatures) { $anncoll->feature($feat); @vals = $anncoll->get_Annotations('locus_tag'); # do something with @vals } -hilmar On Mar 27, 2009, at 4:09 PM, Mark A. Jensen wrote: > Probably folks just starting to use 1.6 will be shunted in the > right direction by the docs, but those making the switch > might reasonably get confused (resulting in this thread, e.g.). > If someone tries to do $feature->annotation- > >getAnnotations('locus_tag') > when $feature->annotation is undef, then probably that user is > not hip to $feature->get_tag_values('locus_tag'), and would appreciate > something like > "Annotation property undefined. Did you mean $feature- > >get_tag_values?" > (not nec. that specific, but you get my drift?) > MAJ > > ----- Original Message ----- From: "Chris Fields" > > To: "Mark A. Jensen" > Cc: "Hilmar Lapp" ; "Govind Chandra" >; > Sent: Friday, March 27, 2009 3:49 PM > Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > > >> What exactly would the hint be? I'm not sure, but I don't think >> there is any documentation indicating that one should use anything >> other than tag methods to retrieve generic data. We *could* >> possibly add in code to check the feature's >> Bio::Annotation::Collection for the same tag name, then add any >> returned AnnotationI::display_text to the array of returned values. >> >> This all depends on: >> >> 1) whether you want to mix your 'peanut butter' with your >> 'chocolate' or keep them separate (amazingly, some people don't >> like Reese's peanut butter cups), and >> 2) whether you want to automatically check for the empty >> collection each time (the Collection is lazily created on the fly >> if one isn't supplied, so it may slow things down by creating the >> instance for every has_tag check). >> >> chris >> >> On Mar 27, 2009, at 1:29 PM, Mark A. Jensen wrote: >> >>> Thanks Hilmar-- so there isn't really a bug, but would it >>> be useful if the object warned a user who attempts to access an >>> empty $feature->annotation with a hint encapsulating >>> your discussion below? >>> MAJ >>> ----- Original Message ----- From: "Hilmar Lapp" >>> To: "Govind Chandra" >>> Cc: "Mark A. Jensen" ; >> > >>> Sent: Friday, March 27, 2009 2:14 PM >>> Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() >>> >>> >>>> $feature->annotation() is a legitimate method call (it implements >>>> AnnotatableI). >>>> SeqFeature::Generic has indeed two mechanism to store >>>> annotation, the tag system and the annotation collection. This >>>> is because it inherits from SeqFeatureI (which brings in the >>>> tag/value annotation) and from AnnotatableI (which brings in >>>> annotation()). >>>> I agree this can be confusing from a user's perspective. As a >>>> rule of thumb, SeqIO parsers will almost universally populate >>>> only the tag/ value system, because typically they will (or >>>> should) assume not more than that the feature object they are >>>> dealing with is a SeqFeatureI. >>>> Once you have the feature objects in your hands, you can add to >>>> either tag/values or annotation() to your heart's content. Just >>>> be aware that nearly all SeqIO writers won't use the >>>> annotation() collection when you pass the sequence back to them >>>> since typically they won't really know what to do with feature >>>> annotation that isn't tag/value (unlike as for sequence >>>> annotation). >>>> If in your code you want to treat tag/value annotation in the >>>> same way as (i.e., as if it were part of) the annotation that's >>>> in the annotation collection then use >>>> SeqFeature::AnnotationAdaptor. That's in fact what Bioperl-db >>>> does to ensure that all annotation gets serialized to the >>>> database no matter where it is. >>>> Hth, >>>> -hilmar >>>> On Mar 27, 2009, at 1:44 PM, Govind Chandra wrote: >>>>> Hi Mark, >>>>> Will it be unfair to say that the documentation as well as the >>>>> implementation are confusing. SeqFeature::Generic should cause >>>>> an error >>>>> when annotation() is called on it if it cannot do the right >>>>> thing. For >>>>> the time being I will stick with the old ways (has_tag etc.). >>>>> Good to >>>>> know they are not deprecated in the way I intend to use them (via >>>>> SeqFeature::Generic). >>>>> Cheers >>>>> Govind >>>>> >>>>> >>>>> >>>>> On Fri, 2009-03-27 at 13:30 -0400, Mark A. Jensen wrote: >>>>>> Hey Govind-- >>>>>> You're right-- SeqFeature::Generic object inherits from >>>>>> AnnotatableI-- but the *_tags_* methods are now >>>>>> SeqFeature::Generic methods--ie, you can use these >>>>>> on features, and they are no longer hitting AnnotableI. >>>>>> It appears that the feature's AnnotationCollection doesn't >>>>>> even get loaded now. >>>>>> [developer out there like to chime in?] >>>>>> cheers, >>>>>> Mark >>>>>> ----- Original Message ----- >>>>>> From: "Govind Chandra" >>>>>> To: "Mark A. Jensen" >>>>>> Cc: >>>>>> Sent: Friday, March 27, 2009 1:09 PM >>>>>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>>>>> >>>>>> >>>>>>> Thanks Mark, >>>>>>> >>>>>>> Sorry for not putting a proper subject in the last post. >>>>>>> >>>>>>> What you suggest is what I have been doing for a long time. I >>>>>>> am just >>>>>>> trying to alter my code to conform to the latest bioperl >>>>>>> version and ran >>>>>>> into this issue. I could be wrong (I am more a user rather >>>>>>> than writer >>>>>>> of modules) but since $feature->annotation() does not result >>>>>>> in an error >>>>>>> I think $feature is-a Bio::AnnotatableI as well. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> Govind >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>>>>>>> Hi Govind- >>>>>>>> >>>>>>>> As near as I can tell, the *_tags methods are deprecated for >>>>>>>> Bio::AnnotatableI objects, but these methods are available >>>>>>>> off the SeqFeatureI objects themselves: i.e., rather than >>>>>>>> >>>>>>>>> $ac=$feature->annotation(); >>>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>>> >>>>>>>> do >>>>>>>> >>>>>>>> $temp1 = $feature->get_tag_values("locus_tag"); >>>>>>>> >>>>>>>> directly. >>>>>>>> >>>>>>>> hope it helps - >>>>>>>> Mark >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Govind Chandra" >>>>>>>> To: >>>>>>>> Sent: Friday, March 27, 2009 11:26 AM >>>>>>>> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> The code below >>>>>>>>> >>>>>>>>> >>>>>>>>> ====== code begins ======= >>>>>>>>> #use strict; >>>>>>>>> use Bio::SeqIO; >>>>>>>>> >>>>>>>>> $infile='NC_000913.gbk'; >>>>>>>>> my $seqio=Bio::SeqIO->new(-file => $infile); >>>>>>>>> my $seqobj=$seqio->next_seq(); >>>>>>>>> my @features=$seqobj->all_SeqFeatures(); >>>>>>>>> my $count=0; >>>>>>>>> foreach my $feature (@features) { >>>>>>>>> unless($feature->primary_tag() eq 'CDS') {next;} >>>>>>>>> print($feature->start()," ", $feature->end(), " >>>>>>>>> ",$feature->strand(),"\n"); >>>>>>>>> $ac=$feature->annotation(); >>>>>>>>> $temp1=$ac->get_Annotations("locus_tag"); >>>>>>>>> @temp2=$ac->get_Annotations(); >>>>>>>>> print("$temp1 $temp2[0] @temp2\n"); >>>>>>>>> if($count++ > 5) {last;} >>>>>>>>> } >>>>>>>>> >>>>>>>>> print(ref($ac),"\n"); >>>>>>>>> exit; >>>>>>>>> >>>>>>>>> ======= code ends ======== >>>>>>>>> >>>>>>>>> produces the output >>>>>>>>> >>>>>>>>> ========== output begins ======== >>>>>>>>> >>>>>>>>> 190 255 1 >>>>>>>>> 0 >>>>>>>>> 337 2799 1 >>>>>>>>> 0 >>>>>>>>> 2801 3733 1 >>>>>>>>> 0 >>>>>>>>> 3734 5020 1 >>>>>>>>> 0 >>>>>>>>> 5234 5530 1 >>>>>>>>> 0 >>>>>>>>> 5683 6459 -1 >>>>>>>>> 0 >>>>>>>>> 6529 7959 -1 >>>>>>>>> 0 >>>>>>>>> Bio::Annotation::Collection >>>>>>>>> >>>>>>>>> =========== output ends ========== >>>>>>>>> >>>>>>>>> $ac is-a Bio::Annotation::Collection but does not actually >>>>>>>>> contain any >>>>>>>>> annotation from the feature. Is this how it should be? I >>>>>>>>> cannot figure >>>>>>>>> out what is wrong with the script. Earlier I used to use >>>>>>>>> has_tag(), >>>>>>>>> get_tag_values() etc. but the documentation says these are >>>>>>>>> deprecated. >>>>>>>>> >>>>>>>>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). >>>>>>>>> Output of uname >>>>>>>>> -a is >>>>>>>>> >>>>>>>>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 >>>>>>>>> EDT 2008 >>>>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>>>> >>>>>>>>> Thanks in advance for any help. >>>>>>>>> >>>>>>>>> Govind >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Mar 28 23:24:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Mar 2009 23:24:10 -0400 Subject: [Bioperl-l] bioperl-dev goes live! In-Reply-To: References: <6B21218784804D1CAF7B25DB6C43A9C5@NewLife> Message-ID: Certainly Build and Test should be there, right? Yes, we could, probably should leave out the rest until they are being actively worked on, since any work should start from the latest live version, I suppose. dev would be a good place to play with bequeath/bequest; perhaps it should stay there, since I'd like to start exploring that. re: building bioperl-dev-- since people will essentially be working on "private" extensions or enhancements, the build process shouldn't install everything by default, but should be driven more by user choices than, say, the core installation is. Is this a fair assessment? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Saturday, March 28, 2009 1:54 PM Subject: Re: [Bioperl-l] bioperl-dev goes live! > We should be requiring a live core installation for dev, so we > probably don't need to have the Bio::Root stuff unless we are > reconfiguring those files. > > chris > > On Mar 28, 2009, at 12:08 AM, Mark A. Jensen wrote: > >> Hi All- >> >> I pleased to announce the maiden voyage of bioperl-dev. I have >> put up a stubby distribution skeleton under bioperl-dev/trunk in >> the Subversion repository. I will let you visit it for the details, >> but-- >> >> Some highlights: >> - the HEAD revision of Bio/Root/* is present in full, as is >> - the HEAD revision of t/lib/*, and >> - the README that I reproduce below >> >> The idea behind bioperl-dev, as I understand from >> Chris, is to provide a sort of sandbox for experimental >> code. Adventuresome users should feel free to play with >> the code there, but not expect much in the way of support, >> bug fixes, and the like. There be dragons there. When a >> bioperl-dev module graduates to the core, then the usual >> support mechanisms kick in. >> >> Devs please make yourselves comfy there, and modify >> the structure to suit. I believe it will be most useful (and >> easiest to integrate installations into working copies of >> the trunk) if the Bio/ subtree mimics the trunk namespace >> (with respect to existing modules) as much as possible. >> And if I'm really off-base someplace, please fix it and/or >> let me know. >> >> Cheers, >> Mark >> >> README: >> >> $Id: README 15616 2009-03-28 04:49:43Z maj $ >> >> o Version >> >> This is bioperl-dev version 1.6.9, a developer release. >> >> o Description >> >> bioperl-dev contains experimental modules intended to expand the >> Bioperl envelope. New ideas for future point and stable releases >> are being explored here. Interested users are encouraged to >> give these a try, keeping in mind the following points: >> >> o the modules here will likely depend on the current HEAD >> revision of Bioperl (bioperl-live/trunk); a release version >> may not suffice; >> >> o documentation is likely to be spotty at best; >> >> o the code should be considered unsupported, though a polite email >> to the dev it likely to elicit a positive response; >> >> o the code should not be considered "production quality"; when this >> level is reached, bioperl-dev modules will graduate to the >> core or the appropriate specialty package. >> >> See the Changes file for more information about what is contained in >> here. >> >> o Installation >> >> See the accompanying INSTALL file for details on installing >> bioperl-dev >> >> o Feedback >> >> Write down any problems or praise and send them to >> bioperl-l at bioperl.org ;-) >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From pjotr.public21 at thebird.nl Sun Mar 29 06:32:58 2009 From: pjotr.public21 at thebird.nl (Pjotr Prins) Date: Sun, 29 Mar 2009 12:32:58 +0200 Subject: [Bioperl-l] Proposal for documentation generation Message-ID: <20090329103258.GA22114@thebird.nl> As you probably know I have been working on mapping microarray and sequencer IO libraries to Perl - see http://biolib.open-bio.org/. The idea is to write a mapping once and run on Perl, Python, Ruby, R, JAVA etc. I am turning to this list because I need to create API documentation for Perl (and the others) from the C/C++ code base. Maybe I am missing something, I would like to have your opinion. = API Documentation = Generating good API documentation for multiple languages is a problem. Ideally we would use the C/C++ code base to expose the interface for all mapped languages - if possible including generated example code for every language (!). SWIG has little support for that (there have been attempts in the past, but apparently dropped). Unfortunately the code SWIG generates does not lend itself to the native scripting language documentation generators. What SWIG can do is generate XML. The C function: int my_mod(int x, int y); gets output like the (simplified): Which allows transforming the function definitions into some other format. Likewise there are facilities for structs, classes etc. Still, this does not solve sharing function descriptions and/or examples. It would also be nice to have languages use the native documentation generators - as these are what users are comfortable with and allows Biolib inclusion into CPAN etc. BioLib opts for a new generator 'docigen' (the 'i' for interpreted languages) which can create Perl POD files, Python Pydoc, Ruby rdoc etc. - and examples using doctests (for the untyped languages). The Doxygen C/C++ documentation is linked in to (as this may include extra information on a method or structure). This does away of handling complex data types - Doxygen does a good job there. Above XML can be a starting point for creating a list of methods with parameters for every function. The input for docigen is a list of methods and parameters, as well as descriptions, used variables, return values and examples. A YAML like interface could be: module: example # int my_mod(int x, int y); method: name: my_mod type: int parameters: - name: x type: int - name: y type: int description: "Calculate the modulo of x and y" examples: - line: "my_mod(8,7)" expects: 1 where examples my be a bit tricky for different languges. In case no automatic translation is possible those can be made explicit: examples: perl: - line: $result = test('filename') - line: $result->num - expects: 1 though this particular example is probably easy to generalize across languages. What do you think, is there an easier way to do this? Pjotr From hlapp at gmx.net Sun Mar 29 14:41:12 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 29 Mar 2009 14:41:12 -0400 Subject: [Bioperl-l] Reminder: Student application deadline for Summer of Code 2009 Message-ID: *** Please disseminate widely to students at your institution. *** *** Note: Among the ideas and mentors are BioPerl, BioRuby, and BioLib *** PHYLOINFORMATICS SUMMER OF CODE 2009 - STUDENT APPLICATION DEADLINE IS APRIL 3 http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 The Phyloinformatics Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for evolutionary informatics under the mentorship of experienced developers from around the world. The program is the participation of the US National Evolutionary Synthesis Center (NESCent) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/ ). Students in the program will receive a stipend from Google (and a T- shirt solely available to successful participants), and may work from their home, or home institution, for the duration of the 3 month program. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. NESCent is particularly targeting students interested in both evolutionary biology and software development. Project ideas are listed on the website and range from hardware acceleration for phylogenetic inference, to support for phyloinformatics standards within the BioPerl and BioRuby toolkits, to alignment of next-gen sequencing data, to ontology term markup for biocuration, to semantic interoperability of web-services, to 3D-printing of phylogenies. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome novel project ideas that dovetail with student interests. TO APPLY: Instructions are at the website (see "When you apply"). You can find GSoC program rules and eligibility requirements at http://socghop.appspot.com . ***The 12-day application period for students ends on Friday, April 3rd, 2009, at 19:00 UTC (3pm EDT, 12pm PDT).*** INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all interested students to get in touch with us with their ideas as early as possible. 2009 NESCent Phyloinformatics Summer of Code: http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009 Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs Cyberinfrastructure Traineeships (managed separately from GSoC; postdocs also eligible): http://hackathon.nescent.org/Cyberinfrastructure_Summer_Traineeships_2009 To sign up for quarterly NESCent newsletters: http://www.nescent.org/about/contact.php --------- Todd Vision and Hilmar Lapp National Evolutionary Synthesis Center http://nescent.org From torsten.seemann at infotech.monash.edu.au Sun Mar 29 20:25:48 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 30 Mar 2009 11:25:48 +1100 Subject: [Bioperl-l] Question about parsing a gb file In-Reply-To: <56be91b60903270609l11bd0edan75ab8d3f0e552d56@mail.gmail.com> References: <56be91b60903270609l11bd0edan75ab8d3f0e552d56@mail.gmail.com> Message-ID: > Hi everybody,I have a little problem/question in parsing a genbank file. > I've got a $s = Bio::Seq object to which I've added > some Bio::SeqFeature::Generic, everything here seem to be ok since I can > find all the properties of the $s setted correctly in my visual debugger; > for instance, I can find the display_name properties of the SeqFeature in > the $s object. > Than I perform a print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) > to write down the genbank file but there I can't get any more some > properties of the sequence, like the "display_name". > What does it happens? > my $s = $str->next_seq(); > my $f = Bio::SeqFeature::Generic->new( > ? ? ? ? ? ?-start ? ? ? ?=> 10, > ? ? ? ? ? ?-end ? ? ? ? ?=> 100, > ? ? ? ? ? ?-strand ? ? ? => -1, > ? ? ? ? ? ?-primary ? ? ?=> 'CDS', # -primary_tag is a synonym > ? ? ? ? ? ?-source_tag ? => 'repeatmasker', > ? ? ? ? ? ?-display_name => 'alu family' > ? ? ? ? ? ? ); > $s->add_SeqFeature($f); > print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) The logical conclusion is that the 'genbank' output format does not store the -display_name attribute of a SeqFeature. If you look at the output of your script you will see only this: CDS complement(10..100) You will have to add appropriate -tags => { name=>value, .... } to your SeqFeature from the Genbank/EMBL feature table http://www.ncbi.nlm.nih.gov/collab/FT/ In particular I think you want to do the following: my $f = Bio::SeqFeature::Generic->new( -start => 10, -end => 100, -strand => -1, -primary => 'CDS', # -primary_tag is a synonym -tags = { product => 'alu family', note => 'repeatmasker', locus_tag => 'GENE00432', # etc } ); Hope this helps, --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From maj at fortinbras.us Sun Mar 29 20:54:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 29 Mar 2009 20:54:43 -0400 Subject: [Bioperl-l] bioperl-dev goes live! In-Reply-To: References: <6B21218784804D1CAF7B25DB6C43A9C5@NewLife> Message-ID: <56641591100446D99E08341CE37FB32F@NewLife> [sorry, by "it should stay there" I meant "RootI should stay in bioperl-dev"] -MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Chris Fields" Cc: Sent: Saturday, March 28, 2009 11:24 PM Subject: Re: [Bioperl-l] bioperl-dev goes live! > Certainly Build and Test should be there, right? > Yes, we could, probably should leave out the > rest until they are being actively worked on, since > any work should start from the latest live version, > I suppose. dev would be a good place to play with > bequeath/bequest; perhaps it should stay there, > since I'd like to start exploring that. > > re: building bioperl-dev-- since people will > essentially be working on "private" extensions > or enhancements, the build process shouldn't > install everything by default, but should be > driven more by user choices than, say, the > core installation is. Is this a fair assessment? > > ----- Original Message ----- > From: "Chris Fields" > To: "Mark A. Jensen" > Cc: > Sent: Saturday, March 28, 2009 1:54 PM > Subject: Re: [Bioperl-l] bioperl-dev goes live! > > >> We should be requiring a live core installation for dev, so we >> probably don't need to have the Bio::Root stuff unless we are >> reconfiguring those files. >> >> chris >> >> On Mar 28, 2009, at 12:08 AM, Mark A. Jensen wrote: >> >>> Hi All- >>> >>> I pleased to announce the maiden voyage of bioperl-dev. I have >>> put up a stubby distribution skeleton under bioperl-dev/trunk in >>> the Subversion repository. I will let you visit it for the details, >>> but-- >>> >>> Some highlights: >>> - the HEAD revision of Bio/Root/* is present in full, as is >>> - the HEAD revision of t/lib/*, and >>> - the README that I reproduce below >>> >>> The idea behind bioperl-dev, as I understand from >>> Chris, is to provide a sort of sandbox for experimental >>> code. Adventuresome users should feel free to play with >>> the code there, but not expect much in the way of support, >>> bug fixes, and the like. There be dragons there. When a >>> bioperl-dev module graduates to the core, then the usual >>> support mechanisms kick in. >>> >>> Devs please make yourselves comfy there, and modify >>> the structure to suit. I believe it will be most useful (and >>> easiest to integrate installations into working copies of >>> the trunk) if the Bio/ subtree mimics the trunk namespace >>> (with respect to existing modules) as much as possible. >>> And if I'm really off-base someplace, please fix it and/or >>> let me know. >>> >>> Cheers, >>> Mark >>> >>> README: >>> >>> $Id: README 15616 2009-03-28 04:49:43Z maj $ >>> >>> o Version >>> >>> This is bioperl-dev version 1.6.9, a developer release. >>> >>> o Description >>> >>> bioperl-dev contains experimental modules intended to expand the >>> Bioperl envelope. New ideas for future point and stable releases >>> are being explored here. Interested users are encouraged to >>> give these a try, keeping in mind the following points: >>> >>> o the modules here will likely depend on the current HEAD >>> revision of Bioperl (bioperl-live/trunk); a release version >>> may not suffice; >>> >>> o documentation is likely to be spotty at best; >>> >>> o the code should be considered unsupported, though a polite email >>> to the dev it likely to elicit a positive response; >>> >>> o the code should not be considered "production quality"; when this >>> level is reached, bioperl-dev modules will graduate to the >>> core or the appropriate specialty package. >>> >>> See the Changes file for more information about what is contained in >>> here. >>> >>> o Installation >>> >>> See the accompanying INSTALL file for details on installing >>> bioperl-dev >>> >>> o Feedback >>> >>> Write down any problems or praise and send them to >>> bioperl-l at bioperl.org ;-) >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Sun Mar 29 20:42:28 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 29 Mar 2009 20:42:28 -0400 Subject: [Bioperl-l] Question about parsing a gb file In-Reply-To: References: <56be91b60903270609l11bd0edan75ab8d3f0e552d56@mail.gmail.com> Message-ID: Paolo- You also may get some insight by looking through the thread started by Govind Chandra subsequent to this one, and see Chris and Hilmar's informative comments there regarding SeqFeature and Annotation. cheers Mark ----- Original Message ----- From: "Torsten Seemann" To: "Paolo Pavan" Cc: Sent: Sunday, March 29, 2009 8:25 PM Subject: Re: [Bioperl-l] Question about parsing a gb file > Hi everybody,I have a little problem/question in parsing a genbank file. > I've got a $s = Bio::Seq object to which I've added > some Bio::SeqFeature::Generic, everything here seem to be ok since I can > find all the properties of the $s setted correctly in my visual debugger; > for instance, I can find the display_name properties of the SeqFeature in > the $s object. > Than I perform a print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) > to write down the genbank file but there I can't get any more some > properties of the sequence, like the "display_name". > What does it happens? > my $s = $str->next_seq(); > my $f = Bio::SeqFeature::Generic->new( > -start => 10, > -end => 100, > -strand => -1, > -primary => 'CDS', # -primary_tag is a synonym > -source_tag => 'repeatmasker', > -display_name => 'alu family' > ); > $s->add_SeqFeature($f); > print Bio::SeqIO->new(-format => 'genbank')->write_seq($s) The logical conclusion is that the 'genbank' output format does not store the -display_name attribute of a SeqFeature. If you look at the output of your script you will see only this: CDS complement(10..100) You will have to add appropriate -tags => { name=>value, .... } to your SeqFeature from the Genbank/EMBL feature table http://www.ncbi.nlm.nih.gov/collab/FT/ In particular I think you want to do the following: my $f = Bio::SeqFeature::Generic->new( -start => 10, -end => 100, -strand => -1, -primary => 'CDS', # -primary_tag is a synonym -tags = { product => 'alu family', note => 'repeatmasker', locus_tag => 'GENE00432', # etc } ); Hope this helps, --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Mar 29 23:11:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 29 Mar 2009 23:11:05 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk><9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <9EA1352F984441D89D877131EA47E6AA@NewLife> Hi all- On the wiki, I've attempted to codify and add exegesis to the ideas discussed in this thread. Have a look at http://www.bioperl.org/wiki/Features_vs._Annotations in the new 'Metadata' subcat of the Scrapbook (http://www.bioperl.org/wiki/Category:Scrapbook) (Thanks Govind for stimulating this discussion!) cheers, Mark ----- Original Message ----- From: "Govind Chandra" To: "Mark A. Jensen" Cc: Sent: Friday, March 27, 2009 1:09 PM Subject: [Bioperl-l] Bio::AnnotatableI function annotation() > Thanks Mark, > > Sorry for not putting a proper subject in the last post. > > What you suggest is what I have been doing for a long time. I am just > trying to alter my code to conform to the latest bioperl version and ran > into this issue. I could be wrong (I am more a user rather than writer > of modules) but since $feature->annotation() does not result in an error > I think $feature is-a Bio::AnnotatableI as well. > > Cheers > > Govind > > > > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >> Hi Govind- >> >> As near as I can tell, the *_tags methods are deprecated for >> Bio::AnnotatableI objects, but these methods are available >> off the SeqFeatureI objects themselves: i.e., rather than >> >> > $ac=$feature->annotation(); >> > $temp1=$ac->get_Annotations("locus_tag"); >> >> do >> >> $temp1 = $feature->get_tag_values("locus_tag"); >> >> directly. >> >> hope it helps - >> Mark >> >> ----- Original Message ----- >> From: "Govind Chandra" >> To: >> Sent: Friday, March 27, 2009 11:26 AM >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >> >> >> > Hi, >> > >> > The code below >> > >> > >> > ====== code begins ======= >> > #use strict; >> > use Bio::SeqIO; >> > >> > $infile='NC_000913.gbk'; >> > my $seqio=Bio::SeqIO->new(-file => $infile); >> > my $seqobj=$seqio->next_seq(); >> > my @features=$seqobj->all_SeqFeatures(); >> > my $count=0; >> > foreach my $feature (@features) { >> > unless($feature->primary_tag() eq 'CDS') {next;} >> > print($feature->start()," ", $feature->end(), " >> > ",$feature->strand(),"\n"); >> > $ac=$feature->annotation(); >> > $temp1=$ac->get_Annotations("locus_tag"); >> > @temp2=$ac->get_Annotations(); >> > print("$temp1 $temp2[0] @temp2\n"); >> > if($count++ > 5) {last;} >> > } >> > >> > print(ref($ac),"\n"); >> > exit; >> > >> > ======= code ends ======== >> > >> > produces the output >> > >> > ========== output begins ======== >> > >> > 190 255 1 >> > 0 >> > 337 2799 1 >> > 0 >> > 2801 3733 1 >> > 0 >> > 3734 5020 1 >> > 0 >> > 5234 5530 1 >> > 0 >> > 5683 6459 -1 >> > 0 >> > 6529 7959 -1 >> > 0 >> > Bio::Annotation::Collection >> > >> > =========== output ends ========== >> > >> > $ac is-a Bio::Annotation::Collection but does not actually contain any >> > annotation from the feature. Is this how it should be? I cannot figure >> > out what is wrong with the script. Earlier I used to use has_tag(), >> > get_tag_values() etc. but the documentation says these are deprecated. >> > >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname >> > -a is >> > >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> > x86_64 x86_64 x86_64 GNU/Linux >> > >> > Thanks in advance for any help. >> > >> > Govind >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From clements at nescent.org Mon Mar 30 00:35:32 2009 From: clements at nescent.org (Dave Clements) Date: Sun, 29 Mar 2009 21:35:32 -0700 Subject: [Bioperl-l] 2009 GMOD Summer Schools - Americas & Europe In-Reply-To: References: Message-ID: Hello all, ***The application deadline for both GMOD summer schools is April 6, one week from now.*** GMOD Summer School - Americas will be held 16-19 July at the National Evolutionary Synthesis Center (NESCent), in Durham, NC, USA. Student tuition is free. See http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas GMOD Summer School - Europe will be held 3-6 August at the University of Oxford, in Oxford, UK. This is a part of GMOD Europe 2009, which includes the next GMOD Meeting. Student tuition is ?95. See http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe Please contact the GMOD Help Desk (help at gmod.org) if you have questions. We hope to see you in Durham or Oxford, Dave C. On Mon, Mar 16, 2009 at 1:53 PM, Dave Clements wrote: > We are now accepting applications for the 2009 GMOD Summer Schools: > > Americas, 16-19 July > ?- National Evolutionary Synthesis Center (NESCent), Durham, NC, USA > ?- Student tuition is free, thanks to NIH grant 1R01HG004483-01. > ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas > > Europe, 3-6 August > ?- University of Oxford, Oxford, United Kingdom > ?- Part of GMOD Europe 2009, which includes the next GMOD Meeting > ?- Student tuition is ?95 > ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe > > GMOD (http://gmod.org/) is a collection of interoperable open source > software components for managing, visualizing, annotating and > integrating biological, mostly genomic, data. ?GMOD is also a > community of developers and users dealing with similar challenges. > GMOD is used in diverse contexts, with both emerging and established > model organisms. > > GMOD Summer Schools (http://gmod.org/wiki/GMOD_Summer_School) > introduce new GMOD users to the GMOD project and feature several days > of hands-on training on how to install, configure and administer GMOD > tools. > > The courses includes training on several GMOD components: > ?* GBrowse - the widely used Generic Genome Browser > ?* Chado - a modular and extensible database schema for biological data > ?* Apollo - genome annotation editor > ?* BioMart - biological data warehouse system > ?* GBrowse_syn - a GBrowse based synteny viewer > ?* JBrowse - a brand new Web 2.0 genome browser > ?* Artemis-Chado Integration (Europe only) > ?* MAKER - Genome annotation pipeline (Americas only) > ?* Tripal - Web front end for Chado (Americas only) > > ***Please submit an application by the end of 6 April 2009, if you are > interested in attending. *** > > Enrollment is limited to 25 students in each course. ?If applications > exceed capacity (and we expect they will) then applicants will be > picked based on the strength of their application. ?Applicants will be > notified of their admission status in mid-April. > > Thanks, > > Dave Clements > GMOD Help Desk > help at gmod.org > > http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas > http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe > http://gmod.org/wiki/GMOD_Europe_2009 From govind.chandra at bbsrc.ac.uk Mon Mar 30 05:33:49 2009 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Mon, 30 Mar 2009 10:33:49 +0100 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <9EA1352F984441D89D877131EA47E6AA@NewLife> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <9EA1352F984441D89D877131EA47E6AA@NewLife> Message-ID: <1238405629.6274.11.camel@jic51958.jic.bbsrc.ac.uk> Thanks to Mark for pursuing the discussion arising out of my initial post so far. It soon got too technical for me to participate in actively but I have followed it as well as I could. And thanks to everybody who participated / commented. I have been using BioPerl every working day for the last 8 years and I will probably use it for the rest of my working life. It is a fantastic resource thanks to all those who help maintain and develop it. Cheers Govind On Sun, 2009-03-29 at 23:11 -0400, Mark A. Jensen wrote: > Hi all- > > On the wiki, I've attempted to codify and add exegesis to the > ideas discussed in this thread. Have a look at > > http://www.bioperl.org/wiki/Features_vs._Annotations > > in the new 'Metadata' subcat of the Scrapbook > (http://www.bioperl.org/wiki/Category:Scrapbook) > > (Thanks Govind for stimulating this discussion!) > > cheers, > Mark > > ----- Original Message ----- > From: "Govind Chandra" > To: "Mark A. Jensen" > Cc: > Sent: Friday, March 27, 2009 1:09 PM > Subject: [Bioperl-l] Bio::AnnotatableI function annotation() > > > > Thanks Mark, > > > > Sorry for not putting a proper subject in the last post. > > > > What you suggest is what I have been doing for a long time. I am just > > trying to alter my code to conform to the latest bioperl version and ran > > into this issue. I could be wrong (I am more a user rather than writer > > of modules) but since $feature->annotation() does not result in an error > > I think $feature is-a Bio::AnnotatableI as well. > > > > Cheers > > > > Govind > > > > > > > > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: > >> Hi Govind- > >> > >> As near as I can tell, the *_tags methods are deprecated for > >> Bio::AnnotatableI objects, but these methods are available > >> off the SeqFeatureI objects themselves: i.e., rather than > >> > >> > $ac=$feature->annotation(); > >> > $temp1=$ac->get_Annotations("locus_tag"); > >> > >> do > >> > >> $temp1 = $feature->get_tag_values("locus_tag"); > >> > >> directly. > >> > >> hope it helps - > >> Mark > >> > >> ----- Original Message ----- > >> From: "Govind Chandra" > >> To: > >> Sent: Friday, March 27, 2009 11:26 AM > >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 > >> > >> > >> > Hi, > >> > > >> > The code below > >> > > >> > > >> > ====== code begins ======= > >> > #use strict; > >> > use Bio::SeqIO; > >> > > >> > $infile='NC_000913.gbk'; > >> > my $seqio=Bio::SeqIO->new(-file => $infile); > >> > my $seqobj=$seqio->next_seq(); > >> > my @features=$seqobj->all_SeqFeatures(); > >> > my $count=0; > >> > foreach my $feature (@features) { > >> > unless($feature->primary_tag() eq 'CDS') {next;} > >> > print($feature->start()," ", $feature->end(), " > >> > ",$feature->strand(),"\n"); > >> > $ac=$feature->annotation(); > >> > $temp1=$ac->get_Annotations("locus_tag"); > >> > @temp2=$ac->get_Annotations(); > >> > print("$temp1 $temp2[0] @temp2\n"); > >> > if($count++ > 5) {last;} > >> > } > >> > > >> > print(ref($ac),"\n"); > >> > exit; > >> > > >> > ======= code ends ======== > >> > > >> > produces the output > >> > > >> > ========== output begins ======== > >> > > >> > 190 255 1 > >> > 0 > >> > 337 2799 1 > >> > 0 > >> > 2801 3733 1 > >> > 0 > >> > 3734 5020 1 > >> > 0 > >> > 5234 5530 1 > >> > 0 > >> > 5683 6459 -1 > >> > 0 > >> > 6529 7959 -1 > >> > 0 > >> > Bio::Annotation::Collection > >> > > >> > =========== output ends ========== > >> > > >> > $ac is-a Bio::Annotation::Collection but does not actually contain any > >> > annotation from the feature. Is this how it should be? I cannot figure > >> > out what is wrong with the script. Earlier I used to use has_tag(), > >> > get_tag_values() etc. but the documentation says these are deprecated. > >> > > >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > >> > -a is > >> > > >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > >> > x86_64 x86_64 x86_64 GNU/Linux > >> > > >> > Thanks in advance for any help. > >> > > >> > Govind > >> > > >> > > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Mon Mar 30 08:10:20 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Mar 2009 08:10:20 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <1238405629.6274.11.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <9EA1352F984441D89D877131EA47E6AA@NewLife> <1238405629.6274.11.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <09F960B41BFA48348EFAAAB8F5E112B4@NewLife> No problem-- One of the really useful things (IMHO) about the list is how user issues can poke the developers into emitting short bursts of wisdom, so that the scribes (like me) can capture it for the faithful. cheers -MAJ ----- Original Message ----- From: "Govind Chandra" To: "Mark A. Jensen" Cc: Sent: Monday, March 30, 2009 5:33 AM Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > Thanks to Mark for pursuing the discussion arising out of my initial > post so far. It soon got too technical for me to participate in actively > but I have followed it as well as I could. And thanks to everybody who > participated / commented. > > I have been using BioPerl every working day for the last 8 years and I > will probably use it for the rest of my working life. It is a fantastic > resource thanks to all those who help maintain and develop it. > > Cheers > > Govind > > > > On Sun, 2009-03-29 at 23:11 -0400, Mark A. Jensen wrote: >> Hi all- >> >> On the wiki, I've attempted to codify and add exegesis to the >> ideas discussed in this thread. Have a look at >> >> http://www.bioperl.org/wiki/Features_vs._Annotations >> >> in the new 'Metadata' subcat of the Scrapbook >> (http://www.bioperl.org/wiki/Category:Scrapbook) >> >> (Thanks Govind for stimulating this discussion!) >> >> cheers, >> Mark >> >> ----- Original Message ----- >> From: "Govind Chandra" >> To: "Mark A. Jensen" >> Cc: >> Sent: Friday, March 27, 2009 1:09 PM >> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >> >> >> > Thanks Mark, >> > >> > Sorry for not putting a proper subject in the last post. >> > >> > What you suggest is what I have been doing for a long time. I am just >> > trying to alter my code to conform to the latest bioperl version and ran >> > into this issue. I could be wrong (I am more a user rather than writer >> > of modules) but since $feature->annotation() does not result in an error >> > I think $feature is-a Bio::AnnotatableI as well. >> > >> > Cheers >> > >> > Govind >> > >> > >> > >> > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >> >> Hi Govind- >> >> >> >> As near as I can tell, the *_tags methods are deprecated for >> >> Bio::AnnotatableI objects, but these methods are available >> >> off the SeqFeatureI objects themselves: i.e., rather than >> >> >> >> > $ac=$feature->annotation(); >> >> > $temp1=$ac->get_Annotations("locus_tag"); >> >> >> >> do >> >> >> >> $temp1 = $feature->get_tag_values("locus_tag"); >> >> >> >> directly. >> >> >> >> hope it helps - >> >> Mark >> >> >> >> ----- Original Message ----- >> >> From: "Govind Chandra" >> >> To: >> >> Sent: Friday, March 27, 2009 11:26 AM >> >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >> >> >> >> >> >> > Hi, >> >> > >> >> > The code below >> >> > >> >> > >> >> > ====== code begins ======= >> >> > #use strict; >> >> > use Bio::SeqIO; >> >> > >> >> > $infile='NC_000913.gbk'; >> >> > my $seqio=Bio::SeqIO->new(-file => $infile); >> >> > my $seqobj=$seqio->next_seq(); >> >> > my @features=$seqobj->all_SeqFeatures(); >> >> > my $count=0; >> >> > foreach my $feature (@features) { >> >> > unless($feature->primary_tag() eq 'CDS') {next;} >> >> > print($feature->start()," ", $feature->end(), " >> >> > ",$feature->strand(),"\n"); >> >> > $ac=$feature->annotation(); >> >> > $temp1=$ac->get_Annotations("locus_tag"); >> >> > @temp2=$ac->get_Annotations(); >> >> > print("$temp1 $temp2[0] @temp2\n"); >> >> > if($count++ > 5) {last;} >> >> > } >> >> > >> >> > print(ref($ac),"\n"); >> >> > exit; >> >> > >> >> > ======= code ends ======== >> >> > >> >> > produces the output >> >> > >> >> > ========== output begins ======== >> >> > >> >> > 190 255 1 >> >> > 0 >> >> > 337 2799 1 >> >> > 0 >> >> > 2801 3733 1 >> >> > 0 >> >> > 3734 5020 1 >> >> > 0 >> >> > 5234 5530 1 >> >> > 0 >> >> > 5683 6459 -1 >> >> > 0 >> >> > 6529 7959 -1 >> >> > 0 >> >> > Bio::Annotation::Collection >> >> > >> >> > =========== output ends ========== >> >> > >> >> > $ac is-a Bio::Annotation::Collection but does not actually contain any >> >> > annotation from the feature. Is this how it should be? I cannot figure >> >> > out what is wrong with the script. Earlier I used to use has_tag(), >> >> > get_tag_values() etc. but the documentation says these are deprecated. >> >> > >> >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname >> >> > -a is >> >> > >> >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 >> >> > x86_64 x86_64 x86_64 GNU/Linux >> >> > >> >> > Thanks in advance for any help. >> >> > >> >> > Govind >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> > >> >> >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > > > > From pmenzel at googlemail.com Mon Mar 30 09:56:02 2009 From: pmenzel at googlemail.com (Peter Menzel) Date: Mon, 30 Mar 2009 15:56:02 +0200 Subject: [Bioperl-l] reading and writing tree Message-ID: <310417680903300656s559e18cep8a65617b18c8d180@mail.gmail.com> Hi, using the TreeIO class, I try to read a tree from a newick file, delete some nodes, and write the tree using write_tree(). Besides that I cannot write to files, that don't exist already, it's also not possible to write to existing files. The following error message is written: Filehandle GEN1 opened only for input at /usr/share/perl5/Bio/Root/IO.pm line 421. So apparently somehow a file handle is associated with the tree, since different TreeIO objects are used. Is there a workaround for this problem? The actual code I run: #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $filename = shift @ARGV; # parse in newick/new hampshire format my $input = new Bio::TreeIO(-file => $filename, -format => "newick"); my $tree = $input->next_tree; foreach my $nodename (@ARGV) { my @nodes = $tree->find_node(-id => $nodename); if(@nodes > 0) { foreach my $n (@nodes) { $tree->remove_Node($n); } } } $input->close(); #write tree to new file my $output = new Bio::TreeIO(-file => $filename.".new", -format => "newick"); $output->write_tree($tree); kind regards, Peter From bosborne11 at verizon.net Mon Mar 30 09:03:02 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Mar 2009 09:03:02 -0400 Subject: [Bioperl-l] Bio::AnnotatableI function annotation() In-Reply-To: <09F960B41BFA48348EFAAAB8F5E112B4@NewLife> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> <9ED3500FE5524639887C2E762053628B@NewLife> <1238173769.20064.32.camel@jic51958.jic.bbsrc.ac.uk> <9EA1352F984441D89D877131EA47E6AA@NewLife> <1238405629.6274.11.camel@jic51958.jic.bbsrc.ac.uk> <09F960B41BFA48348EFAAAB8F5E112B4@NewLife> Message-ID: <4ED87AA5-3B98-416B-A112-9F18C7EA489E@verizon.net> bioperl-l, From the new article: "a feature is metadata attached to a particular section or fragment of a sequence, while an annotation is metadata attached to the sequence object itself, and so describes something about the entire sequence." This point is also made in the HOWTO, I'll make it into its own section so that it's more obvious. Brian O. On Mar 30, 2009, at 8:10 AM, Mark A. Jensen wrote: > No problem-- One of the really useful things (IMHO) about > the list is how user issues can poke the developers into emitting > short bursts of wisdom, so that the scribes (like me) can capture it > for the faithful. > cheers -MAJ > ----- Original Message ----- From: "Govind Chandra" > > To: "Mark A. Jensen" > Cc: > Sent: Monday, March 30, 2009 5:33 AM > Subject: Re: [Bioperl-l] Bio::AnnotatableI function annotation() > > >> Thanks to Mark for pursuing the discussion arising out of my initial >> post so far. It soon got too technical for me to participate in >> actively >> but I have followed it as well as I could. And thanks to everybody >> who >> participated / commented. >> >> I have been using BioPerl every working day for the last 8 years >> and I >> will probably use it for the rest of my working life. It is a >> fantastic >> resource thanks to all those who help maintain and develop it. >> >> Cheers >> >> Govind >> >> >> >> On Sun, 2009-03-29 at 23:11 -0400, Mark A. Jensen wrote: >>> Hi all- >>> >>> On the wiki, I've attempted to codify and add exegesis to the >>> ideas discussed in this thread. Have a look at >>> >>> http://www.bioperl.org/wiki/Features_vs._Annotations >>> >>> in the new 'Metadata' subcat of the Scrapbook >>> (http://www.bioperl.org/wiki/Category:Scrapbook) >>> >>> (Thanks Govind for stimulating this discussion!) >>> >>> cheers, >>> Mark >>> >>> ----- Original Message ----- From: "Govind Chandra" >> > >>> To: "Mark A. Jensen" >>> Cc: >>> Sent: Friday, March 27, 2009 1:09 PM >>> Subject: [Bioperl-l] Bio::AnnotatableI function annotation() >>> >>> >>> > Thanks Mark, >>> > >>> > Sorry for not putting a proper subject in the last post. >>> > >>> > What you suggest is what I have been doing for a long time. I am >>> just >>> > trying to alter my code to conform to the latest bioperl version >>> and ran >>> > into this issue. I could be wrong (I am more a user rather than >>> writer >>> > of modules) but since $feature->annotation() does not result in >>> an error >>> > I think $feature is-a Bio::AnnotatableI as well. >>> > >>> > Cheers >>> > >>> > Govind >>> > >>> > >>> > >>> > On Fri, 2009-03-27 at 12:17 -0400, Mark A. Jensen wrote: >>> >> Hi Govind- >>> >> >>> >> As near as I can tell, the *_tags methods are deprecated for >>> >> Bio::AnnotatableI objects, but these methods are available >>> >> off the SeqFeatureI objects themselves: i.e., rather than >>> >> >>> >> > $ac=$feature->annotation(); >>> >> > $temp1=$ac->get_Annotations("locus_tag"); >>> >> >>> >> do >>> >> >>> >> $temp1 = $feature->get_tag_values("locus_tag"); >>> >> >>> >> directly. >>> >> >>> >> hope it helps - >>> >> Mark >>> >> >>> >> ----- Original Message ----- >> From: "Govind Chandra" >> > >>> >> To: >>> >> Sent: Friday, March 27, 2009 11:26 AM >>> >> Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 >>> >> >>> >> >>> >> > Hi, >>> >> > >>> >> > The code below >>> >> > >>> >> > >>> >> > ====== code begins ======= >>> >> > #use strict; >>> >> > use Bio::SeqIO; >>> >> > >>> >> > $infile='NC_000913.gbk'; >>> >> > my $seqio=Bio::SeqIO->new(-file => $infile); >>> >> > my $seqobj=$seqio->next_seq(); >>> >> > my @features=$seqobj->all_SeqFeatures(); >>> >> > my $count=0; >>> >> > foreach my $feature (@features) { >>> >> > unless($feature->primary_tag() eq 'CDS') {next;} >>> >> > print($feature->start()," ", $feature->end(), " >>> >> > ",$feature->strand(),"\n"); >>> >> > $ac=$feature->annotation(); >>> >> > $temp1=$ac->get_Annotations("locus_tag"); >>> >> > @temp2=$ac->get_Annotations(); >>> >> > print("$temp1 $temp2[0] @temp2\n"); >>> >> > if($count++ > 5) {last;} >>> >> > } >>> >> > >>> >> > print(ref($ac),"\n"); >>> >> > exit; >>> >> > >>> >> > ======= code ends ======== >>> >> > >>> >> > produces the output >>> >> > >>> >> > ========== output begins ======== >>> >> > >>> >> > 190 255 1 >>> >> > 0 >>> >> > 337 2799 1 >>> >> > 0 >>> >> > 2801 3733 1 >>> >> > 0 >>> >> > 3734 5020 1 >>> >> > 0 >>> >> > 5234 5530 1 >>> >> > 0 >>> >> > 5683 6459 -1 >>> >> > 0 >>> >> > 6529 7959 -1 >>> >> > 0 >>> >> > Bio::Annotation::Collection >>> >> > >>> >> > =========== output ends ========== >>> >> > >>> >> > $ac is-a Bio::Annotation::Collection but does not actually >>> contain any >>> >> > annotation from the feature. Is this how it should be? I >>> cannot figure >>> >> > out what is wrong with the script. Earlier I used to use >>> has_tag(), >>> >> > get_tag_values() etc. but the documentation says these are >>> deprecated. >>> >> > >>> >> > Perl is 5.8.8. BioPerl version is 1.6 (installed today). >>> Output of uname >>> >> > -a is >>> >> > >>> >> > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT >>> 2008 >>> >> > x86_64 x86_64 x86_64 GNU/Linux >>> >> > >>> >> > Thanks in advance for any help. >>> >> > >>> >> > Govind >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Bioperl-l mailing list >>> >> > Bioperl-l at lists.open-bio.org >>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > >>> >> > >>> >> >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > >>> > >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Mar 30 12:16:22 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Mar 2009 12:16:22 -0400 Subject: [Bioperl-l] reading and writing tree In-Reply-To: <310417680903300656s559e18cep8a65617b18c8d180@mail.gmail.com> References: <310417680903300656s559e18cep8a65617b18c8d180@mail.gmail.com> Message-ID: <4244C66D037A479393CB5598D2C9319D@NewLife> Hi Peter-- do my $output = new Bio::TreeIO(-file => ">${filename}.new", -format => "newick"); (just like writing to files using open() ) Should fix it- cheers, MAJ ----- Original Message ----- From: "Peter Menzel" To: Sent: Monday, March 30, 2009 9:56 AM Subject: [Bioperl-l] reading and writing tree > Hi, > > using the TreeIO class, I try to read a tree from a newick file, > delete some nodes, and write the tree using write_tree(). > Besides that I cannot write to files, that don't exist already, it's > also not possible to write to existing files. > The following error message is written: > > Filehandle GEN1 opened only for input at > /usr/share/perl5/Bio/Root/IO.pm line 421. > > So apparently somehow a file handle is associated with the tree, since > different TreeIO objects are used. > Is there a workaround for this problem? > > The actual code I run: > > #!/usr/bin/perl -w > > use strict; > use Bio::TreeIO; > > my $filename = shift @ARGV; > > # parse in newick/new hampshire format > my $input = new Bio::TreeIO(-file => $filename, > -format => "newick"); > my $tree = $input->next_tree; > > foreach my $nodename (@ARGV) { > my @nodes = $tree->find_node(-id => $nodename); > if(@nodes > 0) { > foreach my $n (@nodes) { > $tree->remove_Node($n); > } > } > } > $input->close(); > > #write tree to new file > my $output = new Bio::TreeIO(-file => $filename.".new", -format => "newick"); > $output->write_tree($tree); > > > kind regards, Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Mar 30 12:40:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Mar 2009 12:40:12 -0400 Subject: [Bioperl-l] reading and writing tree In-Reply-To: <310417680903300936o515e6f15wb627220957d66ec9@mail.gmail.com> References: <310417680903300656s559e18cep8a65617b18c8d180@mail.gmail.com> <4244C66D037A479393CB5598D2C9319D@NewLife> <310417680903300936o515e6f15wb627220957d66ec9@mail.gmail.com> Message-ID: Sure, you can do : my $output = new Bio::TreeIO(-fh => \*STDOUT, -format => "newick"); On wiki, have a look at http://www.bioperl.org/wiki/HOWTO:Trees#Reading_and_Writing_Trees cheers, Mark ----- Original Message ----- From: "Peter Menzel" To: "Mark A. Jensen" Sent: Monday, March 30, 2009 12:36 PM Subject: Re: [Bioperl-l] reading and writing tree > On Mon, Mar 30, 2009 at 6:16 PM, Mark A. Jensen wrote: >> my $output = new Bio::TreeIO(-file => ">${filename}.new", -format => >> "newick"); >> >> (just like writing to files using open() ) > > awesome, this works fine. Should be added to the wiki.. Is there a way > to write to STDOUT also? > > best, Peter > > From shalabh.sharma7 at gmail.com Tue Mar 31 14:42:51 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 31 Mar 2009 14:42:51 -0400 Subject: [Bioperl-l] taxonomy ID Message-ID: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> Hi All, I am writing a script, for one of its part i have to parse a blast report (refseq blast) and check how may organisms are eukaryotes and how namy of them are prokaryotes. I am using BIO::DB::taxinomy module: http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy But for this i need a taxonomyid (like '33090') given in the example. So is it possible to get a taxonomyid from refseq balst report? If not then how i can deal with this problem? i would really appreciate if anyone can help me out. Thanks Shalabh From Russell.Smithies at agresearch.co.nz Tue Mar 31 16:06:35 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 1 Apr 2009 09:06:35 +1300 Subject: [Bioperl-l] taxonomy ID In-Reply-To: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> References: <9fcc48c70903311142u16a70d0ao13536fc7f91b7d8e@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF324939F51B3@exchsth.agresearch.co.nz> The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database. The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem :-) If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that. It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Wednesday, 1 April 2009 7:43 a.m. > To: bioperl-l > Subject: [Bioperl-l] taxonomy ID > > Hi All, > I am writing a script, for one of its part i have to parse a blast > report (refseq blast) and check how may organisms are eukaryotes and how > namy of them are prokaryotes. > I am using BIO::DB::taxinomy module: > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > But for this i need a taxonomyid (like '33090') given in the example. > So is it possible to get a taxonomyid from refseq balst report? > If not then how i can deal with this problem? > > i would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sanjay.harke at gmail.com Sat Mar 28 08:41:02 2009 From: sanjay.harke at gmail.com (Sanjay Harke) Date: Sat, 28 Mar 2009 18:11:02 +0530 Subject: [Bioperl-l] Query about Bioperl and Mysql Message-ID: <31bb4380903280541r232ebbe4kbb0ccd84f996da1f@mail.gmail.com> Dear friends, anybody nows about my following problem. !) I want to use my own database mysql with Bioperl kindly guide for it. sanjay From dereje1227 at yahoo.com Tue Mar 31 18:59:45 2009 From: dereje1227 at yahoo.com (demis001) Date: Tue, 31 Mar 2009 15:59:45 -0700 (PDT) Subject: [Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15 In-Reply-To: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> References: <1238167562.20064.17.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <22816585.post@talk.nabble.com> Hi , I am new to BioPerl and this forum and even do not know how to post the new post. I have one question for you guys. Is there any BioPerl module that allows me to download sequence based on chromosome name, seqStart and SeqEnd given the formatted human genome database downloaded on my Linux desktop? I used to do this using Perl $URI object and it is really slow as the process depend on the network. To be more specific, I took chrName, seqStart and seqEnd and go to Ensembl database to get the sequence one by one using Perl $URI object. I thought it might be easier if I process locally using indexed database using BioPerl module if there is any designed for this purpose. Input, millions rows of tab delimited (CSV) file contain information about chrName, seqStart, seqEnd. Locally formatted/indexed human genome. Output should be the fasta sequence contain the sequence and with the header contain chr name and location persed Sorry if I posted in the wrong section of the forum and happy to get any recommendation. Thanks Govind Chandra wrote: > > Hi, > > The code below > > > ====== code begins ======= > #use strict; > use Bio::SeqIO; > > $infile='NC_000913.gbk'; > my $seqio=Bio::SeqIO->new(-file => $infile); > my $seqobj=$seqio->next_seq(); > my @features=$seqobj->all_SeqFeatures(); > my $count=0; > foreach my $feature (@features) { > unless($feature->primary_tag() eq 'CDS') {next;} > print($feature->start()," ", $feature->end(), " > ",$feature->strand(),"\n"); > $ac=$feature->annotation(); > $temp1=$ac->get_Annotations("locus_tag"); > @temp2=$ac->get_Annotations(); > print("$temp1 $temp2[0] @temp2\n"); > if($count++ > 5) {last;} > } > > print(ref($ac),"\n"); > exit; > > ======= code ends ======== > > produces the output > > ========== output begins ======== > > 190 255 1 > 0 > 337 2799 1 > 0 > 2801 3733 1 > 0 > 3734 5020 1 > 0 > 5234 5530 1 > 0 > 5683 6459 -1 > 0 > 6529 7959 -1 > 0 > Bio::Annotation::Collection > > =========== output ends ========== > > $ac is-a Bio::Annotation::Collection but does not actually contain any > annotation from the feature. Is this how it should be? I cannot figure > out what is wrong with the script. Earlier I used to use has_tag(), > get_tag_values() etc. but the documentation says these are deprecated. > > Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of uname > -a is > > Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Thanks in advance for any help. > > Govind > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.