From David.Messina at sbc.su.se Fri Apr 1 04:07:06 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 1 Apr 2011 10:07:06 +0200 Subject: [Bioperl-l] StandAloneBlastPlus Question In-Reply-To: References: Message-ID: Hi Veronica, I took a look and couldn't figure out how to do an update to an existing database, either. But I also couldn't find a clear example of how to do this with just the blast+ command-line tools (outside of Perl). Well, it looks like you can create a new, virtual database using blastdb_aliastool. Is that what you mean? If not, could you provide an example command line of how to update an existing blast database using the blast+ command-line tools? Also, Ccing Mark Jensen ? he can probably answer this straight away. Dave On Fri, Apr 1, 2011 at 02:45, Veronica A. wrote: > > > > > Hello, > > I'm trying to create a Perl script using StandAloneBlastPlus that can > either create a new database or update it with new FASTA sequences. > However, I can only find information on the creation of a new database. > I've checked the information on doc.bioperl.org but a lot of the Methods > don't have descriptions yet. > > > Is it possible to use StandAloneBlastPlus or is there another/a better way? > > Thank you, > > Veronica > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From armendarez77 at hotmail.com Fri Apr 1 10:17:09 2011 From: armendarez77 at hotmail.com (Veronica A.) Date: Fri, 1 Apr 2011 07:17:09 -0700 Subject: [Bioperl-l] StandAloneBlastPlus Question Message-ID: Phew....I'm glad I'm not the only one :) I'll look into the blastdb_aliastool method and see if that will help in the mean time. Thank you. Veronica > From: David.Messina at sbc.su.se > Date: Fri, 1 Apr 2011 10:07:06 +0200 > To: armendarez77 at hotmail.com > CC: bioperl-l at lists.open-bio.org; maj at fortinbras.us > Subject: Re: [Bioperl-l] StandAloneBlastPlus Question > > Hi Veronica, > > I took a look and couldn't figure out how to do an update to an existing > database, either. But I also couldn't find a clear example of how to do this > with just the blast+ command-line tools (outside of Perl). > > Well, it looks like you can create a new, virtual database using > blastdb_aliastool. Is that what you mean? If not, could you provide an > example command line of how to update an existing blast database using the > blast+ command-line tools? > > Also, Ccing Mark Jensen ? he can probably answer this straight away. > > > Dave > > > > > > On Fri, Apr 1, 2011 at 02:45, Veronica A. wrote: > > > > > > > > > > > Hello, > > > > I'm trying to create a Perl script using StandAloneBlastPlus that can > > either create a new database or update it with new FASTA sequences. > > However, I can only find information on the creation of a new database. > > I've checked the information on doc.bioperl.org but a lot of the Methods > > don't have descriptions yet. > > > > > > Is it possible to use StandAloneBlastPlus or is there another/a better way? > > > > Thank you, > > > > Veronica > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 1 10:59:03 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 1 Apr 2011 09:59:03 -0500 Subject: [Bioperl-l] StandAloneBlastPlus Question In-Reply-To: References: Message-ID: <87E8483B-4998-4627-94C5-9B7070BC7C80@illinois.edu> Using BLAST+ and makeblastdb, it seems as if the input files can be either FASTA or other BLAST databases, and it takes standard input. You could try multiple -in parameters to see if it works (according to the BLAST docs this is supported). >From 'makeblastdb': *** Input options -in Input file/database name; the data type is automatically detected, it may be any of the following: FASTA file(s) and/or BLAST database(s) Default = `-' -dbtype Molecule type of input Default = `prot' chris On Apr 1, 2011, at 9:17 AM, Veronica A. wrote: > Phew....I'm glad I'm not the only one :) > > I'll look into the blastdb_aliastool method and see if that will help in the mean time. > > Thank you. > > Veronica > > > >> From: David.Messina at sbc.su.se >> Date: Fri, 1 Apr 2011 10:07:06 +0200 >> To: armendarez77 at hotmail.com >> CC: bioperl-l at lists.open-bio.org; maj at fortinbras.us >> Subject: Re: [Bioperl-l] StandAloneBlastPlus Question >> >> Hi Veronica, >> >> I took a look and couldn't figure out how to do an update to an existing >> database, either. But I also couldn't find a clear example of how to do this >> with just the blast+ command-line tools (outside of Perl). >> >> Well, it looks like you can create a new, virtual database using >> blastdb_aliastool. Is that what you mean? If not, could you provide an >> example command line of how to update an existing blast database using the >> blast+ command-line tools? >> >> Also, Ccing Mark Jensen ? he can probably answer this straight away. >> >> >> Dave >> >> >> >> >> >> On Fri, Apr 1, 2011 at 02:45, Veronica A. wrote: >> >>> >>> >>> >>> >>> Hello, >>> >>> I'm trying to create a Perl script using StandAloneBlastPlus that can >>> either create a new database or update it with new FASTA sequences. >>> However, I can only find information on the creation of a new database. >>> I've checked the information on doc.bioperl.org but a lot of the Methods >>> don't have descriptions yet. >>> >>> >>> Is it possible to use StandAloneBlastPlus or is there another/a better way? >>> >>> Thank you, >>> >>> Veronica >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Fri Apr 1 11:53:41 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 1 Apr 2011 11:53:41 -0400 Subject: [Bioperl-l] type=MyISAM In-Reply-To: References: Message-ID: With respect to Bio::DB::GFF being broken on new mysql installations, I could replace TYPE with ENGINE and this would fix newer MySQLs while breaking older ones. I will simply remove the declaration and accept the default. Lincoln On Thu, Mar 31, 2011 at 6:59 PM, Scott Cain wrote: > Hi Lincoln, > > It appears that having "type=MyISAM" in the table definitions for > newer MySQL instances is a problem, as I get this when I try to create > a Bio::DB::GFF database: > > You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use > near 'type=MyISAM' at line 6 at > /Library/Perl/5.8.8/Bio/DB/GFF/Adaptor/dbi.pm line 1049. > > Should we change that, or is there a way to get newer mysql servers to > accept it as is? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From jonathan at leto.net Fri Apr 1 12:17:27 2011 From: jonathan at leto.net (Jonathan "Duke" Leto) Date: Fri, 1 Apr 2011 09:17:27 -0700 Subject: [Bioperl-l] git help In-Reply-To: References: Message-ID: Howdy, Here are some PROTIPs that I think will be of use to many people: 1) Beware of "git commit -a". You can use "git show" to look at your most recent commit before you push it. Always make sure it contains what you think it should. 2) As a general rule, you shouldn't do a "git pull" when you have uncommitted changes. You shouldn't merge branches when you have uncommitted changes, either. Another way to get around this is to use the git "stash". Think of it as a cubby hole to put stuff to the side until you need it again. You could have typed git stash git pull --rebase git stash pop As you can see, the stash is stack that you can "push" and "pop". 3) Updating the index and updating the actual files in your directory (called the working copy) can be done independently. The command: git pull --rebase is equivalent to git fetch # sync remote index to local index git rebase origin/master # update working copy with local index The second way which splits it up is useful when you want to do something in between, such as: git fetch git diff master...origin/master # what am I going to pull in? git rebase origin/master I hope this arms BioPerl hackers with a better understanding of Git. If you have more Git questions, feel free to ask. Duke On Thu, Mar 31, 2011 at 4:13 PM, Scott Cain wrote: > Hi all, > > I just want to know if it is safe for me to commit. ?Here is what git > said to me: > > scott-cains-macbook-pro:bioperl-live cain$ git commit > # On branch master > # Your branch is behind 'origin/master' by 47 commits, and can be > fast-forwarded. > # > # Changed but not updated: > # ? (use "git add ..." to update what will be committed) > # ? (use "git checkout -- ..." to discard changes in working directory) > # > # ? ? ? modified: ? Bio/DB/GFF.pm > # ? ? ? modified: ? Bio/FeatureIO/gff.pm > # ? ? ? modified: ? scripts/Bio-DB-GFF/bulk_load_gff.PLS > # ? ? ? modified: ? scripts/Bio-DB-GFF/fast_load_gff.PLS > # ? ? ? modified: ? t/SeqFeature/FeatureIO.t > # ? ? ? modified: ? t/data/knownGene.gff3 > # > # Untracked files: > # ? (use "git add ..." to include in what will be committed) > # > # ? ? ? scripts/Bio-DB-GFF/genbank2gff3.diff > no changes added to commit (use "git add" and/or "git commit -a") > ---------------------- > > so should I do a git commit -a? > > Also, when I try a git pull, this is what it tells me: > > scott-cains-macbook-pro:bioperl-live cain$ git pull > Updating 1c2cf24..d5de022 > error: Your local changes to 'Bio/FeatureIO/gff.pm' would be > overwritten by merge. ?Aborting. > Please, commit your changes or stash them before you can merge. > > Thanks, > Scott > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jonathan "Duke" Leto jonathan at leto.net http://leto.net From lincoln.stein at gmail.com Fri Apr 1 12:57:19 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 1 Apr 2011 12:57:19 -0400 Subject: [Bioperl-l] type=MyISAM & regression in Bio::DB::GFF Message-ID: Hi Scott, I've fixed bioperl live's Bio::DB::GFF type=MyISAM problems, so it will now work correctly on recent Mysqls. During testing of this, I found a regression that you had apparently introduced when you added a warning message about loading GFF3 files: sub print_gff3_warning { my $self = shift; print STDERR <; return; } The <> was gobbling up the very next line in the GFF3 file, regardless of what its contents were. Was there some reason for you to want this behavior, or were you trying to discard the ##gff-version line? In any case, this broke regression tests, so I've removed it. Let me know what was the intent, if I'm missing something. Lincoln On Thu, Mar 31, 2011 at 6:59 PM, Scott Cain wrote: > Hi Lincoln, > > It appears that having "type=MyISAM" in the table definitions for > newer MySQL instances is a problem, as I get this when I try to create > a Bio::DB::GFF database: > > You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use > near 'type=MyISAM' at line 6 at > /Library/Perl/5.8.8/Bio/DB/GFF/Adaptor/dbi.pm line 1049. > > Should we change that, or is there a way to get newer mysql servers to > accept it as is? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From scott at scottcain.net Fri Apr 1 17:03:31 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 1 Apr 2011 15:03:31 -0600 Subject: [Bioperl-l] type=MyISAM & regression in Bio::DB::GFF In-Reply-To: References: Message-ID: <2D4366D2-832A-4B05-AE18-4E3F4DD4782F@scottcain.net> Hi Lincoln, Sorry about the sloppy <> operator. I was thinking of using it to "hit return to continue" but I didn't think too carefully about the context. In retrospect, I don't think it should do that anyway, since if anybody is using one of the loaders in a pipeline, it would break it. Thanks for fixing it. Scott Sent from my iPad On 2011-04-01, at 10:57 AM, Lincoln Stein wrote: > Hi Scott, > > I've fixed bioperl live's Bio::DB::GFF type=MyISAM problems, so it will now work correctly on recent Mysqls. During testing of this, I found a regression that you had apparently introduced when you added a warning message about loading GFF3 files: > > sub print_gff3_warning { > my $self = shift; > print STDERR < > You are loading a Bio::DB::GFF database with GFF3 formatted data. > While this will likely work fine, the Bio::DB::GFF schema does not > always faithfully capture the complexity represented in GFF3 files. > Unless you have a specific reason for using Bio::DB::GFF, we suggest > that you use a Bio::DB::SeqFeature::Store database and its corresponding > loader, bp_seqfeature_load.pl. > > END > ; > > my $h = <>; > > return; > } > > The <> was gobbling up the very next line in the GFF3 file, regardless of what its contents were. Was there some reason for you to want this behavior, or were you trying to discard the ##gff-version line? In any case, this broke regression tests, so I've removed it. Let me know what was the intent, if I'm missing something. > > Lincoln > > On Thu, Mar 31, 2011 at 6:59 PM, Scott Cain wrote: > Hi Lincoln, > > It appears that having "type=MyISAM" in the table definitions for > newer MySQL instances is a problem, as I get this when I try to create > a Bio::DB::GFF database: > > You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use > near 'type=MyISAM' at line 6 at > /Library/Perl/5.8.8/Bio/DB/GFF/Adaptor/dbi.pm line 1049. > > Should we change that, or is there a way to get newer mysql servers to > accept it as is? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa From jovel_juan at hotmail.com Fri Apr 1 19:50:30 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Fri, 1 Apr 2011 23:50:30 +0000 Subject: [Bioperl-l] How to remove rRNAs from libraries In-Reply-To: <1301604855l.1593384l.0l@psu.edu> References: <1301604855l.1593384l.0l@psu.edu> Message-ID: Hello! I am working with Drosophila small RNAs libraries. I wan to know if there are out any good program (or pipeline) to remove rRNAs from those libraries. I have done that with in-house Perl Scripts in the past, but it get too slow when dealing with large libraries. Thanks a lot in advance, JUAN From gylz.mail at gmail.com Fri Apr 1 23:21:32 2011 From: gylz.mail at gmail.com (=?ISO-8859-1?Q?Guillermo_Fern=E1ndez?=) Date: Sat, 2 Apr 2011 05:21:32 +0200 Subject: [Bioperl-l] Failure when parsing a massive Entrez (GenBank) query. Message-ID: Hello, I am trying to extract the CDS sequences for a list of GenBank DNA files. I have used Bio::SeqIO, Bio::DB::Query::GenBank and Bio::DB::GenBank for it. The entrez query is as follows: '(acetolactate synthase[All Fields] AND "green plants"[porgn]) AND "flowering plants"[porgn] AND "complete cds"[All Fields]'; The perl code can be seen in https://gist.github.com/898456(extractFiles.pl) It run smoothly until an error rises (the output is included next to the source. I wrote the output when the line "$stream->verbose(2);" is commented, and uncommented too.) After it, the program dies before all the sequences had been parsed. I do not know how to overcome this kind of errors and resume processing the sequences that remain after that point. In http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO.html#POD4is said that resuming of parsing after catching an exception thrown by "next_seq" cannot be assumed. As consequence, I'm looking for alternatives. I downloaded the list of gi numbers from the NCBI for that query (sequence.gi.txt) and piped it to the script shown in https://gist.github.com/899123 : "cat sequence.gi.txt | xargs ./extractFileByGI.pl". It works and overcomes bad formatted sequences without compromising the remained sequences but it is really slow compared with the first script. Could you suggest me an efficient solution? Thank you in advance. Guillermo. *P.S.* Using extractFiles.pl for a small number of sequences, starting some sequences before the one that seems to fail and ending some sequences after it, results in a surprisely correct run. (Replace the original NCBI query -$queryString- with the sublist of gi numbers separated by white spaces: "115446302 297179983 30693053 223945818 188529638 188529636 188529634 297600179 167118 188529632") Is it a bug? Should it have failed again (as I expected)? From cjfields at illinois.edu Fri Apr 1 23:44:57 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 1 Apr 2011 22:44:57 -0500 Subject: [Bioperl-l] Failure when parsing a massive Entrez (GenBank) query. In-Reply-To: References: Message-ID: <9ADC8E2F-37BF-472A-8410-9CBCDF71B474@illinois.edu> One alternative is to download the raw GenBank files and parse them; it's very possible one of them is breaking the parser (if so, please report it). One way to do this is by using Bio::DB::EUtilities, the cookbook has a few examples: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris On Apr 1, 2011, at 10:21 PM, Guillermo Fern?ndez wrote: > Hello, > > I am trying to extract the CDS sequences for a list of GenBank DNA files. I > have used Bio::SeqIO, Bio::DB::Query::GenBank and Bio::DB::GenBank for it. > The entrez query is as follows: > > '(acetolactate synthase[All Fields] AND "green plants"[porgn]) AND > "flowering plants"[porgn] AND "complete cds"[All Fields]'; > > The perl code can be seen in https://gist.github.com/898456(extractFiles.pl) > > It run smoothly until an error rises (the output is included next to the > source. I wrote the output when the line "$stream->verbose(2);" is > commented, and uncommented too.) After it, the program dies before all the > sequences had been parsed. I do not know how to overcome this kind of errors > and resume processing the sequences that remain after that point. In > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO.html#POD4is > said that resuming of parsing after catching an exception thrown by > "next_seq" cannot be assumed. > > As consequence, I'm looking for alternatives. I downloaded the list of gi > numbers from the NCBI for that query (sequence.gi.txt) and piped it to the > script shown in https://gist.github.com/899123 : "cat sequence.gi.txt | > xargs ./extractFileByGI.pl". It works and overcomes bad formatted sequences > without compromising the remained sequences but it is really slow compared > with the first script. > > Could you suggest me an efficient solution? > > Thank you in advance. > > Guillermo. > > *P.S.* Using extractFiles.pl for a small number of sequences, starting some > sequences before the one that seems to fail and ending some sequences after > it, results in a surprisely correct run. (Replace the original NCBI query > -$queryString- with the sublist of gi numbers separated by white spaces: > "115446302 297179983 30693053 223945818 188529638 188529636 188529634 > 297600179 167118 188529632") Is it a bug? Should it have failed again (as I > expected)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat Apr 2 14:42:21 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 2 Apr 2011 20:42:21 +0200 Subject: [Bioperl-l] git help In-Reply-To: References: Message-ID: Hey thanks so much, Duke! Very helpful! Dave On Fri, Apr 1, 2011 at 18:17, Jonathan "Duke" Leto wrote: > Howdy, > > Here are some PROTIPs that I think will be of use to many people: > > 1) Beware of "git commit -a". You can use "git show" to look at your most > recent > commit before you push it. Always make sure it contains what you think > it should. > > 2) As a general rule, you shouldn't do a "git pull" when you have > uncommitted > changes. You shouldn't merge branches when you have uncommitted changes, > either. Another way to get around this is to use the git "stash". Think of > it > as a cubby hole to put stuff to the side until you need it again. > > You could have typed > > git stash > git pull --rebase > git stash pop > > As you can see, the stash is stack that you can "push" and "pop". > > 3) Updating the index and updating the actual files in your directory > (called > the working copy) can be done independently. > > The command: > git pull --rebase > > is equivalent to > > git fetch # sync remote index to local index > git rebase origin/master # update working copy with local index > > The second way which splits it up is useful when you want to do something > in between, such as: > > git fetch > git diff master...origin/master # what am I going to pull in? > git rebase origin/master > > I hope this arms BioPerl hackers with a better understanding of Git. If you > have more Git questions, feel free to ask. > > Duke > > > On Thu, Mar 31, 2011 at 4:13 PM, Scott Cain wrote: > > Hi all, > > > > I just want to know if it is safe for me to commit. Here is what git > > said to me: > > > > scott-cains-macbook-pro:bioperl-live cain$ git commit > > # On branch master > > # Your branch is behind 'origin/master' by 47 commits, and can be > > fast-forwarded. > > # > > # Changed but not updated: > > # (use "git add ..." to update what will be committed) > > # (use "git checkout -- ..." to discard changes in working > directory) > > # > > # modified: Bio/DB/GFF.pm > > # modified: Bio/FeatureIO/gff.pm > > # modified: scripts/Bio-DB-GFF/bulk_load_gff.PLS > > # modified: scripts/Bio-DB-GFF/fast_load_gff.PLS > > # modified: t/SeqFeature/FeatureIO.t > > # modified: t/data/knownGene.gff3 > > # > > # Untracked files: > > # (use "git add ..." to include in what will be committed) > > # > > # scripts/Bio-DB-GFF/genbank2gff3.diff > > no changes added to commit (use "git add" and/or "git commit -a") > > ---------------------- > > > > so should I do a git commit -a? > > > > Also, when I try a git pull, this is what it tells me: > > > > scott-cains-macbook-pro:bioperl-live cain$ git pull > > Updating 1c2cf24..d5de022 > > error: Your local changes to 'Bio/FeatureIO/gff.pm' would be > > overwritten by merge. Aborting. > > Please, commit your changes or stash them before you can merge. > > > > Thanks, > > Scott > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Jonathan "Duke" Leto > jonathan at leto.net > http://leto.net > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Apr 4 15:05:15 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Apr 2011 14:05:15 -0500 Subject: [Bioperl-l] Bio::DB::Fasta bug Message-ID: All, I just added a BioPerl bug fix for this issue: https://redmine.open-bio.org/issues/3172 which catches cases where no match occurs in a sequence header for the regex '>(\S+)' and throws a proper exception (previously it treated the line as part of the sequence). My concern is performance issues with the fix, particularly with regards to Bio::DB::Sam and other code using Bio::DB::Fasta. If there is a significant performance issue let me know and I can revert those. chris From rachitasharma at gmail.com Mon Apr 4 15:24:33 2011 From: rachitasharma at gmail.com (Rachita Sharma) Date: Mon, 4 Apr 2011 13:24:33 -0600 Subject: [Bioperl-l] Genbank to GFF conversion Message-ID: I would like to report a bug: The Genbank to GFF converter does not calculate phases, specially for CDS which is required for GFF validation. Regards, Rachita From jason at bioperl.org Mon Apr 4 20:48:08 2011 From: jason at bioperl.org (Jason Stajich) Date: Mon, 04 Apr 2011 17:48:08 -0700 Subject: [Bioperl-l] Genbank to GFF conversion In-Reply-To: References: Message-ID: <4D9A66C8.1000303@bioperl.org> This is now an issue in redmine where we track bugs. https://redmine.open-bio.org/issues/3195 Rachita Sharma wrote: > I would like to report a bug: The Genbank to GFF converter does not > calculate phases, specially for CDS which is required for GFF > validation. -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From christopher.moy at gsk.com Tue Apr 5 09:06:13 2011 From: christopher.moy at gsk.com (Chris_Moy) Date: Tue, 5 Apr 2011 06:06:13 -0700 (PDT) Subject: [Bioperl-l] Pull Codon/Amino Acid Change from Genome Reference Data Message-ID: <31323999.post@talk.nabble.com> Hi, I am working with high throughput sequencing data and I have the following pieces of data: Reference Genome (36.1) HUGO Gene Name (e.g. 'BRCA1') The chromosome and genomic coordinate (eg. 'chr7', '15037494') The nature of the change (missense, silent, etc) The base pair change (eg. from 'A' to 'G') However, I do not have the specific amino acid change at that location for the gene. It's been awhile since I've used Bioperl but I am certainly willing to shake off some rust. If I could get some suggestions on how to best approach this that would be a great help. Thanks. Chris -- View this message in context: http://old.nabble.com/Pull-Codon-Amino-Acid-Change-from-Genome-Reference-Data-tp31323999p31323999.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Cynthia.Page at Colorado.EDU Tue Apr 5 13:50:28 2011 From: Cynthia.Page at Colorado.EDU (pageski) Date: Tue, 5 Apr 2011 10:50:28 -0700 (PDT) Subject: [Bioperl-l] not recognizing hsp in a hit Message-ID: <31326619.post@talk.nabble.com> Hi I am new to this forum so please forgive me if I make faux pas! I am parsing a BLAST report using bioperl for some reason one of my hits which list three hsp is not recognizing them as hsp and during the parsing they are getting skipped over. I have print to screen output showing two genes the first is the one I am referring to and the second is one that is working just for your comparison. I will also paste part of the BLAST report for those two entries. I will not paste the code at this time, but will gladly if it is needed. I am thinking that perhaps I am misunderstanding what an hsp is ... that is why I am first showing you these two pieces of informations. Thanks for any help! ############################################################## I have pasted screen output from my script I use to trouble shoot plus part of a BLAT output in Blast format. From the BLAST output you can see I have three hits on Chr3 it seems to me that this should be three hsp's however these are not considered hsp for some reason yet YCL068 is - I pasted part of that report below YCL067 ############screen output########################################## YCL067C C 3 chr3 This is the number of hsps: 0 for YCL067C Number of hsp == :0 for YCL067C This is the hitscore: 1222 for name: YCL067C This is the highesthit score: 1222 for Name: YCL067C YCL067C YCL068C C 3 chr3 This is the number of hsps: 3 for YCL068C This is the number of hsps:3 for name: YCL068C I am in the if hsps >0 and evaluating YCL068C This is the number of hits:3 for name: YCL068C This is the hitscore: 1508 for name: YCL068C This is the highesthit score: 1508 for Name: YCL068C This is the percent of hsp percentid:99.6173469387755 ###################################################### ###################Part of Blast report for YCL067C################ Query= YCL067C (633 letters) Database: /data/genomes/Sigma1278b/sigmav7.fa 16 sequences; 11,945,947 total letters Searching.done Score E Sequences producing significant alignments: (bits) Value chr3 1222 0.0 chr3 1222 0.0 chr3 1078 0.0 >chr3 Length = 319572 Score = 1222 bits (3153), Expect = 0.0 Identities = 633/633 (100%) Strand = Minus / Plus ############################################ Query= YCL068C (783 letters) Database: /data/genomes/Sigma1278b/sigmav7.fa 16 sequences; 11,945,947 total letters Searching.done Score E Sequences producing significant alignments: (bits) Value chr3 1508 0.0 chr3 1505 0.0 chr3 91 1e-18 >chr3 Length = 319572 Score = 1508 bits (3891), Expect = 0.0 Identities = 781/784 (100%) Strand = Minus / Plus -- View this message in context: http://old.nabble.com/not-recognizing-hsp-in-a-hit-tp31326619p31326619.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Wed Apr 6 09:57:32 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 6 Apr 2011 15:57:32 +0200 Subject: [Bioperl-l] not recognizing hsp in a hit In-Reply-To: <31326619.post@talk.nabble.com> References: <31326619.post@talk.nabble.com> Message-ID: Hi Cynthia, Hmm, curious. Thanks for including your script's output and parts of the blast reports. I would like to see more if I could, though. Would you please go ahead and include your script and the complete blast reports for those two genes? If the files are too big to attach to email, then you can create a bug report and attach the files to that. Instructions are here: http://www.bioperl.org/wiki/Bugs Dave On Tue, Apr 5, 2011 at 19:50, pageski wrote: > > Hi I am new to this forum so please forgive me if I make faux pas! > > I am parsing a BLAST report using bioperl for some reason one of my hits > which list three hsp is not recognizing them as hsp and during the parsing > they are getting skipped over. I have print to screen output showing two > genes the first is the one I am referring to and the second is one that is > working just for your comparison. I will also paste part of the BLAST > report > for those two entries. > > I will not paste the code at this time, but will gladly if it is needed. I > am thinking that perhaps I am misunderstanding what an hsp is ... that is > why I am first showing you these two pieces of informations. > > Thanks for any help! > ############################################################## > I have pasted screen output from my script I use to trouble shoot plus part > of > a BLAT output in Blast format. From the BLAST output you can see I have > three hits on Chr3 > it seems to me that this should be three hsp's however these are not > considered hsp for some reason > yet YCL068 is - I pasted part of that report below YCL067 > > ############screen output########################################## > YCL067C > C > 3 > chr3 > This is the number of hsps: 0 for YCL067C > Number of hsp == :0 for YCL067C > This is the hitscore: 1222 for name: YCL067C > This is the highesthit score: 1222 for Name: YCL067C > YCL067C > > > YCL068C > C > 3 > chr3 > This is the number of hsps: 3 for YCL068C > This is the number of hsps:3 for name: YCL068C > I am in the if hsps >0 and evaluating YCL068C > This is the number of hits:3 for name: YCL068C > This is the hitscore: 1508 for name: YCL068C > This is the highesthit score: 1508 for Name: YCL068C > This is the percent of hsp percentid:99.6173469387755 > ###################################################### > > ###################Part of Blast report for YCL067C################ > > > Query= YCL067C > (633 letters) > > Database: /data/genomes/Sigma1278b/sigmav7.fa > 16 sequences; 11,945,947 total letters > > Searching.done > Score E > Sequences producing significant alignments: (bits) > Value > > chr3 1222 > 0.0 > chr3 1222 > 0.0 > chr3 1078 > 0.0 > > > > >chr3 > Length = 319572 > > Score = 1222 bits (3153), Expect = 0.0 > Identities = 633/633 (100%) > Strand = Minus / Plus > ############################################ > Query= YCL068C > (783 letters) > > Database: /data/genomes/Sigma1278b/sigmav7.fa > 16 sequences; 11,945,947 total letters > > Searching.done > Score E > Sequences producing significant alignments: (bits) > Value > > chr3 1508 > 0.0 > chr3 1505 > 0.0 > chr3 91 > 1e-18 > > > > >chr3 > Length = 319572 > > Score = 1508 bits (3891), Expect = 0.0 > Identities = 781/784 (100%) > Strand = Minus / Plus > > -- > View this message in context: > http://old.nabble.com/not-recognizing-hsp-in-a-hit-tp31326619p31326619.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sydghyyh14 at yahoo.com.cn Wed Apr 6 11:52:21 2011 From: sydghyyh14 at yahoo.com.cn (sydghyyh14) Date: Wed, 6 Apr 2011 23:52:21 +0800 Subject: [Bioperl-l] How to install Bio::Tools::HMM ? Message-ID: <201104062352212038464@yahoo.com.cn> I download it from https://github.com/bioperl/Bio-Tools-HMM, After I run the command : perl Build.PL I got message like this: ! Bio::Root::Version (1.006001) is installed, but we need version >= 1.006009 I have install the latest version of bioperl, how can I update Bio::Root to 1.006009 ? 2011-04-06 sydghyyh14 From cjfields at illinois.edu Wed Apr 6 12:06:25 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 Apr 2011 11:06:25 -0500 Subject: [Bioperl-l] How to install Bio::Tools::HMM ? In-Reply-To: <201104062352212038464@yahoo.com.cn> References: <201104062352212038464@yahoo.com.cn> Message-ID: <9FE6092E-F7C7-4173-B8E1-17D601C5744F@illinois.edu> You need the latest bioperl-live on github: https://github.com/bioperl/bioperl-live chris On Apr 6, 2011, at 10:52 AM, sydghyyh14 wrote: > > I download it from https://github.com/bioperl/Bio-Tools-HMM, After I run the command : perl Build.PL > I got message like this: > ! Bio::Root::Version (1.006001) is installed, but we need version >= 1.006009 > I have install the latest version of bioperl, how can I update Bio::Root to 1.006009 ? > 2011-04-06 > > > > sydghyyh14 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Apr 6 12:10:40 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 Apr 2011 11:10:40 -0500 Subject: [Bioperl-l] Bio::DB::Fasta bug In-Reply-To: References: Message-ID: <582E8280-8014-4AC8-859F-B3FFFFD55371@illinois.edu> All, Okay, assuming this is kosher to be included in the next CPAN release. I'll close the bug report out. chris On Apr 4, 2011, at 2:05 PM, Chris Fields wrote: > All, > > I just added a BioPerl bug fix for this issue: > > https://redmine.open-bio.org/issues/3172 > > which catches cases where no match occurs in a sequence header for the regex '>(\S+)' and throws a proper exception (previously it treated the line as part of the sequence). My concern is performance issues with the fix, particularly with regards to Bio::DB::Sam and other code using Bio::DB::Fasta. If there is a significant performance issue let me know and I can revert those. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hufeiyc at gmail.com Wed Apr 6 14:06:17 2011 From: hufeiyc at gmail.com (Fei Hu) Date: Wed, 6 Apr 2011 14:06:17 -0400 Subject: [Bioperl-l] Summer of Code Proposal Message-ID: Hi all, Below is my GoC 2011 proposal that describes my plan and thoughts. As time is really tight now, I need your advice to make it more realistic and reasonable. Appreciate your time for reviewing. Also I am looking for a mentor who is interested in this project and willing to guide me through the summer. Best Fei PS: Thanks Chris Fields for your valuable suggestion. Name Fei HU Address Rm. 3D-11, Swearingen Engineering Building, University of South Carolina Email hufeiyc at gmail.com Why you are interested in the project you are proposing and are well-suited to undertake it. I like to use Perl to organize and automate the pipeline, starting from extracting data, run various packages and analysis results. And I would like more people to know its virtue and make use of it. Bio-Perl provides us a perfect platform. My current research is about gene order phylogeny reconstruction following maximum likelihood criteria(others includes MP and NJ based). My phylogeny inference pipeline involves using RAxML to build a ML tree and estimating the internal(ancestral) sequence using PAML. While baseml of PAML is well-supported, RAxML is not yet available. Although I wrote my own wrap for RAxML, it?s even better for Bio-Perl to wrap RAxML so that everyone can use easily. I extensively used and also modified the source to fit RAxML to analysis gene order data. With a good understanding of Perl and RAxML, what?s more, the willing to make Bio-perl better, I am prepared to undertake it. Programs or projects you have previously authored or contributed to I implemented the algorithm using Perl[1](open source). And I also use and learn Perl in daily bases. A project plan for the project you are proposing The wrap should be consistent with the other existing packages supported by Tools::Run in style and api. I plan to it to full-fill most popular functionality RAxML currently provide. 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates Sequence analysis (0-9A-V, 32 characters, available models are: ORDERED, MK, GTR), this is useful for morphological data. 2. DNA analysis and Amino Acid analysis, given custom transition matrix(AA only), rate heterogeneity. 4. Conduct standard bootstrapping and rapid bootstrapping as well as the final through inference[2] as well as the relative new bootstopping. 5. Given user starting tree or incomplete constrain tree. 6. Specify a column weight file name to assign individual weights to each column of the alignment. 7. Specify an exclude file name, that contains a specification of alignment positions you wish to exclude. 8. Automatically generate random seed for the program. 9. And more to be added. Others plan that may benefit user. 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential phylip format so that RAxML can read. 2. Design a set of more understandable commands, such as use ?--model? instead of ?-P? to specify a custom model file. use ?--workingdir? instead of ?-w? to specify the working directory. But still one can use the old style according to their own preference. 3. Implement more sophisticated exception handler and running mode summary. There is huge combination of arguments that can cause error. For example, to enable a rapid bootstrapping plus a thorough inference, one needs to give ?-f a? ?-x {random seed}? together with the number of replicates ?-# {number}?, if anyone is missing, RAxML won?t tell at once that these three are all necessary, instead RAxML usually can only tell the ?nearest? error it can spot. In my plan if one wants to conduct a RBS plus inference, the wrap is able to inform user that all those three are necessary and then guides to correct it.In sum, I plan to dig the errors from source code and group them in accordance to their functionality. So each error message will no longer be independent. Another ?trivial? thoughts is when the running-id already exists, RAxML will exit directly without choice, this would be disturbing if overwrite is fine, I suggest to use a switch to define the behavior(overwrite, add a post-fix to name, exit, skip this run). 4. Preliminary post-processing can be conducted and afterward returned as a value or list. Output the maximum likelihood scores for each bootstrapped tree. Enumerate branches that have confidence value larger than a threthold. Return a hash table containing branch lengths and running time, final ML score.More analysis could be done by other package anyway. Any obligations, vacations, or plans for the summer that may require scheduling during the GSoC work period. No special obligations and vacations. [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted. [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML web-servers. Syst. Biol. 2008, 75:758?771. From lcpaulet at googlemail.com Wed Apr 6 14:31:06 2011 From: lcpaulet at googlemail.com (Lorenzo Carretero) Date: Wed, 6 Apr 2011 20:31:06 +0200 Subject: [Bioperl-l] Parsing BLAST -m 9 reports Message-ID: Dear all, I'm trying to parse several blast reports in tabular format -m 9. when looping through the hsps I always get the following message: --------------------- WARNING --------------------- MSG: Did not define the number of conserved matches in the HSP; assuming conserved == identical (312) --------------------------------------------------- Thanks for your help, Lorenzo From David.Messina at sbc.su.se Thu Apr 7 04:51:48 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 7 Apr 2011 10:51:48 +0200 Subject: [Bioperl-l] Summer of Code Proposal In-Reply-To: References: Message-ID: <731EE04A-8A33-43E6-A2E5-CDA644111227@sbc.su.se> Hi, Looking pretty good, particularly the project plan section. You might also add some text in your introduction which shows the importance of RaxML. Say that it's widely used and demonstrate that with number of citations, number of downloads, or similar data. Also, there are some small English mistakes (for example wrap instead of wrapper, provide instead of provides), so ask a native English speaker to do some editing. Good luck! I'd love to see this happen. Dave On Apr 6, 2011, at 20:06, Fei Hu wrote: > Hi all, > > Below is my GoC 2011 proposal that describes my plan and thoughts. > As time is really tight now, I need your advice to make it more realistic > and reasonable. > Appreciate your time for reviewing. > Also I am looking for a mentor who is interested in this project and willing > to guide me through the summer. > > Best > Fei > > PS: Thanks Chris Fields for your valuable suggestion. > > > Name Fei HU > Address Rm. 3D-11, Swearingen Engineering Building, University of South > Carolina > Email hufeiyc at gmail.com > > Why you are interested in the project you are proposing and are well-suited > to undertake it. > I like to use Perl to organize and automate the pipeline, starting from > extracting data, run various packages and analysis results. And I would like > more people to know its virtue and make use of it. Bio-Perl provides us a > perfect platform. > My current research is about gene order phylogeny reconstruction following > maximum likelihood criteria(others includes MP and NJ based). My phylogeny > inference pipeline involves using RAxML to build a ML tree and estimating > the internal(ancestral) sequence using PAML. While baseml of PAML is > well-supported, RAxML is not yet available. Although I wrote my own wrap for > RAxML, it?s even better for Bio-Perl to wrap RAxML so that everyone can use > easily. > I extensively used and also modified the source to fit RAxML to analysis > gene order data. With a good understanding of Perl and RAxML, what?s more, > the willing to make Bio-perl better, I am prepared to undertake it. > Programs or projects you have previously authored or contributed to > I implemented the algorithm using Perl[1](open source). And I also use and > learn Perl in daily bases. > A project plan for the project you are proposing > The wrap should be consistent with the other existing packages supported by > Tools::Run in style and api. I plan to it to full-fill most popular > functionality RAxML currently provide. > 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates > Sequence analysis (0-9A-V, 32 characters, available models are: ORDERED, MK, > GTR), this is useful for morphological data. > 2. DNA analysis and Amino Acid analysis, given custom transition matrix(AA > only), rate heterogeneity. > 4. Conduct standard bootstrapping and rapid bootstrapping as well as the > final through inference[2] as well as the relative new bootstopping. > 5. Given user starting tree or incomplete constrain tree. > 6. Specify a column weight file name to assign individual weights to each > column of the alignment. > 7. Specify an exclude file name, that contains a specification of alignment > positions you wish to exclude. > 8. Automatically generate random seed for the program. > 9. And more to be added. > Others plan that may benefit user. > 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential phylip > format so that RAxML can read. > 2. Design a set of more understandable commands, such as > use ?--model? instead of ?-P? to specify a custom model file. > use ?--workingdir? instead of ?-w? to specify the working directory. > But still one can use the old style according to their own preference. > 3. Implement more sophisticated exception handler and running mode summary. > There is huge combination of arguments that can cause error. For example, to > enable a rapid bootstrapping plus a thorough inference, one needs to give > ?-f a? ?-x {random seed}? together with the number of replicates ?-# > {number}?, if anyone is missing, RAxML won?t tell at once that these three > are all necessary, instead RAxML usually can only tell the ?nearest? error > it can spot. In my plan if one wants to conduct a RBS plus inference, the > wrap is able to inform user that all those three are necessary and then > guides to correct it.In sum, I plan to dig the errors from source code and > group them in accordance to their functionality. So each error message will > no longer be independent. > Another ?trivial? thoughts is when the running-id already exists, RAxML will > exit directly without choice, this would be disturbing if overwrite is fine, > I suggest to use a switch to define the behavior(overwrite, add a post-fix > to name, exit, skip this run). > 4. Preliminary post-processing can be conducted and afterward returned as a > value or list. Output the maximum likelihood scores for each bootstrapped > tree. Enumerate branches that have confidence value larger than a threthold. > Return a hash table containing branch lengths and running time, final ML > score.More analysis could be done by other package anyway. > > Any obligations, vacations, or plans for the summer that may require > scheduling during the GSoC work period. > No special obligations and vacations. > > > [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic > Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted. > [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the > RAxML web-servers. Syst. Biol. 2008, 75:758?771. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Apr 7 05:02:26 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 7 Apr 2011 11:02:26 +0200 Subject: [Bioperl-l] not recognizing hsp in a hit In-Reply-To: <08FD8735-E577-4925-BD5D-DD5D1A9426AE@colorado.edu> References: <31326619.post@talk.nabble.com> <08FD8735-E577-4925-BD5D-DD5D1A9426AE@colorado.edu> Message-ID: Hi Cynthia, Please remember to reply all so this discussion stays on the list. It may very well be a problem with the blat parser so please submit: - your blat output - your script - the output from your script to our bug tracker. The reason we need your script is that it will allow us to confirm what you're seeing. More importantly, in order to fix problems like this, we need a test case, and your code will form the basis for that. So it would be a big help for us. Thanks! Dave On Apr 6, 2011, at 18:36, Cynthia Lee Page wrote: > HI Dave - > > I was talking to my boss about this and a couple things she pointed out to me: first this is a blast report but I actually ran blat on two closely related organisms and used the option blat -out = blast. So in reality this is not an actual blast report and it is my understanding that due to the fact reporting of hsp can be effected as follows, when hsp's are on the same chromosome it will report them as separate hits. > > This is what happen in both cases I presented. In YCL068C, 3 hits, 3 hsp and all are on Chr3. In the case of YCL067C 3 hits were reported and no hsp which is the curious part. In both cases the two highest hsp were roughly the same distance apart on the chromosome the highest around 200,000 and the second around 15,000. > > Perhaps this is sufficiently far apart on the chromosome for situation the my boss was referring to. I still have the problem however of not having the hsp's recognized. So I will proceed by just blatting that gene against the query and see if the report changes. If not, hmm. > > I am a newbie with perl so it certainly could be a problem with the script, but since other genes are parsing as expected it is more likely that the problem is due to the nature of the blat report. In the next few days I will redo the blat on that gene alone and let you know what happen, if you would like. > > Any comments you may have are welcome > > > Thanks a lot, > > Cynthia > > On Apr 6, 2011, at 7:57 AM, Dave Messina wrote: > >> Hi Cynthia, >> >> Hmm, curious. >> >> Thanks for including your script's output and parts of the blast reports. I would like to see more if I could, though. Would you please go ahead and include your script and the complete blast reports for those two genes? If the files are too big to attach to email, then you can create a bug report and attach the files to that. >> >> Instructions are here: >> http://www.bioperl.org/wiki/Bugs >> >> >> Dave >> >> >> >> >> >> On Tue, Apr 5, 2011 at 19:50, pageski wrote: >> >> Hi I am new to this forum so please forgive me if I make faux pas! >> >> I am parsing a BLAST report using bioperl for some reason one of my hits >> which list three hsp is not recognizing them as hsp and during the parsing >> they are getting skipped over. I have print to screen output showing two >> genes the first is the one I am referring to and the second is one that is >> working just for your comparison. I will also paste part of the BLAST report >> for those two entries. >> >> I will not paste the code at this time, but will gladly if it is needed. I >> am thinking that perhaps I am misunderstanding what an hsp is ... that is >> why I am first showing you these two pieces of informations. >> >> Thanks for any help! >> ############################################################## >> I have pasted screen output from my script I use to trouble shoot plus part >> of >> a BLAT output in Blast format. From the BLAST output you can see I have >> three hits on Chr3 >> it seems to me that this should be three hsp's however these are not >> considered hsp for some reason >> yet YCL068 is - I pasted part of that report below YCL067 >> >> ############screen output########################################## >> YCL067C >> C >> 3 >> chr3 >> This is the number of hsps: 0 for YCL067C >> Number of hsp == :0 for YCL067C >> This is the hitscore: 1222 for name: YCL067C >> This is the highesthit score: 1222 for Name: YCL067C >> YCL067C >> >> >> YCL068C >> C >> 3 >> chr3 >> This is the number of hsps: 3 for YCL068C >> This is the number of hsps:3 for name: YCL068C >> I am in the if hsps >0 and evaluating YCL068C >> This is the number of hits:3 for name: YCL068C >> This is the hitscore: 1508 for name: YCL068C >> This is the highesthit score: 1508 for Name: YCL068C >> This is the percent of hsp percentid:99.6173469387755 >> ###################################################### >> >> ###################Part of Blast report for YCL067C################ >> >> >> Query= YCL067C >> (633 letters) >> >> Database: /data/genomes/Sigma1278b/sigmav7.fa >> 16 sequences; 11,945,947 total letters >> >> Searching.done >> Score E >> Sequences producing significant alignments: (bits) >> Value >> >> chr3 1222 >> 0.0 >> chr3 1222 >> 0.0 >> chr3 1078 >> 0.0 >> >> >> >> >chr3 >> Length = 319572 >> >> Score = 1222 bits (3153), Expect = 0.0 >> Identities = 633/633 (100%) >> Strand = Minus / Plus >> ############################################ >> Query= YCL068C >> (783 letters) >> >> Database: /data/genomes/Sigma1278b/sigmav7.fa >> 16 sequences; 11,945,947 total letters >> >> Searching.done >> Score E >> Sequences producing significant alignments: (bits) >> Value >> >> chr3 1508 >> 0.0 >> chr3 1505 >> 0.0 >> chr3 91 >> 1e-18 >> >> >> >> >chr3 >> Length = 319572 >> >> Score = 1508 bits (3891), Expect = 0.0 >> Identities = 781/784 (100%) >> Strand = Minus / Plus >> >> -- >> View this message in context: http://old.nabble.com/not-recognizing-hsp-in-a-hit-tp31326619p31326619.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From David.Messina at sbc.su.se Thu Apr 7 05:28:47 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 7 Apr 2011 11:28:47 +0200 Subject: [Bioperl-l] not recognizing hsp in a hit In-Reply-To: References: <31326619.post@talk.nabble.com> <08FD8735-E577-4925-BD5D-DD5D1A9426AE@colorado.edu> Message-ID: I should also say: I wouldn't be surprised if the blast-style output that Blat produces is slightly nonstandard and causes our parser to make mistakes. By the way, which version of Blat are you using? It looks like Jim Kent has taken Blat at least semi-commercial: http://www.kentinformatics.com/products.html If there's a new version of blat, I don't think BioPerl has seen it yet. Can anyone confirm or deny recent Blat output format changes? Dave From hufeiyc at gmail.com Thu Apr 7 09:08:46 2011 From: hufeiyc at gmail.com (Fei Hu) Date: Thu, 7 Apr 2011 09:08:46 -0400 Subject: [Bioperl-l] Summer of Code Proposal In-Reply-To: <731EE04A-8A33-43E6-A2E5-CDA644111227@sbc.su.se> References: <731EE04A-8A33-43E6-A2E5-CDA644111227@sbc.su.se> Message-ID: Messina : I corrected some written mistakes. Also I added a new whole section talking about the RAxML and comparing it to others. Thank you so much. Best Fei On Thu, Apr 7, 2011 at 4:51 AM, Dave Messina wrote: > Hi, > > Looking pretty good, particularly the project plan section. > > You might also add some text in your introduction which shows the > importance of RaxML. Say that it's widely used and demonstrate that with > number of citations, number of downloads, or similar data. > > Also, there are some small English mistakes (for example wrap instead of > wrapper, provide instead of provides), so ask a native English speaker to do > some editing. > > Good luck! I'd love to see this happen. > > Dave > > > On Apr 6, 2011, at 20:06, Fei Hu wrote: > > > Hi all, > > > > Below is my GoC 2011 proposal that describes my plan and thoughts. > > As time is really tight now, I need your advice to make it more realistic > > and reasonable. > > Appreciate your time for reviewing. > > Also I am looking for a mentor who is interested in this project and > willing > > to guide me through the summer. > > > > Best > > Fei > > > > PS: Thanks Chris Fields for your valuable suggestion. > > > > > > Name Fei HU > > Address Rm. 3D-11, Swearingen Engineering Building, University of South > > Carolina > > Email hufeiyc at gmail.com > > > > Why you are interested in the project you are proposing and are > well-suited > > to undertake it. > > I like to use Perl to organize and automate the pipeline, starting from > > extracting data, run various packages and analysis results. And I would > like > > more people to know its virtue and make use of it. Bio-Perl provides us a > > perfect platform. > > My current research is about gene order phylogeny reconstruction > following > > maximum likelihood criteria(others includes MP and NJ based). My > phylogeny > > inference pipeline involves using RAxML to build a ML tree and estimating > > the internal(ancestral) sequence using PAML. While baseml of PAML is > > well-supported, RAxML is not yet available. Although I wrote my own wrap > for > > RAxML, it?s even better for Bio-Perl to wrap RAxML so that everyone can > use > > easily. > > I extensively used and also modified the source to fit RAxML to analysis > > gene order data. With a good understanding of Perl and RAxML, what?s > more, > > the willing to make Bio-perl better, I am prepared to undertake it. > > Programs or projects you have previously authored or contributed to > > I implemented the algorithm using Perl[1](open source). And I also use > and > > learn Perl in daily bases. > > A project plan for the project you are proposing > > The wrap should be consistent with the other existing packages supported > by > > Tools::Run in style and api. I plan to it to full-fill most popular > > functionality RAxML currently provide. > > 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates > > Sequence analysis (0-9A-V, 32 characters, available models are: ORDERED, > MK, > > GTR), this is useful for morphological data. > > 2. DNA analysis and Amino Acid analysis, given custom transition > matrix(AA > > only), rate heterogeneity. > > 4. Conduct standard bootstrapping and rapid bootstrapping as well as the > > final through inference[2] as well as the relative new bootstopping. > > 5. Given user starting tree or incomplete constrain tree. > > 6. Specify a column weight file name to assign individual weights to each > > column of the alignment. > > 7. Specify an exclude file name, that contains a specification of > alignment > > positions you wish to exclude. > > 8. Automatically generate random seed for the program. > > 9. And more to be added. > > Others plan that may benefit user. > > 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential > phylip > > format so that RAxML can read. > > 2. Design a set of more understandable commands, such as > > use ?--model? instead of ?-P? to specify a custom model file. > > use ?--workingdir? instead of ?-w? to specify the working directory. > > But still one can use the old style according to their own preference. > > 3. Implement more sophisticated exception handler and running mode > summary. > > There is huge combination of arguments that can cause error. For example, > to > > enable a rapid bootstrapping plus a thorough inference, one needs to give > > ?-f a? ?-x {random seed}? together with the number of replicates ?-# > > {number}?, if anyone is missing, RAxML won?t tell at once that these > three > > are all necessary, instead RAxML usually can only tell the ?nearest? > error > > it can spot. In my plan if one wants to conduct a RBS plus inference, the > > wrap is able to inform user that all those three are necessary and then > > guides to correct it.In sum, I plan to dig the errors from source code > and > > group them in accordance to their functionality. So each error message > will > > no longer be independent. > > Another ?trivial? thoughts is when the running-id already exists, RAxML > will > > exit directly without choice, this would be disturbing if overwrite is > fine, > > I suggest to use a switch to define the behavior(overwrite, add a > post-fix > > to name, exit, skip this run). > > 4. Preliminary post-processing can be conducted and afterward returned as > a > > value or list. Output the maximum likelihood scores for each > bootstrapped > > tree. Enumerate branches that have confidence value larger than a > threthold. > > Return a hash table containing branch lengths and running time, final ML > > score.More analysis could be done by other package anyway. > > > > Any obligations, vacations, or plans for the summer that may require > > scheduling during the GSoC work period. > > No special obligations and vacations. > > > > > > [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic > > Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted. > > [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for > the > > RAxML web-servers. Syst. Biol. 2008, 75:758?771. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- *Fei Hu Bioinformatics Lab 3D-11 Swearingen Building U of South Carolina Tel: 803-397-5240* From cjfields at illinois.edu Thu Apr 7 10:28:02 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Apr 2011 09:28:02 -0500 Subject: [Bioperl-l] Summer of Code Proposal In-Reply-To: References: <731EE04A-8A33-43E6-A2E5-CDA644111227@sbc.su.se> Message-ID: <9D0D3468-EDAF-49B8-9239-FA33A3005A28@illinois.edu> Fei, A few things. Most important: 1) You should have a rough timeline (with actual dates) for the summer project, based on Google's events calendar (http://www.google-melange.com/gsoc/events/google/gsoc2011). This should include start of coding as well as some general timeline of how you plan on implementing your wrappers and other related code. 2) 'Deliverables' are needed. How does BioPerl benefit from this? What do we get as a result of this endeavor? A few comments on the proposal: 1) Wrappers, by themselves, aren't necessarily difficult to write up. The tough part is getting Bio::* objects to work with the wrapped executable and parsing output, all the while ensuring the current classes within BioPerl can deal with the data in a meaningful way. I haven't seen that described. 2) How would you want to deal with very large data sets using SeqIO? Or would it be better to use something like an indexed flatfile, or seqs stored in a database? 3) How do you plan on dealing with multi-state or binary state data? I don't think there are classes that handle this data (yet), or handle it well w/o significant hackery. hint: maybe that can be rectified... chris On Apr 7, 2011, at 8:08 AM, Fei Hu wrote: > Messina : > > I corrected some written mistakes. > Also I added a new whole section talking about the RAxML and comparing it to > others. > Thank you so much. > > Best > Fei > > On Thu, Apr 7, 2011 at 4:51 AM, Dave Messina wrote: > >> Hi, >> >> Looking pretty good, particularly the project plan section. >> >> You might also add some text in your introduction which shows the >> importance of RaxML. Say that it's widely used and demonstrate that with >> number of citations, number of downloads, or similar data. >> >> Also, there are some small English mistakes (for example wrap instead of >> wrapper, provide instead of provides), so ask a native English speaker to do >> some editing. >> >> Good luck! I'd love to see this happen. >> >> Dave >> >> >> On Apr 6, 2011, at 20:06, Fei Hu wrote: >> >>> Hi all, >>> >>> Below is my GoC 2011 proposal that describes my plan and thoughts. >>> As time is really tight now, I need your advice to make it more realistic >>> and reasonable. >>> Appreciate your time for reviewing. >>> Also I am looking for a mentor who is interested in this project and >> willing >>> to guide me through the summer. >>> >>> Best >>> Fei >>> >>> PS: Thanks Chris Fields for your valuable suggestion. >>> >>> >>> Name Fei HU >>> Address Rm. 3D-11, Swearingen Engineering Building, University of South >>> Carolina >>> Email hufeiyc at gmail.com >>> >>> Why you are interested in the project you are proposing and are >> well-suited >>> to undertake it. >>> I like to use Perl to organize and automate the pipeline, starting from >>> extracting data, run various packages and analysis results. And I would >> like >>> more people to know its virtue and make use of it. Bio-Perl provides us a >>> perfect platform. >>> My current research is about gene order phylogeny reconstruction >> following >>> maximum likelihood criteria(others includes MP and NJ based). My >> phylogeny >>> inference pipeline involves using RAxML to build a ML tree and estimating >>> the internal(ancestral) sequence using PAML. While baseml of PAML is >>> well-supported, RAxML is not yet available. Although I wrote my own wrap >> for >>> RAxML, it?s even better for Bio-Perl to wrap RAxML so that everyone can >> use >>> easily. >>> I extensively used and also modified the source to fit RAxML to analysis >>> gene order data. With a good understanding of Perl and RAxML, what?s >> more, >>> the willing to make Bio-perl better, I am prepared to undertake it. >>> Programs or projects you have previously authored or contributed to >>> I implemented the algorithm using Perl[1](open source). And I also use >> and >>> learn Perl in daily bases. >>> A project plan for the project you are proposing >>> The wrap should be consistent with the other existing packages supported >> by >>> Tools::Run in style and api. I plan to it to full-fill most popular >>> functionality RAxML currently provide. >>> 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates >>> Sequence analysis (0-9A-V, 32 characters, available models are: ORDERED, >> MK, >>> GTR), this is useful for morphological data. >>> 2. DNA analysis and Amino Acid analysis, given custom transition >> matrix(AA >>> only), rate heterogeneity. >>> 4. Conduct standard bootstrapping and rapid bootstrapping as well as the >>> final through inference[2] as well as the relative new bootstopping. >>> 5. Given user starting tree or incomplete constrain tree. >>> 6. Specify a column weight file name to assign individual weights to each >>> column of the alignment. >>> 7. Specify an exclude file name, that contains a specification of >> alignment >>> positions you wish to exclude. >>> 8. Automatically generate random seed for the program. >>> 9. And more to be added. >>> Others plan that may benefit user. >>> 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential >> phylip >>> format so that RAxML can read. >>> 2. Design a set of more understandable commands, such as >>> use ?--model? instead of ?-P? to specify a custom model file. >>> use ?--workingdir? instead of ?-w? to specify the working directory. >>> But still one can use the old style according to their own preference. >>> 3. Implement more sophisticated exception handler and running mode >> summary. >>> There is huge combination of arguments that can cause error. For example, >> to >>> enable a rapid bootstrapping plus a thorough inference, one needs to give >>> ?-f a? ?-x {random seed}? together with the number of replicates ?-# >>> {number}?, if anyone is missing, RAxML won?t tell at once that these >> three >>> are all necessary, instead RAxML usually can only tell the ?nearest? >> error >>> it can spot. In my plan if one wants to conduct a RBS plus inference, the >>> wrap is able to inform user that all those three are necessary and then >>> guides to correct it.In sum, I plan to dig the errors from source code >> and >>> group them in accordance to their functionality. So each error message >> will >>> no longer be independent. >>> Another ?trivial? thoughts is when the running-id already exists, RAxML >> will >>> exit directly without choice, this would be disturbing if overwrite is >> fine, >>> I suggest to use a switch to define the behavior(overwrite, add a >> post-fix >>> to name, exit, skip this run). >>> 4. Preliminary post-processing can be conducted and afterward returned as >> a >>> value or list. Output the maximum likelihood scores for each >> bootstrapped >>> tree. Enumerate branches that have confidence value larger than a >> threthold. >>> Return a hash table containing branch lengths and running time, final ML >>> score.More analysis could be done by other package anyway. >>> >>> Any obligations, vacations, or plans for the summer that may require >>> scheduling during the GSoC work period. >>> No special obligations and vacations. >>> >>> >>> [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic >>> Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted. >>> [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for >> the >>> RAxML web-servers. Syst. Biol. 2008, 75:758?771. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > *Fei Hu > Bioinformatics Lab > 3D-11 Swearingen Building > U of South Carolina > Tel: 803-397-5240* > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lcpaulet at googlemail.com Thu Apr 7 11:37:24 2011 From: lcpaulet at googlemail.com (Lorenzo Carretero) Date: Thu, 7 Apr 2011 17:37:24 +0200 Subject: [Bioperl-l] Sorting BLAST hits by score Message-ID: Hi all, I'm wrinting a subroutine to parse blast reports . The script: *my $in = new Bio::SearchIO(-file => $filename , -format => $format) or die "No $filename BLAST file with $format found"; while( my $result = $in->next_result ) { $result->sort_hits(); my $query = $result->query_name(); my $number_of_hits = $result->num_hits(); print "Query Name : " ,$query,"\n"; print "Number of Hits : ", $number_of_hits,"\n"; my $pointer = 0; while( my $hit = $result->next_hit()) ## $hit is a Bio::Search::Hit::HitI compliant object { my $hitname = $hit->name(); my $evalue = $hit->significance(); while( my $hsp = $hit->next_hsp ) ## $hsp is a Bio::Search::HSP::HSPI compliant object { my $querylen = $hsp->length( 'query' ); my $hitlen = $hsp->length( 'hit' ); my $alnlen = $hsp->length( 'total' ); my $identity = $hsp->percent_identity(); my $value = "$query\t$hitname\t$identity\t$evalue\t$querylen\t$hitlen\t$alnlen\t$number_of_hits\n"; print "Value Content : $value\n"; } } #if($pointer > 0){last;} }* is within a sub to get the best non redundant hit from -m8/9 NCBI format blast reports According to the documentation for *Bio::Search::Result::ResultI* objects, the sort_hits method "sorts the available hit objects by a user-supplied function. *Defaults to sort by descending score.* " However, I get the following error message: Use of uninitialized value in numeric comparison (<=>) at /Library/Perl//5.10.0/Bio/Search/Result/ResultI.pm line 202, line 2773. Furthermore, every iteration through the hsps returns the warnig message: --------------------- WARNING --------------------- MSG: Did not define the number of conserved matches in the HSP; assuming conserved == identical (152) I can't see where the problem is!!! Thank you a lot for your help, Lorenzo From hufeiyc at gmail.com Thu Apr 7 12:23:46 2011 From: hufeiyc at gmail.com (Fei Hu) Date: Thu, 7 Apr 2011 12:23:46 -0400 Subject: [Bioperl-l] Summer of Code Proposal In-Reply-To: <9D0D3468-EDAF-49B8-9239-FA33A3005A28@illinois.edu> References: <731EE04A-8A33-43E6-A2E5-CDA644111227@sbc.su.se> <9D0D3468-EDAF-49B8-9239-FA33A3005A28@illinois.edu> Message-ID: Hi Chris I added a lot to the proposal according to your valuable comments, including a timeline, benefit, Bio:Seq and SeqIO concern and how it can work with other objects. Thank you! Best Fei On Thu, Apr 7, 2011 at 10:28 AM, Chris Fields wrote: > Fei, > > A few things. Most important: > > 1) You should have a rough timeline (with actual dates) for the summer > project, based on Google's events calendar ( > http://www.google-melange.com/gsoc/events/google/gsoc2011). This should > include start of coding as well as some general timeline of how you plan on > implementing your wrappers and other related code. > > 2) 'Deliverables' are needed. How does BioPerl benefit from this? What do > we get as a result of this endeavor? > > A few comments on the proposal: > > 1) Wrappers, by themselves, aren't necessarily difficult to write up. The > tough part is getting Bio::* objects to work with the wrapped executable and > parsing output, all the while ensuring the current classes within BioPerl > can deal with the data in a meaningful way. I haven't seen that described. > > 2) How would you want to deal with very large data sets using SeqIO? Or > would it be better to use something like an indexed flatfile, or seqs stored > in a database? > > 3) How do you plan on dealing with multi-state or binary state data? I > don't think there are classes that handle this data (yet), or handle it well > w/o significant hackery. hint: maybe that can be rectified... > > chris > > On Apr 7, 2011, at 8:08 AM, Fei Hu wrote: > > > Messina : > > > > I corrected some written mistakes. > > Also I added a new whole section talking about the RAxML and comparing it > to > > others. > > Thank you so much. > > > > Best > > Fei > > > > On Thu, Apr 7, 2011 at 4:51 AM, Dave Messina >wrote: > > > >> Hi, > >> > >> Looking pretty good, particularly the project plan section. > >> > >> You might also add some text in your introduction which shows the > >> importance of RaxML. Say that it's widely used and demonstrate that with > >> number of citations, number of downloads, or similar data. > >> > >> Also, there are some small English mistakes (for example wrap instead of > >> wrapper, provide instead of provides), so ask a native English speaker > to do > >> some editing. > >> > >> Good luck! I'd love to see this happen. > >> > >> Dave > >> > >> > >> On Apr 6, 2011, at 20:06, Fei Hu wrote: > >> > >>> Hi all, > >>> > >>> Below is my GoC 2011 proposal that describes my plan and thoughts. > >>> As time is really tight now, I need your advice to make it more > realistic > >>> and reasonable. > >>> Appreciate your time for reviewing. > >>> Also I am looking for a mentor who is interested in this project and > >> willing > >>> to guide me through the summer. > >>> > >>> Best > >>> Fei > >>> > >>> PS: Thanks Chris Fields for your valuable suggestion. > >>> > >>> > >>> Name Fei HU > >>> Address Rm. 3D-11, Swearingen Engineering Building, University of > South > >>> Carolina > >>> Email hufeiyc at gmail.com > >>> > >>> Why you are interested in the project you are proposing and are > >> well-suited > >>> to undertake it. > >>> I like to use Perl to organize and automate the pipeline, starting from > >>> extracting data, run various packages and analysis results. And I would > >> like > >>> more people to know its virtue and make use of it. Bio-Perl provides us > a > >>> perfect platform. > >>> My current research is about gene order phylogeny reconstruction > >> following > >>> maximum likelihood criteria(others includes MP and NJ based). My > >> phylogeny > >>> inference pipeline involves using RAxML to build a ML tree and > estimating > >>> the internal(ancestral) sequence using PAML. While baseml of PAML is > >>> well-supported, RAxML is not yet available. Although I wrote my own > wrap > >> for > >>> RAxML, it?s even better for Bio-Perl to wrap RAxML so that everyone can > >> use > >>> easily. > >>> I extensively used and also modified the source to fit RAxML to > analysis > >>> gene order data. With a good understanding of Perl and RAxML, what?s > >> more, > >>> the willing to make Bio-perl better, I am prepared to undertake it. > >>> Programs or projects you have previously authored or contributed to > >>> I implemented the algorithm using Perl[1](open source). And I also use > >> and > >>> learn Perl in daily bases. > >>> A project plan for the project you are proposing > >>> The wrap should be consistent with the other existing packages > supported > >> by > >>> Tools::Run in style and api. I plan to it to full-fill most popular > >>> functionality RAxML currently provide. > >>> 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates > >>> Sequence analysis (0-9A-V, 32 characters, available models are: > ORDERED, > >> MK, > >>> GTR), this is useful for morphological data. > >>> 2. DNA analysis and Amino Acid analysis, given custom transition > >> matrix(AA > >>> only), rate heterogeneity. > >>> 4. Conduct standard bootstrapping and rapid bootstrapping as well as > the > >>> final through inference[2] as well as the relative new bootstopping. > >>> 5. Given user starting tree or incomplete constrain tree. > >>> 6. Specify a column weight file name to assign individual weights to > each > >>> column of the alignment. > >>> 7. Specify an exclude file name, that contains a specification of > >> alignment > >>> positions you wish to exclude. > >>> 8. Automatically generate random seed for the program. > >>> 9. And more to be added. > >>> Others plan that may benefit user. > >>> 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential > >> phylip > >>> format so that RAxML can read. > >>> 2. Design a set of more understandable commands, such as > >>> use ?--model? instead of ?-P? to specify a custom model file. > >>> use ?--workingdir? instead of ?-w? to specify the working directory. > >>> But still one can use the old style according to their own preference. > >>> 3. Implement more sophisticated exception handler and running mode > >> summary. > >>> There is huge combination of arguments that can cause error. For > example, > >> to > >>> enable a rapid bootstrapping plus a thorough inference, one needs to > give > >>> ?-f a? ?-x {random seed}? together with the number of replicates ?-# > >>> {number}?, if anyone is missing, RAxML won?t tell at once that these > >> three > >>> are all necessary, instead RAxML usually can only tell the ?nearest? > >> error > >>> it can spot. In my plan if one wants to conduct a RBS plus inference, > the > >>> wrap is able to inform user that all those three are necessary and then > >>> guides to correct it.In sum, I plan to dig the errors from source code > >> and > >>> group them in accordance to their functionality. So each error message > >> will > >>> no longer be independent. > >>> Another ?trivial? thoughts is when the running-id already exists, RAxML > >> will > >>> exit directly without choice, this would be disturbing if overwrite is > >> fine, > >>> I suggest to use a switch to define the behavior(overwrite, add a > >> post-fix > >>> to name, exit, skip this run). > >>> 4. Preliminary post-processing can be conducted and afterward returned > as > >> a > >>> value or list. Output the maximum likelihood scores for each > >> bootstrapped > >>> tree. Enumerate branches that have confidence value larger than a > >> threthold. > >>> Return a hash table containing branch lengths and running time, final > ML > >>> score.More analysis could be done by other package anyway. > >>> > >>> Any obligations, vacations, or plans for the summer that may require > >>> scheduling during the GSoC work period. > >>> No special obligations and vacations. > >>> > >>> > >>> [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic > >>> Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted. > >>> [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for > >> the > >>> RAxML web-servers. Syst. Biol. 2008, 75:758?771. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > *Fei Hu > > Bioinformatics Lab > > 3D-11 Swearingen Building > > U of South Carolina > > Tel: 803-397-5240* > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- *Fei Hu Bioinformatics Lab 3D-11 Swearingen Building U of South Carolina Tel: 803-397-5240* From David.Messina at sbc.su.se Thu Apr 7 12:34:38 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 7 Apr 2011 18:34:38 +0200 Subject: [Bioperl-l] Sorting BLAST hits by score In-Reply-To: References: Message-ID: Hi Lorenzo, > However, I get the following error message: > Use of uninitialized value in numeric comparison (<=>) at > /Library/Perl//5.10.0/Bio/Search/Result/ResultI.pm line 202, line > 2773. > Hmm, I'm not seeing this error message. Could you provide us with your blast -m8 output so I can test with the same data? Furthermore, every iteration through the hsps returns the warnig message: > > --------------------- WARNING --------------------- > MSG: Did not define the number of conserved matches in the HSP; assuming > conserved == identical (152) > This is due to the fact that the blast -m8 tabular output doesn't report conserved matches. You should be able to suppress these warnings by adding -verbose => -1 to your Bio::SearchIO->new parameters, but it isn't working for me for some reason. Hmm. Anyway, as it's just a warning, it won't affect your results. Dave From lcpaulet at googlemail.com Thu Apr 7 13:00:33 2011 From: lcpaulet at googlemail.com (Lorenzo Carretero) Date: Thu, 7 Apr 2011 19:00:33 +0200 Subject: [Bioperl-l] Sorting BLAST hits by score In-Reply-To: References: Message-ID: Hi, My blast -m 8 output looks like this: # BLASTP 2.2.18 [Mar-02-2008] # Query: AcoGoldSmith_v1.022378m|PACid:18139816 # Database: Acoerulea_151_cds.pep.blastable # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score AcoGoldSmith_v1.022378m|PACid:18139816 AcoGoldSmith_v1.022378m|PACid:18139816 100.00 772 0 0 1 772 1 772 0.0 1542 AcoGoldSmith_v1.022378m|PACid:18139816 AcoGoldSmith_v1.024532m|PACid:18155258 35.15 717 422 9 46 761 8 682 9e-116 414 AcoGoldSmith_v1.022378m|PACid:18139816 AcoGoldSmith_v1.024532m|PACid:18155258 25.34 667 420 15 6 647 74 687 7e-53 205 AcoGoldSmith_v1.022378m|PACid:18139816 AcoGoldSmith_v1.024532m|PACid:18155258 29.63 378 251 8 5 382 174 536 2e-41 167 AcoGoldSmith_v1.022378m|PACid:18139816 AcoGoldSmith_v1.024532m|PACid:18155258 33.70 276 172 6 4 279 274 538 2e-35 147 Thanks for the reply, Lorenzo On Thu, Apr 7, 2011 at 6:34 PM, Dave Messina wrote: > Hi Lorenzo, > > > >> However, I get the following error message: >> Use of uninitialized value in numeric comparison (<=>) at >> /Library/Perl//5.10.0/Bio/Search/Result/ResultI.pm line 202, line >> 2773. >> > > Hmm, I'm not seeing this error message. Could you provide us with your > blast -m8 output so I can test with the same data? > > > Furthermore, every iteration through the hsps returns the warnig message: >> >> --------------------- WARNING --------------------- >> MSG: Did not define the number of conserved matches in the HSP; assuming >> conserved == identical (152) >> > > This is due to the fact that the blast -m8 tabular output doesn't report > conserved matches. > > You should be able to suppress these warnings by adding > > -verbose => -1 > > to your Bio::SearchIO->new parameters, but it isn't working for me for some > reason. Hmm. Anyway, as it's just a warning, it won't affect your results. > > > > Dave > > From cjfields at illinois.edu Thu Apr 7 13:16:29 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Apr 2011 12:16:29 -0500 Subject: [Bioperl-l] Sorting BLAST hits by score In-Reply-To: References: Message-ID: Lorenzo, Mind filing this as a bug and attaching the script and BLAST report to that? Sometimes diagnosing this is easier that way. http://bioperl.org/wiki/Bugs chris On Apr 7, 2011, at 12:00 PM, Lorenzo Carretero wrote: > Hi, > > My blast -m 8 output looks like this: > > # BLASTP 2.2.18 [Mar-02-2008] > # Query: AcoGoldSmith_v1.022378m|PACid:18139816 > # Database: Acoerulea_151_cds.pep.blastable > # Fields: Query id, Subject id, % identity, alignment length, mismatches, > gap openings, q. start, q. end, s. start, s. end, e-value, bit score > AcoGoldSmith_v1.022378m|PACid:18139816 > AcoGoldSmith_v1.022378m|PACid:18139816 100.00 772 0 0 1 > 772 1 772 0.0 1542 > AcoGoldSmith_v1.022378m|PACid:18139816 > AcoGoldSmith_v1.024532m|PACid:18155258 35.15 717 422 9 46 > 761 8 682 9e-116 414 > AcoGoldSmith_v1.022378m|PACid:18139816 > AcoGoldSmith_v1.024532m|PACid:18155258 25.34 667 420 15 6 > 647 74 687 7e-53 205 > AcoGoldSmith_v1.022378m|PACid:18139816 > AcoGoldSmith_v1.024532m|PACid:18155258 29.63 378 251 8 5 > 382 174 536 2e-41 167 > AcoGoldSmith_v1.022378m|PACid:18139816 > AcoGoldSmith_v1.024532m|PACid:18155258 33.70 276 172 6 4 > 279 274 538 2e-35 147 > > Thanks for the reply, > Lorenzo > > > > On Thu, Apr 7, 2011 at 6:34 PM, Dave Messina wrote: > >> Hi Lorenzo, >> >> >> >>> However, I get the following error message: >>> Use of uninitialized value in numeric comparison (<=>) at >>> /Library/Perl//5.10.0/Bio/Search/Result/ResultI.pm line 202, line >>> 2773. >>> >> >> Hmm, I'm not seeing this error message. Could you provide us with your >> blast -m8 output so I can test with the same data? >> >> >> Furthermore, every iteration through the hsps returns the warnig message: >>> >>> --------------------- WARNING --------------------- >>> MSG: Did not define the number of conserved matches in the HSP; assuming >>> conserved == identical (152) >>> >> >> This is due to the fact that the blast -m8 tabular output doesn't report >> conserved matches. >> >> You should be able to suppress these warnings by adding >> >> -verbose => -1 >> >> to your Bio::SearchIO->new parameters, but it isn't working for me for some >> reason. Hmm. Anyway, as it's just a warning, it won't affect your results. >> >> >> >> Dave >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fpf at biochem.mpg.de Thu Apr 7 11:35:38 2011 From: fpf at biochem.mpg.de (fpf at biochem.mpg.de) Date: Thu, 07 Apr 2011 17:35:38 +0200 Subject: [Bioperl-l] blast results: how to get forward/reverse strand info? Message-ID: <20110407173538.yw3lb69c0wcowgo4@webmail.biochem.mpg.de> Dear BioPerl specialists, At the end is part of a blast result which I want to parse: Blast version: BLASTN 2.2.24 [Aug-08-2010] BioPerl call: use Bio::SearchIO ; ... my $stringfh = new IO::String ($blaststring); my $in = new Bio::SearchIO (-format => 'blast', -fh => $stringfh) or die "parsing blast output string failed"; I need to know if my hit is forward (Strand = Plus / Plus) or reverse ( Strand = Plus / Minus) As I did not see how this information is returned, I checked if hit_end is > or < hit_start. However, the hit_start and hit_end coordinates are given "(in original hit sequence coords)" as stated in e.g. http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Search/HSP/GenericHSP.pm In the example below, hit_start is 24885 hit_end is 25902 How to find out if the hit is to the forward or reverse strand? I tried "->strand" but without success (always returns 1) Thanks for your help Friedhelm Pfeiffer fpf at biochem.mpg.de here is blaststring (central alignment section removed) > Hfvol_pHV3 haloVolc1_dna pHV3 Length = 437906 Score = 1905 bits (961), Expect = 0.0 Identities = 1009/1021 (98%), Gaps = 3/1021 (0%) Strand = Plus / Minus Query: 27 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga 86 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 25902 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga 25843 ... Query: 986 tcaggtcccgggccctggaattcgatgctggccggattcaacggaacctttgctggggag 1045 ||||||||| ||||||||||||||||||| ||||||||| ||||| || ||||||||||| Sbjct: 24942 tcaggtccc-ggccctggaattcgatgct-gccggattcgacggagccgttgctggggag 24885 From torsten.seemann at monash.edu Thu Apr 7 21:21:51 2011 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Fri, 8 Apr 2011 11:21:51 +1000 Subject: [Bioperl-l] Who is going to BOSC 2011 / Codefest 2011 ? Message-ID: Hi all, I've been a bit quiet on the Bioperl front the last couple of years, but I'm still a regular user, and am getting into Gbrowse2 too now. I was thinking of coming to ISMB / BOSC / Codefest in Vienna this year (assuming I can get funded) and was wondering if anyone on this list was coming? I only see 4 people on the Wiki attendee list for Codefest: http://www.open-bio.org/wiki/Codefest_2011 I would love to get back into Bioperl but will probably need a bit of guidance since the git/svn migration, and the amount of restructuring that has gone on under Chris Field's (excellent) guidance. -- --Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA --http://www.bioinformatics.net.au/ From hlapp at drycafe.net Fri Apr 8 01:41:50 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 8 Apr 2011 01:41:50 -0400 Subject: [Bioperl-l] Who is going to BOSC 2011 / Codefest 2011 ? In-Reply-To: References: Message-ID: On Apr 7, 2011, at 9:21 PM, Torsten Seemann wrote: > I only see 4 people on the Wiki attendee list for Codefest: 5 now. > I would love to get back into Bioperl but will probably need a bit > of guidance since the git/svn migration, and the amount of > restructuring that has gone on under Chris Field's (excellent) > guidance. There should be enough to provide that. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Fri Apr 8 07:24:53 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 Apr 2011 06:24:53 -0500 Subject: [Bioperl-l] Who is going to BOSC 2011 / Codefest 2011 ? In-Reply-To: References: Message-ID: <93B22CD3-8716-4BD0-ADA9-A5EEF9A1305C@illinois.edu> Nice to hear from you Torsten! Yes, lots to catch up on, even more to do... I likely won't make this year's BOSC, but I would guess Heikki and Hilmar might. Dave Messina is also in Europe and might make it. Not sure about others. chris On Apr 7, 2011, at 8:21 PM, Torsten Seemann wrote: > Hi all, > > I've been a bit quiet on the Bioperl front the last couple of years, but I'm > still a regular user, and am getting into Gbrowse2 too now. > > I was thinking of coming to ISMB / BOSC / Codefest in Vienna this year > (assuming I can get funded) and was wondering if anyone on this list was > coming? > > I only see 4 people on the Wiki attendee list for Codefest: > http://www.open-bio.org/wiki/Codefest_2011 > > I would love to get back into Bioperl but will probably need a bit of > guidance since the git/svn migration, and the amount of restructuring that > has gone on under Chris Field's (excellent) guidance. > > -- > --Dr Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA > --http://www.bioinformatics.net.au/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Apr 8 11:41:39 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 8 Apr 2011 11:41:39 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures Message-ID: Hi Lincoln, I've been looking into some test failures with the postgres adaptor for Bio::DB::GFF and I wanted to check with you that I'm interpreting this correctly. In t/LocalDB/BioDBGFF.t there are these lines: @features = sort {$a->start<=>$b->start} @features; is($features[0]->type,'Component:reference'); is($features[-1]->type,'exon:confirmed'); So that the features in the data set are sorted by their start values and the beginning and end of the list are checked. The test refers to the test.gff data file, that contains among others these lines: Contig1 confirmed transcript 30001 31000 . - . Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" Contig1 confirmed exon 30001 30100 . - . Transcript trans-2; Gene "abc-1"; Note "function unknown" Contig1 confirmed exon 30701 30800 . - . Transcript trans-2 Contig1 confirmed exon 30801 31000 . - . Transcript trans-2 Since this transcript and its exons are on the minus strand, the values that the start and stop method return will be reversed, so that start for the transcript will be 31000 and stop will be 30001. The problem with this test is since the last exon and the transcript share a start value (31000), you can't really be sure which one will be at the bottom of the list after sorting, right? In the case of the postgres adaptor, it fails this test on my machine because the transcript is at the bottom of the list. The test for the beginning of the list similarly could fail though it didn't in my case, as other features that have 1 as a start are of type "Component:clone". So, my question is this: am I missing something, and the postgres adaptor is not behaving as expected, or are these tests ambiguous? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Fri Apr 8 13:10:31 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 8 Apr 2011 13:10:31 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Do start() and end() flip values for minus strand features? This isn't supposed to happen. Lincoln On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: > Hi Lincoln, > > I've been looking into some test failures with the postgres adaptor > for Bio::DB::GFF and I wanted to check with you that I'm interpreting > this correctly. In t/LocalDB/BioDBGFF.t there are these lines: > > @features = sort {$a->start<=>$b->start} @features; > > is($features[0]->type,'Component:reference'); > is($features[-1]->type,'exon:confirmed'); > > So that the features in the data set are sorted by their start values > and the beginning and end of the list are checked. The test refers to > the test.gff data file, that contains among others these lines: > > Contig1 confirmed transcript 30001 31000 . - . > Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" > Contig1 confirmed exon 30001 30100 . - . Transcript > trans-2; Gene "abc-1"; Note "function unknown" > Contig1 confirmed exon 30701 30800 . - . Transcript trans-2 > Contig1 confirmed exon 30801 31000 . - . Transcript trans-2 > > Since this transcript and its exons are on the minus strand, the > values that the start and stop method return will be reversed, so that > start for the transcript will be 31000 and stop will be 30001. The > problem with this test is since the last exon and the transcript share > a start value (31000), you can't really be sure which one will be at > the bottom of the list after sorting, right? In the case of the > postgres adaptor, it fails this test on my machine because the > transcript is at the bottom of the list. The test for the beginning > of the list similarly could fail though it didn't in my case, as other > features that have 1 as a start are of type "Component:clone". > > So, my question is this: am I missing something, and the postgres > adaptor is not behaving as expected, or are these tests ambiguous? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From scott at scottcain.net Fri Apr 8 13:18:33 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 8 Apr 2011 13:18:33 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Hi Lincoln, Yes, apparently, it does. It does this for both the memory and the postgres adaptors. I looked at how the data was stored in the feature object with Data::Dumper and that is how it is represented in the hash too. Perhaps this test should be calling the "absolute" method first? Scott On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein wrote: > Do start() and end() flip values for minus strand features? This isn't > supposed to happen. > Lincoln > > On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: >> >> Hi Lincoln, >> >> I've been looking into some test failures with the postgres adaptor >> for Bio::DB::GFF and I wanted to check with you that I'm interpreting >> this correctly. ?In t/LocalDB/BioDBGFF.t there are these lines: >> >> ?@features = sort {$a->start<=>$b->start} @features; >> >> ?is($features[0]->type,'Component:reference'); >> ?is($features[-1]->type,'exon:confirmed'); >> >> So that the features in the data set are sorted by their start values >> and the beginning and end of the list are checked. ?The test refers to >> the test.gff data file, that contains among others these lines: >> >> Contig1 confirmed ? transcript ? ? ?30001 ? 31000 ? . ? - ? . >> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >> Contig1 confirmed ? exon ? ?30001 ? 30100 ? . ? - ? . ? Transcript >> trans-2; Gene "abc-1"; Note "function unknown" >> Contig1 confirmed ? exon ? ?30701 ? 30800 ? . ? - ? . ? Transcript trans-2 >> Contig1 confirmed ? exon ? ?30801 ? 31000 ? . ? - ? . ? Transcript trans-2 >> >> Since this transcript and its exons are on the minus strand, the >> values that the start and stop method return will be reversed, so that >> start for the transcript will be 31000 and stop will be 30001. ?The >> problem with this test is since the last exon and the transcript share >> a start value (31000), you can't really be sure which one will be at >> the bottom of the list after sorting, right? ?In the case of the >> postgres adaptor, it fails this test on my machine because the >> transcript is at the bottom of the list. ?The test for the beginning >> of the list similarly could fail though it didn't in my case, as other >> features that have 1 as a start are of type "Component:clone". >> >> So, my question is this: am I missing something, and the postgres >> adaptor is not behaving as expected, or are these tests ambiguous? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Fri Apr 8 13:51:55 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 8 Apr 2011 13:51:55 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is too late to change it now. (Bio::DB::SeqFeature::Store had better not!) Best to remove the test altogether. Lincoln On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain wrote: > Hi Lincoln, > > Yes, apparently, it does. It does this for both the memory and the > postgres adaptors. I looked at how the data was stored in the feature > object with Data::Dumper and that is how it is represented in the hash > too. Perhaps this test should be calling the "absolute" method first? > > Scott > > > On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein > wrote: > > Do start() and end() flip values for minus strand features? This isn't > > supposed to happen. > > Lincoln > > > > On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: > >> > >> Hi Lincoln, > >> > >> I've been looking into some test failures with the postgres adaptor > >> for Bio::DB::GFF and I wanted to check with you that I'm interpreting > >> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: > >> > >> @features = sort {$a->start<=>$b->start} @features; > >> > >> is($features[0]->type,'Component:reference'); > >> is($features[-1]->type,'exon:confirmed'); > >> > >> So that the features in the data set are sorted by their start values > >> and the beginning and end of the list are checked. The test refers to > >> the test.gff data file, that contains among others these lines: > >> > >> Contig1 confirmed transcript 30001 31000 . - . > >> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" > >> Contig1 confirmed exon 30001 30100 . - . Transcript > >> trans-2; Gene "abc-1"; Note "function unknown" > >> Contig1 confirmed exon 30701 30800 . - . Transcript > trans-2 > >> Contig1 confirmed exon 30801 31000 . - . Transcript > trans-2 > >> > >> Since this transcript and its exons are on the minus strand, the > >> values that the start and stop method return will be reversed, so that > >> start for the transcript will be 31000 and stop will be 30001. The > >> problem with this test is since the last exon and the transcript share > >> a start value (31000), you can't really be sure which one will be at > >> the bottom of the list after sorting, right? In the case of the > >> postgres adaptor, it fails this test on my machine because the > >> transcript is at the bottom of the list. The test for the beginning > >> of the list similarly could fail though it didn't in my case, as other > >> features that have 1 as a start are of type "Component:clone". > >> > >> So, my question is this: am I missing something, and the postgres > >> adaptor is not behaving as expected, or are these tests ambiguous? > >> > >> Thanks, > >> Scott > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > > Lincoln D. Stein > > Director, Informatics and Biocomputing Platform > > Ontario Institute for Cancer Research > > 101 College St., Suite 800 > > Toronto, ON, Canada M5G0A3 > > 416 673-8514 > > Assistant: Renata Musa > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From scott at scottcain.net Fri Apr 8 13:58:29 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 8 Apr 2011 13:58:29 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: OK, I'll take it out and move on to the next problem. Thanks, Scott On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein wrote: > Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is too > late to change it now. (Bio::DB::SeqFeature::Store had better not!) Best to > remove the test altogether. > Lincoln > > On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain wrote: >> >> Hi Lincoln, >> >> Yes, apparently, it does. ?It does this for both the memory and the >> postgres adaptors. ?I looked at how the data was stored in the feature >> object with Data::Dumper and that is how it is represented in the hash >> too. ?Perhaps this test should be calling the "absolute" method first? >> >> Scott >> >> >> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein >> wrote: >> > Do start() and end() flip values for minus strand features? This isn't >> > supposed to happen. >> > Lincoln >> > >> > On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: >> >> >> >> Hi Lincoln, >> >> >> >> I've been looking into some test failures with the postgres adaptor >> >> for Bio::DB::GFF and I wanted to check with you that I'm interpreting >> >> this correctly. ?In t/LocalDB/BioDBGFF.t there are these lines: >> >> >> >> ?@features = sort {$a->start<=>$b->start} @features; >> >> >> >> ?is($features[0]->type,'Component:reference'); >> >> ?is($features[-1]->type,'exon:confirmed'); >> >> >> >> So that the features in the data set are sorted by their start values >> >> and the beginning and end of the list are checked. ?The test refers to >> >> the test.gff data file, that contains among others these lines: >> >> >> >> Contig1 confirmed ? transcript ? ? ?30001 ? 31000 ? . ? - ? . >> >> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >> >> Contig1 confirmed ? exon ? ?30001 ? 30100 ? . ? - ? . ? Transcript >> >> trans-2; Gene "abc-1"; Note "function unknown" >> >> Contig1 confirmed ? exon ? ?30701 ? 30800 ? . ? - ? . ? Transcript >> >> trans-2 >> >> Contig1 confirmed ? exon ? ?30801 ? 31000 ? . ? - ? . ? Transcript >> >> trans-2 >> >> >> >> Since this transcript and its exons are on the minus strand, the >> >> values that the start and stop method return will be reversed, so that >> >> start for the transcript will be 31000 and stop will be 30001. ?The >> >> problem with this test is since the last exon and the transcript share >> >> a start value (31000), you can't really be sure which one will be at >> >> the bottom of the list after sorting, right? ?In the case of the >> >> postgres adaptor, it fails this test on my machine because the >> >> transcript is at the bottom of the list. ?The test for the beginning >> >> of the list similarly could fail though it didn't in my case, as other >> >> features that have 1 as a start are of type "Component:clone". >> >> >> >> So, my question is this: am I missing something, and the postgres >> >> adaptor is not behaving as expected, or are these tests ambiguous? >> >> >> >> Thanks, >> >> Scott >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> > >> > -- >> > Lincoln D. Stein >> > Director, Informatics and Biocomputing Platform >> > Ontario Institute for Cancer Research >> > 101 College St., Suite 800 >> > Toronto, ON, Canada M5G0A3 >> > 416 673-8514 >> > Assistant: Renata Musa >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Fri Apr 8 14:02:16 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 8 Apr 2011 20:02:16 +0200 Subject: [Bioperl-l] blast results: how to get forward/reverse strand info? In-Reply-To: <20110407173538.yw3lb69c0wcowgo4@webmail.biochem.mpg.de> References: <20110407173538.yw3lb69c0wcowgo4@webmail.biochem.mpg.de> Message-ID: Hi Friedhelm, Ah, yes, I've been bitten by that, too. You need to specify whether you want the strand for query or hit, a la: $hsp->strand('hit'); Here are the docs for that method, from Bio::Search::HSP::HSPI Title : strand Usage : $hsp->strand('query') Function: Retrieves the strand for the HSP component requested Returns : +1 or -1 (0 if unknown) Args : 'hit' or 'subject' or 'sbjct' to retrieve the strand of the subject 'query' to retrieve the query strand (default) 'list' or 'array' to retreive both query and hit together Dave On Thu, Apr 7, 2011 at 17:35, wrote: > Dear BioPerl specialists, > > At the end is part of a blast result which I want to parse: > > Blast version: BLASTN 2.2.24 [Aug-08-2010] > > BioPerl call: > use Bio::SearchIO ; > ... > my $stringfh = new IO::String ($blaststring); > my $in = new Bio::SearchIO (-format => 'blast', -fh => $stringfh) > or die "parsing blast output string failed"; > > I need to know if my hit is forward (Strand = Plus / Plus) > or reverse ( Strand = Plus / Minus) > > As I did not see how this information is returned, I checked if > hit_end is > or < hit_start. However, the hit_start and hit_end coordinates > are given "(in original hit sequence coords)" as stated in e.g. > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Search/HSP/GenericHSP.pm > > In the example below, > hit_start is 24885 > hit_end is 25902 > > How to find out if the hit is to the forward or reverse strand? > I tried "->strand" but without success (always returns 1) > > Thanks for your help > > Friedhelm Pfeiffer > fpf at biochem.mpg.de > > > here is blaststring (central alignment section removed) > > Hfvol_pHV3 haloVolc1_dna pHV3 >> > Length = 437906 > > Score = 1905 bits (961), Expect = 0.0 > Identities = 1009/1021 (98%), Gaps = 3/1021 (0%) > Strand = Plus / Minus > > Query: 27 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga > 86 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 25902 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga > 25843 > > ... > > Query: 986 tcaggtcccgggccctggaattcgatgctggccggattcaacggaacctttgctggggag > 1045 > ||||||||| ||||||||||||||||||| ||||||||| ||||| || ||||||||||| > Sbjct: 24942 tcaggtccc-ggccctggaattcgatgct-gccggattcgacggagccgttgctggggag > 24885 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Apr 8 14:56:40 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 Apr 2011 13:56:40 -0500 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Scott, I'll try documenting the Pg error for SF::Store in the next hour. Had my hands full with the GSoC onslaught of emails and local $job stuff. Would like to get it fixed for the CPAN release. chris On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: > OK, I'll take it out and move on to the next problem. > > Thanks, > Scott > > > On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein wrote: >> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is too >> late to change it now. (Bio::DB::SeqFeature::Store had better not!) Best to >> remove the test altogether. >> Lincoln >> >> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain wrote: >>> >>> Hi Lincoln, >>> >>> Yes, apparently, it does. It does this for both the memory and the >>> postgres adaptors. I looked at how the data was stored in the feature >>> object with Data::Dumper and that is how it is represented in the hash >>> too. Perhaps this test should be calling the "absolute" method first? >>> >>> Scott >>> >>> >>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein >>> wrote: >>>> Do start() and end() flip values for minus strand features? This isn't >>>> supposed to happen. >>>> Lincoln >>>> >>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: >>>>> >>>>> Hi Lincoln, >>>>> >>>>> I've been looking into some test failures with the postgres adaptor >>>>> for Bio::DB::GFF and I wanted to check with you that I'm interpreting >>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: >>>>> >>>>> @features = sort {$a->start<=>$b->start} @features; >>>>> >>>>> is($features[0]->type,'Component:reference'); >>>>> is($features[-1]->type,'exon:confirmed'); >>>>> >>>>> So that the features in the data set are sorted by their start values >>>>> and the beginning and end of the list are checked. The test refers to >>>>> the test.gff data file, that contains among others these lines: >>>>> >>>>> Contig1 confirmed transcript 30001 31000 . - . >>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>> Contig1 confirmed exon 30001 30100 . - . Transcript >>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>> Contig1 confirmed exon 30701 30800 . - . Transcript >>>>> trans-2 >>>>> Contig1 confirmed exon 30801 31000 . - . Transcript >>>>> trans-2 >>>>> >>>>> Since this transcript and its exons are on the minus strand, the >>>>> values that the start and stop method return will be reversed, so that >>>>> start for the transcript will be 31000 and stop will be 30001. The >>>>> problem with this test is since the last exon and the transcript share >>>>> a start value (31000), you can't really be sure which one will be at >>>>> the bottom of the list after sorting, right? In the case of the >>>>> postgres adaptor, it fails this test on my machine because the >>>>> transcript is at the bottom of the list. The test for the beginning >>>>> of the list similarly could fail though it didn't in my case, as other >>>>> features that have 1 as a start are of type "Component:clone". >>>>> >>>>> So, my question is this: am I missing something, and the postgres >>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>> >>>>> Thanks, >>>>> Scott >>>>> >>>>> >>>>> -- >>>>> >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain >>>>> dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Director, Informatics and Biocomputing Platform >>>> Ontario Institute for Cancer Research >>>> 101 College St., Suite 800 >>>> Toronto, ON, Canada M5G0A3 >>>> 416 673-8514 >>>> Assistant: Renata Musa >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >> >> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Apr 8 17:32:50 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 8 Apr 2011 17:32:50 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Argh! MySQL is not a RDMS! Anyone who tells you otherwise is lying! The first test failure for the Pg SFS adaptor is failing because it is trying to execute this query (which it inherited from the mysql adaptor, where it works just fine): select id,object FROM bioperl_seqfeature_t_test_schema_feature where id='doesnotexit'; Of course, the id column is defined as an integer column. MySQL must be silently casting this string to an integer value (? I guess anyway--who knows). Anyway, PostgreSQL does the right thing and throws an error with this query. I don't see how I can make the Postgres adaptor pass this test as written, as it is nonsensical. Scott On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields wrote: > Scott, > > I'll try documenting the Pg error for SF::Store in the next hour. ?Had my hands full with the GSoC onslaught of emails and local $job stuff. ?Would like to get it fixed for the CPAN release. > > chris > > On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: > >> OK, I'll take it out and move on to the next problem. >> >> Thanks, >> Scott >> >> >> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein wrote: >>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is too >>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) Best to >>> remove the test altogether. >>> Lincoln >>> >>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain wrote: >>>> >>>> Hi Lincoln, >>>> >>>> Yes, apparently, it does. ?It does this for both the memory and the >>>> postgres adaptors. ?I looked at how the data was stored in the feature >>>> object with Data::Dumper and that is how it is represented in the hash >>>> too. ?Perhaps this test should be calling the "absolute" method first? >>>> >>>> Scott >>>> >>>> >>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein >>>> wrote: >>>>> Do start() and end() flip values for minus strand features? This isn't >>>>> supposed to happen. >>>>> Lincoln >>>>> >>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: >>>>>> >>>>>> Hi Lincoln, >>>>>> >>>>>> I've been looking into some test failures with the postgres adaptor >>>>>> for Bio::DB::GFF and I wanted to check with you that I'm interpreting >>>>>> this correctly. ?In t/LocalDB/BioDBGFF.t there are these lines: >>>>>> >>>>>> ?@features = sort {$a->start<=>$b->start} @features; >>>>>> >>>>>> ?is($features[0]->type,'Component:reference'); >>>>>> ?is($features[-1]->type,'exon:confirmed'); >>>>>> >>>>>> So that the features in the data set are sorted by their start values >>>>>> and the beginning and end of the list are checked. ?The test refers to >>>>>> the test.gff data file, that contains among others these lines: >>>>>> >>>>>> Contig1 confirmed ? transcript ? ? ?30001 ? 31000 ? . ? - ? . >>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>> Contig1 confirmed ? exon ? ?30001 ? 30100 ? . ? - ? . ? Transcript >>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>> Contig1 confirmed ? exon ? ?30701 ? 30800 ? . ? - ? . ? Transcript >>>>>> trans-2 >>>>>> Contig1 confirmed ? exon ? ?30801 ? 31000 ? . ? - ? . ? Transcript >>>>>> trans-2 >>>>>> >>>>>> Since this transcript and its exons are on the minus strand, the >>>>>> values that the start and stop method return will be reversed, so that >>>>>> start for the transcript will be 31000 and stop will be 30001. ?The >>>>>> problem with this test is since the last exon and the transcript share >>>>>> a start value (31000), you can't really be sure which one will be at >>>>>> the bottom of the list after sorting, right? ?In the case of the >>>>>> postgres adaptor, it fails this test on my machine because the >>>>>> transcript is at the bottom of the list. ?The test for the beginning >>>>>> of the list similarly could fail though it didn't in my case, as other >>>>>> features that have 1 as a start are of type "Component:clone". >>>>>> >>>>>> So, my question is this: am I missing something, and the postgres >>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>> >>>>>> Thanks, >>>>>> Scott >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>>>>> dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>> >>>>> >>>>> >>>>> -- >>>>> Lincoln D. Stein >>>>> Director, Informatics and Biocomputing Platform >>>>> Ontario Institute for Cancer Research >>>>> 101 College St., Suite 800 >>>>> Toronto, ON, Canada M5G0A3 >>>>> 416 673-8514 >>>>> Assistant: Renata Musa >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>>> dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Director, Informatics and Biocomputing Platform >>> Ontario Institute for Cancer Research >>> 101 College St., Suite 800 >>> Toronto, ON, Canada M5G0A3 >>> 416 673-8514 >>> Assistant: Renata Musa >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Apr 8 20:53:59 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 Apr 2011 19:53:59 -0500 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: That was introduced here by Florent: https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t So, essentially the MySQL adaptor is getting this wrong. Any way we can somehow enable strict mode? http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html chris On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: > Argh! MySQL is not a RDMS! Anyone who tells you otherwise is lying! > > The first test failure for the Pg SFS adaptor is failing because it is > trying to execute this query (which it inherited from the mysql > adaptor, where it works just fine): > > select id,object FROM bioperl_seqfeature_t_test_schema_feature where > id='doesnotexit'; > > Of course, the id column is defined as an integer column. MySQL must > be silently casting this string to an integer value (? I guess > anyway--who knows). Anyway, PostgreSQL does the right thing and > throws an error with this query. I don't see how I can make the > Postgres adaptor pass this test as written, as it is nonsensical. > > Scott > > > > On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields wrote: >> Scott, >> >> I'll try documenting the Pg error for SF::Store in the next hour. Had my hands full with the GSoC onslaught of emails and local $job stuff. Would like to get it fixed for the CPAN release. >> >> chris >> >> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: >> >>> OK, I'll take it out and move on to the next problem. >>> >>> Thanks, >>> Scott >>> >>> >>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein wrote: >>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is too >>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) Best to >>>> remove the test altogether. >>>> Lincoln >>>> >>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain wrote: >>>>> >>>>> Hi Lincoln, >>>>> >>>>> Yes, apparently, it does. It does this for both the memory and the >>>>> postgres adaptors. I looked at how the data was stored in the feature >>>>> object with Data::Dumper and that is how it is represented in the hash >>>>> too. Perhaps this test should be calling the "absolute" method first? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein >>>>> wrote: >>>>>> Do start() and end() flip values for minus strand features? This isn't >>>>>> supposed to happen. >>>>>> Lincoln >>>>>> >>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain wrote: >>>>>>> >>>>>>> Hi Lincoln, >>>>>>> >>>>>>> I've been looking into some test failures with the postgres adaptor >>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm interpreting >>>>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: >>>>>>> >>>>>>> @features = sort {$a->start<=>$b->start} @features; >>>>>>> >>>>>>> is($features[0]->type,'Component:reference'); >>>>>>> is($features[-1]->type,'exon:confirmed'); >>>>>>> >>>>>>> So that the features in the data set are sorted by their start values >>>>>>> and the beginning and end of the list are checked. The test refers to >>>>>>> the test.gff data file, that contains among others these lines: >>>>>>> >>>>>>> Contig1 confirmed transcript 30001 31000 . - . >>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>>> Contig1 confirmed exon 30001 30100 . - . Transcript >>>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>>> Contig1 confirmed exon 30701 30800 . - . Transcript >>>>>>> trans-2 >>>>>>> Contig1 confirmed exon 30801 31000 . - . Transcript >>>>>>> trans-2 >>>>>>> >>>>>>> Since this transcript and its exons are on the minus strand, the >>>>>>> values that the start and stop method return will be reversed, so that >>>>>>> start for the transcript will be 31000 and stop will be 30001. The >>>>>>> problem with this test is since the last exon and the transcript share >>>>>>> a start value (31000), you can't really be sure which one will be at >>>>>>> the bottom of the list after sorting, right? In the case of the >>>>>>> postgres adaptor, it fails this test on my machine because the >>>>>>> transcript is at the bottom of the list. The test for the beginning >>>>>>> of the list similarly could fail though it didn't in my case, as other >>>>>>> features that have 1 as a start are of type "Component:clone". >>>>>>> >>>>>>> So, my question is this: am I missing something, and the postgres >>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>>> >>>>>>> Thanks, >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at scottcain >>>>>>> dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Lincoln D. Stein >>>>>> Director, Informatics and Biocomputing Platform >>>>>> Ontario Institute for Cancer Research >>>>>> 101 College St., Suite 800 >>>>>> Toronto, ON, Canada M5G0A3 >>>>>> 416 673-8514 >>>>>> Assistant: Renata Musa >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain >>>>> dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Director, Informatics and Biocomputing Platform >>>> Ontario Institute for Cancer Research >>>> 101 College St., Suite 800 >>>> Toronto, ON, Canada M5G0A3 >>>> 416 673-8514 >>>> Assistant: Renata Musa >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wadim_kapulkin at yahoo.co.uk Fri Apr 8 23:32:04 2011 From: wadim_kapulkin at yahoo.co.uk (wadim kapulkin) Date: Sat, 9 Apr 2011 04:32:04 +0100 (BST) Subject: [Bioperl-l] Q: batched extraction of sub-sequences and their reverse-complements ? Message-ID: <710602.90088.qm@web28506.mail.ukl.yahoo.com> Hi There, Let me ask the following : I would like to extract the batch of subsequences (as fastas), based on list of coordinates : i.e. 1-1000, 1001-2000 , 2001-3000 etc) from given 'large seqence' (i.e. chromosome sized >10MB) and then, ideally , I would be keen to know how to extract the converse set - [i.e.: extract 'same' ( I mean corresponding) batch of sequences, based on list of converse coordinates from reverse-complement of given 'large sequence']. Would there be 'easy' way of doing that ? Thanks in advance & apologies for asking trivia - Wadim From David.Messina at sbc.su.se Sat Apr 9 04:47:34 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 9 Apr 2011 10:47:34 +0200 Subject: [Bioperl-l] Q: batched extraction of sub-sequences and their reverse-complements ? In-Reply-To: <710602.90088.qm@web28506.mail.ukl.yahoo.com> References: <710602.90088.qm@web28506.mail.ukl.yahoo.com> Message-ID: Hi Wadim, I would like to extract the batch of subsequences (as fastas), based on > list of > coordinates : i.e. 1-1000, 1001-2000 , 2001-3000 etc) from given 'large > seqence' > (i.e. chromosome sized >10MB) Take a look at Bio::DB::Fasta. > and then, ideally , I would be keen to know how to > extract the converse set - [i.e.: extract 'same' ( I mean corresponding) > batch > of sequences, based on list of converse coordinates from > reverse-complement of > given 'large sequence']. > I don't totally understand this part of your question, but this may help: http://www.bioperl.org/wiki/BioPerl_Tutorial#Converting_coordinate_systems_.28Coordinate::Pair.2C_RelSegment.29 Dave From lincoln.stein at gmail.com Sun Apr 10 17:10:24 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Sun, 10 Apr 2011 17:10:24 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Hi Folks, Is this what's blocking the netx bioperl release, or are there other blockers? I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. Lincoln On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields wrote: > That was introduced here by Florent: > > > https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t > > So, essentially the MySQL adaptor is getting this wrong. Any way we can > somehow enable strict mode? > > http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html > > chris > > On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: > > > Argh! MySQL is not a RDMS! Anyone who tells you otherwise is lying! > > > > The first test failure for the Pg SFS adaptor is failing because it is > > trying to execute this query (which it inherited from the mysql > > adaptor, where it works just fine): > > > > select id,object FROM bioperl_seqfeature_t_test_schema_feature where > > id='doesnotexit'; > > > > Of course, the id column is defined as an integer column. MySQL must > > be silently casting this string to an integer value (? I guess > > anyway--who knows). Anyway, PostgreSQL does the right thing and > > throws an error with this query. I don't see how I can make the > > Postgres adaptor pass this test as written, as it is nonsensical. > > > > Scott > > > > > > > > On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields > wrote: > >> Scott, > >> > >> I'll try documenting the Pg error for SF::Store in the next hour. Had > my hands full with the GSoC onslaught of emails and local $job stuff. Would > like to get it fixed for the CPAN release. > >> > >> chris > >> > >> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: > >> > >>> OK, I'll take it out and move on to the next problem. > >>> > >>> Thanks, > >>> Scott > >>> > >>> > >>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein > wrote: > >>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is > too > >>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) > Best to > >>>> remove the test altogether. > >>>> Lincoln > >>>> > >>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain > wrote: > >>>>> > >>>>> Hi Lincoln, > >>>>> > >>>>> Yes, apparently, it does. It does this for both the memory and the > >>>>> postgres adaptors. I looked at how the data was stored in the > feature > >>>>> object with Data::Dumper and that is how it is represented in the > hash > >>>>> too. Perhaps this test should be calling the "absolute" method > first? > >>>>> > >>>>> Scott > >>>>> > >>>>> > >>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < > lincoln.stein at gmail.com> > >>>>> wrote: > >>>>>> Do start() and end() flip values for minus strand features? This > isn't > >>>>>> supposed to happen. > >>>>>> Lincoln > >>>>>> > >>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain > wrote: > >>>>>>> > >>>>>>> Hi Lincoln, > >>>>>>> > >>>>>>> I've been looking into some test failures with the postgres adaptor > >>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm > interpreting > >>>>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: > >>>>>>> > >>>>>>> @features = sort {$a->start<=>$b->start} @features; > >>>>>>> > >>>>>>> is($features[0]->type,'Component:reference'); > >>>>>>> is($features[-1]->type,'exon:confirmed'); > >>>>>>> > >>>>>>> So that the features in the data set are sorted by their start > values > >>>>>>> and the beginning and end of the list are checked. The test refers > to > >>>>>>> the test.gff data file, that contains among others these lines: > >>>>>>> > >>>>>>> Contig1 confirmed transcript 30001 31000 . - . > >>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" > >>>>>>> Contig1 confirmed exon 30001 30100 . - . Transcript > >>>>>>> trans-2; Gene "abc-1"; Note "function unknown" > >>>>>>> Contig1 confirmed exon 30701 30800 . - . Transcript > >>>>>>> trans-2 > >>>>>>> Contig1 confirmed exon 30801 31000 . - . Transcript > >>>>>>> trans-2 > >>>>>>> > >>>>>>> Since this transcript and its exons are on the minus strand, the > >>>>>>> values that the start and stop method return will be reversed, so > that > >>>>>>> start for the transcript will be 31000 and stop will be 30001. The > >>>>>>> problem with this test is since the last exon and the transcript > share > >>>>>>> a start value (31000), you can't really be sure which one will be > at > >>>>>>> the bottom of the list after sorting, right? In the case of the > >>>>>>> postgres adaptor, it fails this test on my machine because the > >>>>>>> transcript is at the bottom of the list. The test for the > beginning > >>>>>>> of the list similarly could fail though it didn't in my case, as > other > >>>>>>> features that have 1 as a start are of type "Component:clone". > >>>>>>> > >>>>>>> So, my question is this: am I missing something, and the postgres > >>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Scott > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------ > >>>>>>> Scott Cain, Ph. D. scott at > scottcain > >>>>>>> dot net > >>>>>>> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >>>>>>> Ontario Institute for Cancer Research > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Lincoln D. Stein > >>>>>> Director, Informatics and Biocomputing Platform > >>>>>> Ontario Institute for Cancer Research > >>>>>> 101 College St., Suite 800 > >>>>>> Toronto, ON, Canada M5G0A3 > >>>>>> 416 673-8514 > >>>>>> Assistant: Renata Musa > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > ------------------------------------------------------------------------ > >>>>> Scott Cain, Ph. D. scott at > scottcain > >>>>> dot net > >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>>>> Ontario Institute for Cancer Research > >>>> > >>>> > >>>> > >>>> -- > >>>> Lincoln D. Stein > >>>> Director, Informatics and Biocomputing Platform > >>>> Ontario Institute for Cancer Research > >>>> 101 College St., Suite 800 > >>>> Toronto, ON, Canada M5G0A3 > >>>> 416 673-8514 > >>>> Assistant: Renata Musa > >>>> > >>> > >>> > >>> > >>> -- > >>> > ------------------------------------------------------------------------ > >>> Scott Cain, Ph. D. scott at scottcain > dot net > >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>> Ontario Institute for Cancer Research > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From avilella at gmail.com Mon Apr 11 05:53:00 2011 From: avilella at gmail.com (Albert Vilella) Date: Mon, 11 Apr 2011 10:53:00 +0100 Subject: [Bioperl-l] bioperl vs other bio* Message-ID: Hi, This may have been asked before but, what is the current set of features in bioperl compared to biopython, biojava and other bio* projects? Has anyone listed the features present in bioperl not present in the other bio* projects? And what I specially would like to know, what are the missing features in bioperl that other bio* projects have been quicker at implementing? Cheers, Albert. From miguel.pignatelli at uv.es Mon Apr 11 06:30:29 2011 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Mon, 11 Apr 2011 11:30:29 +0100 Subject: [Bioperl-l] bioperl vs other bio* In-Reply-To: References: Message-ID: <4DA2D845.4060606@uv.es> Hi Albert, I don't know if there is a comprehensive list of "missing" features in bioperl, but I have come across one of these some months ago. It seems that bioperl lacks IO support for SFF format files while, at least, biopython has (Bio.Seq.SffIO). I started to write a module for this, but I haven't found too much time to finish it... Cheers, M; On 11/04/11 10:53, Albert Vilella wrote: > Hi, > > This may have been asked before but, what is the current set of > features in bioperl compared > to biopython, biojava and other bio* projects? > > Has anyone listed the features present in bioperl not present in the > other bio* projects? > And what I specially would like to know, what are the missing features > in bioperl that other > bio* projects have been quicker at implementing? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Mon Apr 11 08:51:22 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 11 Apr 2011 14:51:22 +0200 Subject: [Bioperl-l] Fwd: blast results: how to get forward/reverse strand info? In-Reply-To: <20110411141511.ws09sqh9sscc0gok@webmail.biochem.mpg.de> References: <20110407173538.yw3lb69c0wcowgo4@webmail.biochem.mpg.de> <20110411141511.ws09sqh9sscc0gok@webmail.biochem.mpg.de> Message-ID: ---------- Forwarded message ---------- From: Date: Mon, Apr 11, 2011 at 14:15 Subject: Re: [Bioperl-l] blast results: how to get forward/reverse strand info? To: Dave Messina Dear Dave, Thanks for your help. This solved the problem. There is one additional info: $hsp->strand() will return the value for query (which, with blastN, is always +1). This is what I had tried without success. But as you point out: $hsp->strand('hit') provides the required strand information. Thanks again. Friedhelm Quoting Dave Messina : Hi Friedhelm, > > Ah, yes, I've been bitten by that, too. You need to specify whether you > want > the strand for query or hit, a la: > > $hsp->strand('hit'); > > Here are the docs for that method, from Bio::Search::HSP::HSPI > > Title : strand > Usage : $hsp->strand('query') > Function: Retrieves the strand for the HSP component requested > Returns : +1 or -1 (0 if unknown) > Args : 'hit' or 'subject' or 'sbjct' to retrieve the strand of the > subject > 'query' to retrieve the query strand (default) > 'list' or 'array' to retreive both query and hit together > > > Dave > > > > > On Thu, Apr 7, 2011 at 17:35, wrote: > > Dear BioPerl specialists, >> >> At the end is part of a blast result which I want to parse: >> >> Blast version: BLASTN 2.2.24 [Aug-08-2010] >> >> BioPerl call: >> use Bio::SearchIO ; >> ... >> my $stringfh = new IO::String ($blaststring); >> my $in = new Bio::SearchIO (-format => 'blast', -fh => $stringfh) >> or die "parsing blast output string failed"; >> >> I need to know if my hit is forward (Strand = Plus / Plus) >> or reverse ( Strand = Plus / Minus) >> >> As I did not see how this information is returned, I checked if >> hit_end is > or < hit_start. However, the hit_start and hit_end >> coordinates >> are given "(in original hit sequence coords)" as stated in e.g. >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Search/HSP/GenericHSP.pm >> >> In the example below, >> hit_start is 24885 >> hit_end is 25902 >> >> How to find out if the hit is to the forward or reverse strand? >> I tried "->strand" but without success (always returns 1) >> >> Thanks for your help >> >> Friedhelm Pfeiffer >> fpf at biochem.mpg.de >> >> >> here is blaststring (central alignment section removed) >> >> Hfvol_pHV3 haloVolc1_dna pHV3 >> >>> >>> Length = 437906 >> >> Score = 1905 bits (961), Expect = 0.0 >> Identities = 1009/1021 (98%), Gaps = 3/1021 (0%) >> Strand = Plus / Minus >> >> Query: 27 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga >> 86 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 25902 cggcgatgccgaggatttcggaccgccggacgcgcagggaaacgccgtcgacggcgcgga >> 25843 >> >> ... >> >> Query: 986 tcaggtcccgggccctggaattcgatgctggccggattcaacggaacctttgctggggag >> 1045 >> ||||||||| ||||||||||||||||||| ||||||||| ||||| || ||||||||||| >> Sbjct: 24942 tcaggtccc-ggccctggaattcgatgct-gccggattcgacggagccgttgctggggag >> 24885 >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Mon Apr 11 09:50:58 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 08:50:58 -0500 Subject: [Bioperl-l] bioperl vs other bio* In-Reply-To: <4DA2D845.4060606@uv.es> References: <4DA2D845.4060606@uv.es> Message-ID: IIRC there was a push to add SFF parsing via io_lib binding at one point, but that seems to have fallen apart. Wouldn't be terribly hard to do, we have some XS code that could be cleaned up to support this (or we can use BioLib bindings, which did work to some degree). chris On Apr 11, 2011, at 5:30 AM, Miguel Pignatelli wrote: > Hi Albert, > > I don't know if there is a comprehensive list of "missing" features in bioperl, but I have come across one of these some months ago. It seems that bioperl lacks IO support for SFF format files while, at least, biopython has (Bio.Seq.SffIO). I started to write a module for this, but I haven't found too much time to finish it... > > Cheers, > > M; > > > On 11/04/11 10:53, Albert Vilella wrote: >> Hi, >> >> This may have been asked before but, what is the current set of >> features in bioperl compared >> to biopython, biojava and other bio* projects? >> >> Has anyone listed the features present in bioperl not present in the >> other bio* projects? >> And what I specially would like to know, what are the missing features >> in bioperl that other >> bio* projects have been quicker at implementing? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tiago.hori at gmail.com Sun Apr 10 18:56:43 2011 From: tiago.hori at gmail.com (T.Hori) Date: Sun, 10 Apr 2011 15:56:43 -0700 (PDT) Subject: [Bioperl-l] Output of a BLAST parse to text file Message-ID: <906a78cf-59eb-40d5-9c5e-b277c49dd0d4@w21g2000yqm.googlegroups.com> Hi Guys, I am really new to BioPerl, so this may be a stuoid question. Let's say I use the example from the BioPerl.org: use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'report.bls'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } } } That gives me several hits as results on the standard output. I am starting to learn BioPerl for what I think is a dauting task. I have a colletion of 20K ESTs and I have to find the best Human hit for everyone of those sequences. So I am starting by learning how to parse BLAST results. I was wondering how I would go about having the output of the parse go to tab-delimeted text file instead of the standard output. Any help would be greatly appreciated. Thanks, Tiago From bubli_thakur at rediffmail.com Mon Apr 11 03:16:10 2011 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 11 Apr 2011 07:16:10 -0000 Subject: [Bioperl-l] =?utf-8?q?Accessing_nucleotide_sequence?= Message-ID: <20110411071610.58798.qmail@f4mail-235-230.rediffmail.com> Hi everyone;I am quite new in bioperl.I am using the below code to access the cds from Genebank  for particular gi number- package Bio::Perl; use Bio::Perl; use Bio::Factory::FTLocationFactory; use Bio::DB::GenPept; use Bio::DB::GenBank; my $gp = Bio::DB::GenPept->new(); my $gb = Bio::DB::GenBank->new(); # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; my $prot_obj = $gp->get_Seq_by_id('111219521'); foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; my $seq1 = $cds_obj->seq; #print "CDS sequence is ",$cds_obj->seq,"\n"; $seq2 = new_sequence("$seq1","111219521"); write_sequence (">hum1",'fasta',$seq2); } }----------------------------------------I have to tried to modify the code to get a number of CDS according to a list of NCBI GI number provided to it. I have used the get _Stream _by_id method but it doesnot work. Can anybody please help me to modify the code or if anybody has a code to get mutiple number of cds according to list of GI number, please provide me the code.RegardsSubarna Thakur From cjfields at illinois.edu Mon Apr 11 10:15:32 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 09:15:32 -0500 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: Lincoln, It's Bio::Aseembly-specific incorrect semantics that are triggering this, I don't think anything in the SF::Store/GBrowse set triggers the problem. So, to me this isn't a blocker for anything unless someone specifies that Bio::Assembly must use the Pg adaptor (I think it uses the memory one by default, not sure if it's hard-coded that way). I do agree with Scott, that MySQL and other adaptors are silently dealing with this data w/o dying, so this should be filed for tracking, as I'm sure it will pop up at some point again. The relevant tests should be TODO'd to catch this. Since this will likely be the last release for 1.6.x, I suppose we can go ahead and leave the version number as 1.0069. chris On Apr 10, 2011, at 4:10 PM, Lincoln Stein wrote: > Hi Folks, > > Is this what's blocking the netx bioperl release, or are there other > blockers? > > I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. > > Lincoln > > On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields wrote: > >> That was introduced here by Florent: >> >> >> https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t >> >> So, essentially the MySQL adaptor is getting this wrong. Any way we can >> somehow enable strict mode? >> >> http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html >> >> chris >> >> On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: >> >>> Argh! MySQL is not a RDMS! Anyone who tells you otherwise is lying! >>> >>> The first test failure for the Pg SFS adaptor is failing because it is >>> trying to execute this query (which it inherited from the mysql >>> adaptor, where it works just fine): >>> >>> select id,object FROM bioperl_seqfeature_t_test_schema_feature where >>> id='doesnotexit'; >>> >>> Of course, the id column is defined as an integer column. MySQL must >>> be silently casting this string to an integer value (? I guess >>> anyway--who knows). Anyway, PostgreSQL does the right thing and >>> throws an error with this query. I don't see how I can make the >>> Postgres adaptor pass this test as written, as it is nonsensical. >>> >>> Scott >>> >>> >>> >>> On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields >> wrote: >>>> Scott, >>>> >>>> I'll try documenting the Pg error for SF::Store in the next hour. Had >> my hands full with the GSoC onslaught of emails and local $job stuff. Would >> like to get it fixed for the CPAN release. >>>> >>>> chris >>>> >>>> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: >>>> >>>>> OK, I'll take it out and move on to the next problem. >>>>> >>>>> Thanks, >>>>> Scott >>>>> >>>>> >>>>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein >> wrote: >>>>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is >> too >>>>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) >> Best to >>>>>> remove the test altogether. >>>>>> Lincoln >>>>>> >>>>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain >> wrote: >>>>>>> >>>>>>> Hi Lincoln, >>>>>>> >>>>>>> Yes, apparently, it does. It does this for both the memory and the >>>>>>> postgres adaptors. I looked at how the data was stored in the >> feature >>>>>>> object with Data::Dumper and that is how it is represented in the >> hash >>>>>>> too. Perhaps this test should be calling the "absolute" method >> first? >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < >> lincoln.stein at gmail.com> >>>>>>> wrote: >>>>>>>> Do start() and end() flip values for minus strand features? This >> isn't >>>>>>>> supposed to happen. >>>>>>>> Lincoln >>>>>>>> >>>>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain >> wrote: >>>>>>>>> >>>>>>>>> Hi Lincoln, >>>>>>>>> >>>>>>>>> I've been looking into some test failures with the postgres adaptor >>>>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm >> interpreting >>>>>>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: >>>>>>>>> >>>>>>>>> @features = sort {$a->start<=>$b->start} @features; >>>>>>>>> >>>>>>>>> is($features[0]->type,'Component:reference'); >>>>>>>>> is($features[-1]->type,'exon:confirmed'); >>>>>>>>> >>>>>>>>> So that the features in the data set are sorted by their start >> values >>>>>>>>> and the beginning and end of the list are checked. The test refers >> to >>>>>>>>> the test.gff data file, that contains among others these lines: >>>>>>>>> >>>>>>>>> Contig1 confirmed transcript 30001 31000 . - . >>>>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>>>>> Contig1 confirmed exon 30001 30100 . - . Transcript >>>>>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>>>>> Contig1 confirmed exon 30701 30800 . - . Transcript >>>>>>>>> trans-2 >>>>>>>>> Contig1 confirmed exon 30801 31000 . - . Transcript >>>>>>>>> trans-2 >>>>>>>>> >>>>>>>>> Since this transcript and its exons are on the minus strand, the >>>>>>>>> values that the start and stop method return will be reversed, so >> that >>>>>>>>> start for the transcript will be 31000 and stop will be 30001. The >>>>>>>>> problem with this test is since the last exon and the transcript >> share >>>>>>>>> a start value (31000), you can't really be sure which one will be >> at >>>>>>>>> the bottom of the list after sorting, right? In the case of the >>>>>>>>> postgres adaptor, it fails this test on my machine because the >>>>>>>>> transcript is at the bottom of the list. The test for the >> beginning >>>>>>>>> of the list similarly could fail though it didn't in my case, as >> other >>>>>>>>> features that have 1 as a start are of type "Component:clone". >>>>>>>>> >>>>>>>>> So, my question is this: am I missing something, and the postgres >>>>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Scott >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >> ------------------------------------------------------------------------ >>>>>>>>> Scott Cain, Ph. D. scott at >> scottcain >>>>>>>>> dot net >>>>>>>>> GMOD Coordinator (http://gmod.org/) >> 216-392-3087 >>>>>>>>> Ontario Institute for Cancer Research >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Lincoln D. Stein >>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>> Ontario Institute for Cancer Research >>>>>>>> 101 College St., Suite 800 >>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>> 416 673-8514 >>>>>>>> Assistant: Renata Musa >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at >> scottcain >>>>>>> dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Lincoln D. Stein >>>>>> Director, Informatics and Biocomputing Platform >>>>>> Ontario Institute for Cancer Research >>>>>> 101 College St., Suite 800 >>>>>> Toronto, ON, Canada M5G0A3 >>>>>> 416 673-8514 >>>>>> Assistant: Renata Musa >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain >> dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Apr 11 10:31:27 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 11 Apr 2011 15:31:27 +0100 Subject: [Bioperl-l] Output of a BLAST parse to text file In-Reply-To: <906a78cf-59eb-40d5-9c5e-b277c49dd0d4@w21g2000yqm.googlegroups.com> References: <906a78cf-59eb-40d5-9c5e-b277c49dd0d4@w21g2000yqm.googlegroups.com> Message-ID: <1302532287.15828.60.camel@deskpro15336.internal.sanger.ac.uk> Hi Tiago, In the terminal where you run your script (let's say it's called my_blast_parse.pl), you can run it like so: my_blast_parse.pl [PARAMETERS] > my_outfile.txt The ">" will redirect the output to a file calles my_outfile.txt. To make the output tab-delimited, you will want to play with the print statement in the script. try this for a start (the "\t" inserts tabs): print $result->query_name."\t".$hit->name."\t".$hsp->length('total')."\t". $hsp->percent_identity, "\n"; Good luck! Frank On Sun, 2011-04-10 at 15:56 -0700, T.Hori wrote: > Hi Guys, > > I am really new to BioPerl, so this may be a stuoid question. Let's > say I use the example from the BioPerl.org: > > use strict; > use Bio::SearchIO; > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'report.bls'); > while( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query=", $result->query_name, > " Hit=", $hit->name, > " Length=", $hsp->length('total'), > " Percent_id=", $hsp->percent_identity, "\n"; > } > } > } > } > } > > That gives me several hits as results on the standard output. I am > starting to learn BioPerl for what I think is a dauting task. I have a > colletion of 20K ESTs and I have to find the best Human hit for > everyone of those sequences. So I am starting by learning how to parse > BLAST results. I was wondering how I would go about having the output > of the parse go to tab-delimeted text file instead of the standard > output. > > Any help would be greatly appreciated. > > Thanks, > > Tiago > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon Apr 11 11:49:09 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 10:49:09 -0500 Subject: [Bioperl-l] [Announcement] Next BioPerl release Message-ID: All, Lincoln needs a new BioPerl release to CPAN ASAP to coordinate with the latest GBrowse2 release. The latest GBrowse2 requires v BioPerl 1.0069, which is what we're using on the master branch to distinguish from the 1.6 branch version (the master branch version meant to designate 'just prior to 1.7'). However, we were planning on designating the next release as 1.6.2 (or, 1.006002), which will not match with GBrowse's requirements. Instead of asking Lincoln to change GBrowse to reflect that, and b/c this release may be the last one for the 1.6.x series prior to some major work on core this summer, I suggest we go ahead and release the next version as soon as possible and keep the version as 1.0069. There is very little difference between what is in the 1.6.2 branch and master anyway, and (if things go well) we're planning on a major restructuring of BioPerl over the summer that could effectively be the next major version of BioPerl. A few things: 1) I am requesting that any work beyond simple doc fixes or small code changes be carried out on a branch of master. Most of the tests are now passing; please don't break them with a last-minute rush to get code in. I reserve the right to revert changes if necessary. 2) I will create a new branch, just prior to the release (1.6.9). The old branch that was previously meant to be 1.6.2 will be removed; it was essentially a very recent branch of master anyway, just with a different version. 3) The only significant blocker to the release at the moment is being worked out on a branch (https://redmine.open-bio.org/issues/3196). This should be fixed very soon. 4) If needed the 1.6.9 branch can be used for any future bug fixes; I anticipate this may happen if we need to get a new release out that GBrowse requires. The version will be bumped to 1.006901, 1.006902, etc. accordingly. Any questions/feedback on the above? I will be on #bioperl on IRC (freenode) as well as here. chris From scott at scottcain.net Mon Apr 11 11:53:53 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 11:53:53 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: The test I was complaining about last week is clearly flawed. I would suggest changing the "doesnotexit" to "-1" in this test: is( $db->fetch('doesnotexit'), undef); where the point of the test is to search for something by ID that doesn't exist. Since a primary key of "-1" is unlikely to be used with an autogenerated primary key, that should safely fail to find something, thus passing the test. There is, however, a second failure with the Pg test that I'm working on today, where for some reason, the same method in the mysql adaptor (inherited by the Pg adaptor) is generating different queries when run against the different databases. Once I sort out why that is happening, the Pg adaptor should be passing tests again. Scott On Mon, Apr 11, 2011 at 10:15 AM, Chris Fields wrote: > Lincoln, > > It's Bio::Aseembly-specific incorrect semantics that are triggering this, I don't think anything in the SF::Store/GBrowse set triggers the problem. ?So, to me this isn't a blocker for anything unless someone specifies that Bio::Assembly must use the Pg adaptor (I think it uses the memory one by default, not sure if it's hard-coded that way). > > I do agree with Scott, that MySQL and other adaptors are silently dealing with this data w/o dying, so this should be filed for tracking, as I'm sure it will pop up at some point again. ?The relevant tests should be TODO'd to catch this. > > Since this will likely be the last release for 1.6.x, I suppose we can go ahead and leave the version number as 1.0069. > > chris > > On Apr 10, 2011, at 4:10 PM, Lincoln Stein wrote: > >> Hi Folks, >> >> Is this what's blocking the netx bioperl release, or are there other >> blockers? >> >> I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. >> >> Lincoln >> >> On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields wrote: >> >>> That was introduced here by Florent: >>> >>> >>> https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t >>> >>> So, essentially the MySQL adaptor is getting this wrong. ?Any way we can >>> somehow enable strict mode? >>> >>> http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html >>> >>> chris >>> >>> On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: >>> >>>> Argh! ?MySQL is not a RDMS! ?Anyone who tells you otherwise is lying! >>>> >>>> The first test failure for the Pg SFS adaptor is failing because it is >>>> trying to execute this query (which it inherited from the mysql >>>> adaptor, where it works just fine): >>>> >>>> select id,object FROM bioperl_seqfeature_t_test_schema_feature where >>>> id='doesnotexit'; >>>> >>>> Of course, the id column is defined as an integer column. ?MySQL must >>>> be silently casting this string to an integer value (? I guess >>>> anyway--who knows). ?Anyway, PostgreSQL does the right thing and >>>> throws an error with this query. ?I don't see how I can make the >>>> Postgres adaptor pass this test as written, as it is nonsensical. >>>> >>>> Scott >>>> >>>> >>>> >>>> On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields >>> wrote: >>>>> Scott, >>>>> >>>>> I'll try documenting the Pg error for SF::Store in the next hour. ?Had >>> my hands full with the GSoC onslaught of emails and local $job stuff. ?Would >>> like to get it fixed for the CPAN release. >>>>> >>>>> chris >>>>> >>>>> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: >>>>> >>>>>> OK, I'll take it out and move on to the next problem. >>>>>> >>>>>> Thanks, >>>>>> Scott >>>>>> >>>>>> >>>>>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein >>> wrote: >>>>>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is >>> too >>>>>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) >>> Best to >>>>>>> remove the test altogether. >>>>>>> Lincoln >>>>>>> >>>>>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain >>> wrote: >>>>>>>> >>>>>>>> Hi Lincoln, >>>>>>>> >>>>>>>> Yes, apparently, it does. ?It does this for both the memory and the >>>>>>>> postgres adaptors. ?I looked at how the data was stored in the >>> feature >>>>>>>> object with Data::Dumper and that is how it is represented in the >>> hash >>>>>>>> too. ?Perhaps this test should be calling the "absolute" method >>> first? >>>>>>>> >>>>>>>> Scott >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < >>> lincoln.stein at gmail.com> >>>>>>>> wrote: >>>>>>>>> Do start() and end() flip values for minus strand features? This >>> isn't >>>>>>>>> supposed to happen. >>>>>>>>> Lincoln >>>>>>>>> >>>>>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain >>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Lincoln, >>>>>>>>>> >>>>>>>>>> I've been looking into some test failures with the postgres adaptor >>>>>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm >>> interpreting >>>>>>>>>> this correctly. ?In t/LocalDB/BioDBGFF.t there are these lines: >>>>>>>>>> >>>>>>>>>> @features = sort {$a->start<=>$b->start} @features; >>>>>>>>>> >>>>>>>>>> is($features[0]->type,'Component:reference'); >>>>>>>>>> is($features[-1]->type,'exon:confirmed'); >>>>>>>>>> >>>>>>>>>> So that the features in the data set are sorted by their start >>> values >>>>>>>>>> and the beginning and end of the list are checked. ?The test refers >>> to >>>>>>>>>> the test.gff data file, that contains among others these lines: >>>>>>>>>> >>>>>>>>>> Contig1 confirmed ? transcript ? ? ?30001 ? 31000 ? . ? - ? . >>>>>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>>>>>> Contig1 confirmed ? exon ? ?30001 ? 30100 ? . ? - ? . ? Transcript >>>>>>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>>>>>> Contig1 confirmed ? exon ? ?30701 ? 30800 ? . ? - ? . ? Transcript >>>>>>>>>> trans-2 >>>>>>>>>> Contig1 confirmed ? exon ? ?30801 ? 31000 ? . ? - ? . ? Transcript >>>>>>>>>> trans-2 >>>>>>>>>> >>>>>>>>>> Since this transcript and its exons are on the minus strand, the >>>>>>>>>> values that the start and stop method return will be reversed, so >>> that >>>>>>>>>> start for the transcript will be 31000 and stop will be 30001. ?The >>>>>>>>>> problem with this test is since the last exon and the transcript >>> share >>>>>>>>>> a start value (31000), you can't really be sure which one will be >>> at >>>>>>>>>> the bottom of the list after sorting, right? ?In the case of the >>>>>>>>>> postgres adaptor, it fails this test on my machine because the >>>>>>>>>> transcript is at the bottom of the list. ?The test for the >>> beginning >>>>>>>>>> of the list similarly could fail though it didn't in my case, as >>> other >>>>>>>>>> features that have 1 as a start are of type "Component:clone". >>>>>>>>>> >>>>>>>>>> So, my question is this: am I missing something, and the postgres >>>>>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Scott >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> >>> ------------------------------------------------------------------------ >>>>>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at >>> scottcain >>>>>>>>>> dot net >>>>>>>>>> GMOD Coordinator (http://gmod.org/) >>> 216-392-3087 >>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Lincoln D. Stein >>>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>> 101 College St., Suite 800 >>>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>>> 416 673-8514 >>>>>>>>> Assistant: Renata Musa >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>> ------------------------------------------------------------------------ >>>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at >>> scottcain >>>>>>>> dot net >>>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Lincoln D. Stein >>>>>>> Director, Informatics and Biocomputing Platform >>>>>>> Ontario Institute for Cancer Research >>>>>>> 101 College St., Suite 800 >>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>> 416 673-8514 >>>>>>> Assistant: Renata Musa >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>> dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>> dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From adam at retchless.us Mon Apr 11 11:56:56 2011 From: adam at retchless.us (Adam Retchless) Date: Mon, 11 Apr 2011 11:56:56 -0400 Subject: [Bioperl-l] Bug? Phyml parameters Message-ID: <4DA324C8.40000@retchless.us> Dear BioPerl crew, I noticed some strange behavior in how the Phyml wrapper treats parameters, and am wondering if there is a reason for this, or if it is a bug. Documentation is here: http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html Being new to BioPerl, I'm not exactly sure what is the right way to address this. Let me know if this should just be reported as a bug rather than writing to the list... 1) The major issue is that the wrapper seems to disable the option to calculate confidence values for the trees (e.g. bootstrap, aLRT). In the documentation, I see no option to set this parameter, and the "_setparams" subroutine has a comment line explicitly stating "no bootstrap sets" (http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html#CODE26). A web search revealed no discussion of this point. In fact, there were several references to the bootstrap values arising from Phyml, making me think that it used to be enabled (or there is some other way to enable it). 2) While digging into the above issue, I looked at the "new" subroutine and it looks like the "freq" parameter is mis-assigned. The value of "freq" is given to the variable $kappa rather than $freq. (Code here: http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html#CODE1) Any information on this (particularly #1) would be appreciated. Thanks, Adam -- Adam Retchless Center for Genomic Sciences Allegheny-Singer Research Institute From lincoln.stein at gmail.com Mon Apr 11 12:50:47 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 11 Apr 2011 12:50:47 -0400 Subject: [Bioperl-l] [Announcement] Next BioPerl release In-Reply-To: References: Message-ID: I can re-release GBrowse to use whatever bioperl numbering system is needed. Sorry if I did not understand the branch structure! This will be very easy for me. Lincoln On Mon, Apr 11, 2011 at 11:49 AM, Chris Fields wrote: > All, > > Lincoln needs a new BioPerl release to CPAN ASAP to coordinate with the > latest GBrowse2 release. The latest GBrowse2 requires v BioPerl 1.0069, > which is what we're using on the master branch to distinguish from the 1.6 > branch version (the master branch version meant to designate 'just prior to > 1.7'). However, we were planning on designating the next release as 1.6.2 > (or, 1.006002), which will not match with GBrowse's requirements. > > Instead of asking Lincoln to change GBrowse to reflect that, and b/c this > release may be the last one for the 1.6.x series prior to some major work on > core this summer, I suggest we go ahead and release the next version as soon > as possible and keep the version as 1.0069. There is very little difference > between what is in the 1.6.2 branch and master anyway, and (if things go > well) we're planning on a major restructuring of BioPerl over the summer > that could effectively be the next major version of BioPerl. > > A few things: > > 1) I am requesting that any work beyond simple doc fixes or small code > changes be carried out on a branch of master. Most of the tests are now > passing; please don't break them with a last-minute rush to get code in. I > reserve the right to revert changes if necessary. > > 2) I will create a new branch, just prior to the release (1.6.9). The old > branch that was previously meant to be 1.6.2 will be removed; it was > essentially a very recent branch of master anyway, just with a different > version. > > 3) The only significant blocker to the release at the moment is being > worked out on a branch (https://redmine.open-bio.org/issues/3196). This > should be fixed very soon. > > 4) If needed the 1.6.9 branch can be used for any future bug fixes; I > anticipate this may happen if we need to get a new release out that GBrowse > requires. The version will be bumped to 1.006901, 1.006902, etc. > accordingly. > > Any questions/feedback on the above? I will be on #bioperl on IRC > (freenode) as well as here. > > chris -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From huangyifeicmb at gmail.com Mon Apr 11 13:06:36 2011 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Mon, 11 Apr 2011 13:06:36 -0400 Subject: [Bioperl-l] bioperl vs other bio* In-Reply-To: References: Message-ID: Hi Albert, I used Bio++, a set of C++ libraries for phylogenetics and population genetics, in my recent project. I found it's a very useful tool to implement your own phylogenetic models, because it has a series of classes for manipulating biological data, calculating tree likelihood, and performing numerical optimization. I don't find these features in bioperl. Probably perl is not an efficient language to implement algorithms. Yifei On Mon, Apr 11, 2011 at 5:53 AM, Albert Vilella wrote: > Hi, > > This may have been asked before but, what is the current set of > features in bioperl compared > to biopython, biojava and other bio* projects? > > Has anyone listed the features present in bioperl not present in the > other bio* projects? > And what I specially would like to know, what are the missing features > in bioperl that other > bio* projects have been quicker at implementing? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From scott at scottcain.net Mon Apr 11 13:27:34 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 13:27:34 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: OK, I fixed the second bug I referred to below (and committed it on master :-) If nobody complains, I also change the test I referred to first below, and then my issues with the next release should be resolved. Thanks, Scott On Mon, Apr 11, 2011 at 11:53 AM, Scott Cain wrote: > The test I was complaining about last week is clearly flawed. ?I would > suggest changing the "doesnotexit" to "-1" in this test: > > ?is( $db->fetch('doesnotexit'), undef); > > where the point of the test is to search for something by ID that > doesn't exist. ?Since a primary key of "-1" is unlikely to be used > with an autogenerated primary key, that should safely fail to find > something, thus passing the test. > > There is, however, a second failure with the Pg test that I'm working > on today, where for some reason, the same method in the mysql adaptor > (inherited by the Pg adaptor) is generating different queries when run > against the different databases. ?Once I sort out why that is > happening, the Pg adaptor should be passing tests again. > > Scott > > > On Mon, Apr 11, 2011 at 10:15 AM, Chris Fields wrote: >> Lincoln, >> >> It's Bio::Aseembly-specific incorrect semantics that are triggering this, I don't think anything in the SF::Store/GBrowse set triggers the problem. ?So, to me this isn't a blocker for anything unless someone specifies that Bio::Assembly must use the Pg adaptor (I think it uses the memory one by default, not sure if it's hard-coded that way). >> >> I do agree with Scott, that MySQL and other adaptors are silently dealing with this data w/o dying, so this should be filed for tracking, as I'm sure it will pop up at some point again. ?The relevant tests should be TODO'd to catch this. >> >> Since this will likely be the last release for 1.6.x, I suppose we can go ahead and leave the version number as 1.0069. >> >> chris >> >> On Apr 10, 2011, at 4:10 PM, Lincoln Stein wrote: >> >>> Hi Folks, >>> >>> Is this what's blocking the netx bioperl release, or are there other >>> blockers? >>> >>> I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. >>> >>> Lincoln >>> >>> On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields wrote: >>> >>>> That was introduced here by Florent: >>>> >>>> >>>> https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t >>>> >>>> So, essentially the MySQL adaptor is getting this wrong. ?Any way we can >>>> somehow enable strict mode? >>>> >>>> http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html >>>> >>>> chris >>>> >>>> On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: >>>> >>>>> Argh! ?MySQL is not a RDMS! ?Anyone who tells you otherwise is lying! >>>>> >>>>> The first test failure for the Pg SFS adaptor is failing because it is >>>>> trying to execute this query (which it inherited from the mysql >>>>> adaptor, where it works just fine): >>>>> >>>>> select id,object FROM bioperl_seqfeature_t_test_schema_feature where >>>>> id='doesnotexit'; >>>>> >>>>> Of course, the id column is defined as an integer column. ?MySQL must >>>>> be silently casting this string to an integer value (? I guess >>>>> anyway--who knows). ?Anyway, PostgreSQL does the right thing and >>>>> throws an error with this query. ?I don't see how I can make the >>>>> Postgres adaptor pass this test as written, as it is nonsensical. >>>>> >>>>> Scott >>>>> >>>>> >>>>> >>>>> On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields >>>> wrote: >>>>>> Scott, >>>>>> >>>>>> I'll try documenting the Pg error for SF::Store in the next hour. ?Had >>>> my hands full with the GSoC onslaught of emails and local $job stuff. ?Would >>>> like to get it fixed for the CPAN release. >>>>>> >>>>>> chris >>>>>> >>>>>> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: >>>>>> >>>>>>> OK, I'll take it out and move on to the next problem. >>>>>>> >>>>>>> Thanks, >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein >>>> wrote: >>>>>>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is >>>> too >>>>>>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) >>>> Best to >>>>>>>> remove the test altogether. >>>>>>>> Lincoln >>>>>>>> >>>>>>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain >>>> wrote: >>>>>>>>> >>>>>>>>> Hi Lincoln, >>>>>>>>> >>>>>>>>> Yes, apparently, it does. ?It does this for both the memory and the >>>>>>>>> postgres adaptors. ?I looked at how the data was stored in the >>>> feature >>>>>>>>> object with Data::Dumper and that is how it is represented in the >>>> hash >>>>>>>>> too. ?Perhaps this test should be calling the "absolute" method >>>> first? >>>>>>>>> >>>>>>>>> Scott >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < >>>> lincoln.stein at gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Do start() and end() flip values for minus strand features? This >>>> isn't >>>>>>>>>> supposed to happen. >>>>>>>>>> Lincoln >>>>>>>>>> >>>>>>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain >>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Lincoln, >>>>>>>>>>> >>>>>>>>>>> I've been looking into some test failures with the postgres adaptor >>>>>>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm >>>> interpreting >>>>>>>>>>> this correctly. ?In t/LocalDB/BioDBGFF.t there are these lines: >>>>>>>>>>> >>>>>>>>>>> @features = sort {$a->start<=>$b->start} @features; >>>>>>>>>>> >>>>>>>>>>> is($features[0]->type,'Component:reference'); >>>>>>>>>>> is($features[-1]->type,'exon:confirmed'); >>>>>>>>>>> >>>>>>>>>>> So that the features in the data set are sorted by their start >>>> values >>>>>>>>>>> and the beginning and end of the list are checked. ?The test refers >>>> to >>>>>>>>>>> the test.gff data file, that contains among others these lines: >>>>>>>>>>> >>>>>>>>>>> Contig1 confirmed ? transcript ? ? ?30001 ? 31000 ? . ? - ? . >>>>>>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>>>>>>> Contig1 confirmed ? exon ? ?30001 ? 30100 ? . ? - ? . ? Transcript >>>>>>>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>>>>>>> Contig1 confirmed ? exon ? ?30701 ? 30800 ? . ? - ? . ? Transcript >>>>>>>>>>> trans-2 >>>>>>>>>>> Contig1 confirmed ? exon ? ?30801 ? 31000 ? . ? - ? . ? Transcript >>>>>>>>>>> trans-2 >>>>>>>>>>> >>>>>>>>>>> Since this transcript and its exons are on the minus strand, the >>>>>>>>>>> values that the start and stop method return will be reversed, so >>>> that >>>>>>>>>>> start for the transcript will be 31000 and stop will be 30001. ?The >>>>>>>>>>> problem with this test is since the last exon and the transcript >>>> share >>>>>>>>>>> a start value (31000), you can't really be sure which one will be >>>> at >>>>>>>>>>> the bottom of the list after sorting, right? ?In the case of the >>>>>>>>>>> postgres adaptor, it fails this test on my machine because the >>>>>>>>>>> transcript is at the bottom of the list. ?The test for the >>>> beginning >>>>>>>>>>> of the list similarly could fail though it didn't in my case, as >>>> other >>>>>>>>>>> features that have 1 as a start are of type "Component:clone". >>>>>>>>>>> >>>>>>>>>>> So, my question is this: am I missing something, and the postgres >>>>>>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Scott >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> >>>> ------------------------------------------------------------------------ >>>>>>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at >>>> scottcain >>>>>>>>>>> dot net >>>>>>>>>>> GMOD Coordinator (http://gmod.org/) >>>> 216-392-3087 >>>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Lincoln D. Stein >>>>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>>> 101 College St., Suite 800 >>>>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>>>> 416 673-8514 >>>>>>>>>> Assistant: Renata Musa >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>> ------------------------------------------------------------------------ >>>>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at >>>> scottcain >>>>>>>>> dot net >>>>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>>>> Ontario Institute for Cancer Research >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Lincoln D. Stein >>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>> Ontario Institute for Cancer Research >>>>>>>> 101 College St., Suite 800 >>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>> 416 673-8514 >>>>>>>> Assistant: Renata Musa >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>>> dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain >>>> dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Director, Informatics and Biocomputing Platform >>> Ontario Institute for Cancer Research >>> 101 College St., Suite 800 >>> Toronto, ON, Canada M5G0A3 >>> 416 673-8514 >>> Assistant: Renata Musa >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Apr 11 13:35:58 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 12:35:58 -0500 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: References: Message-ID: <9B8FB5BF-0924-4DF2-A12F-E1E075807DF0@illinois.edu> Scott, Go ahead and change the test if you haven't done it already. The only blocker would be the Module::Build work I alluded to earlier, which I'm working on now. Lincoln, probably not necessary to worry about changing the version number for the GBrowse release, we can deal with bug fixes with minor version increments on 1.6.9. Most end-users won't care about versioning anyway as long as the dependency install path works fine. chris On Apr 11, 2011, at 12:27 PM, Scott Cain wrote: > OK, I fixed the second bug I referred to below (and committed it on > master :-) If nobody complains, I also change the test I referred to > first below, and then my issues with the next release should be > resolved. > > Thanks, > Scott > > > On Mon, Apr 11, 2011 at 11:53 AM, Scott Cain wrote: >> The test I was complaining about last week is clearly flawed. I would >> suggest changing the "doesnotexit" to "-1" in this test: >> >> is( $db->fetch('doesnotexit'), undef); >> >> where the point of the test is to search for something by ID that >> doesn't exist. Since a primary key of "-1" is unlikely to be used >> with an autogenerated primary key, that should safely fail to find >> something, thus passing the test. >> >> There is, however, a second failure with the Pg test that I'm working >> on today, where for some reason, the same method in the mysql adaptor >> (inherited by the Pg adaptor) is generating different queries when run >> against the different databases. Once I sort out why that is >> happening, the Pg adaptor should be passing tests again. >> >> Scott >> >> >> On Mon, Apr 11, 2011 at 10:15 AM, Chris Fields wrote: >>> Lincoln, >>> >>> It's Bio::Aseembly-specific incorrect semantics that are triggering this, I don't think anything in the SF::Store/GBrowse set triggers the problem. So, to me this isn't a blocker for anything unless someone specifies that Bio::Assembly must use the Pg adaptor (I think it uses the memory one by default, not sure if it's hard-coded that way). >>> >>> I do agree with Scott, that MySQL and other adaptors are silently dealing with this data w/o dying, so this should be filed for tracking, as I'm sure it will pop up at some point again. The relevant tests should be TODO'd to catch this. >>> >>> Since this will likely be the last release for 1.6.x, I suppose we can go ahead and leave the version number as 1.0069. >>> >>> chris >>> >>> On Apr 10, 2011, at 4:10 PM, Lincoln Stein wrote: >>> >>>> Hi Folks, >>>> >>>> Is this what's blocking the netx bioperl release, or are there other >>>> blockers? >>>> >>>> I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. >>>> >>>> Lincoln >>>> >>>> On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields wrote: >>>> >>>>> That was introduced here by Florent: >>>>> >>>>> >>>>> https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t >>>>> >>>>> So, essentially the MySQL adaptor is getting this wrong. Any way we can >>>>> somehow enable strict mode? >>>>> >>>>> http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html >>>>> >>>>> chris >>>>> >>>>> On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: >>>>> >>>>>> Argh! MySQL is not a RDMS! Anyone who tells you otherwise is lying! >>>>>> >>>>>> The first test failure for the Pg SFS adaptor is failing because it is >>>>>> trying to execute this query (which it inherited from the mysql >>>>>> adaptor, where it works just fine): >>>>>> >>>>>> select id,object FROM bioperl_seqfeature_t_test_schema_feature where >>>>>> id='doesnotexit'; >>>>>> >>>>>> Of course, the id column is defined as an integer column. MySQL must >>>>>> be silently casting this string to an integer value (? I guess >>>>>> anyway--who knows). Anyway, PostgreSQL does the right thing and >>>>>> throws an error with this query. I don't see how I can make the >>>>>> Postgres adaptor pass this test as written, as it is nonsensical. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields >>>>> wrote: >>>>>>> Scott, >>>>>>> >>>>>>> I'll try documenting the Pg error for SF::Store in the next hour. Had >>>>> my hands full with the GSoC onslaught of emails and local $job stuff. Would >>>>> like to get it fixed for the CPAN release. >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: >>>>>>> >>>>>>>> OK, I'll take it out and move on to the next problem. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Scott >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein >>>>> wrote: >>>>>>>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and it is >>>>> too >>>>>>>>> late to change it now. (Bio::DB::SeqFeature::Store had better not!) >>>>> Best to >>>>>>>>> remove the test altogether. >>>>>>>>> Lincoln >>>>>>>>> >>>>>>>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain >>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Lincoln, >>>>>>>>>> >>>>>>>>>> Yes, apparently, it does. It does this for both the memory and the >>>>>>>>>> postgres adaptors. I looked at how the data was stored in the >>>>> feature >>>>>>>>>> object with Data::Dumper and that is how it is represented in the >>>>> hash >>>>>>>>>> too. Perhaps this test should be calling the "absolute" method >>>>> first? >>>>>>>>>> >>>>>>>>>> Scott >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < >>>>> lincoln.stein at gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> Do start() and end() flip values for minus strand features? This >>>>> isn't >>>>>>>>>>> supposed to happen. >>>>>>>>>>> Lincoln >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain >>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Lincoln, >>>>>>>>>>>> >>>>>>>>>>>> I've been looking into some test failures with the postgres adaptor >>>>>>>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm >>>>> interpreting >>>>>>>>>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these lines: >>>>>>>>>>>> >>>>>>>>>>>> @features = sort {$a->start<=>$b->start} @features; >>>>>>>>>>>> >>>>>>>>>>>> is($features[0]->type,'Component:reference'); >>>>>>>>>>>> is($features[-1]->type,'exon:confirmed'); >>>>>>>>>>>> >>>>>>>>>>>> So that the features in the data set are sorted by their start >>>>> values >>>>>>>>>>>> and the beginning and end of the list are checked. The test refers >>>>> to >>>>>>>>>>>> the test.gff data file, that contains among others these lines: >>>>>>>>>>>> >>>>>>>>>>>> Contig1 confirmed transcript 30001 31000 . - . >>>>>>>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" >>>>>>>>>>>> Contig1 confirmed exon 30001 30100 . - . Transcript >>>>>>>>>>>> trans-2; Gene "abc-1"; Note "function unknown" >>>>>>>>>>>> Contig1 confirmed exon 30701 30800 . - . Transcript >>>>>>>>>>>> trans-2 >>>>>>>>>>>> Contig1 confirmed exon 30801 31000 . - . Transcript >>>>>>>>>>>> trans-2 >>>>>>>>>>>> >>>>>>>>>>>> Since this transcript and its exons are on the minus strand, the >>>>>>>>>>>> values that the start and stop method return will be reversed, so >>>>> that >>>>>>>>>>>> start for the transcript will be 31000 and stop will be 30001. The >>>>>>>>>>>> problem with this test is since the last exon and the transcript >>>>> share >>>>>>>>>>>> a start value (31000), you can't really be sure which one will be >>>>> at >>>>>>>>>>>> the bottom of the list after sorting, right? In the case of the >>>>>>>>>>>> postgres adaptor, it fails this test on my machine because the >>>>>>>>>>>> transcript is at the bottom of the list. The test for the >>>>> beginning >>>>>>>>>>>> of the list similarly could fail though it didn't in my case, as >>>>> other >>>>>>>>>>>> features that have 1 as a start are of type "Component:clone". >>>>>>>>>>>> >>>>>>>>>>>> So, my question is this: am I missing something, and the postgres >>>>>>>>>>>> adaptor is not behaving as expected, or are these tests ambiguous? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Scott >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> >>>>> ------------------------------------------------------------------------ >>>>>>>>>>>> Scott Cain, Ph. D. scott at >>>>> scottcain >>>>>>>>>>>> dot net >>>>>>>>>>>> GMOD Coordinator (http://gmod.org/) >>>>> 216-392-3087 >>>>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Lincoln D. Stein >>>>>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>>>> 101 College St., Suite 800 >>>>>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>>>>> 416 673-8514 >>>>>>>>>>> Assistant: Renata Musa >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>> ------------------------------------------------------------------------ >>>>>>>>>> Scott Cain, Ph. D. scott at >>>>> scottcain >>>>>>>>>> dot net >>>>>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Lincoln D. Stein >>>>>>>>> Director, Informatics and Biocomputing Platform >>>>>>>>> Ontario Institute for Cancer Research >>>>>>>>> 101 College St., Suite 800 >>>>>>>>> Toronto, ON, Canada M5G0A3 >>>>>>>>> 416 673-8514 >>>>>>>>> Assistant: Renata Musa >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>> ------------------------------------------------------------------------ >>>>>>>> Scott Cain, Ph. D. scott at scottcain >>>>> dot net >>>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>>> Ontario Institute for Cancer Research >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. scott at scottcain >>>>> dot net >>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Director, Informatics and Biocomputing Platform >>>> Ontario Institute for Cancer Research >>>> 101 College St., Suite 800 >>>> Toronto, ON, Canada M5G0A3 >>>> 416 673-8514 >>>> Assistant: Renata Musa >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Apr 11 14:37:45 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 14:37:45 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures Message-ID: Hi Lincoln, Last week you fixed a problem with the way Bio::DB::GFF created MySQL tables, removing the "type=MYISAM" from the declaration. The problem now is that when I try to create a new Bio::DB::GFF database, I get this error: The used table type doesn't support FULLTEXT indexes when trying to create the fattribute_to_feature table. It looks like the default engine is InnoDB, which doesn't support full text searching. Of course, if I add "ENGINE MYISAM" to the end of the query it works. This is with mysql 5.5.9. When the table creation fails, obviously most of the tests fail when testing against MySQL. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Mon Apr 11 14:55:23 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 11 Apr 2011 14:55:23 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: Oh gee, what can I do? I will have to hard-code testing for the particular version of mysql in order to fix this. Lincoln On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain wrote: > Hi Lincoln, > > Last week you fixed a problem with the way Bio::DB::GFF created MySQL > tables, removing the "type=MYISAM" from the declaration. The problem > now is that when I try to create a new Bio::DB::GFF database, I get > this error: > > The used table type doesn't support FULLTEXT indexes > > when trying to create the fattribute_to_feature table. It looks like > the default engine is InnoDB, which doesn't support full text > searching. Of course, if I add "ENGINE MYISAM" to the end of the > query it works. This is with mysql 5.5.9. When the table creation > fails, obviously most of the tests fail when testing against MySQL. > > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Mon Apr 11 14:57:44 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 11 Apr 2011 14:57:44 -0400 Subject: [Bioperl-l] Bio::DB::GFF/Postgres test failures In-Reply-To: <9B8FB5BF-0924-4DF2-A12F-E1E075807DF0@illinois.edu> References: <9B8FB5BF-0924-4DF2-A12F-E1E075807DF0@illinois.edu> Message-ID: It now looks like Scott and I need to fix the mysql table definition mechanism in Bio::DB::GFF and Bio::SeqFeature::Store. Some versions of mysql are not accepting the type=MyISAM declaration. Lincoln On Mon, Apr 11, 2011 at 1:35 PM, Chris Fields wrote: > Scott, > > Go ahead and change the test if you haven't done it already. The only > blocker would be the Module::Build work I alluded to earlier, which I'm > working on now. > > Lincoln, probably not necessary to worry about changing the version number > for the GBrowse release, we can deal with bug fixes with minor version > increments on 1.6.9. Most end-users won't care about versioning anyway as > long as the dependency install path works fine. > > chris > > On Apr 11, 2011, at 12:27 PM, Scott Cain wrote: > > > OK, I fixed the second bug I referred to below (and committed it on > > master :-) If nobody complains, I also change the test I referred to > > first below, and then my issues with the next release should be > > resolved. > > > > Thanks, > > Scott > > > > > > On Mon, Apr 11, 2011 at 11:53 AM, Scott Cain > wrote: > >> The test I was complaining about last week is clearly flawed. I would > >> suggest changing the "doesnotexit" to "-1" in this test: > >> > >> is( $db->fetch('doesnotexit'), undef); > >> > >> where the point of the test is to search for something by ID that > >> doesn't exist. Since a primary key of "-1" is unlikely to be used > >> with an autogenerated primary key, that should safely fail to find > >> something, thus passing the test. > >> > >> There is, however, a second failure with the Pg test that I'm working > >> on today, where for some reason, the same method in the mysql adaptor > >> (inherited by the Pg adaptor) is generating different queries when run > >> against the different databases. Once I sort out why that is > >> happening, the Pg adaptor should be passing tests again. > >> > >> Scott > >> > >> > >> On Mon, Apr 11, 2011 at 10:15 AM, Chris Fields > wrote: > >>> Lincoln, > >>> > >>> It's Bio::Aseembly-specific incorrect semantics that are triggering > this, I don't think anything in the SF::Store/GBrowse set triggers the > problem. So, to me this isn't a blocker for anything unless someone > specifies that Bio::Assembly must use the Pg adaptor (I think it uses the > memory one by default, not sure if it's hard-coded that way). > >>> > >>> I do agree with Scott, that MySQL and other adaptors are silently > dealing with this data w/o dying, so this should be filed for tracking, as > I'm sure it will pop up at some point again. The relevant tests should be > TODO'd to catch this. > >>> > >>> Since this will likely be the last release for 1.6.x, I suppose we can > go ahead and leave the version number as 1.0069. > >>> > >>> chris > >>> > >>> On Apr 10, 2011, at 4:10 PM, Lincoln Stein wrote: > >>> > >>>> Hi Folks, > >>>> > >>>> Is this what's blocking the netx bioperl release, or are there other > >>>> blockers? > >>>> > >>>> I just released GBrowse 2.27, which requires bioperl 1.0069 or higher. > >>>> > >>>> Lincoln > >>>> > >>>> On Fri, Apr 8, 2011 at 8:53 PM, Chris Fields > wrote: > >>>> > >>>>> That was introduced here by Florent: > >>>>> > >>>>> > >>>>> > https://github.com/bioperl/bioperl-live/commit/6f65223ef5aabc3ceaa815d3cb71982f81ae6b30#t/LocalDB/SeqFeature.t > >>>>> > >>>>> So, essentially the MySQL adaptor is getting this wrong. Any way we > can > >>>>> somehow enable strict mode? > >>>>> > >>>>> http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html > >>>>> > >>>>> chris > >>>>> > >>>>> On Apr 8, 2011, at 4:32 PM, Scott Cain wrote: > >>>>> > >>>>>> Argh! MySQL is not a RDMS! Anyone who tells you otherwise is > lying! > >>>>>> > >>>>>> The first test failure for the Pg SFS adaptor is failing because it > is > >>>>>> trying to execute this query (which it inherited from the mysql > >>>>>> adaptor, where it works just fine): > >>>>>> > >>>>>> select id,object FROM bioperl_seqfeature_t_test_schema_feature where > >>>>>> id='doesnotexit'; > >>>>>> > >>>>>> Of course, the id column is defined as an integer column. MySQL > must > >>>>>> be silently casting this string to an integer value (? I guess > >>>>>> anyway--who knows). Anyway, PostgreSQL does the right thing and > >>>>>> throws an error with this query. I don't see how I can make the > >>>>>> Postgres adaptor pass this test as written, as it is nonsensical. > >>>>>> > >>>>>> Scott > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Fri, Apr 8, 2011 at 2:56 PM, Chris Fields > > >>>>> wrote: > >>>>>>> Scott, > >>>>>>> > >>>>>>> I'll try documenting the Pg error for SF::Store in the next hour. > Had > >>>>> my hands full with the GSoC onslaught of emails and local $job stuff. > Would > >>>>> like to get it fixed for the CPAN release. > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Apr 8, 2011, at 12:58 PM, Scott Cain wrote: > >>>>>>> > >>>>>>>> OK, I'll take it out and move on to the next problem. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Scott > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Apr 8, 2011 at 1:51 PM, Lincoln Stein < > lincoln.stein at gmail.com> > >>>>> wrote: > >>>>>>>>> Oh right. The Bio::DB::GFF adaptor has that broken behavior and > it is > >>>>> too > >>>>>>>>> late to change it now. (Bio::DB::SeqFeature::Store had better > not!) > >>>>> Best to > >>>>>>>>> remove the test altogether. > >>>>>>>>> Lincoln > >>>>>>>>> > >>>>>>>>> On Fri, Apr 8, 2011 at 1:18 PM, Scott Cain > >>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Lincoln, > >>>>>>>>>> > >>>>>>>>>> Yes, apparently, it does. It does this for both the memory and > the > >>>>>>>>>> postgres adaptors. I looked at how the data was stored in the > >>>>> feature > >>>>>>>>>> object with Data::Dumper and that is how it is represented in > the > >>>>> hash > >>>>>>>>>> too. Perhaps this test should be calling the "absolute" method > >>>>> first? > >>>>>>>>>> > >>>>>>>>>> Scott > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Fri, Apr 8, 2011 at 1:10 PM, Lincoln Stein < > >>>>> lincoln.stein at gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>> Do start() and end() flip values for minus strand features? > This > >>>>> isn't > >>>>>>>>>>> supposed to happen. > >>>>>>>>>>> Lincoln > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Apr 8, 2011 at 11:41 AM, Scott Cain < > scott at scottcain.net> > >>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Lincoln, > >>>>>>>>>>>> > >>>>>>>>>>>> I've been looking into some test failures with the postgres > adaptor > >>>>>>>>>>>> for Bio::DB::GFF and I wanted to check with you that I'm > >>>>> interpreting > >>>>>>>>>>>> this correctly. In t/LocalDB/BioDBGFF.t there are these > lines: > >>>>>>>>>>>> > >>>>>>>>>>>> @features = sort {$a->start<=>$b->start} @features; > >>>>>>>>>>>> > >>>>>>>>>>>> is($features[0]->type,'Component:reference'); > >>>>>>>>>>>> is($features[-1]->type,'exon:confirmed'); > >>>>>>>>>>>> > >>>>>>>>>>>> So that the features in the data set are sorted by their start > >>>>> values > >>>>>>>>>>>> and the beginning and end of the list are checked. The test > refers > >>>>> to > >>>>>>>>>>>> the test.gff data file, that contains among others these > lines: > >>>>>>>>>>>> > >>>>>>>>>>>> Contig1 confirmed transcript 30001 31000 . - . > >>>>>>>>>>>> Transcript trans-2; Gene "xyz-2"; Note "Terribly interesting" > >>>>>>>>>>>> Contig1 confirmed exon 30001 30100 . - . > Transcript > >>>>>>>>>>>> trans-2; Gene "abc-1"; Note "function unknown" > >>>>>>>>>>>> Contig1 confirmed exon 30701 30800 . - . > Transcript > >>>>>>>>>>>> trans-2 > >>>>>>>>>>>> Contig1 confirmed exon 30801 31000 . - . > Transcript > >>>>>>>>>>>> trans-2 > >>>>>>>>>>>> > >>>>>>>>>>>> Since this transcript and its exons are on the minus strand, > the > >>>>>>>>>>>> values that the start and stop method return will be reversed, > so > >>>>> that > >>>>>>>>>>>> start for the transcript will be 31000 and stop will be 30001. > The > >>>>>>>>>>>> problem with this test is since the last exon and the > transcript > >>>>> share > >>>>>>>>>>>> a start value (31000), you can't really be sure which one will > be > >>>>> at > >>>>>>>>>>>> the bottom of the list after sorting, right? In the case of > the > >>>>>>>>>>>> postgres adaptor, it fails this test on my machine because the > >>>>>>>>>>>> transcript is at the bottom of the list. The test for the > >>>>> beginning > >>>>>>>>>>>> of the list similarly could fail though it didn't in my case, > as > >>>>> other > >>>>>>>>>>>> features that have 1 as a start are of type "Component:clone". > >>>>>>>>>>>> > >>>>>>>>>>>> So, my question is this: am I missing something, and the > postgres > >>>>>>>>>>>> adaptor is not behaving as expected, or are these tests > ambiguous? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Scott > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>> > ------------------------------------------------------------------------ > >>>>>>>>>>>> Scott Cain, Ph. D. scott at > >>>>> scottcain > >>>>>>>>>>>> dot net > >>>>>>>>>>>> GMOD Coordinator (http://gmod.org/) > >>>>> 216-392-3087 > >>>>>>>>>>>> Ontario Institute for Cancer Research > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Lincoln D. Stein > >>>>>>>>>>> Director, Informatics and Biocomputing Platform > >>>>>>>>>>> Ontario Institute for Cancer Research > >>>>>>>>>>> 101 College St., Suite 800 > >>>>>>>>>>> Toronto, ON, Canada M5G0A3 > >>>>>>>>>>> 416 673-8514 > >>>>>>>>>>> Assistant: Renata Musa > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> > >>>>> > ------------------------------------------------------------------------ > >>>>>>>>>> Scott Cain, Ph. D. scott at > >>>>> scottcain > >>>>>>>>>> dot net > >>>>>>>>>> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >>>>>>>>>> Ontario Institute for Cancer Research > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Lincoln D. Stein > >>>>>>>>> Director, Informatics and Biocomputing Platform > >>>>>>>>> Ontario Institute for Cancer Research > >>>>>>>>> 101 College St., Suite 800 > >>>>>>>>> Toronto, ON, Canada M5G0A3 > >>>>>>>>> 416 673-8514 > >>>>>>>>> Assistant: Renata Musa > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>> > ------------------------------------------------------------------------ > >>>>>>>> Scott Cain, Ph. D. scott at > scottcain > >>>>> dot net > >>>>>>>> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >>>>>>>> Ontario Institute for Cancer Research > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > ------------------------------------------------------------------------ > >>>>>> Scott Cain, Ph. D. scott at > scottcain > >>>>> dot net > >>>>>> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >>>>>> Ontario Institute for Cancer Research > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Lincoln D. Stein > >>>> Director, Informatics and Biocomputing Platform > >>>> Ontario Institute for Cancer Research > >>>> 101 College St., Suite 800 > >>>> Toronto, ON, Canada M5G0A3 > >>>> 416 673-8514 > >>>> Assistant: Renata Musa > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From scott at scottcain.net Mon Apr 11 15:00:37 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 15:00:37 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: I know. I guess the question is, how far back is the "ENGINE" syntax supported. Could we just use that syntax and tell people using older (I think probably much older) MySQL servers to upgrade? Scott On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein wrote: > Oh gee, what can I do? I will have to hard-code testing for the particular > version of mysql in order to fix this. > > Lincoln > > On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain wrote: > >> Hi Lincoln, >> >> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >> tables, removing the "type=MYISAM" from the declaration. ?The problem >> now is that when I try to create a new Bio::DB::GFF database, I get >> this error: >> >> ?The used table type doesn't support FULLTEXT indexes >> >> when trying to create the fattribute_to_feature table. ?It looks like >> the default engine is InnoDB, which doesn't support full text >> searching. ?Of course, if I add "ENGINE MYISAM" to the end of the >> query it works. ?This is with mysql 5.5.9. ?When the table creation >> fails, obviously most of the tests fail when testing against MySQL. >> >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Mon Apr 11 15:02:23 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 15:02:23 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: Or, much better, perhaps there is a way to change the default engine via a query, so it could be set to myisam before the table creation starts. Scott On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: > I know. ?I guess the question is, how far back is the "ENGINE" syntax > supported. ?Could we just use that syntax and tell people using older > (I think probably much older) MySQL servers to upgrade? > > Scott > > > On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein wrote: >> Oh gee, what can I do? I will have to hard-code testing for the particular >> version of mysql in order to fix this. >> >> Lincoln >> >> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain wrote: >> >>> Hi Lincoln, >>> >>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >>> tables, removing the "type=MYISAM" from the declaration. ?The problem >>> now is that when I try to create a new Bio::DB::GFF database, I get >>> this error: >>> >>> ?The used table type doesn't support FULLTEXT indexes >>> >>> when trying to create the fattribute_to_feature table. ?It looks like >>> the default engine is InnoDB, which doesn't support full text >>> searching. ?Of course, if I add "ENGINE MYISAM" to the end of the >>> query it works. ?This is with mysql 5.5.9. ?When the table creation >>> fails, obviously most of the tests fail when testing against MySQL. >>> >>> Scott >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot >>> net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> >> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Mon Apr 11 15:06:01 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 11 Apr 2011 15:06:01 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: Hi Scott, I'm fixing it now. What branch of bioperl is leading up to the release? Lincoln On Mon, Apr 11, 2011 at 3:02 PM, Scott Cain wrote: > Or, much better, perhaps there is a way to change the default engine > via a query, so it could be set to myisam before the table creation > starts. > > Scott > > > On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: > > I know. I guess the question is, how far back is the "ENGINE" syntax > > supported. Could we just use that syntax and tell people using older > > (I think probably much older) MySQL servers to upgrade? > > > > Scott > > > > > > On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein > wrote: > >> Oh gee, what can I do? I will have to hard-code testing for the > particular > >> version of mysql in order to fix this. > >> > >> Lincoln > >> > >> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain > wrote: > >> > >>> Hi Lincoln, > >>> > >>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL > >>> tables, removing the "type=MYISAM" from the declaration. The problem > >>> now is that when I try to create a new Bio::DB::GFF database, I get > >>> this error: > >>> > >>> The used table type doesn't support FULLTEXT indexes > >>> > >>> when trying to create the fattribute_to_feature table. It looks like > >>> the default engine is InnoDB, which doesn't support full text > >>> searching. Of course, if I add "ENGINE MYISAM" to the end of the > >>> query it works. This is with mysql 5.5.9. When the table creation > >>> fails, obviously most of the tests fail when testing against MySQL. > >>> > >>> Scott > >>> > >>> > >>> -- > >>> > ------------------------------------------------------------------------ > >>> Scott Cain, Ph. D. scott at scottcain > dot > >>> net > >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>> Ontario Institute for Cancer Research > >>> > >> > >> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From scott at scottcain.net Mon Apr 11 15:07:38 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 11 Apr 2011 15:07:38 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: Hi Lincoln, I'm reasonably sure Chris said he is going to release off master. Scott On Mon, Apr 11, 2011 at 3:06 PM, Lincoln Stein wrote: > Hi Scott, > I'm fixing it now. What branch of bioperl is leading up to the release? > Lincoln > > On Mon, Apr 11, 2011 at 3:02 PM, Scott Cain wrote: >> >> Or, much better, perhaps there is a way to change the default engine >> via a query, so it could be set to myisam before the table creation >> starts. >> >> Scott >> >> >> On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: >> > I know. ?I guess the question is, how far back is the "ENGINE" syntax >> > supported. ?Could we just use that syntax and tell people using older >> > (I think probably much older) MySQL servers to upgrade? >> > >> > Scott >> > >> > >> > On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein >> > wrote: >> >> Oh gee, what can I do? I will have to hard-code testing for the >> >> particular >> >> version of mysql in order to fix this. >> >> >> >> Lincoln >> >> >> >> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain >> >> wrote: >> >> >> >>> Hi Lincoln, >> >>> >> >>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >> >>> tables, removing the "type=MYISAM" from the declaration. ?The problem >> >>> now is that when I try to create a new Bio::DB::GFF database, I get >> >>> this error: >> >>> >> >>> ?The used table type doesn't support FULLTEXT indexes >> >>> >> >>> when trying to create the fattribute_to_feature table. ?It looks like >> >>> the default engine is InnoDB, which doesn't support full text >> >>> searching. ?Of course, if I add "ENGINE MYISAM" to the end of the >> >>> query it works. ?This is with mysql 5.5.9. ?When the table creation >> >>> fails, obviously most of the tests fail when testing against MySQL. >> >>> >> >>> Scott >> >>> >> >>> >> >>> -- >> >>> >> >>> ------------------------------------------------------------------------ >> >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at >> >>> scottcain dot >> >>> net >> >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> >>> Ontario Institute for Cancer Research >> >>> >> >> >> >> >> >> >> >> -- >> >> Lincoln D. Stein >> >> Director, Informatics and Biocomputing Platform >> >> Ontario Institute for Cancer Research >> >> 101 College St., Suite 800 >> >> Toronto, ON, Canada M5G0A3 >> >> 416 673-8514 >> >> Assistant: Renata Musa >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >> > >> > >> > -- >> > ------------------------------------------------------------------------ >> > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> > dot net >> > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> > Ontario Institute for Cancer Research >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Apr 11 15:12:49 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 14:12:49 -0500 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: <98C110D9-45E2-41C3-94B8-D8D32653137C@illinois.edu> Yes, go off master. chris On Apr 11, 2011, at 2:07 PM, Scott Cain wrote: > Hi Lincoln, > > I'm reasonably sure Chris said he is going to release off master. > > Scott > > > On Mon, Apr 11, 2011 at 3:06 PM, Lincoln Stein wrote: >> Hi Scott, >> I'm fixing it now. What branch of bioperl is leading up to the release? >> Lincoln >> >> On Mon, Apr 11, 2011 at 3:02 PM, Scott Cain wrote: >>> >>> Or, much better, perhaps there is a way to change the default engine >>> via a query, so it could be set to myisam before the table creation >>> starts. >>> >>> Scott >>> >>> >>> On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: >>>> I know. I guess the question is, how far back is the "ENGINE" syntax >>>> supported. Could we just use that syntax and tell people using older >>>> (I think probably much older) MySQL servers to upgrade? >>>> >>>> Scott >>>> >>>> >>>> On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein >>>> wrote: >>>>> Oh gee, what can I do? I will have to hard-code testing for the >>>>> particular >>>>> version of mysql in order to fix this. >>>>> >>>>> Lincoln >>>>> >>>>> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain >>>>> wrote: >>>>> >>>>>> Hi Lincoln, >>>>>> >>>>>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >>>>>> tables, removing the "type=MYISAM" from the declaration. The problem >>>>>> now is that when I try to create a new Bio::DB::GFF database, I get >>>>>> this error: >>>>>> >>>>>> The used table type doesn't support FULLTEXT indexes >>>>>> >>>>>> when trying to create the fattribute_to_feature table. It looks like >>>>>> the default engine is InnoDB, which doesn't support full text >>>>>> searching. Of course, if I add "ENGINE MYISAM" to the end of the >>>>>> query it works. This is with mysql 5.5.9. When the table creation >>>>>> fails, obviously most of the tests fail when testing against MySQL. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. scott at >>>>>> scottcain dot >>>>>> net >>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Lincoln D. Stein >>>>> Director, Informatics and Biocomputing Platform >>>>> Ontario Institute for Cancer Research >>>>> 101 College St., Suite 800 >>>>> Toronto, ON, Canada M5G0A3 >>>>> 416 673-8514 >>>>> Assistant: Renata Musa >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain >>>> dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >> >> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Apr 11 15:16:23 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 14:16:23 -0500 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: Message-ID: <8112935C-4962-4BB6-BC66-B1E40BB235E0@illinois.edu> 'show engines' gives a list with the default noted. http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html Not sure if this available for older versions, but it looks as if it goes back to at least v 4.1 :) chris On Apr 11, 2011, at 2:02 PM, Scott Cain wrote: > Or, much better, perhaps there is a way to change the default engine > via a query, so it could be set to myisam before the table creation > starts. > > Scott > > > On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: >> I know. I guess the question is, how far back is the "ENGINE" syntax >> supported. Could we just use that syntax and tell people using older >> (I think probably much older) MySQL servers to upgrade? >> >> Scott >> >> >> On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein wrote: >>> Oh gee, what can I do? I will have to hard-code testing for the particular >>> version of mysql in order to fix this. >>> >>> Lincoln >>> >>> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain wrote: >>> >>>> Hi Lincoln, >>>> >>>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >>>> tables, removing the "type=MYISAM" from the declaration. The problem >>>> now is that when I try to create a new Bio::DB::GFF database, I get >>>> this error: >>>> >>>> The used table type doesn't support FULLTEXT indexes >>>> >>>> when trying to create the fattribute_to_feature table. It looks like >>>> the default engine is InnoDB, which doesn't support full text >>>> searching. Of course, if I add "ENGINE MYISAM" to the end of the >>>> query it works. This is with mysql 5.5.9. When the table creation >>>> fails, obviously most of the tests fail when testing against MySQL. >>>> >>>> Scott >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot >>>> net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Director, Informatics and Biocomputing Platform >>> Ontario Institute for Cancer Research >>> 101 College St., Suite 800 >>> Toronto, ON, Canada M5G0A3 >>> 416 673-8514 >>> Assistant: Renata Musa >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Mon Apr 11 15:20:13 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 11 Apr 2011 15:20:13 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: <8112935C-4962-4BB6-BC66-B1E40BB235E0@illinois.edu> References: <8112935C-4962-4BB6-BC66-B1E40BB235E0@illinois.edu> Message-ID: The deprecation of TYPE= happened at mysql version 4.1, and so I am checking using a select version() followed by either TYPE=MYISAM or ENGINE=MYISAM. These changes are now committed. Lincoln On Mon, Apr 11, 2011 at 3:16 PM, Chris Fields wrote: > 'show engines' gives a list with the default noted. > > http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html > > Not sure if this available for older versions, but it looks as if it goes > back to at least v 4.1 :) > > chris > > On Apr 11, 2011, at 2:02 PM, Scott Cain wrote: > > > Or, much better, perhaps there is a way to change the default engine > > via a query, so it could be set to myisam before the table creation > > starts. > > > > Scott > > > > > > On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: > >> I know. I guess the question is, how far back is the "ENGINE" syntax > >> supported. Could we just use that syntax and tell people using older > >> (I think probably much older) MySQL servers to upgrade? > >> > >> Scott > >> > >> > >> On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein > wrote: > >>> Oh gee, what can I do? I will have to hard-code testing for the > particular > >>> version of mysql in order to fix this. > >>> > >>> Lincoln > >>> > >>> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain > wrote: > >>> > >>>> Hi Lincoln, > >>>> > >>>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL > >>>> tables, removing the "type=MYISAM" from the declaration. The problem > >>>> now is that when I try to create a new Bio::DB::GFF database, I get > >>>> this error: > >>>> > >>>> The used table type doesn't support FULLTEXT indexes > >>>> > >>>> when trying to create the fattribute_to_feature table. It looks like > >>>> the default engine is InnoDB, which doesn't support full text > >>>> searching. Of course, if I add "ENGINE MYISAM" to the end of the > >>>> query it works. This is with mysql 5.5.9. When the table creation > >>>> fails, obviously most of the tests fail when testing against MySQL. > >>>> > >>>> Scott > >>>> > >>>> > >>>> -- > >>>> > ------------------------------------------------------------------------ > >>>> Scott Cain, Ph. D. scott at > scottcain dot > >>>> net > >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>>> Ontario Institute for Cancer Research > >>>> > >>> > >>> > >>> > >>> -- > >>> Lincoln D. Stein > >>> Director, Informatics and Biocomputing Platform > >>> Ontario Institute for Cancer Research > >>> 101 College St., Suite 800 > >>> Toronto, ON, Canada M5G0A3 > >>> 416 673-8514 > >>> Assistant: Renata Musa > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From jason at bioperl.org Mon Apr 11 21:15:33 2011 From: jason at bioperl.org (Jason Stajich) Date: Mon, 11 Apr 2011 18:15:33 -0700 Subject: [Bioperl-l] SeqIO STDOUT changed? Message-ID: <4DA3A7B5.8070108@bioperl.org> I noticed I am now getting errors with this code: my $out = Bio::SeqIO->new(-format => 'fasta'); that previously would have defaulted to STDOUT when write_seq was used. This causes errors in existing scripts (e.g. bp_sreformat and others). $ bp_sreformat.pl -if fasta -of genbank -i multifa.seq Unknown sequence format to bioperl genbank: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: No file, fh, or string argument provided STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::SeqIO::new /usr/local/lib/perl5/Bio/SeqIO.pm:369 STACK: /usr/local/bin/bp_sreformat.pl:124 ----------------------------------------------------------- Was this an intentional API change or something else gone wrong? This would be my proposed fix back -- core (master)]$ git diff diff --git a/Bio/SeqIO.pm b/Bio/SeqIO.pm index 83bab58..9b2ba02 100644 --- a/Bio/SeqIO.pm +++ b/Bio/SeqIO.pm @@ -363,11 +363,14 @@ sub new { my %param = @args; @param{ map { lc $_ } keys %param } = values %param; # lowercase keys - $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); - $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); - $class->throw("No file, fh, or string argument provided"); # neither defined - } + $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); + $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); + $class->throw("string argument provided, but with an undefined value") if (exists($param{'-string'})); + # $class->throw("No file, fh, or string argument provided"); # neither defined + } my $format = $param{'-format'} || $class->_guess_format( $param{-file} || $ARGV[0] ); -jason -- Jason Stajich jason at bioperl.org http://bioperl.org From cjfields at illinois.edu Mon Apr 11 22:12:07 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Apr 2011 21:12:07 -0500 Subject: [Bioperl-l] SeqIO STDOUT changed? In-Reply-To: <4DA3A7B5.8070108@bioperl.org> References: <4DA3A7B5.8070108@bioperl.org> Message-ID: Jason, That would be unintentional. I'm wondering (from a consistency standpoint) whether we should be doing this check within Bio::SeqIO or in Bio::Root::IO, though; the latter would at least be consistent between all the various *IO implementations, just not sure how easy it would be to implement. chris On Apr 11, 2011, at 8:15 PM, Jason Stajich wrote: > I noticed I am now getting errors with this code: > my $out = Bio::SeqIO->new(-format => 'fasta'); > that previously would have defaulted to STDOUT when write_seq was used. This causes errors in existing scripts (e.g. bp_sreformat and others). > > $ bp_sreformat.pl -if fasta -of genbank -i multifa.seq > Unknown sequence format to bioperl genbank: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No file, fh, or string argument provided > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/Bio/Root/Root.pm:472 > STACK: Bio::SeqIO::new /usr/local/lib/perl5/Bio/SeqIO.pm:369 > STACK: /usr/local/bin/bp_sreformat.pl:124 > ----------------------------------------------------------- > > Was this an intentional API change or something else gone wrong? This would be my proposed fix back -- > core (master)]$ git diff > diff --git a/Bio/SeqIO.pm b/Bio/SeqIO.pm > index 83bab58..9b2ba02 100644 > --- a/Bio/SeqIO.pm > +++ b/Bio/SeqIO.pm > @@ -363,11 +363,14 @@ sub new { > my %param = @args; > @param{ map { lc $_ } keys %param } = values %param; # lowercase keys > > > - $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); > - $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); > - $class->throw("No file, fh, or string argument provided"); # neither defined > - } > + $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); > + $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); > + $class->throw("string argument provided, but with an undefined value") if (exists($param{'-string'})); > + # $class->throw("No file, fh, or string argument provided"); # neither defined > + } > > my $format = $param{'-format'} || > $class->_guess_format( $param{-file} || $ARGV[0] ); > > -jason > -- > Jason Stajich > jason at bioperl.org > http://bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Apr 12 00:19:14 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 12 Apr 2011 00:19:14 -0400 Subject: [Bioperl-l] Bio::DB::GFF table creation problems/test failures In-Reply-To: References: <8112935C-4962-4BB6-BC66-B1E40BB235E0@illinois.edu> Message-ID: Hi Lincoln, I made a minor modification to this so that it wouldn't complain about version numbers like 5.5.9: https://github.com/bioperl/bioperl-live/commit/5ff758a1a8efc5f43521e6ade3bf62e724b98dcb Scott On Monday, April 11, 2011, Lincoln Stein wrote: > The deprecation of TYPE= happened at mysql version 4.1, and so I am checking using a select version() followed by either TYPE=MYISAM or ENGINE=MYISAM. These changes are now committed. > Lincoln > > On Mon, Apr 11, 2011 at 3:16 PM, Chris Fields wrote: > > 'show engines' gives a list with the default noted. > > http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html > > Not sure if this available for older versions, but it looks as if it goes back to at least v 4.1 :) > > chris > > On Apr 11, 2011, at 2:02 PM, Scott Cain wrote: > >> Or, much better, perhaps there is a way to change the default engine >> via a query, so it could be set to myisam before the table creation >> starts. >> >> Scott >> >> >> On Mon, Apr 11, 2011 at 3:00 PM, Scott Cain wrote: >>> I know. ?I guess the question is, how far back is the "ENGINE" syntax >>> supported. ?Could we just use that syntax and tell people using older >>> (I think probably much older) MySQL servers to upgrade? >>> >>> Scott >>> >>> >>> On Mon, Apr 11, 2011 at 2:55 PM, Lincoln Stein wrote: >>>> Oh gee, what can I do? I will have to hard-code testing for the particular >>>> version of mysql in order to fix this. >>>> >>>> Lincoln >>>> >>>> On Mon, Apr 11, 2011 at 2:37 PM, Scott Cain wrote: >>>> >>>>> Hi Lincoln, >>>>> >>>>> Last week you fixed a problem with the way Bio::DB::GFF created MySQL >>>>> tables, removing the "type=MYISAM" from the declaration. ?The problem >>>>> now is that when I try to create a new Bio::DB::GFF database, I get >>>>> this error: >>>>> >>>>> ?The used table type doesn't support FULLTEXT indexes >>>>> >>>>> when trying to create the fattribute_to_feature table. ?It looks like >>>>> the default engine is InnoDB, which doesn't support full text >>>>> searching. ?Of course, if I add "ENGINE MYISAM" to the end of the >>>>> query it works. ?This is with mysql 5.5.9. ?When the table creation >>>>> fails, obviously most of the tests fail when testing against MySQL. >>>>> >>>>> Scott >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot >>>>> net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Director, Informatics and Biocomputing Platform >>>> Ontario Institute for Cancer Research >>>> 101 College St., Suite 800 >>>> Toronto, ON, Canada M5G0A3 >>>> 416 673-8514 >>>> Assistant: Renata Musa >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From qian.977228 at gmail.com Mon Apr 11 23:07:27 2011 From: qian.977228 at gmail.com (Qian Zhao) Date: Tue, 12 Apr 2011 11:07:27 +0800 Subject: [Bioperl-l] Some questions about the Bio::PopGen Message-ID: Hi Recently, I am learning how to caculate pi, Fst, Tajima D using Bio::PopGen. I am not familiar with Perl and I am really confused with the following problems. (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used to caculate is this: __DATA__ 01 A01 A 01 A02 A 01 A03 A 01 A04 A 01 A05 A 02 A01 A 02 A02 T 02 A03 T 02 A04 T 02 A05 T 03 A01 G 03 A02 G 03 A03 G 03 A04 G 03 A05 G 04 A01 G 04 A02 G 04 A03 C 04 A04 C 04 A05 G 05 A01 T 05 A02 C 05 A03 T 05 A04 T 05 A05 T And I am not sure if I can use these sequences below to demostrate the prettybase format above: >A01 AAGGT >A02 ATGGC >A03 ATGCT >A04 ATGCT >A05 ATGGT The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I use DnaSP. I find that if the 1.4/5=0.28, which means that if the number from Bio::PopGen::Statistics is divided by the individula number, the result would be exactly the same. Is there something wrong in my perl script? The code I used was below: #/usr/bin/perl -w use warnings; use strict; use Bio::PopGen::Genotype; my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'gene_1', -individual_id => '001', -alleles => ['1','5'] ); use Bio::PopGen::Individual; my $ind = Bio::PopGen::Individual->new(-unique_id => '001', -genotypes => [$genotype] ); $ind->add_Genotype( Bio::PopGen::Genotype->new(-alleles => ['1', '5'], -marker_name => 'gene_1') ); $ind->add_Genotype( Bio::PopGen::Genotype->new(-alleles => ['1', '5'], -marker_name => 'gene_1') ); $ind->add_Genotype( Bio::PopGen::Genotype->new(-alleles => ['1', '5'], -marker_name => 'gene_1') ); $ind->add_Genotype( Bio::PopGen::Genotype->new(-alleles => ['1', '5'], -marker_name => 'gene_1') ); use Bio::PopGen::Population; my $pop = Bio::PopGen::Population->new(-name => 'Bm', -description => 'description', -individuals => [$ind] ); use Bio::PopGen::IO; use Bio::PopGen::Statistics; my $nummarkers = $pop->get_marker_names; my $stats = Bio::PopGen::Statistics->new(); my $io = Bio::PopGen::IO->new (-format => 'prettybase', -file => '1.txt'); if( my $pop = $io->next_population ) { my $pi = $stats->pi($pop, $nummarkers); print "pi is $pi\n"; my @inds; for my $ind ( $pop->get_Individuals ) { if( $ind->unique_id =~ /A0[1-3]/ ) { push @inds, $ind; } } print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n"; } (2) I want to use Bio::PopGen::Utilities to translate the alignment file to the population file. However, I can not find the result file after the program. I use the following code: use Bio::PopGen::Utilities; use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => 't/data/t7.aln', -format => 'clustalw'); my $aln = $in->next_aln; my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln); my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => 'cod', -alignment => $aln); I am not sure where I should add my result file' name in the code. (3) If my file contains a lot of individual sequences and one individual has one genotype. I'd like to know how can I use the Bio::PopGen::Individual, Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which can used in Bio::PopGen::Statistics ? I will be great appreciated if I can get the answers. Thanks for your time and Best Wishes! Qian From bubli_thakur at rediffmail.com Tue Apr 12 08:07:12 2011 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 12 Apr 2011 12:07:12 -0000 Subject: [Bioperl-l] =?utf-8?q?Problem_with_ka/ks_ratio?= Message-ID: <20110412120712.58895.qmail@f4mail207.rediffmail.com> Hi all,I running the following perl script for generating the ka/ks ratio using PAML- ----------------------------------------------------------------#!perl -w use strict; BEGIN { $ENV{CLUSTALDIR} = '/usr/local/bin' } BEGIN { $ENV{PAMLDIR} = '/root/Desktop/paml44/bin' } # $Id: pairwise_kaks.PLS 15088 2008-12-04 02:49:09Z bosborne $ # Author Jason Stajich <jason-at-bioperl-dot-org> =head1 NAME pairwise_kaks - script to calculate pairwise Ka,Ks for a set of sequences =head1 SYNOPSIS pairwise_kaks.PLS -i t/data/worm_fam_2785.cdna [-f fasta/genbank/embl...] [-msa tcoffee/clustal] [-kaks yn00/codeml] =head1 DESCRIPTION This script will take as input a dataset of cDNA sequences verify that they contain no stop codons, align them in protein space, project the alignment back into cDNA and estimate the Ka (non-synonymous) and Ks (synonymous) substitutions based on the ML method of Yang with the PAML package. Requires: * bioperl-run package * PAML program codeml or yn00 * Multiple sequence alignment programs Clustalw OR T-Coffee Often there are specific specific parameters you want to run when you a computing Ka/Ks ratios so consider this script a starting point and do not rely it on for every situation. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l at bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://bugzilla.open-bio.org/ =head1 AUTHOR Jason Stajich jason-at-bioperl-dot-org =cut eval { # Ka/Ks estimators require Bio::Tools::Run::Phylo::PAML::Codeml; require Bio::Tools::Run::Phylo::PAML::Yn00; # Multiple Sequence Alignment programs require Bio::Tools::Run::Alignment::Clustalw; require Bio::Tools::Run::Alignment::TCoffee; }; if( $@ ) { die("Must have bioperl-run pkg installed to run this script"); } # for projecting alignments from protein to R/DNA space use Bio::Align::Utilities qw(aa_to_dna_aln); # for input of the sequence data use Bio::SeqIO; use Bio::AlignIO; # for the command line argument parsing use Getopt::Long; my ($aln_prog, $kaks_prog,$format, $output, $cdna,$verbose,$help) = qw(clustalw codeml fasta); GetOptions( 'i|input:s' => \$cdna, 'o|output:s' => \$output, 'f|format:s' => \$format, 'msa:s' => \$aln_prog, 'kaks:s' => \$kaks_prog, 'v|verbose' => \$verbose, 'h|help' => \$help, ); if( $help ) { exec('perldoc',$0); exit(0); } $verbose = -1 unless $verbose; my ($aln_factory,$kaks_factory); if( $aln_prog =~ /clus/i ) { $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(-verbose => $verbose); } elsif( $aln_prog =~ /t\_?cof/i ) { $aln_factory = Bio::Tools::Run::Alignment::TCoffee->new(-verbose => $verbose); } else { warn("Did not provide either 'clustalw' or 'tcoffee' as alignment program names"); exit(0); } unless( $aln_factory->executable ) { warn("Could not find the executable for $aln_prog, make sure you have installed it and have either set ".uc($aln_prog)."DIR or it is in your PATH"); exit(0); }   if( $kaks_prog =~ /yn00/i ) { $kaks_factory = Bio::Tools::Run::Phylo::PAML::Yn00->new(-verbose => $verbose); } elsif( $kaks_prog =~ /codeml/i ) { # change the parameters here if you want to tweak your Codeml running! $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new (-verbose => $verbose, -params => { 'runmode' => -2, 'seqtype' => 1, } ); } unless ( $kaks_factory->executable ) { warn("Could not find the executable for $kaks_prog, make sure you have installed it and you have defined PAMLDIR or it is in your PATH"); exit(0); } unless ( $cdna && -f $cdna && -r $cdna && ! -z $cdna ) { warn("Did not specify a valid cDNA sequence file as input"); exit(0); } my $seqin = new Bio::SeqIO(-file => $cdna, -format => $format); my %seqs; my @prots; while( my $seq = $seqin->next_seq ) { $seqs{$seq->display_id} = $seq; my $protein = $seq->translate(); my $pseq = $protein->seq(); $pseq =~ s/\*$//; if( $pseq =~ /\*/ ) { warn("provided a cDNA (".$seq->display_id.") sequence with a stop codon, PAML will choke!"); exit(0); } # Tcoffee can't handle '*' $pseq =~ s/\*//g; $protein->seq($pseq); push @prots, $protein; } if( @prots < 2 ) { warn("Need at least 2 cDNA sequences to proceed"); exit(0); } local * OUT; if( $output ) { open(OUT, ">$output") || die("cannot open output $output for writing"); } else { *OUT = *STDOUT; } my $aa_aln = $aln_factory->align(\@prots); my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs); my @each = $dna_aln->each_seq();   $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); if( $rc <= 0 ) { warn($kaks_factory->error_string,"\n"); exit; } my $result = $parser->next_result; if ($result->version =~ m/3\.15/) { warn("This script does not work with v3.15 of PAML! Please use 3.14 instead."); exit(0); } my $MLmatrix = $result->get_MLmatrix(); my @otus = $result->get_seqs(); my @pos = map { my $c= 1; foreach my $s ( @each ) { last if( $s->display_id eq $_->display_id ); $c++; } $c; } @otus; print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID CDNA_PERCENTID)), "\n"; for( my $i = 0; $i < (scalar @otus -1) ; $i++) { for( my $j = $i+1; $j < (scalar @otus); $j++ ) { my $sub_aa_aln = $aa_aln->select_noncont($pos[$i],$pos[$j]); my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]); print OUT join("\t", $otus[$i]->display_id, $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'}, $MLmatrix->[$i]->[$j]->{'dS'}, $MLmatrix->[$i]->[$j]->{'omega'}, sprintf("%.2f",$sub_aa_aln->percentage_identity), sprintf("%.2f",$sub_dna_aln->percentage_identity), ), "\n"; } }-----------------------------------------------------------------After running the code on the following fasta file >seq1 TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTA AGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGAT GATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG >seq2 TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTC AGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGAT GACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG---------------------------------------------I am getting the result as-SEQ1  SEQ2  Ka  Ks  Ka/Ks  PROT_PERCENTID  CDNA_PERCENTIDBut no values are coming .Can Anybody help me to figureout the problemThanksSubarna From chapmanb at 50mail.com Tue Apr 12 08:35:31 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 12 Apr 2011 08:35:31 -0400 Subject: [Bioperl-l] Bioinformatics Open Source Conference (BOSC 2011)--Abstracts due April 18th! Message-ID: <20110412123531.GD2105@kunkel> Only one week left to submit an abstract to BOSC 2011! We have two great keynote speakers lined up (Lawrence Hunter and Matt Wood) and session topics that include parallel and cloud-based approaches to bioinformatics, genome content management, and tools for next-generation sequencing. We'd love to hear about your Open Source bioinformatics project! The 12th Annual Bioinformatics Open Source Conference (BOSC 2011) An ISMB 2011 Special Interest Group (SIG) July 15-16, 2011, in Vienna, Austria http://www.open-bio.org/wiki/BOSC_2011 Important Dates: April 18, 2011: Deadline for submitting abstracts to BOSC 2011 May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) July 15-16, 2011: BOSC 2011 July 17-19, 2011: ISMB 2011 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. We invite you to submit abstracts for talks and posters. Sessions include: - Approaches to parallel processing - Cloud-based approaches to improving software and data accessibility - The Semantic Web in open source bioinformatics - Data visualization - Tools for next-generation sequencing - Other Open Source software In addition to the above sessions, there will be a panel discussion about "Meeting the challenges of inter-institutional collaboration". We are also working to arrange a joint session with one of the other ISMB SIGs. Thanks to generous sponsorship from Eagle Genomics and an anonymous donor, we are pleased to announce a competition for three Student Travel Awards for BOSC 2011. Each winner will be awarded $250 to defray the costs of travel to BOSC 2011. All students whose abstracts are accepted for talks will be considered for this award. For instructions on submitting your abstract, please visit http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information BOSC 2011 Organizing Committee: Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor From richard.harrison at edinburgh.ac.uk Tue Apr 12 08:15:52 2011 From: richard.harrison at edinburgh.ac.uk (Richard Harrison) Date: Tue, 12 Apr 2011 13:15:52 +0100 Subject: [Bioperl-l] Bug in a simple bioperl graphics program Message-ID: <7AB294C5-42DE-4C68-8FEC-A0FB3ED97641@edinburgh.ac.uk> Hi, I'm using bioperl-live (1.0069) and Bio-Graphics-2.20 and get this error : Can't locate object method "attributes" via package "Bio::SeqFeature::Generic" at /usr/local/share/perl/5.10.1/Bio/ Graphics/Glyph.pm line 703, line 7. when running a very simple script to print out a simple feature display (where some contigs are positioned on a larger reference sequence). Using bioperl 1.006 and some unknown version of bio-graphics ( i can't remember installing this separately) it works fine. Please let me know if you have any suggestions! All the best, Richard Harrison -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From scott at scottcain.net Tue Apr 12 09:38:11 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 12 Apr 2011 09:38:11 -0400 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test Message-ID: Hi Lincoln and Chris, I'm wondering what we should do about the SQLite option for testing when running ./Build test. Currently (for me at least :-) when I select SQLite as the testing database when running perl Build.PL, the Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different number of tests that planned, thus the dubious. My inclination would be to not include SQLite as a testing option to avoid this problem, otherwise I suppose the Bio::DB::GFF test could be modified to use the memory adaptor when SQLite is the chosen testing database. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Apr 12 10:09:30 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Apr 2011 09:09:30 -0500 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: <5051973F-2C39-4048-868B-36BE6B1FD806@illinois.edu> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: > Hi Lincoln and Chris, > > I'm wondering what we should do about the SQLite option for testing > when running ./Build test. Currently (for me at least :-) when I > select SQLite as the testing database when running perl Build.PL, the > Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there > isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different > number of tests that planned, thus the dubious. My inclination would > be to not include SQLite as a testing option to avoid this problem, > otherwise I suppose the Bio::DB::GFF test could be modified to use the > memory adaptor when SQLite is the chosen testing database. > > Scott I think the latter option (use memory adaptor if SQLite is used) is best, it at least runs the Bio::DB::GFF tests. This is also uncovering a general problem we should address down the line, mainly when running tests, if a resource isn't available there should be at least be a visible note about the point of failure instead of silently passing. This set of tests was silently passing for me with SQLite selected, and I've been bitten by this with network tests quietly passing when the remote resource is no longer accessible. chris From lincoln.stein at gmail.com Tue Apr 12 10:18:54 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 12 Apr 2011 10:18:54 -0400 Subject: [Bioperl-l] Bug in a simple bioperl graphics program In-Reply-To: <7AB294C5-42DE-4C68-8FEC-A0FB3ED97641@edinburgh.ac.uk> References: <7AB294C5-42DE-4C68-8FEC-A0FB3ED97641@edinburgh.ac.uk> Message-ID: Sorry about the bug. It is fixed in Bio::Graphics version 2.21, which should be appearing on CPAN within the day. Lincoln On Tue, Apr 12, 2011 at 8:15 AM, Richard Harrison < richard.harrison at edinburgh.ac.uk> wrote: > Hi, > I'm using bioperl-live (1.0069) and Bio-Graphics-2.20 and get this error : > > Can't locate object method "attributes" via package > "Bio::SeqFeature::Generic" at > /usr/local/share/perl/5.10.1/Bio/Graphics/Glyph.pm line 703, line 7. > > > when running a very simple script to print out a simple feature display > (where some contigs are positioned on a larger reference sequence). > > Using bioperl 1.006 and some unknown version of bio-graphics ( i can't > remember installing this separately) it works fine. > > Please let me know if you have any suggestions! > All the best, > Richard Harrison > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Tue Apr 12 10:21:19 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 12 Apr 2011 10:21:19 -0400 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: I'd like to maintain the SQLite tests, but Bio::DB::GFF tests should default to "memory" (with a visible warning) whenever an unrecognized adaptor is passed to it. Lincoln On Tue, Apr 12, 2011 at 9:38 AM, Scott Cain wrote: > Hi Lincoln and Chris, > > I'm wondering what we should do about the SQLite option for testing > when running ./Build test. Currently (for me at least :-) when I > select SQLite as the testing database when running perl Build.PL, the > Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there > isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different > number of tests that planned, thus the dubious. My inclination would > be to not include SQLite as a testing option to avoid this problem, > otherwise I suppose the Bio::DB::GFF test could be modified to use the > memory adaptor when SQLite is the chosen testing database. > > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Tue Apr 12 11:13:38 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Apr 2011 10:13:38 -0500 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: I added some simple code to the tests to catch this (on the master branch). Seems to work; Scott, can you confirm that? chris On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: > Hi Lincoln and Chris, > > I'm wondering what we should do about the SQLite option for testing > when running ./Build test. Currently (for me at least :-) when I > select SQLite as the testing database when running perl Build.PL, the > Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there > isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different > number of tests that planned, thus the dubious. My inclination would > be to not include SQLite as a testing option to avoid this problem, > otherwise I suppose the Bio::DB::GFF test could be modified to use the > memory adaptor when SQLite is the chosen testing database. > > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Apr 12 11:23:39 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 12 Apr 2011 17:23:39 +0200 Subject: [Bioperl-l] Fwd: Q: batched extraction of sub-sequences and their reverse-complements ? In-Reply-To: <641125.85561.qm@web28506.mail.ukl.yahoo.com> References: <710602.90088.qm@web28506.mail.ukl.yahoo.com> <641125.85561.qm@web28506.mail.ukl.yahoo.com> Message-ID: ---------- Forwarded message ---------- From: wadim kapulkin Date: Tue, Apr 12, 2011 at 17:13 Subject: Re: [Bioperl-l] Q: batched extraction of sub-sequences and their reverse-complements ? To: Dave Messina Hello Dave Thank you very much for yours response. Indeed my question might be split as you did :) So first: Yours suggestion below as to use Bio::DB::Fasta shall make trick. Thanks very much ! As per second part : I probably did not explained properly what I had in mind. However the link you included below seems to address this matter: quoting exerted phrase 'Although coordinate conversion sounds pretty trivial it can get fairly tricky when one includes the possibilities of switching to coordinates on negative (i.e. Crick) strands and/or having a coordinate system terminate because you have reached the end of a clone or contig.'. The issue is indeed in the coordinate conversion. In the specific example, I have been concerned with: I used Cbriggsae chromosomal set to run external program and find out the output depends sometimes on strand polarity... (this is getting even more complicated when used other assemblies/ db freezes offering the sequences differing in lenght). I will need bit more time to describe this specific example. Thanks very much again. Wadim ------------------------------ *From:* Dave Messina *To:* wadim kapulkin *Cc:* bioperl-l at lists.open-bio.org *Sent:* Sat, 9 April, 2011 4:47:34 *Subject:* Re: [Bioperl-l] Q: batched extraction of sub-sequences and their reverse-complements ? Hi Wadim, I would like to extract the batch of subsequences (as fastas), based on > list of > coordinates : i.e. 1-1000, 1001-2000 , 2001-3000 etc) from given 'large > seqence' > (i.e. chromosome sized >10MB) Take a look at Bio::DB::Fasta. > and then, ideally , I would be keen to know how to > extract the converse set - [i.e.: extract 'same' ( I mean corresponding) > batch > of sequences, based on list of converse coordinates from > reverse-complement of > given 'large sequence']. > I don't totally understand this part of your question, but this may help: http://www.bioperl.org/wiki/BioPerl_Tutorial#Converting_coordinate_systems_.28Coordinate::Pair.2C_RelSegment.29 Dave _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Apr 12 11:43:43 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 12 Apr 2011 11:43:43 -0400 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: Hi Chris, I was about to do this myself when I saw your email; thanks! It passes all tests now. The only tests that fail for me right now are the Align::Graphics tests. Also, there is a strange warning for Bio::DB::Fasta, but the tests still pass. The message is this: ok 16 indexing was interrupted, so unlinking /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index at Bio/DB/Fasta.pm line 1061. ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) Scott On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: > I added some simple code to the tests to catch this (on the master branch). ?Seems to work; Scott, can you confirm that? > > chris > > On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: > >> Hi Lincoln and Chris, >> >> I'm wondering what we should do about the SQLite option for testing >> when running ./Build test. ?Currently (for me at least :-) when I >> select SQLite as the testing database when running perl Build.PL, the >> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >> number of tests that planned, thus the dubious. ?My inclination would >> be to not include SQLite as a testing option to avoid this problem, >> otherwise I suppose the Bio::DB::GFF test could be modified to use the >> memory adaptor when SQLite is the chosen testing database. >> >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Apr 12 11:59:18 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Apr 2011 10:59:18 -0500 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: That warning is due to an error check in the test suite, where Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): $test_dbdir = setup_temp_dir('bad_dbfa'); throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} qr/FASTA header doesn't match/; I'm guessing when Bio::DB::Fasta bails with an error it gets rid of the index file in DESTROY, hence the warning. It's harmless, but we can probably catch that warning and squelch it; let me see what I can come up with. chris On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: > Hi Chris, > > I was about to do this myself when I saw your email; thanks! It > passes all tests now. The only tests that fail for me right now are > the Align::Graphics tests. Also, there is a strange warning for > Bio::DB::Fasta, but the tests still pass. The message is this: > > ok 16 > indexing was interrupted, so unlinking > /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index > at Bio/DB/Fasta.pm line 1061. > ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) > > > Scott > > > On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: >> I added some simple code to the tests to catch this (on the master branch). Seems to work; Scott, can you confirm that? >> >> chris >> >> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: >> >>> Hi Lincoln and Chris, >>> >>> I'm wondering what we should do about the SQLite option for testing >>> when running ./Build test. Currently (for me at least :-) when I >>> select SQLite as the testing database when running perl Build.PL, the >>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >>> number of tests that planned, thus the dubious. My inclination would >>> be to not include SQLite as a testing option to avoid this problem, >>> otherwise I suppose the Bio::DB::GFF test could be modified to use the >>> memory adaptor when SQLite is the chosen testing database. >>> >>> Scott >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Apr 12 12:11:04 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Apr 2011 11:11:04 -0500 Subject: [Bioperl-l] Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: Warnings for DB::Fasta now fixed (just caught the sig locally and noop'd it). What are the failures for Align::Graphics? chris On Apr 12, 2011, at 10:59 AM, Chris Fields wrote: > That warning is due to an error check in the test suite, where Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): > > $test_dbdir = setup_temp_dir('bad_dbfa'); > throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} qr/FASTA header doesn't match/; > > I'm guessing when Bio::DB::Fasta bails with an error it gets rid of the index file in DESTROY, hence the warning. It's harmless, but we can probably catch that warning and squelch it; let me see what I can come up with. > > chris > > On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: > >> Hi Chris, >> >> I was about to do this myself when I saw your email; thanks! It >> passes all tests now. The only tests that fail for me right now are >> the Align::Graphics tests. Also, there is a strange warning for >> Bio::DB::Fasta, but the tests still pass. The message is this: >> >> ok 16 >> indexing was interrupted, so unlinking >> /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index >> at Bio/DB/Fasta.pm line 1061. >> ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) >> >> >> Scott >> >> >> On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: >>> I added some simple code to the tests to catch this (on the master branch). Seems to work; Scott, can you confirm that? >>> >>> chris >>> >>> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: >>> >>>> Hi Lincoln and Chris, >>>> >>>> I'm wondering what we should do about the SQLite option for testing >>>> when running ./Build test. Currently (for me at least :-) when I >>>> select SQLite as the testing database when running perl Build.PL, the >>>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >>>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >>>> number of tests that planned, thus the dubious. My inclination would >>>> be to not include SQLite as a testing option to avoid this problem, >>>> otherwise I suppose the Bio::DB::GFF test could be modified to use the >>>> memory adaptor when SQLite is the chosen testing database. >>>> >>>> Scott >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Apr 12 12:45:26 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 12 Apr 2011 12:45:26 -0400 Subject: [Bioperl-l] GD version required/Align::Graphics (was: Re: Clean handling of SQLite choice when running ./Build test Message-ID: Hi Chris, I hadn't looked at the Align::Graphics failures until you asked (I didn't think of it as my problem :-) It turned out the failures were happening because I didn't have GD::Group. So I used cpan to install GD::Group and it turns out it is distributed with GD now, but presumably not with the version I had (and now, of course, I don't know what version I had). So, I guess the question is, what version started including GD::Group and do we need to add it as a recommends and make it skip when it's not present? Scott On Tue, Apr 12, 2011 at 12:11 PM, Chris Fields wrote: > Warnings for DB::Fasta now fixed (just caught the sig locally and noop'd it). > > What are the failures for Align::Graphics? > > chris > > > On Apr 12, 2011, at 10:59 AM, Chris Fields wrote: > >> That warning is due to an error check in the test suite, where Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): >> >> $test_dbdir = setup_temp_dir('bad_dbfa'); >> throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} qr/FASTA header doesn't match/; >> >> I'm guessing when Bio::DB::Fasta bails with an error it gets rid of the index file in DESTROY, hence the warning. ?It's harmless, but we can probably catch that warning and squelch it; let me see what I can come up with. >> >> chris >> >> On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> I was about to do this myself when I saw your email; thanks! ?It >>> passes all tests now. ?The only tests that fail for me right now are >>> the Align::Graphics tests. ?Also, there is a strange warning for >>> Bio::DB::Fasta, but the tests still pass. ?The message is this: >>> >>> ok 16 >>> indexing was interrupted, so unlinking >>> /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index >>> at Bio/DB/Fasta.pm line 1061. >>> ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) >>> >>> >>> Scott >>> >>> >>> On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: >>>> I added some simple code to the tests to catch this (on the master branch). ?Seems to work; Scott, can you confirm that? >>>> >>>> chris >>>> >>>> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: >>>> >>>>> Hi Lincoln and Chris, >>>>> >>>>> I'm wondering what we should do about the SQLite option for testing >>>>> when running ./Build test. ?Currently (for me at least :-) when I >>>>> select SQLite as the testing database when running perl Build.PL, the >>>>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >>>>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >>>>> number of tests that planned, thus the dubious. ?My inclination would >>>>> be to not include SQLite as a testing option to avoid this problem, >>>>> otherwise I suppose the Bio::DB::GFF test could be modified to use the >>>>> memory adaptor when SQLite is the chosen testing database. >>>>> >>>>> Scott >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Apr 12 13:35:12 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Apr 2011 12:35:12 -0500 Subject: [Bioperl-l] GD version required/Align::Graphics (was: Re: Clean handling of SQLite choice when running ./Build test In-Reply-To: References: Message-ID: <26CB1B8B-937F-480C-AD4C-E4CAC1DA8C74@illinois.edu> Looks like GD v2.41. We can either require a specific version of GD or list GD::Group separately (this should probably tie in with what Bio::Graphics requires). chris On Apr 12, 2011, at 11:45 AM, Scott Cain wrote: > Hi Chris, > > I hadn't looked at the Align::Graphics failures until you asked (I > didn't think of it as my problem :-) It turned out the failures were > happening because I didn't have GD::Group. So I used cpan to install > GD::Group and it turns out it is distributed with GD now, but > presumably not with the version I had (and now, of course, I don't > know what version I had). So, I guess the question is, what version > started including GD::Group and do we need to add it as a recommends > and make it skip when it's not present? > > Scott > > > On Tue, Apr 12, 2011 at 12:11 PM, Chris Fields wrote: >> Warnings for DB::Fasta now fixed (just caught the sig locally and noop'd it). >> >> What are the failures for Align::Graphics? >> >> chris >> >> >> On Apr 12, 2011, at 10:59 AM, Chris Fields wrote: >> >>> That warning is due to an error check in the test suite, where Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): >>> >>> $test_dbdir = setup_temp_dir('bad_dbfa'); >>> throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} qr/FASTA header doesn't match/; >>> >>> I'm guessing when Bio::DB::Fasta bails with an error it gets rid of the index file in DESTROY, hence the warning. It's harmless, but we can probably catch that warning and squelch it; let me see what I can come up with. >>> >>> chris >>> >>> On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: >>> >>>> Hi Chris, >>>> >>>> I was about to do this myself when I saw your email; thanks! It >>>> passes all tests now. The only tests that fail for me right now are >>>> the Align::Graphics tests. Also, there is a strange warning for >>>> Bio::DB::Fasta, but the tests still pass. The message is this: >>>> >>>> ok 16 >>>> indexing was interrupted, so unlinking >>>> /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index >>>> at Bio/DB/Fasta.pm line 1061. >>>> ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) >>>> >>>> >>>> Scott >>>> >>>> >>>> On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: >>>>> I added some simple code to the tests to catch this (on the master branch). Seems to work; Scott, can you confirm that? >>>>> >>>>> chris >>>>> >>>>> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: >>>>> >>>>>> Hi Lincoln and Chris, >>>>>> >>>>>> I'm wondering what we should do about the SQLite option for testing >>>>>> when running ./Build test. Currently (for me at least :-) when I >>>>>> select SQLite as the testing database when running perl Build.PL, the >>>>>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >>>>>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >>>>>> number of tests that planned, thus the dubious. My inclination would >>>>>> be to not include SQLite as a testing option to avoid this problem, >>>>>> otherwise I suppose the Bio::DB::GFF test could be modified to use the >>>>>> memory adaptor when SQLite is the chosen testing database. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Apr 12 13:39:03 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 12 Apr 2011 13:39:03 -0400 Subject: [Bioperl-l] GD version required/Align::Graphics (was: Re: Clean handling of SQLite choice when running ./Build test In-Reply-To: <26CB1B8B-937F-480C-AD4C-E4CAC1DA8C74@illinois.edu> References: <26CB1B8B-937F-480C-AD4C-E4CAC1DA8C74@illinois.edu> Message-ID: Hi Chris, Bio::Graphics only requires 2.30 at the moment. Perhaps putting in GD::Group separately would make the most sense. Scott On Tue, Apr 12, 2011 at 1:35 PM, Chris Fields wrote: > Looks like GD v2.41. ?We can either require a specific version of GD or list GD::Group separately (this should probably tie in with what Bio::Graphics requires). > > chris > > On Apr 12, 2011, at 11:45 AM, Scott Cain wrote: > >> Hi Chris, >> >> I hadn't looked at the Align::Graphics failures until you asked (I >> didn't think of it as my problem :-) ?It turned out the failures were >> happening because I didn't have GD::Group. ?So I used cpan to install >> GD::Group and it turns out it is distributed with GD now, but >> presumably not with the version I had (and now, of course, I don't >> know what version I had). ?So, I guess the question is, what version >> started including GD::Group and do we need to add it as a recommends >> and make it skip when it's not present? >> >> Scott >> >> >> On Tue, Apr 12, 2011 at 12:11 PM, Chris Fields wrote: >>> Warnings for DB::Fasta now fixed (just caught the sig locally and noop'd it). >>> >>> What are the failures for Align::Graphics? >>> >>> chris >>> >>> >>> On Apr 12, 2011, at 10:59 AM, Chris Fields wrote: >>> >>>> That warning is due to an error check in the test suite, where Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): >>>> >>>> $test_dbdir = setup_temp_dir('bad_dbfa'); >>>> throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} qr/FASTA header doesn't match/; >>>> >>>> I'm guessing when Bio::DB::Fasta bails with an error it gets rid of the index file in DESTROY, hence the warning. ?It's harmless, but we can probably catch that warning and squelch it; let me see what I can come up with. >>>> >>>> chris >>>> >>>> On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> I was about to do this myself when I saw your email; thanks! ?It >>>>> passes all tests now. ?The only tests that fail for me right now are >>>>> the Align::Graphics tests. ?Also, there is a strange warning for >>>>> Bio::DB::Fasta, but the tests still pass. ?The message is this: >>>>> >>>>> ok 16 >>>>> indexing was interrupted, so unlinking >>>>> /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index >>>>> at Bio/DB/Fasta.pm line 1061. >>>>> ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) >>>>> >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields wrote: >>>>>> I added some simple code to the tests to catch this (on the master branch). ?Seems to work; Scott, can you confirm that? >>>>>> >>>>>> chris >>>>>> >>>>>> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: >>>>>> >>>>>>> Hi Lincoln and Chris, >>>>>>> >>>>>>> I'm wondering what we should do about the SQLite option for testing >>>>>>> when running ./Build test. ?Currently (for me at least :-) when I >>>>>>> select SQLite as the testing database when running perl Build.PL, the >>>>>>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since there >>>>>>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different >>>>>>> number of tests that planned, thus the dubious. ?My inclination would >>>>>>> be to not include SQLite as a testing option to avoid this problem, >>>>>>> otherwise I suppose the Bio::DB::GFF test could be modified to use the >>>>>>> memory adaptor when SQLite is the chosen testing database. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Tue Apr 12 14:56:39 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 12 Apr 2011 14:56:39 -0400 Subject: [Bioperl-l] GD version required/Align::Graphics (was: Re: Clean handling of SQLite choice when running ./Build test In-Reply-To: References: <26CB1B8B-937F-480C-AD4C-E4CAC1DA8C74@illinois.edu> Message-ID: Bio::Graphics tests whether GD supports startGroup() and endGroup() before making either call. Lincoln On Tue, Apr 12, 2011 at 1:39 PM, Scott Cain wrote: > Hi Chris, > > Bio::Graphics only requires 2.30 at the moment. Perhaps putting in > GD::Group separately would make the most sense. > > Scott > > > On Tue, Apr 12, 2011 at 1:35 PM, Chris Fields > wrote: > > Looks like GD v2.41. We can either require a specific version of GD or > list GD::Group separately (this should probably tie in with what > Bio::Graphics requires). > > > > chris > > > > On Apr 12, 2011, at 11:45 AM, Scott Cain wrote: > > > >> Hi Chris, > >> > >> I hadn't looked at the Align::Graphics failures until you asked (I > >> didn't think of it as my problem :-) It turned out the failures were > >> happening because I didn't have GD::Group. So I used cpan to install > >> GD::Group and it turns out it is distributed with GD now, but > >> presumably not with the version I had (and now, of course, I don't > >> know what version I had). So, I guess the question is, what version > >> started including GD::Group and do we need to add it as a recommends > >> and make it skip when it's not present? > >> > >> Scott > >> > >> > >> On Tue, Apr 12, 2011 at 12:11 PM, Chris Fields > wrote: > >>> Warnings for DB::Fasta now fixed (just caught the sig locally and > noop'd it). > >>> > >>> What are the failures for Align::Graphics? > >>> > >>> chris > >>> > >>> > >>> On Apr 12, 2011, at 10:59 AM, Chris Fields wrote: > >>> > >>>> That warning is due to an error check in the test suite, where > Bio::DB::Fasta bails if the FASTA file is not indexed correctly (bad data): > >>>> > >>>> $test_dbdir = setup_temp_dir('bad_dbfa'); > >>>> throws_ok {$db = Bio::DB::Fasta->new($test_dbdir, -reindex => 1)} > qr/FASTA header doesn't match/; > >>>> > >>>> I'm guessing when Bio::DB::Fasta bails with an error it gets rid of > the index file in DESTROY, hence the warning. It's harmless, but we can > probably catch that warning and squelch it; let me see what I can come up > with. > >>>> > >>>> chris > >>>> > >>>> On Apr 12, 2011, at 10:43 AM, Scott Cain wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> I was about to do this myself when I saw your email; thanks! It > >>>>> passes all tests now. The only tests that fail for me right now are > >>>>> the Align::Graphics tests. Also, there is a strange warning for > >>>>> Bio::DB::Fasta, but the tests still pass. The message is this: > >>>>> > >>>>> ok 16 > >>>>> indexing was interrupted, so unlinking > >>>>> > /var/folders/b2/b2QPieqCF08SwC33tp0hmU+++TI/-Tmp-/CXuyf4rpl7/bad_dbfa/directory.index > >>>>> at Bio/DB/Fasta.pm line 1061. > >>>>> ok 17 - threw Regexp ((?-xism:FASTA header doesn't match)) > >>>>> > >>>>> > >>>>> Scott > >>>>> > >>>>> > >>>>> On Tue, Apr 12, 2011 at 11:13 AM, Chris Fields < > cjfields at illinois.edu> wrote: > >>>>>> I added some simple code to the tests to catch this (on the master > branch). Seems to work; Scott, can you confirm that? > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Apr 12, 2011, at 8:38 AM, Scott Cain wrote: > >>>>>> > >>>>>>> Hi Lincoln and Chris, > >>>>>>> > >>>>>>> I'm wondering what we should do about the SQLite option for testing > >>>>>>> when running ./Build test. Currently (for me at least :-) when I > >>>>>>> select SQLite as the testing database when running perl Build.PL, > the > >>>>>>> Bio::DB::GFF tests get "dubious" (which is fairly amazing, since > there > >>>>>>> isn't a SQLite adaptor for Bio::DB::GFF), but it runs a different > >>>>>>> number of tests that planned, thus the dubious. My inclination > would > >>>>>>> be to not include SQLite as a testing option to avoid this problem, > >>>>>>> otherwise I suppose the Bio::DB::GFF test could be modified to use > the > >>>>>>> memory adaptor when SQLite is the chosen testing database. > >>>>>>> > >>>>>>> Scott > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> > ------------------------------------------------------------------------ > >>>>>>> Scott Cain, Ph. D. scott at > scottcain dot net > >>>>>>> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >>>>>>> Ontario Institute for Cancer Research > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > ------------------------------------------------------------------------ > >>>>> Scott Cain, Ph. D. scott at > scottcain dot net > >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>>>> Ontario Institute for Cancer Research > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From jason at bioperl.org Tue Apr 12 17:00:01 2011 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Apr 2011 14:00:01 -0700 Subject: [Bioperl-l] Bug? Phyml parameters In-Reply-To: <4DA324C8.40000@retchless.us> References: <4DA324C8.40000@retchless.us> Message-ID: <4DA4BD51.30300@bioperl.org> Probably best to submit it as a bug to redmine (redmine.open-bio.org) especially as for your first request is more of a feature request -- if you are comfortable with coding you can make changes and submit a pull request via git when you've made changes. As for the second one, I can fix that right away, but it is helpful to have a bug report so we can track things that are in progress so things don't slip through the cracks. Generally speaking I use RAXML or GARLI for my ML trees but hopefully there are users with PHYML interests too. Adam Retchless wrote: > Dear BioPerl crew, > > I noticed some strange behavior in how the Phyml wrapper treats > parameters, and am wondering if there is a reason for this, or if it > is a bug. > > Documentation is here: > http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html > > Being new to BioPerl, I'm not exactly sure what is the right way to > address this. Let me know if this should just be reported as a bug > rather than writing to the list... > > 1) The major issue is that the wrapper seems to disable the option to > calculate confidence values for the trees (e.g. bootstrap, aLRT). In > the documentation, I see no option to set this parameter, and the > "_setparams" subroutine has a comment line explicitly stating "no > bootstrap sets" > (http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html#CODE26). > > A web search revealed no discussion of this point. In fact, there were > several references to the bootstrap values arising from Phyml, making > me think that it used to be enabled (or there is some other way to > enable it). > > 2) While digging into the above issue, I looked at the "new" > subroutine and it looks like the "freq" parameter is mis-assigned. The > value of "freq" is given to the variable $kappa rather than $freq. > (Code here: > http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Phylo/Phyml.html#CODE1) > > Any information on this (particularly #1) would be appreciated. > > Thanks, > Adam > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From jason at bioperl.org Tue Apr 12 17:02:02 2011 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Apr 2011 14:02:02 -0700 Subject: [Bioperl-l] SeqIO STDOUT changed? In-Reply-To: References: <4DA3A7B5.8070108@bioperl.org> Message-ID: <4DA4BDCA.1090108@bioperl.org> Ok I'll check in this fix and we can see if we can bump this up to Root::IO where I imagine it would be better to have this generic type of code. I think the only issue in SeqIO is the GuessFormat aspects but I think that is separate from opening the IO enough that it shouldn't be a problem. jason Chris Fields wrote: > Jason, > > That would be unintentional. I'm wondering (from a consistency standpoint) whether we should be doing this check within Bio::SeqIO or in Bio::Root::IO, though; the latter would at least be consistent between all the various *IO implementations, just not sure how easy it would be to implement. > > chris > > On Apr 11, 2011, at 8:15 PM, Jason Stajich wrote: > >> I noticed I am now getting errors with this code: >> my $out = Bio::SeqIO->new(-format => 'fasta'); >> that previously would have defaulted to STDOUT when write_seq was used. This causes errors in existing scripts (e.g. bp_sreformat and others). >> >> $ bp_sreformat.pl -if fasta -of genbank -i multifa.seq >> Unknown sequence format to bioperl genbank: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: No file, fh, or string argument provided >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/Bio/Root/Root.pm:472 >> STACK: Bio::SeqIO::new /usr/local/lib/perl5/Bio/SeqIO.pm:369 >> STACK: /usr/local/bin/bp_sreformat.pl:124 >> ----------------------------------------------------------- >> >> Was this an intentional API change or something else gone wrong? This would be my proposed fix back -- >> core (master)]$ git diff >> diff --git a/Bio/SeqIO.pm b/Bio/SeqIO.pm >> index 83bab58..9b2ba02 100644 >> --- a/Bio/SeqIO.pm >> +++ b/Bio/SeqIO.pm >> @@ -363,11 +363,14 @@ sub new { >> my %param = @args; >> @param{ map { lc $_ } keys %param } = values %param; # lowercase keys >> >> >> - $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); >> - $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); >> - $class->throw("No file, fh, or string argument provided"); # neither defined >> - } >> + $class->throw("file argument provided, but with an undefined value") if exists($param{'-file'}); >> + $class->throw("fh argument provided, but with an undefined value") if (exists($param{'-fh'})); >> + $class->throw("string argument provided, but with an undefined value") if (exists($param{'-string'})); >> + # $class->throw("No file, fh, or string argument provided"); # neither defined >> + } >> >> my $format = $param{'-format'} || >> $class->_guess_format( $param{-file} || $ARGV[0] ); >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> http://bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From rmb32 at cornell.edu Wed Apr 13 11:30:59 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 13 Apr 2011 08:30:59 -0700 Subject: [Bioperl-l] last call for Google Summer of Code mentors Message-ID: <4DA5C1B3.4010504@cornell.edu> Hi all, This is the last call for mentors for Google Summer of Code. We have a good crop of student proposals this year for doing work on OBF projects, and money from Google to fund them, but we need experienced Bio* developers to mentor them. If you'd like to see the student proposals, participate in their scoring, and possibly volunteer to mentor them (remotely of course) over the summer, do two things: 1.) Create an account on http://google-melange.com and send a request to be an admin from the OBF page on there, http://www.google-melange.com/gsoc/org/google/gsoc2011/obf 2.) Join the OBF GSoC mentors mailing list at http://lists.open-bio.org/mailman/listinfo/gsoc-mentors Even if you just want to see the student applications and help with scoring, but don't necessarily have time to mentor a student, your input in the scoring process is appreciated. :-) Rob ---- Robert Buels OBF GSoC 2011 Administrator From dalloliogm at gmail.com Thu Apr 14 11:14:30 2011 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 14 Apr 2011 17:14:30 +0200 Subject: [Bioperl-l] provide examples of good and bad ML questions for a candidate 'Ten Simple Rules' article Message-ID: Hello everybody, I would like to invite you to an initiative that our group launched a few weeks earlier this month. We are writing a paper in the style of PLoS CompBiol 'Ten Simple Rules' series, about 'How to get Help from Mailing Lists and Online Scientific Communities'. - http://www.wikigenes.org/e/pub/e/137.html Mailing lists and forums/online communities can be an important resource for researchers. The OpenBio.* mailing lists are an example of this, as they are the medium where all the bio.* projects are coordinated and where new users meet experts. However, using mailing lists correctly is not easy, and there are some rules that not everybody is aware of, but that must be respected in order to obtain good answers. Taking inspiration from this last point, we decided to launch the initiative of a candidate 'Ten Simple Rules' article. The article is open to contributions, which means that everybody is free to edit the manuscript and that the authors of the most important contributions will be invited to sign the paper. More precisely, at this point of the writing, the main body of the manuscript is almost complete. However, we need help for completing a table with examples of good and bad mailing list questions. I bet that the most experienced followers of this mailing list can easily provide many examples of badly posed questions they have seen (and hopefully some good ones); so, if you have the time to make your contribution, please join the wiki and the mailing list and help us making this manuscript more complete. Please feel free to forward this message to who you believe interested. -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it From cjfields at illinois.edu Thu Apr 14 15:57:37 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Apr 2011 14:57:37 -0500 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released Message-ID: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> All, BioPerl 1.6.9 is now available in CPAN. In this release: * Refactored Bio::Species/Bio::Tree * New SeqIO modules (gbxml, msout, mbsout) * Updates for perl 5.12 * Bio::Assembly support for SAM/BAM, Newbler, ace output * Bio::DB::SeqFeature updates * PAML updated to work with v. 4.4d * lots of various bug fixes, around 50 Just to note, this is the first release after I reworked the Build.PL system, so we will probably hit a few speed bumps along the way. This is in effort to simplify the process for further work this summer on modularizing BioPerl, but it also makes new releases much easier to make. In particular, this has only been tested on Ubuntu Linux and Mac OS X (no Windows testing has occurred yet). Please post if there are any problems. Enjoy! chris From rmb32 at cornell.edu Thu Apr 14 16:14:26 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 14 Apr 2011 13:14:26 -0700 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> Message-ID: <4DA755A2.4010405@cornell.edu> Hurray! Chris, you are tremendous. Seriously. Rob On 04/14/2011 12:57 PM, Chris Fields wrote: > All, > > BioPerl 1.6.9 is now available in CPAN. In this release: > > * Refactored Bio::Species/Bio::Tree > * New SeqIO modules (gbxml, msout, mbsout) > * Updates for perl 5.12 > * Bio::Assembly support for SAM/BAM, Newbler, ace output > * Bio::DB::SeqFeature updates > * PAML updated to work with v. 4.4d > * lots of various bug fixes, around 50 > > Just to note, this is the first release after I reworked the Build.PL system, so we will probably hit a few speed bumps along the way. This is in effort to simplify the process for further work this summer on modularizing BioPerl, but it also makes new releases much easier to make. In particular, this has only been tested on Ubuntu Linux and Mac OS X (no Windows testing has occurred yet). Please post if there are any problems. > > Enjoy! > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu Apr 14 16:56:59 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 14 Apr 2011 22:56:59 +0200 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: <4DA755A2.4010405@cornell.edu> References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <4DA755A2.4010405@cornell.edu> Message-ID: Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give it up. Dave On Thu, Apr 14, 2011 at 22:14, Robert Buels wrote: > Hurray! Chris, you are tremendous. Seriously. > > Rob > > > > On 04/14/2011 12:57 PM, Chris Fields wrote: > >> All, >> >> BioPerl 1.6.9 is now available in CPAN. In this release: >> >> * Refactored Bio::Species/Bio::Tree >> * New SeqIO modules (gbxml, msout, mbsout) >> * Updates for perl 5.12 >> * Bio::Assembly support for SAM/BAM, Newbler, ace output >> * Bio::DB::SeqFeature updates >> * PAML updated to work with v. 4.4d >> * lots of various bug fixes, around 50 >> >> Just to note, this is the first release after I reworked the Build.PL >> system, so we will probably hit a few speed bumps along the way. This is in >> effort to simplify the process for further work this summer on modularizing >> BioPerl, but it also makes new releases much easier to make. In particular, >> this has only been tested on Ubuntu Linux and Mac OS X (no Windows testing >> has occurred yet). Please post if there are any problems. >> >> Enjoy! >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at gmail.com Thu Apr 14 17:57:27 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 14 Apr 2011 14:57:27 -0700 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <4DA755A2.4010405@cornell.edu> Message-ID: <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> Excellent news Chris - thanks for the hard work! -jason On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: > Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give it up. > > > Dave > > > > > > On Thu, Apr 14, 2011 at 22:14, Robert Buels wrote: > >> Hurray! Chris, you are tremendous. Seriously. >> >> Rob >> >> >> >> On 04/14/2011 12:57 PM, Chris Fields wrote: >> >>> All, >>> >>> BioPerl 1.6.9 is now available in CPAN. In this release: >>> >>> * Refactored Bio::Species/Bio::Tree >>> * New SeqIO modules (gbxml, msout, mbsout) >>> * Updates for perl 5.12 >>> * Bio::Assembly support for SAM/BAM, Newbler, ace output >>> * Bio::DB::SeqFeature updates >>> * PAML updated to work with v. 4.4d >>> * lots of various bug fixes, around 50 >>> >>> Just to note, this is the first release after I reworked the Build.PL >>> system, so we will probably hit a few speed bumps along the way. This is in >>> effort to simplify the process for further work this summer on modularizing >>> BioPerl, but it also makes new releases much easier to make. In particular, >>> this has only been tested on Ubuntu Linux and Mac OS X (no Windows testing >>> has occurred yet). Please post if there are any problems. >>> >>> Enjoy! >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Apr 14 17:06:40 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 14 Apr 2011 17:06:40 -0400 Subject: [Bioperl-l] Bioperl installation doubt In-Reply-To: <1569920B-D132-487F-BBC6-9A81A430E6E0@illinois.edu> References: <6CE8AACF-0D78-4BBA-8094-CC189B0FF872@illinois.edu> <63E35F39-D70C-4D6F-A792-D2738B850567@verizon.net> <1569920B-D132-487F-BBC6-9A81A430E6E0@illinois.edu> Message-ID: Chris, I've done a bit of work on http://www.bioperl.org/wiki/Bptutorial.pl, I'd say it's now about one quarter of its original size. Most of the text has gone into new or existing HOWTOs. BIO On Mar 28, 2011, at 6:40 PM, Chris Fields wrote: > Brian, > > I think this was started: > > http://www.bioperl.org/wiki/Bptutorial.pl > > It certainly could be cleaned up, organized, and updated (that and the FAQ). Makes sense to have it as a HOWTO or maybe split it into several HOWTOs. Maybe even combine it with the beginner's HOWTO into various sections? > > chris > > On Mar 28, 2011, at 3:43 PM, Brian Osborne wrote: > >> Chris, >> >> I'll get started on dissembling bptutorial. There's certainly useful text in there but there's also duplicated or outdated material. Looks like there are 3 options for any given section: >> >> - put it into an existing HOWTO >> - make it into a new HOWTO >> - delete it >> >> BIO >> >> On Mar 28, 2011, at 9:27 AM, Chris Fields wrote: >> >>> Dave, >>> >>> +1 on removing old docs to prevent confusion. Or, alternatively, +1 to syncing those to current docs (though I think decreasing the replication effort in keeping docs up-to-date is probably the best tact). >>> >>> chris >>> >>> On Mar 28, 2011, at 6:51 AM, Dave Messina wrote: >>> >>>>> >>>>> Thank you very much. It is working. I got the program code from the >>>>> following link. >>>>> >>>>> http://www.bioperl.org/Core/Latest/bptutorial.html >>>> >>>> >>>> Aha, okay. You got there from Google, I guess? That is *way* out of date. >>>> >>>> To the other core devs, in order to prevent this confusion in the future, >>>> I'd like to delete the Core/ directory from our website since it's been >>>> superseded at this point by other docs and is not current. I intend to put >>>> up a ticket at Redmine, but I will wait a bit before doing so to allow time >>>> for people to see this and comment ? please do speak up if there's good >>>> reason to keep it. >>>> >>>> >>>> Could you please give me the link to join this forum to see other >>>>> discussions, which would be more helpful to me? >>>>> >>>> >>>> Sure, you can sign up for the mailing list here: >>>> >>>> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> And the archives are also searchable: >>>> >>>> http://news.gmane.org/gmane.comp.lang.perl.bio.general >>>> >>>> >>>> Please let me know if you have any suggestion for me to keep learning the >>>>> bioperl. >>>> >>>> >>>> I would also suggest reading the (current) tutorial and HOWTOs at >>>> www.bioperl.org . Lots of good links on the main page there, particularly >>>> under the Documentation heading. >>>> >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>>>> With regards, >>>>> Ravi. >>>>> >>>>> >>>>> 2011/3/28 Dave Messina >>>>> >>>>>> Hi Ravi, >>>>>> >>>>>> Please make sure to "Reply All" so that everyone on the mailing list can >>>>>> follow (and add to) the discussion. >>>>>> >>>>>> If you read the first line of the exception, you'll see it states what the >>>>>> error is: >>>>>> "WebDBSeqI Error ? check query sequences!" >>>>>> >>>>>> You'd have no way of knowing this, but that ID and database combination is >>>>>> not functioning anymore, so that's why in this case you're getting an error. >>>>>> Please try using the example in the tutorial here: >>>>>> >>>>>> >>>>>> http://www.bioperl.org/wiki/BioPerl_Tutorial#Quick_getting_started_scripts >>>>>> >>>>>> which has been updated to a different ID which should work. >>>>>> >>>>>> Sorry for the confusion! So that we can prevent other people from having >>>>>> the same issue, could you tell me where you got that example code? >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2011/3/28 ?????????????????? >>>>>> >>>>>>> Hi Dave, >>>>>>> >>>>>>> Thanks a lot for your reply. It is really helpful. Please find the >>>>>>> screenshot after making the change you pointed out. But I am getting >>>>>>> "Exception: Bio::Root::Exception" error. I think I have to set the >>>>>>> environment variables but I am not sure how to do that. Could you please >>>>>>> guide me on this too. >>>>>>> >>>>>>> I can go to the "Environment Variable" Window in my pc. But I dont know >>>>>>> what to enter once I click "New" on that window. >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> With regards, >>>>>>> Ravi. >>>>>>> >>>>>>> >>>>>>> 2011/3/27 Dave Messina >>>>>>> >>>>>>>> Hi Ravi, >>>>>>>> >>>>>>>> The get_sequence and write_sequence methods are in the Bio::Perl module, >>>>>>>> not Bio::Seq. So your first line >>>>>>>> >>>>>>>> use Bio::Seq; >>>>>>>> >>>>>>>> should be replaced with >>>>>>>> >>>>>>>> use Bio::Perl; >>>>>>>> >>>>>>>> >>>>>>>> More examples in the BioPerl Tutorial here: >>>>>>>> http://www.bioperl.org/wiki/BioPerl_Tutorial >>>>>>>> >>>>>>>> >>>>>>>> Dve >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2011/3/24 ?????????????????? >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could you please help me installing bioperl-db, bioperl-run & other >>>>>>>>> packages >>>>>>>>> using ppm on windows 7? Please find the attachment for the error >>>>>>>>> message I >>>>>>>>> get. I would really appreciate if you help me fix this issue. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Ravi. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Apr 14 21:22:04 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 14 Apr 2011 21:22:04 -0400 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <4DA755A2.4010405@cornell.edu> <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> Message-ID: Yeah - awesome, congrats, that was fast! -hilmar On Apr 14, 2011, at 5:57 PM, Jason Stajich wrote: > Excellent news Chris - thanks for the hard work! > > -jason > On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: > >> Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give >> it up. >> >> >> Dave >> >> >> >> >> >> On Thu, Apr 14, 2011 at 22:14, Robert Buels >> wrote: >> >>> Hurray! Chris, you are tremendous. Seriously. >>> >>> Rob >>> >>> >>> >>> On 04/14/2011 12:57 PM, Chris Fields wrote: >>> >>>> All, >>>> >>>> BioPerl 1.6.9 is now available in CPAN. In this release: >>>> >>>> * Refactored Bio::Species/Bio::Tree >>>> * New SeqIO modules (gbxml, msout, mbsout) >>>> * Updates for perl 5.12 >>>> * Bio::Assembly support for SAM/BAM, Newbler, ace output >>>> * Bio::DB::SeqFeature updates >>>> * PAML updated to work with v. 4.4d >>>> * lots of various bug fixes, around 50 >>>> >>>> Just to note, this is the first release after I reworked the >>>> Build.PL >>>> system, so we will probably hit a few speed bumps along the way. >>>> This is in >>>> effort to simplify the process for further work this summer on >>>> modularizing >>>> BioPerl, but it also makes new releases much easier to make. In >>>> particular, >>>> this has only been tested on Ubuntu Linux and Mac OS X (no >>>> Windows testing >>>> has occurred yet). Please post if there are any problems. >>>> >>>> Enjoy! >>>> >>>> chris >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From sac at bioperl.org Fri Apr 15 02:38:30 2011 From: sac at bioperl.org (Steve Chervitz) Date: Thu, 14 Apr 2011 23:38:30 -0700 Subject: [Bioperl-l] not recognizing hsp in a hit In-Reply-To: References: <31326619.post@talk.nabble.com> <08FD8735-E577-4925-BD5D-DD5D1A9426AE@colorado.edu> Message-ID: On Thu, Apr 7, 2011 at 2:28 AM, Dave Messina wrote: > I should also say: I wouldn't be surprised if the blast-style output that > Blat produces is slightly nonstandard and causes our parser to make > mistakes. By the way, which version of Blat are you using? > > It looks like Jim Kent has taken Blat at least semi-commercial: > http://www.kentinformatics.com/products.html > If there's a new version of blat, I don't think BioPerl has seen it yet. > > Can anyone confirm or deny recent Blat output format changes? > It's not recent, but this might possibly be the culprit: blat quirk in -blast output mode: https://lists.soe.ucsc.edu/pipermail/genome/2008-April/016211.html Steve From dan.bolser at gmail.com Fri Apr 15 07:05:27 2011 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 15 Apr 2011 12:05:27 +0100 Subject: [Bioperl-l] Use of uninitialized value in length at Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1062 In-Reply-To: References: <4D8F73A4.8050408@bioperl.org> <4960971A-8C61-49F2-B43A-E32BAFC4C59E@illinois.edu> Message-ID: Now logged here: https://redmine.open-bio.org/issues/3206 And commented on here: https://github.com/bioperl/bioperl-live/pull/7 Not sure how to provide pointers from those sources to this thread? Cheers, Dan. On 28 March 2011 21:08, Dan Bolser wrote: > On 27 March 2011 21:13, Chris Fields wrote: >> On Mar 27, 2011, at 12:55 PM, Dan Bolser wrote: >> >>> On 27 March 2011 18:28, Jason Stajich wrote: >>>> Dan - not sure why you would need to do this as length on an undef should >>>> still return false (an undef). >>>> >>>> $ perl -e '$g=""; if( length($g)) { print "yes" } else { print "no"} print >>>> "\n"' >>>> no >>>> $ perl -e '$g=undef; if( length($g)) { print "yes" } else { print "no"} >>>> print "\n"' >>>> no >>> >>> Doesn't the latter spew a warning? (The output before / after my 'bug >>> fix' is the same, I just don't see 100s of warnings about undefined >>> values). >> >> Only with later versions of perl (I think perl 5.12). >> >>>> Also, having no 'source' is probably not proper GFF3. >>> >>> I'm quite sure it is, but by GFF does have a source. I'm just calling >>> 'features' with only a feature type and not a feature type and a >>> source (because I only care about source). My call is pretty similar >>> to the example here: >>> >>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/DB/SeqFeature/Store.pm#features >>> >>> @features = $db->features(-seqid=>'Chr1',-start=>5000,-end=>6000,-types=>'mRNA'); >> >> I think Jason is inferring that the GFF3 is invalid from your statement below re: $source_tag not being defined, which to me means the source attribute for the feature is not defined. >> >> Note: b/c something works with SF::Store does NOT mean the source is proper GFF3; it is quite possible to have invalid GFF3 loaded into the database w/o a hiccup. ?I think the loader assumes the data loaded has already been validated; IIRC there is very little validation done on GFF3 loaded into SF::Store, particularly the 'type'. > > As I said, my GFF3 does have a source. I'll try to put together a test > script, as it seems like the value should not be undef (but thanks for > the tip Roy). > > I had a feeling that this relatively trivial fix was a sticking > plaster on a larger problem. > > >> chris > Dan > From p.j.a.cock at googlemail.com Fri Apr 15 07:33:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 15 Apr 2011 12:33:02 +0100 Subject: [Bioperl-l] Use of uninitialized value in length at Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1062 In-Reply-To: References: <4D8F73A4.8050408@bioperl.org> <4960971A-8C61-49F2-B43A-E32BAFC4C59E@illinois.edu> Message-ID: On Fri, Apr 15, 2011 at 12:05 PM, Dan Bolser wrote: > Now logged here: > https://redmine.open-bio.org/issues/3206 > > And commented on here: > https://github.com/bioperl/bioperl-live/pull/7 > > Not sure how to provide pointers from those sources to this thread? Link to the mailing list archive perhaps? http://lists.open-bio.org/pipermail/bioperl-l/2011-March/034727.html ... http://lists.open-bio.org/pipermail/bioperl-l/2011-April/034899.html Peter From xinli.sun at sdstate.edu Thu Apr 14 19:52:16 2011 From: xinli.sun at sdstate.edu (Sun, Xinli) Date: Thu, 14 Apr 2011 18:52:16 -0500 Subject: [Bioperl-l] FW: Pal2Nal problem In-Reply-To: References: Message-ID: Dear Sir or Madam, I am a Perl beginner, and am using the Bio::Tools::Run::Alignment::Pal2Nal module to get the nucleotide sequences based on an alignment of their translations for PAML-yn00. However, I meet some problems, and send my codes and error report to you, and expect to get your help. The errors usually occur at the input of the DNA sequences and the output of the result. I am not sure for these. Do you have some examples with the sequence input and output for the Pal2Nal module? If yes, I hope to imitate your codes. Thanks a lot, Xinli Sun. Ph. D. Plant Science Department South Dakota State University SNP 211, Box 2140C Brookings, SD 57007 605-688-4984 (lab) -------------- next part -------------- A non-text attachment was scrubbed... Name: pal2nalw.pl Type: application/x-perl Size: 1329 bytes Desc: pal2nalw.pl URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Error.txt URL: From roy.chaudhuri at gmail.com Fri Apr 15 10:01:25 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 15 Apr 2011 15:01:25 +0100 Subject: [Bioperl-l] FW: Pal2Nal problem In-Reply-To: References: Message-ID: <4DA84FB5.5090805@gmail.com> Hi Xinli, You don't need to use Bio::AlignIO to read $alnDP, since it is already a Bio::SimpleAlign object. To write out the Phylip file you just need this: #output the result with phylip format my $out = Bio::AlignIO->new(-file => ">out.phy" , -format => 'phylip'); $out->write_aln($alnDP); Hope this helps, Roy. On 15/04/2011 00:52, Sun, Xinli wrote: > > Dear Sir or Madam, > > I am a Perl beginner, and am using the > Bio::Tools::Run::Alignment::Pal2Nal module to get the nucleotide > sequences based on an alignment of their translations for PAML-yn00. > However, I meet some problems, and send my codes and error report to > you, and expect to get your help. The errors usually occur at the > input of the DNA sequences and the output of the result. I am not > sure for these. > > Do you have some examples with the sequence input and output for the > Pal2Nal module? If yes, I hope to imitate your codes. > > Thanks a lot, > > Xinli Sun. Ph. D. > > Plant Science Department South Dakota State University SNP 211, Box > 2140C Brookings, SD 57007 605-688-4984 (lab) > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From aradwen at gmail.com Fri Apr 15 11:02:26 2011 From: aradwen at gmail.com (Radhouane Aniba) Date: Fri, 15 Apr 2011 11:02:26 -0400 Subject: [Bioperl-l] Species tree and partition Message-ID: Hello everyone, I have a yest species tree containing 23 yeast species, and would like to partition the tree using bioperl. By partitioning I mean cutting internal edges and split the tree into two subtrees. I have to do that for all possible internal branches (all combinations) Is there a way to do that with bioperl ? Thank you Radhouane -- From andrei.tudor at rocketmail.com Fri Apr 15 11:10:54 2011 From: andrei.tudor at rocketmail.com (Andrei Tudor) Date: Fri, 15 Apr 2011 08:10:54 -0700 (PDT) Subject: [Bioperl-l] 454 raw Quality Message-ID: <859969.23813.qm@web111010.mail.gq1.yahoo.com> Hello, Is there a way to get a quality score for a 454 sequencing? I have the qual and fasta file of the reads, but is there a script that calculates the overall score? Thanks, Andrei From andrei.tudor at rocketmail.com Fri Apr 15 10:51:45 2011 From: andrei.tudor at rocketmail.com (Andrei Tudor) Date: Fri, 15 Apr 2011 07:51:45 -0700 (PDT) Subject: [Bioperl-l] 454 raw Quality Message-ID: <899341.94126.qm@web111005.mail.gq1.yahoo.com> Hello, Is there a way to get a quality score for a 454 sequencing? I have the qual and fasta file of the reads, but is there a script that calculates the overall score? Thanks, Andrei From Cynthia.Page at Colorado.EDU Fri Apr 15 11:18:14 2011 From: Cynthia.Page at Colorado.EDU (pageski) Date: Fri, 15 Apr 2011 08:18:14 -0700 (PDT) Subject: [Bioperl-l] not recognizing hsp in a hit In-Reply-To: References: <31326619.post@talk.nabble.com> Message-ID: <31406791.post@talk.nabble.com> Thanks for your responses, I am using blat version 34 and used -blast output mode. Steve Chervitz-2 wrote: > > On Thu, Apr 7, 2011 at 2:28 AM, Dave Messina > wrote: > >> I should also say: I wouldn't be surprised if the blast-style output that >> Blat produces is slightly nonstandard and causes our parser to make >> mistakes. By the way, which version of Blat are you using? >> >> It looks like Jim Kent has taken Blat at least semi-commercial: >> http://www.kentinformatics.com/products.html >> If there's a new version of blat, I don't think BioPerl has seen it yet. >> >> Can anyone confirm or deny recent Blat output format changes? >> > > It's not recent, but this might possibly be the culprit: > > blat quirk in -blast output mode: > https://lists.soe.ucsc.edu/pipermail/genome/2008-April/016211.html > > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/not-recognizing-hsp-in-a-hit-tp31326619p31406791.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From dvadell at clustering.com.ar Fri Apr 15 11:50:56 2011 From: dvadell at clustering.com.ar (Diego M. Vadell) Date: Fri, 15 Apr 2011 12:50:56 -0300 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> Message-ID: <201104151250.57126.dvadell@clustering.com.ar> Thank you very much!! -- Diego On Thursday, April 14, 2011 10:22:04 pm Hilmar Lapp wrote: > Yeah - awesome, congrats, that was fast! > > -hilmar > > On Apr 14, 2011, at 5:57 PM, Jason Stajich wrote: > > Excellent news Chris - thanks for the hard work! > > > > -jason > > > > On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: > >> Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give > >> it up. > >> > >> > >> Dave > >> > >> > >> > >> > >> > >> On Thu, Apr 14, 2011 at 22:14, Robert Buels > >> > >> wrote: > >>> Hurray! Chris, you are tremendous. Seriously. > >>> > >>> Rob > >>> > >>> On 04/14/2011 12:57 PM, Chris Fields wrote: > >>>> All, > >>>> > >>>> BioPerl 1.6.9 is now available in CPAN. In this release: > >>>> > >>>> * Refactored Bio::Species/Bio::Tree > >>>> * New SeqIO modules (gbxml, msout, mbsout) > >>>> * Updates for perl 5.12 > >>>> * Bio::Assembly support for SAM/BAM, Newbler, ace output > >>>> * Bio::DB::SeqFeature updates > >>>> * PAML updated to work with v. 4.4d > >>>> * lots of various bug fixes, around 50 > >>>> > >>>> Just to note, this is the first release after I reworked the > >>>> Build.PL > >>>> system, so we will probably hit a few speed bumps along the way. > >>>> This is in > >>>> effort to simplify the process for further work this summer on > >>>> modularizing > >>>> BioPerl, but it also makes new releases much easier to make. In > >>>> particular, > >>>> this has only been tested on Ubuntu Linux and Mac OS X (no > >>>> Windows testing > >>>> has occurred yet). Please post if there are any problems. > >>>> > >>>> Enjoy! > >>>> > >>>> chris > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sac at bioperl.org Fri Apr 15 12:55:37 2011 From: sac at bioperl.org (Steve Chervitz) Date: Fri, 15 Apr 2011 09:55:37 -0700 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <4DA755A2.4010405@cornell.edu> <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> Message-ID: perl -e 'print map chr, qw(67 104 101 101 114 115 32 116 111 32 99 106 102 105 101 108 100 115 33 32 58 41 10);' perl -e 'print map chr, qw(83 116 101 118 101 10);' On Thu, Apr 14, 2011 at 6:22 PM, Hilmar Lapp wrote: > Yeah - awesome, congrats, that was fast! > > -hilmar > > > On Apr 14, 2011, at 5:57 PM, Jason Stajich wrote: > > Excellent news Chris - thanks for the hard work! >> >> -jason >> On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: >> >> Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give it up. >>> >>> >>> Dave >>> >>> >>> >>> >>> >>> On Thu, Apr 14, 2011 at 22:14, Robert Buels wrote: >>> >>> Hurray! Chris, you are tremendous. Seriously. >>>> >>>> Rob >>>> >>>> >>>> >>>> On 04/14/2011 12:57 PM, Chris Fields wrote: >>>> >>>> All, >>>>> >>>>> BioPerl 1.6.9 is now available in CPAN. In this release: >>>>> >>>>> * Refactored Bio::Species/Bio::Tree >>>>> * New SeqIO modules (gbxml, msout, mbsout) >>>>> * Updates for perl 5.12 >>>>> * Bio::Assembly support for SAM/BAM, Newbler, ace output >>>>> * Bio::DB::SeqFeature updates >>>>> * PAML updated to work with v. 4.4d >>>>> * lots of various bug fixes, around 50 >>>>> >>>>> Just to note, this is the first release after I reworked the Build.PL >>>>> system, so we will probably hit a few speed bumps along the way. This >>>>> is in >>>>> effort to simplify the process for further work this summer on >>>>> modularizing >>>>> BioPerl, but it also makes new releases much easier to make. In >>>>> particular, >>>>> this has only been tested on Ubuntu Linux and Mac OS X (no Windows >>>>> testing >>>>> has occurred yet). Please post if there are any problems. >>>>> >>>>> Enjoy! >>>>> >>>>> chris >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abhishek.vit at gmail.com Fri Apr 15 16:16:26 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Fri, 15 Apr 2011 13:16:26 -0700 Subject: [Bioperl-l] From Blast hits to Taxanomy lineage for Short DNA Sequences (reads) In-Reply-To: References: <4D77466F.4040604@uv.es> Message-ID: Hi Guys I have one more related question. This time I have list of NCBI locus names and not GI numbers. What I need to do is to obtain lineage for each locus name. Is this functionality built in ? Eg: I want to seach NCBI for Locus name "CP000490" and get the organism lineage ? Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae; Paracoccus. This info is present in the gen bank record but I am not sure whats the best way to fetch it specifically. http://www.ncbi.nlm.nih.gov/nuccore/CP000490 Thanks for your help! -Abhi On Wed, Mar 9, 2011 at 7:25 PM, Abhishek Pratap wrote: > Thanks guys. I could not try either method today but will get back to > you if I face problem. > > Best, > -Abhi > > On Wed, Mar 9, 2011 at 9:34 AM, shalabh sharma > wrote: > > Hey Abhishek: > > The other way to deal with this that you can download > > the gi_taxaid file from ncbi. > > Convert all your GI's to taxaid and use Bio::DB:Taxanomy. > > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html > > I think there are lot of other options too, if you are interested you can > > search for the thread which i started long time back. > > Hope this helps. > > -Shalabh Sharma > > ----------------------------------------------- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > > > On Wed, Mar 9, 2011 at 4:20 AM, Miguel Pignatelli < > miguel.pignatelli at uv.es> > > wrote: > >> > >> Hi Abhishek, > >> > >> For a non bioperl related solution, take a look at Bio::LITE::Taxonomy. > >> It has been design to deal with great number of sequences (it is fast > and > >> efficient). > >> > >> You may also find interesting the Blast2lca tool, > >> > >> https://github.com/emepyc/Blast2lca > >> > >> It currently works with the best hits for each query (calculates the > lower > >> common ancestor), but if you want to use only the best hit, please drop > me a > >> line. > >> > >> Please, let me know if you need further help with any of these, > >> > >> Cheers, > >> > >> M; > >> > >> > >> > >> On 08/03/11 22:42, Abhishek Pratap wrote: > >>> > >>> Hi All > >>> > >>> I have results from different megablast of short reads(DNA sequences) > >>> and after extracting the tophit for each read I want to bin them by > >>> their lineage creating a tree. > >>> > >>> For example. > >>> > >>> If blast query hits the reference -> > >>> > >>> > gi|196110604|gb|CP001103.1|__Alteromonas_macleodii_'Deep_ecotype',_complete_genome > >>> > >>> I want to get the lineage for this specie. > >>> > >>> > >>> > Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae;Alteromonas;Alteromona > >>> > >>> The final goal is to do the above mapping as efficiently as possible. > >>> Any pointers will be appreciated. > >>> > >>> > >>> Thanks! > >>> -Abhi > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From abhishek.vit at gmail.com Fri Apr 15 18:39:38 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Fri, 15 Apr 2011 15:39:38 -0700 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included Message-ID: Hi Guys Sorry I am posting the same question again from an old thread. I hope this time the subject line is more relevant to the question. I have list of NCBI Accession/locus name and not GI numbers. What I need to do is to obtain lineage for each NCBI accession. Is this functionality built in directly ? I am using eftech to get the genbank record but not sure how to specifically pull out the organism lineage. Also I would want this to be fast as I will have thousands of such accessions to query. Eg: I want to seach NCBI for Locus name "CP000490" and get the organism lineage ? Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae; Paracoccus. This info is present in the gen bank record but I am not sure whats the best way to fetch it specifically. http://www.ncbi.nlm.nih.gov/nuccore/CP000490 Sample code : my @ids = qw( NW_001884661 EZ361133 CP000490 ) ; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -email => 'apratap at lbl.gov', -db => 'nucleotide', -id => \@ids, ); my $file = 'temp.gb'; $factory->get_Response(-file => $file); my $seqin = Bio::SeqIO->new(-file => $file, -format => 'genbank'); Thanks for your help! -Abhi From timmcilveen at talktalk.net Fri Apr 15 19:35:21 2011 From: timmcilveen at talktalk.net (Tim) Date: Sat, 16 Apr 2011 00:35:21 +0100 Subject: [Bioperl-l] installation error In-Reply-To: References: Message-ID: <4DA8D639.7050403@talktalk.net> Hi everyone, I'm having a problem installing Bioperl with my new install of Fedora 14. I have installed bioperl using GitHub. $ git clone git://github.com/bioperl/bioperl-live.git I told Perl where to find it: export PERL5LIB="$HOME/tm2383/bioperl-live:$PERL5LIB" Then I tested the install: $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' I got this error: Can't locate Bio/Perl.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5 /usr/share/perl5 /usr/lib64/perl5 /usr/share/perl5 /usr/local/lib64/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/local/lib/perl5/site_perl/5.10.0 /usr/lib64/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl .). I have checked my Perl installation: find / -name 'perl' /usr/bin/perl /usr/src/kernels/2.6.35.11-83.fc14.x86_64/tools/perf/scripts/perl /usr/src/kernels/2.6.35.6-45.fc14.x86_64/tools/perf/scripts/perl /usr/libexec/perf-core/scripts/perl My Perl is is /usr/bin/perl, but for some reason /usr/local/lib64/perl5 is being mentioned in the error message. I'm not expert at Linux, so I have reported all I can about the problem, using the information that I knew how to retrieve. I hope this helps! Any advice would be great Tim From florent.angly at gmail.com Sat Apr 16 00:17:28 2011 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 16 Apr 2011 14:17:28 +1000 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: <201104151250.57126.dvadell@clustering.com.ar> References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> <201104151250.57126.dvadell@clustering.com.ar> Message-ID: <4DA91858.8060208@gmail.com> Awesome! Thanks to everyone for the efforts they put in the new version. Florent On 16/04/11 01:50, Diego M. Vadell wrote: > Thank you very much!! > > -- Diego > > On Thursday, April 14, 2011 10:22:04 pm Hilmar Lapp wrote: >> Yeah - awesome, congrats, that was fast! >> >> -hilmar >> >> On Apr 14, 2011, at 5:57 PM, Jason Stajich wrote: >>> Excellent news Chris - thanks for the hard work! >>> >>> -jason >>> >>> On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: >>>> Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give >>>> it up. >>>> >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Apr 14, 2011 at 22:14, Robert Buels >>>> >>>> wrote: >>>>> Hurray! Chris, you are tremendous. Seriously. >>>>> >>>>> Rob >>>>> >>>>> On 04/14/2011 12:57 PM, Chris Fields wrote: >>>>>> All, >>>>>> >>>>>> BioPerl 1.6.9 is now available in CPAN. In this release: >>>>>> >>>>>> * Refactored Bio::Species/Bio::Tree >>>>>> * New SeqIO modules (gbxml, msout, mbsout) >>>>>> * Updates for perl 5.12 >>>>>> * Bio::Assembly support for SAM/BAM, Newbler, ace output >>>>>> * Bio::DB::SeqFeature updates >>>>>> * PAML updated to work with v. 4.4d >>>>>> * lots of various bug fixes, around 50 >>>>>> >>>>>> Just to note, this is the first release after I reworked the >>>>>> Build.PL >>>>>> system, so we will probably hit a few speed bumps along the way. >>>>>> This is in >>>>>> effort to simplify the process for further work this summer on >>>>>> modularizing >>>>>> BioPerl, but it also makes new releases much easier to make. In >>>>>> particular, >>>>>> this has only been tested on Ubuntu Linux and Mac OS X (no >>>>>> Windows testing >>>>>> has occurred yet). Please post if there are any problems. >>>>>> >>>>>> Enjoy! >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From drummike at gmail.com Sat Apr 16 01:30:51 2011 From: drummike at gmail.com (Mike Williams) Date: Sat, 16 Apr 2011 01:30:51 -0400 Subject: [Bioperl-l] installation error In-Reply-To: <4DA8D639.7050403@talktalk.net> References: <4DA8D639.7050403@talktalk.net> Message-ID: Hi there. A couple caveats first. I'm using fedora 13 and installed Bio::Perl from CPAN. I have never installed Bio::Perl using files from the git repository. Also, I compile perl from the source and install perl in a non-standard place to avoid conflicts between the fedora package manager (yum) and CPAN. So, my paths are different from yours. Lastly, I am far from an expert at Bio::Perl, although I have installed it a few times on multiple different machines. I've made a few comments interspersed with your message. On Fri, Apr 15, 2011 at 7:35 PM, Tim wrote: > I'm having a problem installing Bioperl with my new install of Fedora 14. > I have installed bioperl using GitHub. > > > ? ? ? ? ? $ git clone git://github.com/bioperl/bioperl-live.git You did not mention anything between fetching the code with git and testing. Did you run Build.PL? I am pretty sure you cannot just fetch the code from git and run it, you have do a couple other things first. There is a file called INSTALL in the bio perl tree that explains this in more detail, but you have to do a minimum of: perl Build.pl ./Build test # then if the tests pass it is save to proceed with the install ./Build install Note that if you want to install Bio::Perl under your home directory you will need to specify that in the first command listed above, something like: perl Build.PL --prefix /home/tim/tm2383 OR perl Build.PL --prefix $HOME/tm2383 The prefix argument will modify the file Build.PL produces, so the other two commands stay the same. > ? ? ? ? ? I told Perl where to find it: > > ? ? ? ? ? export PERL5LIB="$HOME/tm2383/bioperl-live:$PERL5LIB" The above line should work if you did the install with --prefix as listed above. > ? ? ? ? ? Then I tested the install: > ? ? ? ? ? $ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' > > ? ? ? ? ? I got this error: > ? ? ? ? ? Can't locate Bio/Perl.pm in @INC (@INC contains: > ? ? ? ? ? /usr/local/lib64/perl5 /usr/local/share/perl5 > ? ? ? ? ? /usr/lib64/perl5 /usr/share/perl5 /usr/lib64/perl5 > ? ? ? ? ? find / -name 'perl' Instead of find please post the output of: which perl which will locate the first version of perl found in the list of directories in you $PATH environmental variable. > ? ? ? ? ? /usr/bin/perl This is probably the perl that is getting run, you may well have a 64 bit perl if you are running a 64 bit version of fedora. To answer that question do the following: uname -a > ? ? ? ? ? /usr/src/kernels/2.6.35.11-83.fc14.x86_64/tools/perf/scripts/perl > ? ? ? ? ? /usr/src/kernels/2.6.35.6-45.fc14.x86_64/tools/perf/scripts/perl > ? ? ? ? ? /usr/libexec/perf-core/scripts/perl > > My Perl is is /usr/bin/perl, but for some reason /usr/local/lib64/perl5 is being mentioned in the error message. The directories that start with /usr/... listed in the error message are where perl searches for its libraries. > I'm not expert at Linux, so I have reported all I can about the problem, > using the information that I knew how to retrieve. I hope this helps! The info you posted helped quite a bit. A little more info would be helpful. If the two of us cannot resolve this, then a little more info should make it easier for someone else to chime in with some help. Please post output from the following: ################################################### which perl uname -a find ~ -name Perl.pm -print perldoc -l Bio::Perl echo $HOME echo $PERL5LIB # please place the output from the next command at the end because it will be kind of long and the helpful bits for this problem are probably at the beginning. perl -V ################################################### Here is the output of the first few of those commands from one of my fedora 13 boxes: $which perl /opt/bin/perl $ uname -a Linux watson.localdomain 2.6.34.8-68.fc13.i686.PAE #1 SMP Thu Feb 17 14:54:10 UTC 2011 i686 i686 i386 GNU/Linux $ perldoc -l Bio::Perl /opt/lib/perl5/site_perl/5.12.1/Bio/Perl.pm $find /opt -name Perl.pm -print /opt/lib/perl5/site_perl/5.12.1/Bio/Perl.pm $ perl -e 'print "@INC\n";' /opt/lib/perl5/site_perl/5.12.1/i686-linux /opt/lib/perl5/site_perl/5.12.1 /opt/lib/perl5/5.12.1/i686-linux /opt/lib/perl5/5.12.1 . As I mentioned at the beginning of my response, my paths will not match yours. Just included the info to point out a couple things. Note that the paths in @INC are similar to the ones in your error message (without the _64), and that those paths are different from the executable /opt/bin/perl. If your system is running a 64 bit version of fedora that wil be reflected in the output of uname -a and in the output of perl -V, but the name of perl itself will not change. Anyhow, post the stuff I asked about and we'll go from the there. Best, Mike From chiragmatkarbioinfo at gmail.com Sat Apr 16 08:27:43 2011 From: chiragmatkarbioinfo at gmail.com (Chirag Matkar) Date: Sat, 16 Apr 2011 19:27:43 +0700 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl 1.6.9 released In-Reply-To: <4DA91858.8060208@gmail.com> References: <4066E3ED-891B-4674-B6AE-57253F2AA825@illinois.edu> <9DF9A44B-A9E3-4E25-BA70-96C1C3FB98B7@gmail.com> <201104151250.57126.dvadell@clustering.com.ar> <4DA91858.8060208@gmail.com> Message-ID: Great Effort!! On Sat, Apr 16, 2011 at 11:17 AM, Florent Angly wrote: > Awesome! Thanks to everyone for the efforts they put in the new version. > Florent > > > On 16/04/11 01:50, Diego M. Vadell wrote: > >> Thank you very much!! >> >> -- Diego >> >> On Thursday, April 14, 2011 10:22:04 pm Hilmar Lapp wrote: >> >>> Yeah - awesome, congrats, that was fast! >>> >>> -hilmar >>> >>> On Apr 14, 2011, at 5:57 PM, Jason Stajich wrote: >>> >>>> Excellent news Chris - thanks for the hard work! >>>> >>>> -jason >>>> >>>> On Apr 14, 2011, at 1:56 PM, Dave Messina wrote: >>>> >>>>> Thanks, Chris! This is a PumpKing Extraordinaire, everybody. Give >>>>> it up. >>>>> >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Apr 14, 2011 at 22:14, Robert Buels >>>>> >>>>> wrote: >>>>> >>>>>> Hurray! Chris, you are tremendous. Seriously. >>>>>> >>>>>> Rob >>>>>> >>>>>> On 04/14/2011 12:57 PM, Chris Fields wrote: >>>>>> >>>>>>> All, >>>>>>> >>>>>>> BioPerl 1.6.9 is now available in CPAN. In this release: >>>>>>> >>>>>>> * Refactored Bio::Species/Bio::Tree >>>>>>> * New SeqIO modules (gbxml, msout, mbsout) >>>>>>> * Updates for perl 5.12 >>>>>>> * Bio::Assembly support for SAM/BAM, Newbler, ace output >>>>>>> * Bio::DB::SeqFeature updates >>>>>>> * PAML updated to work with v. 4.4d >>>>>>> * lots of various bug fixes, around 50 >>>>>>> >>>>>>> Just to note, this is the first release after I reworked the >>>>>>> Build.PL >>>>>>> system, so we will probably hit a few speed bumps along the way. >>>>>>> This is in >>>>>>> effort to simplify the process for further work this summer on >>>>>>> modularizing >>>>>>> BioPerl, but it also makes new releases much easier to make. In >>>>>>> particular, >>>>>>> this has only been tested on Ubuntu Linux and Mac OS X (no >>>>>>> Windows testing >>>>>>> has occurred yet). Please post if there are any problems. >>>>>>> >>>>>>> Enjoy! >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Regards, Chirag Matkar BTech Bioinformatics, Red Hat Certified Engineer, Software Engineer ,Cenveo From jaudall at gmail.com Fri Apr 15 13:52:33 2011 From: jaudall at gmail.com (Joshua Udall) Date: Fri, 15 Apr 2011 11:52:33 -0600 Subject: [Bioperl-l] 454 raw Quality In-Reply-To: <859969.23813.qm@web111010.mail.gq1.yahoo.com> References: <859969.23813.qm@web111010.mail.gq1.yahoo.com> Message-ID: Are you asking for a way to generate a single score for an entire read? Simply loop individually over the qual values in a qual object with a $counter and $total. On Fri, Apr 15, 2011 at 9:10 AM, Andrei Tudor wrote: > Hello, > > > > Is there a way to get a quality score for a 454 sequencing? > I have the qual and fasta file of the reads, but is there a script that calculates the overall score? > > Thanks, > Andrei > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Joshua Udall Assistant Professor 295 WIDB Plant and Wildlife Science Dept. Brigham Young University Provo, UT 84602 801-422-9307 Fax: 801-422-0008 USA From abhishek.vit at gmail.com Mon Apr 18 13:13:17 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 18 Apr 2011 10:13:17 -0700 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included In-Reply-To: References: Message-ID: Just wanted to push this once again if in case this message was missed over the weekend. -Abhi On Fri, Apr 15, 2011 at 3:39 PM, Abhishek Pratap wrote: > Hi Guys > > Sorry I am posting the same question again from an old thread. I hope this > time the subject line is more relevant to the question. > > I have list of NCBI Accession/locus name and not GI numbers. What I need > to do is to obtain lineage for each NCBI accession. > > Is this functionality built in directly ? I am using eftech to get the > genbank record but not sure how to specifically pull out the organism > lineage. Also I would want this to be fast as I will have thousands of such > accessions to query. > > Eg: > > I want to seach NCBI for Locus name "CP000490" and get the organism lineage > ? > > > Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; > Rhodobacteraceae; Paracoccus. > > > This info is present in the gen bank record but I am not sure whats the > best way to fetch it specifically. > http://www.ncbi.nlm.nih.gov/nuccore/CP000490 > > Sample code : > > my @ids = qw( NW_001884661 EZ361133 CP000490 ) ; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -email => 'apratap at lbl.gov', > -db => 'nucleotide', > -id => \@ids, > > > > ); > > my $file = 'temp.gb'; > > $factory->get_Response(-file => $file); > > my $seqin = Bio::SeqIO->new(-file => $file, > -format => 'genbank'); > > > > Thanks for your help! > -Abhi > From lskatz at gmail.com Mon Apr 18 11:25:11 2011 From: lskatz at gmail.com (Lee Katz) Date: Mon, 18 Apr 2011 11:25:11 -0400 Subject: [Bioperl-l] Retaining seqnos from clustal output in AlignIO Message-ID: Hi, I have a large Clustal alignment but I want to sort the sequences and retain the coordinate numbers on the sides. I've also hacked in strand information into the sequence name, but that's another story. How can I use an "in" alignio and an "out" alignio and still retain the coordinate numbers? I can't find an option to keep them. Thanks. SeqYYY.fna/1/232361-241520 TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 SeqZZZ.fna/1/233928-243087 TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 AnotherSeq.fna/1/1-10089 TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 -- Lee Katz From genehack at genehack.org Mon Apr 18 13:30:18 2011 From: genehack at genehack.org (John SJ Anderson) Date: Mon, 18 Apr 2011 13:30:18 -0400 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included In-Reply-To: References: Message-ID: On Fri, Apr 15, 2011 at 18:39, Abhishek Pratap wrote: > I want to seach NCBI for Locus name "CP000490" and get the organism lineage Maybe this will at least get you started: #! /opt/perl/bin/perl use strict; use warnings; use 5.010; use Bio::DB::GenBank; my @ids = qw( NW_001884661 EZ361133 CP000490 ); my $gbh = Bio::DB::GenBank->new(); foreach my $id( @ids ) { say "* ID: $id"; my $seq = $gbh->get_Seq_by_acc( $id ); my $org = $seq->species; say join ' ' , $org->classification , "\n"; } j. From cjfields at illinois.edu Mon Apr 18 19:42:49 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Apr 2011 18:42:49 -0500 Subject: [Bioperl-l] Fwd: Q: batched extraction of sub-sequences and their reverse-complements ? In-Reply-To: References: <710602.90088.qm@web28506.mail.ukl.yahoo.com> <641125.85561.qm@web28506.mail.ukl.yahoo.com> Message-ID: <332EB128-54C2-4259-9BF6-A7BBC6A7F1A1@illinois.edu> Have a look at Bio::Coordinate for various coordinate conversions (I think the specific module to use in this case is probably Bio::Coordinate::GeneMapper). chris On Apr 12, 2011, at 10:23 AM, Dave Messina wrote: > ---------- Forwarded message ---------- > From: wadim kapulkin > Date: Tue, Apr 12, 2011 at 17:13 > Subject: Re: [Bioperl-l] Q: batched extraction of sub-sequences and their > reverse-complements ? > To: Dave Messina > > > Hello Dave > > Thank you very much for yours response. Indeed my question might be split as > you did :) > > So first: > Yours suggestion below as to use Bio::DB::Fasta shall make trick. Thanks > very much ! > > As per second part : I probably did not explained properly what I had in > mind. However the link you included below seems to address this matter: > quoting exerted phrase 'Although coordinate conversion sounds pretty trivial > it can get fairly tricky when one includes the possibilities of switching to > coordinates on negative (i.e. Crick) strands and/or having a coordinate > system terminate because you have reached the end of a clone or contig.'. The > issue is indeed in the coordinate conversion. In the specific example, I > have been concerned with: I used Cbriggsae chromosomal set to run external > program and find out the output depends sometimes on strand polarity... > (this is getting even more complicated when used other assemblies/ db > freezes offering the sequences differing in lenght). I will need bit more > time to describe this specific example. > > Thanks very much again. > > Wadim > > ------------------------------ > *From:* Dave Messina > *To:* wadim kapulkin > *Cc:* bioperl-l at lists.open-bio.org > *Sent:* Sat, 9 April, 2011 4:47:34 > *Subject:* Re: [Bioperl-l] Q: batched extraction of sub-sequences and their > reverse-complements ? > > Hi Wadim, > > I would like to extract the batch of subsequences (as fastas), based on >> list of >> coordinates : i.e. 1-1000, 1001-2000 , 2001-3000 etc) from given 'large >> seqence' >> (i.e. chromosome sized >10MB) > > > Take a look at Bio::DB::Fasta. > > > > >> and then, ideally , I would be keen to know how to >> extract the converse set - [i.e.: extract 'same' ( I mean corresponding) >> batch >> of sequences, based on list of converse coordinates from >> reverse-complement of >> given 'large sequence']. >> > > I don't totally understand this part of your question, but this may help: > > http://www.bioperl.org/wiki/BioPerl_Tutorial#Converting_coordinate_systems_.28Coordinate::Pair.2C_RelSegment.29 > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Apr 19 12:30:48 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Apr 2011 11:30:48 -0500 Subject: [Bioperl-l] Bioperl installation doubt In-Reply-To: References: <6CE8AACF-0D78-4BBA-8094-CC189B0FF872@illinois.edu> <63E35F39-D70C-4D6F-A792-D2738B850567@verizon.net> <1569920B-D132-487F-BBC6-9A81A430E6E0@illinois.edu> Message-ID: <29C6D2F4-D011-4E29-BE3A-C82F4F323AC5@illinois.edu> Brian, Thanks for this! Saw that you have been working on the wiki to get this done. chris PS - just getting caught up on email, apologies for not responding sooner On Apr 14, 2011, at 4:06 PM, Brian Osborne wrote: > Chris, > > I've done a bit of work on http://www.bioperl.org/wiki/Bptutorial.pl, I'd say it's now about one quarter of its original size. Most of the text has gone into new or existing HOWTOs. > > BIO > > > > On Mar 28, 2011, at 6:40 PM, Chris Fields wrote: > >> Brian, >> >> I think this was started: >> >> http://www.bioperl.org/wiki/Bptutorial.pl >> >> It certainly could be cleaned up, organized, and updated (that and the FAQ). Makes sense to have it as a HOWTO or maybe split it into several HOWTOs. Maybe even combine it with the beginner's HOWTO into various sections? >> >> chris >> >> On Mar 28, 2011, at 3:43 PM, Brian Osborne wrote: >> >>> Chris, >>> >>> I'll get started on dissembling bptutorial. There's certainly useful text in there but there's also duplicated or outdated material. Looks like there are 3 options for any given section: >>> >>> - put it into an existing HOWTO >>> - make it into a new HOWTO >>> - delete it >>> >>> BIO >>> >>> On Mar 28, 2011, at 9:27 AM, Chris Fields wrote: >>> >>>> Dave, >>>> >>>> +1 on removing old docs to prevent confusion. Or, alternatively, +1 to syncing those to current docs (though I think decreasing the replication effort in keeping docs up-to-date is probably the best tact). >>>> >>>> chris >>>> >>>> On Mar 28, 2011, at 6:51 AM, Dave Messina wrote: >>>> >>>>>> >>>>>> Thank you very much. It is working. I got the program code from the >>>>>> following link. >>>>>> >>>>>> http://www.bioperl.org/Core/Latest/bptutorial.html >>>>> >>>>> >>>>> Aha, okay. You got there from Google, I guess? That is *way* out of date. >>>>> >>>>> To the other core devs, in order to prevent this confusion in the future, >>>>> I'd like to delete the Core/ directory from our website since it's been >>>>> superseded at this point by other docs and is not current. I intend to put >>>>> up a ticket at Redmine, but I will wait a bit before doing so to allow time >>>>> for people to see this and comment ? please do speak up if there's good >>>>> reason to keep it. >>>>> >>>>> >>>>> Could you please give me the link to join this forum to see other >>>>>> discussions, which would be more helpful to me? >>>>>> >>>>> >>>>> Sure, you can sign up for the mailing list here: >>>>> >>>>> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> And the archives are also searchable: >>>>> >>>>> http://news.gmane.org/gmane.comp.lang.perl.bio.general >>>>> >>>>> >>>>> Please let me know if you have any suggestion for me to keep learning the >>>>>> bioperl. >>>>> >>>>> >>>>> I would also suggest reading the (current) tutorial and HOWTOs at >>>>> www.bioperl.org . Lots of good links on the main page there, particularly >>>>> under the Documentation heading. >>>>> >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> With regards, >>>>>> Ravi. >>>>>> >>>>>> >>>>>> 2011/3/28 Dave Messina >>>>>> >>>>>>> Hi Ravi, >>>>>>> >>>>>>> Please make sure to "Reply All" so that everyone on the mailing list can >>>>>>> follow (and add to) the discussion. >>>>>>> >>>>>>> If you read the first line of the exception, you'll see it states what the >>>>>>> error is: >>>>>>> "WebDBSeqI Error ? check query sequences!" >>>>>>> >>>>>>> You'd have no way of knowing this, but that ID and database combination is >>>>>>> not functioning anymore, so that's why in this case you're getting an error. >>>>>>> Please try using the example in the tutorial here: >>>>>>> >>>>>>> >>>>>>> http://www.bioperl.org/wiki/BioPerl_Tutorial#Quick_getting_started_scripts >>>>>>> >>>>>>> which has been updated to a different ID which should work. >>>>>>> >>>>>>> Sorry for the confusion! So that we can prevent other people from having >>>>>>> the same issue, could you tell me where you got that example code? >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2011/3/28 ?????????????????? >>>>>>> >>>>>>>> Hi Dave, >>>>>>>> >>>>>>>> Thanks a lot for your reply. It is really helpful. Please find the >>>>>>>> screenshot after making the change you pointed out. But I am getting >>>>>>>> "Exception: Bio::Root::Exception" error. I think I have to set the >>>>>>>> environment variables but I am not sure how to do that. Could you please >>>>>>>> guide me on this too. >>>>>>>> >>>>>>>> I can go to the "Environment Variable" Window in my pc. But I dont know >>>>>>>> what to enter once I click "New" on that window. >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> With regards, >>>>>>>> Ravi. >>>>>>>> >>>>>>>> >>>>>>>> 2011/3/27 Dave Messina >>>>>>>> >>>>>>>>> Hi Ravi, >>>>>>>>> >>>>>>>>> The get_sequence and write_sequence methods are in the Bio::Perl module, >>>>>>>>> not Bio::Seq. So your first line >>>>>>>>> >>>>>>>>> use Bio::Seq; >>>>>>>>> >>>>>>>>> should be replaced with >>>>>>>>> >>>>>>>>> use Bio::Perl; >>>>>>>>> >>>>>>>>> >>>>>>>>> More examples in the BioPerl Tutorial here: >>>>>>>>> http://www.bioperl.org/wiki/BioPerl_Tutorial >>>>>>>>> >>>>>>>>> >>>>>>>>> Dve >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2011/3/24 ?????????????????? >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Could you please help me installing bioperl-db, bioperl-run & other >>>>>>>>>> packages >>>>>>>>>> using ppm on windows 7? Please find the attachment for the error >>>>>>>>>> message I >>>>>>>>>> get. I would really appreciate if you help me fix this issue. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Ravi. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Tue Apr 19 13:27:55 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 19 Apr 2011 13:27:55 -0400 Subject: [Bioperl-l] nucleotide sequences from peptide id Message-ID: Hi, i have thousands of peptide ids and i want their corresponding nucleotide sequences. Is there any way i can convert those peptide id to nucleotide ids , because then i can use Bio::DB::Genbank to get the sequences , or is their any other direct way to achieve this. I would really appreciate if anyone can help me out. Thanks Shalabh -- From David.Messina at sbc.su.se Tue Apr 19 13:20:36 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 19 Apr 2011 19:20:36 +0200 Subject: [Bioperl-l] Retaining seqnos from clustal output in AlignIO In-Reply-To: References: Message-ID: Hi Lee, I thought there was a way to do this, but looks like we have an option to AlignIO->new() to do the opposite: -displayname_flat => 1 [optional] to force the displayname to not show start/end information Since the syntax id/start-end is special, and I believe actually rewritten depending on how the sequence or alignment is manipulated, you might try keeping your strand and coordinate metadata separate from the id ? for example, in the description field (hacky though it may be). Sequences like these are LocatableSeqs in BioPerl. Here are details: http://doc.bioperl.org/bioperl-live/Bio/LocatableSeq.html Dave On Mon, Apr 18, 2011 at 17:25, Lee Katz wrote: > Hi, I have a large Clustal alignment but I want to sort the sequences and > retain the coordinate numbers on the sides. I've also hacked in strand > information into the sequence name, but that's another story. > > How can I use an "in" alignio and an "out" alignio and still retain the > coordinate numbers? I can't find an option to keep them. Thanks. > > SeqYYY.fna/1/232361-241520 > TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 > SeqZZZ.fna/1/233928-243087 > TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 > AnotherSeq.fna/1/1-10089 > TTCGGCTTAAACCTTATCCATATCCAAACGCATAACCGTAACCCATTCAC 50 > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Tue Apr 19 14:23:59 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 19 Apr 2011 20:23:59 +0200 Subject: [Bioperl-l] nucleotide sequences from peptide id In-Reply-To: References: Message-ID: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#I_want_a_list_of_database_.27x.27_UIDs_that_are_linked_from_a_list_of_database_.27y.27_UIDs . There usually isn't a single nucleotide ID that corresponds to a peptide ID, although the RefSeq database tries to do this (and Ensembl, sorta). Easiest might be to download the peptide records and pull out whatever nuc id crossreferences you prefer. Dave On Tue, Apr 19, 2011 at 19:27, shalabh sharma wrote: > Hi, > i have thousands of peptide ids and i want their corresponding > nucleotide sequences. > Is there any way i can convert those peptide id to nucleotide ids , because > then i can use Bio::DB::Genbank to get the sequences , or is their any > other > direct way to achieve this. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > -- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From kellert at ohsu.edu Tue Apr 19 18:21:42 2011 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 19 Apr 2011 15:21:42 -0700 Subject: [Bioperl-l] Bio::Biblio::MedlineArticle.pm Message-ID: Greetings, I'm trying to parse a file of medline records (and out of practice). The code following gives the error: $ bp_medline.pls ref01.txt Odd number of elements in hash assignment at /Users/kellert/perl5/bioperl/Bio/Biblio/BiblioBase.pm line 154. $MeshHeadings = undef; ### code snippet bp_medline.pls ### use Bio::Biblio::MedlineArticle; my $obj = Bio::Biblio::MedlineArticle->new(-mesh_headings => ); print Data::Dumper->Dump( [$obj->mesh_headings], ['MeshHeadings'] ); ######## I'm missing something, in creating my object, but I can't tell from the pod what. Can someone point out my error? thanks, Tom MMI DNA Services Core Facility 503-494-2442 kellert at ohsu.edu Office: 6588 RJH (CROET/BasicScience) From Kevin.M.Brown at asu.edu Tue Apr 19 18:41:10 2011 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 19 Apr 2011 15:41:10 -0700 Subject: [Bioperl-l] Bio::Biblio::MedlineArticle.pm In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B407938EDF@EX02.asurite.ad.asu.edu> my $obj = Bio::Biblio::MedlineArticle->new(-mesh_headings => ); Note the lack of anything after the "=>" You either need to remove that flag or finish filling it out with a variable or string or hash or whatever it is supposed to be. Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Tom Keller > Sent: Tuesday, April 19, 2011 3:22 PM > To: BioPerl-List > Subject: [Bioperl-l] Bio::Biblio::MedlineArticle.pm > > Greetings, > I'm trying to parse a file of medline records (and out of practice). > The code following gives the error: > $ bp_medline.pls ref01.txt > Odd number of elements in hash assignment at > /Users/kellert/perl5/bioperl/Bio/Biblio/BiblioBase.pm line 154. > $MeshHeadings = undef; > > > ### code snippet bp_medline.pls ### > use Bio::Biblio::MedlineArticle; > > my $obj = Bio::Biblio::MedlineArticle->new(-mesh_headings => ); > > print Data::Dumper->Dump( [$obj->mesh_headings], ['MeshHeadings'] ); > > ######## > > I'm missing something, in creating my object, but I can't tell from the > pod what. Can someone point out my error? > > thanks, > > Tom > MMI DNA Services Core > Facility > 503-494-2442 > kellert at ohsu.edu > Office: 6588 RJH (CROET/BasicScience) > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lberna at adinet.com.uy Wed Apr 20 05:46:17 2011 From: lberna at adinet.com.uy (lberna at adinet.com.uy) Date: Wed, 20 Apr 2011 06:46:17 -0300 (UYT) Subject: [Bioperl-l] bootstrap replicates Message-ID: <20904900.1303292777834.JavaMail.tomcat@fe-ps05> Hi, I am new user of bioperl, I am trying to perform a phylogeny with replicate, but I do not understand how to use the bootsrap_replicate module. this is my code, with out the boostrap, and works #!/usr/bin/perl -w use Bio::AlignIO; use Bio::Align::DNAStatistics; use Bio::Tree::DistanceFactory; use Bio::TreeIO; use Bio::Align::Utilities qw(:all); # for a dna alignment my $alnio = Bio::AlignIO->new(-file => $ARGV[0], -format=>'clustalw'); my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ'); my $stats = Bio::Align::DNAStatistics->new; my $treeout = Bio::TreeIO->new(-format => 'newick'); while(my $aln = $alnio->next_aln ) { my $mat = $stats->distance(-method => 'Kimura', -align => $aln); my $tree = $dfactory->make_tree($mat); $treeout->write_tree($tree); } print $treeout; -------------------------------------------------------------------- but I want to incorporate the bootstrap, all form that I tried doesn't work... #use Bio::Align::Utilities qw(:all); #my $replicates = bootstrap_replicates($aln,100); Can somebody help me?? thanks a lot! Luisa From abhishek.vit at gmail.com Wed Apr 20 13:10:48 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 20 Apr 2011 10:10:48 -0700 Subject: [Bioperl-l] From Blast hits to Taxanomy lineage for Short DNA Sequences (reads) In-Reply-To: <4D77466F.4040604@uv.es> References: <4D77466F.4040604@uv.es> Message-ID: Hi Miguel I have started using Bio::LITE::Taxonomy for fetching lineage and for a batch process this method is darn fast which is great. Just wondering can we extract classification from NCBI accession also. In some cases I dont get a GI to query and only have NCBI Accession to get the classification. Thanks! -Abhi On Wed, Mar 9, 2011 at 1:20 AM, Miguel Pignatelli wrote: > Hi Abhishek, > > For a non bioperl related solution, take a look at Bio::LITE::Taxonomy. > It has been design to deal with great number of sequences (it is fast and > efficient). > > You may also find interesting the Blast2lca tool, > > https://github.com/emepyc/Blast2lca > > It currently works with the best hits for each query (calculates the lower > common ancestor), but if you want to use only the best hit, please drop me a > line. > > Please, let me know if you need further help with any of these, > > Cheers, > > M; > > > > > On 08/03/11 22:42, Abhishek Pratap wrote: > >> Hi All >> >> I have results from different megablast of short reads(DNA sequences) >> and after extracting the tophit for each read I want to bin them by >> their lineage creating a tree. >> >> For example. >> >> If blast query hits the reference -> >> >> gi|196110604|gb|CP001103.1|__Alteromonas_macleodii_'Deep_ecotype',_complete_genome >> >> I want to get the lineage for this specie. >> >> >> Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae;Alteromonas;Alteromona >> >> The final goal is to do the above mapping as efficiently as possible. >> Any pointers will be appreciated. >> >> >> Thanks! >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From abhishek.vit at gmail.com Wed Apr 20 13:14:04 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 20 Apr 2011 10:14:04 -0700 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included In-Reply-To: References: Message-ID: Hi John Thanks for your reply. I think your code does exactly what I needed. In batch cases it runs a bit slow but still able to fetch lineage using NCBI Accession. FYI : I love the speed of Bio::LITE::Taxonomy but it only works on NCBI GI. So I am wondering if I can convert Accession -> GI quickly then I can get my results far more faster. -Abhi On Mon, Apr 18, 2011 at 10:30 AM, John SJ Anderson wrote: > On Fri, Apr 15, 2011 at 18:39, Abhishek Pratap > wrote: > > > I want to seach NCBI for Locus name "CP000490" and get the organism > lineage > > Maybe this will at least get you started: > > #! /opt/perl/bin/perl > > use strict; > use warnings; > use 5.010; > > use Bio::DB::GenBank; > > my @ids = qw( NW_001884661 EZ361133 CP000490 ); > my $gbh = Bio::DB::GenBank->new(); > > foreach my $id( @ids ) { > say "* ID: $id"; > my $seq = $gbh->get_Seq_by_acc( $id ); > my $org = $seq->species; > say join ' ' , $org->classification , "\n"; > } > > > j. > From shalabh.sharma7 at gmail.com Wed Apr 20 13:21:03 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 20 Apr 2011 13:21:03 -0400 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included In-Reply-To: References: Message-ID: Hey Abhishek, I think you can use this: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#I_want_a_list_of_database_.27x.27_UIDs_that_are_linked_from_a_list_of_database_.27y.27_UIDs Also there are few files maintained by refseq that can convert accession to GIs. -Shalabh On Wed, Apr 20, 2011 at 1:14 PM, Abhishek Pratap wrote: > Hi John > > Thanks for your reply. I think your code does exactly what I needed. In > batch cases it runs a bit slow but still able to fetch lineage using NCBI > Accession. > > FYI : I love the speed of Bio::LITE::Taxonomy but it only works on NCBI GI. > > So I am wondering if I can convert Accession -> GI quickly then I can get > my > results far more faster. > > -Abhi > > On Mon, Apr 18, 2011 at 10:30 AM, John SJ Anderson >wrote: > > > On Fri, Apr 15, 2011 at 18:39, Abhishek Pratap > > wrote: > > > > > I want to seach NCBI for Locus name "CP000490" and get the organism > > lineage > > > > Maybe this will at least get you started: > > > > #! /opt/perl/bin/perl > > > > use strict; > > use warnings; > > use 5.010; > > > > use Bio::DB::GenBank; > > > > my @ids = qw( NW_001884661 EZ361133 CP000490 ); > > my $gbh = Bio::DB::GenBank->new(); > > > > foreach my $id( @ids ) { > > say "* ID: $id"; > > my $seq = $gbh->get_Seq_by_acc( $id ); > > my $org = $seq->species; > > say join ' ' , $org->classification , "\n"; > > } > > > > > > j. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From j_martin at lbl.gov Wed Apr 20 16:15:36 2011 From: j_martin at lbl.gov (Joel Martin) Date: Wed, 20 Apr 2011 13:15:36 -0700 Subject: [Bioperl-l] [Quick help needed] Getting Organism info using NCBI Accession numbers : sample code included In-Reply-To: References: Message-ID: ftp://ftp.ncbi.nih.gov/genbank/README.genbank look in there for Genbank Livelists, for doing many lookups it will be much faster and kinder to ncbi if you use those. Joel On Wed, Apr 20, 2011 at 10:21 AM, shalabh sharma wrote: > Hey Abhishek, > ? ? ? ? ? ? ? ?I think you can use this: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#I_want_a_list_of_database_.27x.27_UIDs_that_are_linked_from_a_list_of_database_.27y.27_UIDs > > Also there are few files maintained by refseq that can convert accession to > GIs. > > -Shalabh > > > On Wed, Apr 20, 2011 at 1:14 PM, Abhishek Pratap wrote: > >> Hi John >> >> Thanks for your reply. I think your code does exactly what I needed. In >> batch cases it runs a bit slow but still able to fetch lineage using NCBI >> Accession. >> >> FYI : I love the speed of Bio::LITE::Taxonomy but it only works on NCBI GI. >> >> So I am wondering if I can convert Accession -> GI quickly then I can get >> my >> results far more faster. >> >> -Abhi >> >> On Mon, Apr 18, 2011 at 10:30 AM, John SJ Anderson > >wrote: >> >> > On Fri, Apr 15, 2011 at 18:39, Abhishek Pratap >> > wrote: >> > >> > > I want to seach NCBI for Locus name "CP000490" and get the organism >> > lineage >> > >> > Maybe this will at least get you started: >> > >> > #! /opt/perl/bin/perl >> > >> > use strict; >> > use warnings; >> > use 5.010; >> > >> > use Bio::DB::GenBank; >> > >> > my @ids = qw( NW_001884661 EZ361133 CP000490 ); >> > my $gbh = Bio::DB::GenBank->new(); >> > >> > foreach my $id( @ids ) { >> > ?say "* ID: $id"; >> > ?my $seq = $gbh->get_Seq_by_acc( $id ); >> > ?my $org = $seq->species; >> > ?say join ' ' , $org->classification , "\n"; >> > } >> > >> > >> > j. >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Apr 21 16:07:39 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Apr 2011 15:07:39 -0500 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl-DB, Run, Network 1.6.900 released Message-ID: <123E2170-7E62-475E-B8CA-371B0E7C91CA@illinois.edu> All, The latest BioPerl-DB, BioPerl-Run, and BioPerl-Network code has been released to CPAN: http://search.cpan.org/dist/BioPerl-Run/ http://search.cpan.org/dist/BioPerl-DB/ http://search.cpan.org/dist/BioPerl-Network/ Please report any bugs to our Redmine server: https://redmine.open-bio.org/ Enjoy! chris From lmrodriguezr at gmail.com Sat Apr 23 10:39:24 2011 From: lmrodriguezr at gmail.com (=?ISO-8859-1?Q?Luis=2DMiguel_Rodr=EDguez_Rojas?=) Date: Sat, 23 Apr 2011 16:39:24 +0200 Subject: [Bioperl-l] ESummary document Message-ID: Hello, Does anyone now how can I initialize a Bio::Tools::EUtilities::Summary::DocSum object from a DocSum file? Say, a script contains the following code: my $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', ...); $factory->get_Response(-file=>$file); Which effectively saves an eSummaryResult XML file in $file. If I want to retrieve the result from within the script, it is easy: my $docsum = $factory->next_DocSum; But, what about reading the file from another script? I did not find related documentation. Thanks, and congrats for the recent release. LRR -- Luis M. Rodriguez-R [ http://thebio.me/lrr ] --------------------------------- UMR R?sistance des Plantes aux Bioagresseurs - Group effecteur/cible Institut de Recherche pour le D?veloppement, Montpellier, France [ http://bioinfo-prod.mpl.ird.fr/xantho | Luismiguel.Rodriguez at ird.fr ] +33 (0) 6.29.74.55.93 Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Bogot?, Colombia [ http://lamfu.uniandes.edu.co | luisrodr at uniandes.edu.co ] +57 (1) 3.39.49.49 ext 2777 From chet.seligman at gmail.com Sat Apr 23 10:53:34 2011 From: chet.seligman at gmail.com (Chet Seligman) Date: Sat, 23 Apr 2011 07:53:34 -0700 Subject: [Bioperl-l] Leave-one-out cross-correlation Message-ID: Any recommendations on which modules to use for this? I want to correlate a selected gene based on GO categories. From cjfields at illinois.edu Sat Apr 23 14:09:54 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 23 Apr 2011 13:09:54 -0500 Subject: [Bioperl-l] ESummary document In-Reply-To: References: Message-ID: <73AA4C7F-D8A3-4236-870A-FCB4F96D76E8@illinois.edu> Luis, You should be able to use the Bio::Tools::EUtilities parsers directly: # from Bio::DB::EUtilities $factory->get_Response(-file=>$file); my $parser = Bio::Tools::EUtilities->new( -eutil => 'einfo', -file => $file); while (my $docsum = $parser->next_DocSum) { ... } There are some small caveats to this, but basically 95% or more of Bio::DB::EUtilities is just glue for submitting queries and passing the response output to the proper Bio::Tools::EUtilities parsers. chris On Apr 23, 2011, at 9:39 AM, Luis-Miguel Rodr?guez Rojas wrote: > Hello, > > Does anyone now how can I initialize a > Bio::Tools::EUtilities::Summary::DocSum object from a DocSum file? > > Say, a script contains the following code: > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', ...); > $factory->get_Response(-file=>$file); > > Which effectively saves an eSummaryResult XML file in $file. If I want to > retrieve the result from within the script, it is easy: > > my $docsum = $factory->next_DocSum; > > But, what about reading the file from another script? I did not find > related documentation. > > Thanks, and congrats for the recent release. > LRR > > -- > Luis M. Rodriguez-R > [ http://thebio.me/lrr ] > --------------------------------- > UMR R?sistance des Plantes aux Bioagresseurs - Group effecteur/cible > Institut de Recherche pour le D?veloppement, Montpellier, France > [ http://bioinfo-prod.mpl.ird.fr/xantho | Luismiguel.Rodriguez at ird.fr ] > +33 (0) 6.29.74.55.93 > > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Bogot?, Colombia > [ http://lamfu.uniandes.edu.co | luisrodr at uniandes.edu.co ] > +57 (1) 3.39.49.49 ext 2777 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lmrodriguezr at gmail.com Sat Apr 23 21:11:25 2011 From: lmrodriguezr at gmail.com (=?ISO-8859-1?Q?Luis=2DMiguel_Rodr=EDguez_Rojas?=) Date: Sun, 24 Apr 2011 03:11:25 +0200 Subject: [Bioperl-l] ESummary document In-Reply-To: <73AA4C7F-D8A3-4236-870A-FCB4F96D76E8@illinois.edu> References: <73AA4C7F-D8A3-4236-870A-FCB4F96D76E8@illinois.edu> Message-ID: It works perfectly, Thanks! -- Luis M. Rodriguez-R [ http://thebio.me/lrr ] --------------------------------- UMR R?sistance des Plantes aux Bioagresseurs - Group effecteur/cible Institut de Recherche pour le D?veloppement, Montpellier, France [ http://bioinfo-prod.mpl.ird.fr/xantho | Luismiguel.Rodriguez at ird.fr ] +33 (0) 6.29.74.55.93 Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Bogot?, Colombia [ http://lamfu.uniandes.edu.co | luisrodr at uniandes.edu.co ] +57 (1) 3.39.49.49 ext 2777 2011/4/23 Chris Fields > Luis, > > You should be able to use the Bio::Tools::EUtilities parsers directly: > > # from Bio::DB::EUtilities > $factory->get_Response(-file=>$file); > > my $parser = Bio::Tools::EUtilities->new( > -eutil => 'einfo', > -file => $file); > > while (my $docsum = $parser->next_DocSum) { ... } > > There are some small caveats to this, but basically 95% or more of > Bio::DB::EUtilities is just glue for submitting queries and passing the > response output to the proper Bio::Tools::EUtilities parsers. > > chris > > On Apr 23, 2011, at 9:39 AM, Luis-Miguel Rodr?guez Rojas wrote: > > > Hello, > > > > Does anyone now how can I initialize a > > Bio::Tools::EUtilities::Summary::DocSum object from a DocSum file? > > > > Say, a script contains the following code: > > > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', ...); > > $factory->get_Response(-file=>$file); > > > > Which effectively saves an eSummaryResult XML file in $file. If I want > to > > retrieve the result from within the script, it is easy: > > > > my $docsum = $factory->next_DocSum; > > > > But, what about reading the file from another script? I did not find > > related documentation. > > > > Thanks, and congrats for the recent release. > > LRR > > > > -- > > Luis M. Rodriguez-R > > [ http://thebio.me/lrr ] > > --------------------------------- > > UMR R?sistance des Plantes aux Bioagresseurs - Group effecteur/cible > > Institut de Recherche pour le D?veloppement, Montpellier, France > > [ http://bioinfo-prod.mpl.ird.fr/xantho | Luismiguel.Rodriguez at ird.fr ] > > +33 (0) 6.29.74.55.93 > > > > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > > Universidad de Los Andes, Bogot?, Colombia > > [ http://lamfu.uniandes.edu.co | luisrodr at uniandes.edu.co ] > > +57 (1) 3.39.49.49 ext 2777 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mbelgin at gmail.com Mon Apr 25 14:02:41 2011 From: mbelgin at gmail.com (Mehmet) Date: Mon, 25 Apr 2011 14:02:41 -0400 Subject: [Bioperl-l] Metadata file problems installing BioPerl Message-ID: Hi Everyone, I am trying to install BioPerl following the instructions on the wiki page. I tried both options, and both failed. I will really appreciate your suggestions. We are using a linux cluster with RHEL5. I upgraded Perl to 5.12.3 and I have sudo access (my problems are not related to access permissions) * WIth the Build.PL method, I am getting: Could not get valid metadata. Error is: Invalid metadata structure. Errors: 'Perl_5' for 'license' does not have a URL scheme (resources -> license) [Validation: 1.2], Expected a map structure from string or file. (optional_features -> Bio::FeatureIO::gff -> requires) [Validation: 1.2], Expected a map structure from string or file. (optional_features -> Bio::WebAgent -> ......... ..... ..... Followed by tons of similar "Expected a map structure from ....." messages. * With direct cpan install, I get: CPAN.pm: Going to build C/CJ/CJFIELDS/BioPerl-run-1.6.1.tar.gz BioPerl minimal core version 1.006000 is required for BioPerl-run Warning: No success on command[/panfs/ iw-scratch.pace.gatech.edu/iw.usr.local/packages/perl-5.12.3/bin/perlBuild.PL ] CJFIELDS/BioPerl-run-1.6.1.tar.gz /panfs/ iw-scratch.pace.gatech.edu/iw.usr.local/packages/perl-5.12.3/bin/perlBuild.PL -- NOT OK Running Build test Make had some problems, won't test Running Build install Make had some problems, won't install Could not read metadata file. Falling back to other methods to determine prerequisites Failed during this command: CJFIELDS/BioPerl-run-1.6.1.tar.gz : writemakefile NO '/panfs/ iw-scratch.pace.gatech.edu/iw.usr.local/packages/perl-5.12.3/bin/perlBuild.PL ' returned status 512 Any thoughts? What am I doing wrong here? Thanks a lot in advance! -Memo -- ========================================= Mehmet Belgin, Ph.D. (mehmet.belgin at oit.gatech.edu) Scientific Computing Consultant | OIT - Academic and Research Technologies Georgia Institute of Technology 258 Fourth Street, Rich Building, Room 326 Atlanta, GA 30332-0700 Office: (404) 385-2676 From rmb32 at cornell.edu Mon Apr 25 17:42:48 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 25 Apr 2011 14:42:48 -0700 Subject: [Bioperl-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4DB5EAD8.1020905@cornell.edu> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 6 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Justinas Vygintas Daugmaudis Michele dos Santos da Silva (2 students!) Mocapy++Biopython: from data to probabilistic models of biomolecules mentored by Thomas Hamelryck and Eric Talevich Chuan Hock Koh BioJava - Amino acids physico-chemical properties calculation mentored by Peter Troshin, Andreas Prlic, and Jay Vyas Micha? Koziarski Representing bio-objects and related information with images (BioRuby) mentored by Raoul J.P. Bonnal and Francesco Strozzi Sheena Scroggins Major BioPerl Reorganization mentored by Robert Buels and Chris Fields Mikael Eric Trellet Interface analysis module for BioPython mentored by Jo?o Rodrigues and Eric Talevich Once again this year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2011 Administrator From sheena.scroggins at gmail.com Wed Apr 27 01:17:26 2011 From: sheena.scroggins at gmail.com (Sheena Scroggins) Date: Tue, 26 Apr 2011 22:17:26 -0700 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project Message-ID: Hey everyone, I wanted to take a minute to introduce myself as one of the Google Summer of Code interns. I was the lucky one chosen to work on the BioPerl Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and somewhat new to this level of programming so bear with me as I learn the technical jargon. Luckily I have both Rob and Chris to mentor me this summer! Reading through the mailing list archives, I see there have been many discussion and differing opinions about tackling this project. Given the time frame for GSoC and my limited experience, there is no way I will complete this project on my own but I will at least be able to start it, which will hopefully motivate others to pitch in. So far, the plan for the GSoC project is to start by breaking out Bio::Root, followed by a couple other modules based on their dependencies and the time allowed. Each will be published to CPAN independently. You can follow the project (once it starts) on github at https://github.com/sheenams. I look forward to collaborating with many of you on the reorganization (hint hint)! Sheena From Graham.Hamilton at glasgow.ac.uk Wed Apr 27 06:43:42 2011 From: Graham.Hamilton at glasgow.ac.uk (Graham Hamilton) Date: Wed, 27 Apr 2011 11:43:42 +0100 Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker issues Message-ID: <6E04EFD5-9AF9-435B-8A9D-A0AA091B11C8@glasgow.ac.uk> Dear All I am writing a script to run RepeatMasker on several hundred thousand sequences of ~100 bp. I have installed RepeatMasker and have a script that will read in a single fasta sequence file and print out the masked sequence. use Bio::Tools::Run::RepeatMasker; use Bio::SeqIO; my $factory = Bio::Tools::Run::RepeatMasker->new(-species => 'human'); my $in = Bio::SeqIO->new(-file => "test.fa", -format => 'fasta'); my $seq = $in->next_seq(); $factory->run($seq); my $masked_seq = $factory->masked_seq; print $masked_seq->seq; The script works for sequences that contain repeats. Unfortunately, when I use a sequence without repeats I get the following error. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Cannot open RepeatMasker outfile for parsing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.12.3/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::RepeatMasker::_run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:308 STACK: Bio::Tools::Run::RepeatMasker::run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:260 STACK: RepeatMaskerTest.pl:22 ----------------------------------------------------------- As I want to screen a file of many short sequences, most will contain repeats but not all. I want to keep the sequences that do not contain repeats for further investigation. This is a problem for me as the Exception exits the script. I assume that I am doing something wrong, can anyone give me some hints as to how I can get this to work? Regards Graham Dr Graham Hamilton The Sir Henry Wellcome Functional Genomics Facility Room B3-28 Joseph Black Building Research Institute of Molecular, Cell & Systems Biology College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow Scotland G12 8QQ T: +44 141 330 6212 From Graham.Hamilton at glasgow.ac.uk Wed Apr 27 06:55:23 2011 From: Graham.Hamilton at glasgow.ac.uk (Graham Hamilton) Date: Wed, 27 Apr 2011 11:55:23 +0100 Subject: [Bioperl-l] [BioPerl]RepeatMasker issues Message-ID: <68110FE7-622B-43C7-9583-331EEB07BF07@glasgow.ac.uk> I am writing a script to run RepeatMasker on several hundred thousand sequences of ~100 bp. I have installed RepeatMasker and have a script that will read in a single fasta sequence file and print out the masked sequence. use Bio::Tools::Run::RepeatMasker; use Bio::SeqIO; my $factory = Bio::Tools::Run::RepeatMasker->new(-species => 'human'); my $in = Bio::SeqIO->new(-file => "test.fa", -format => 'fasta'); my $seq = $in->next_seq(); $factory->run($seq); my $masked_seq = $factory->masked_seq; print $masked_seq->seq; The script works for sequences that contain repeats. Unfortunately, when I use a sequence without repeats I get the following error. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Cannot open RepeatMasker outfile for parsing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.12.3/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::RepeatMasker::_run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:308 STACK: Bio::Tools::Run::RepeatMasker::run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:260 STACK: RepeatMaskerTest.pl:22 ----------------------------------------------------------- As I want to screen a file of many short sequences, most will contain repeats but not all. I want to keep the sequences that do not contain repeats for further investigation. This is a problem for me as the Exception exits the script. I assume that I am doing something wrong, can anyone give me some hints as to how I can get this to work? Regards Graham Dr Graham Hamilton The Sir Henry Wellcome Functional Genomics Facility Room B3-28 Joseph Black Building Research Institute of Molecular, Cell & Systems Biology College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow Scotland G12 8QQ T: +44 141 330 6212 From stephensr at mail.nih.gov Wed Apr 27 07:13:04 2011 From: stephensr at mail.nih.gov (Stephens, Robert (NIH/NCI) [C]) Date: Wed, 27 Apr 2011 07:13:04 -0400 Subject: [Bioperl-l] Adding mouseover functionality to bioperl graphics images Message-ID: Hello, I would like to draw models of some genomic fragments and include some mouseover and click functionality similar to gbrowse balloon popups without building an individual gbrowse config file each time - is there a way to do this with bioperl graphics module or alternatively avoid or construct a generic config file for gbrowse ? Thanks, Bob From sdavis2 at mail.nih.gov Wed Apr 27 07:39:26 2011 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 27 Apr 2011 07:39:26 -0400 Subject: [Bioperl-l] Adding mouseover functionality to bioperl graphics images In-Reply-To: References: Message-ID: On Wed, Apr 27, 2011 at 7:13 AM, Stephens, Robert (NIH/NCI) [C] wrote: > Hello, > > I would like to draw models of some genomic fragments and include some > mouseover and click functionality similar to gbrowse balloon popups without > building an individual gbrowse config file each time - is there a way to do > this with bioperl graphics module or alternatively avoid or construct a > generic config file for gbrowse ? Hi, Bob. Have a look here: http://search.cpan.org/~lds/Bio-Graphics-2.21/lib/Bio/Graphics/Panel.pm#Creating_Imagemaps Sean From roy.chaudhuri at gmail.com Wed Apr 27 07:45:17 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 27 Apr 2011 12:45:17 +0100 Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker issues In-Reply-To: <6E04EFD5-9AF9-435B-8A9D-A0AA091B11C8@glasgow.ac.uk> References: <6E04EFD5-9AF9-435B-8A9D-A0AA091B11C8@glasgow.ac.uk> Message-ID: <4DB801CD.1080208@gmail.com> Hi Graham, Sorry, I don't know much about RepeatMasker, but maybe you could try wrapping the run in an eval?: eval {$factory->run($seq)} if ($@) {warn $@} else {print $factory->masked_seq->seq} Cheers, Roy. On 27/04/2011 11:43, Graham Hamilton wrote: > Dear All > > I am writing a script to run RepeatMasker on several hundred thousand > sequences of ~100 bp. I have installed RepeatMasker and have a script > that will read in a single fasta sequence file and print out the > masked sequence. > > use Bio::Tools::Run::RepeatMasker; use Bio::SeqIO; > > my $factory = Bio::Tools::Run::RepeatMasker->new(-species => > 'human'); my $in = Bio::SeqIO->new(-file => "test.fa", -format => > 'fasta'); my $seq = $in->next_seq(); $factory->run($seq); my > $masked_seq = $factory->masked_seq; print $masked_seq->seq; > > The script works for sequences that contain repeats. Unfortunately, > when I use a sequence without repeats I get the following error. > > ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: > Cannot open RepeatMasker outfile for parsing STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/lib/perl5/site_perl/5.12.3/Bio/Root/Root.pm:472 STACK: > Bio::Tools::Run::RepeatMasker::_run > /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:308 > > STACK: Bio::Tools::Run::RepeatMasker::run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:260 > STACK: RepeatMaskerTest.pl:22 > ----------------------------------------------------------- > > As I want to screen a file of many short sequences, most will contain > repeats but not all. I want to keep the sequences that do not contain > repeats for further investigation. This is a problem for me as the > Exception exits the script. > > I assume that I am doing something wrong, can anyone give me some > hints as to how I can get this to work? > > Regards > > Graham > > > Dr Graham Hamilton The Sir Henry Wellcome Functional Genomics > Facility Room B3-28 Joseph Black Building Research Institute of > Molecular, Cell& Systems Biology College of Medical, Veterinary& > Life Sciences University of Glasgow Glasgow Scotland G12 8QQ > > T: +44 141 330 6212 > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Wed Apr 27 11:31:26 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 27 Apr 2011 08:31:26 -0700 Subject: [Bioperl-l] [BioPerl]RepeatMasker issues In-Reply-To: <68110FE7-622B-43C7-9583-331EEB07BF07@glasgow.ac.uk> References: <68110FE7-622B-43C7-9583-331EEB07BF07@glasgow.ac.uk> Message-ID: I guess it fails to produce a .out file not an empty .out file if there are no repeats, not sure if this is intended or a bug in RM. BTW - your approach to use this module will likely be pretty inefficient as you will initiate 100,000 runs while you could just run repeatmasker on a single fasta file of your 100bp seqs as a single instance (or some smaller chunks that you farm out to the cluster). Then you just need to parse one repeatmasker output file (or concatenate all the ones from the chunks you sent off to the cluster). This seems much more sane than this one at a time approach. However, If you want to still do it this way, you need to fix the module so that it doesn't throw on empty file - at around line 260 you can change that throw to a warn and return from the function. e.g. in RepeatMasker.pm you could change this: unless (open (RM, $outfile)) { $self->throw("Cannot open RepeatMasker outfile for parsing"); } to something like this unless (open (RM, $outfile)) { $self->warn("Cannot open RepeatMasker outfile for parsing"); return; } You should also be checking to see that the run succeeded in your code my @rpt_features = $factory->run($seq) if( @rpt_features ) { # test that there are > 0 elements, then we have repeats to report print $factory->masked_seq->seq, "\n"; } -- or just run RepeatMasker on the cmdline On Apr 27, 2011, at 3:55 AM, Graham Hamilton wrote: > I am writing a script to run RepeatMasker on several hundred thousand sequences of ~100 bp. I have installed RepeatMasker and have a script that will read in a single fasta sequence file and print out the masked sequence. > > use Bio::Tools::Run::RepeatMasker; > use Bio::SeqIO; > > my $factory = Bio::Tools::Run::RepeatMasker->new(-species => 'human'); > my $in = Bio::SeqIO->new(-file => "test.fa", > -format => 'fasta'); > my $seq = $in->next_seq(); > $factory->run($seq); > my $masked_seq = $factory->masked_seq; > print $masked_seq->seq; > > The script works for sequences that contain repeats. Unfortunately, when I use a sequence without repeats I get the following error. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Cannot open RepeatMasker outfile for parsing > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.12.3/Bio/Root/Root.pm:472 > STACK: Bio::Tools::Run::RepeatMasker::_run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:308 > STACK: Bio::Tools::Run::RepeatMasker::run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:260 > STACK: RepeatMaskerTest.pl:22 > ----------------------------------------------------------- > > As I want to screen a file of many short sequences, most will contain repeats but not all. I want to keep the sequences that do not contain repeats for further investigation. This is a problem for me as the Exception exits the script. > > I assume that I am doing something wrong, can anyone give me some hints as to how I can get this to work? > > Regards > > Graham > > > Dr Graham Hamilton > The Sir Henry Wellcome Functional Genomics Facility > Room B3-28 > Joseph Black Building > Research Institute of Molecular, Cell & Systems Biology > College of Medical, Veterinary & Life Sciences > University of Glasgow > Glasgow > Scotland > G12 8QQ > > T: +44 141 330 6212 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abhishek.vit at gmail.com Wed Apr 27 14:32:58 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 27 Apr 2011 11:32:58 -0700 Subject: [Bioperl-l] Handling hierarchical phylogeny based data in bio-/perl Message-ID: Hi Guys I have lineage for many contigs blasted to nt dbase. The goal is to arrange them in a hierarchical data structure something like hash of hash and also store some other ancillary data like contig names for each bin and coverage etc. For example if my input is from a tsv file with lineage as one column and others like contig name, coverage etc Eukaryota Viridiplantae Streptophyta Embryophyta Tracheophyta Spermatophyta Magnoliophyta eudicotyledons core_eudicotyledons Eukaryota Viridiplantae Streptophyta Streptophytina Charophyceae Charales Characeae Chara Eukaryota Viridiplantae Streptophyta Streptophytina Embryophyta then I would like to store data as follows Eukaryota -> count = 3 Eukaryota -> coverage = 6.3 Eukaryota->Viridplantae->count=3 Eukaryota->Viridplantae->coverage=4.3 Eukaryota->Viridplantae->Streptophyta->count=3 Eukaryota->Viridplantae->Streptophyta->coverage2=2.3 -------etc I could create such hash explicitly but it is a tiring process as num of words on each line(lineage) increases I have to keep on increasing my data structure manually. Also all lines(lineage) wont have same number of words. Also I would like to print such a tree with count/coverage information associated for each bin. Wondering if I can use some Tree based built in capability of perl/bio-perl to do this. I did have a look at http://bioperl.org/wiki/HOWTO:Trees but I dont think I could find example to read from tsv file and create a data structure where I am also storing count/coverage for each bin. Any pointers will help. Best, -Abhi From cjfields at illinois.edu Wed Apr 27 15:35:43 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Apr 2011 14:35:43 -0500 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: References: Message-ID: Sheena, Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine): 1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions. Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation. For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on: 1st round (v 1.9) : BioPerl (just an installer) -> installs root, min-deps, extra-deps 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, other-min-deps, extra-deps ... Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, tree, align, coord, map, everything-else ... Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance). Of course I would only pick the easiest route to start, but maybe that gives some ideas. Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). 2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core. Beyond that, what are the proposed splits? (we've discussed this on-list before, but it's appropriate to bring this up again) 3) How do we want to handle versioning? We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart. Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN. 5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0. 6) In some cases I can see particularly thorny problems, such as circular dependencies. I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful. 7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code? Other thoughts? chris On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > Hey everyone, > > I wanted to take a minute to introduce myself as one of the Google Summer of > Code interns. I was the lucky one chosen to work on the BioPerl > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and > somewhat new to this level of programming so bear with me as I learn the > technical jargon. Luckily I have both Rob and Chris to mentor me this > summer! > > Reading through the mailing list archives, I see there have been many > discussion and differing opinions about tackling this project. Given the > time frame for GSoC and my limited experience, there is no way I will > complete this project on my own but I will at least be able to start it, > which will hopefully motivate others to pitch in. So far, the plan for the > GSoC project is to start by breaking out Bio::Root, followed by a couple > other modules based on their dependencies and the time allowed. Each will be > published to CPAN independently. You can follow the project (once it starts) > on github at https://github.com/sheenams. > > I look forward to collaborating with many of you on the reorganization (hint > hint)! > > Sheena > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioinfo.khush at gmail.com Thu Apr 28 01:22:46 2011 From: bioinfo.khush at gmail.com (khush ........) Date: Thu, 28 Apr 2011 10:52:46 +0530 Subject: [Bioperl-l] Standalone blast Message-ID: Hi, I have some sequences ~250 and wanted to use BLASTX to blast against nr database of NCBI, as this is time consuming using web based search. Can some one please tell me how to start BIOPERL with scuh problems. I know that this is possible with bioperl, but do not know how. Any suggestion will be appreciable. Thanks in advance Kamal From David.Messina at sbc.su.se Thu Apr 28 03:29:11 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 28 Apr 2011 09:29:11 +0200 Subject: [Bioperl-l] Standalone blast In-Reply-To: References: Message-ID: Hi Kamal, This is covered in the beginners' HOWTO: http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST Dave On Thu, Apr 28, 2011 at 07:22, khush ........ wrote: > Hi, > > I have some sequences ~250 and wanted to use BLASTX to blast against nr > database of NCBI, as this is time consuming using web based search. Can > some > one please tell me how to start BIOPERL with scuh problems. I know that > this > is possible with bioperl, but do not know how. > > Any suggestion will be appreciable. > > Thanks in advance > Kamal > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu Apr 28 04:03:56 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 28 Apr 2011 10:03:56 +0200 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: References: Message-ID: First of all, welcome Sheena! Awesome that you'll be working on this. 2) The Bio::Root modules are probably the true core modules and are the most > stable with regards to changes, so those could be moved to something like > BioPerl-Core. Beyond that, what are the proposed splits? (we've discussed > this on-list before, but it's appropriate to bring this up again) > I believe the last version of the proposed splits is here: http://www.bioperl.org/wiki/Proposed_BioPerl_changes Looking at it now, it seems like the "pieces" are a little too big. For example, I think maybe Bio::Tools::* should be separate from core. > 3) How do we want to handle versioning? We can't (and probably shouldn't) > release everything on a synchronized versioning scheme (via > Bio::Root::Version, for instance), that'll quickly fall apart. Personally I > can foresee each split-off dist having it's own version, with the BioPerl > network of modules being in effect it's own mini-CPAN. > I think each sub-dist should have its own version. > 5) Related to versioning, in my opinion we should maybe aim on eventually > calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. > Lincoln has already done something like this with Bio::Graphics, which was > originally part of BioPerl but split off prior to v 1.6.0. > Yep. > 6) In some cases I can see particularly thorny problems, such as circular > dependencies. I can think of a few ways to address that (creating a simple > lightweight Bio::Species class as a fallback if Bio::Tree code isn't > present, for instance), but any additional thoughts on this would be > helpful. > One way to look into this would be to use Class::Inspector to check dependencies. Or see Rob's find_mod_deps.pl in /maintenance. But more to the point, yes, once the circularities are identified, there should be graceful, well-defined behavior that breaks the circularity requirement. > 7) Do we want to set up something like 'git submodule' for the devs to pull > down all BioPerl-relevant code? I don't know enough about the implications of this to comment. > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > I look forward to collaborating with many of you on the reorganization > (hint > > hint)! > > Yes! Definitely don't hesitate to call for help on the list. Once you get going and have some specific tasks laid out, that'll make it easier for people to jump in on them. Dave From sheena.scroggins at gmail.com Thu Apr 28 15:53:49 2011 From: sheena.scroggins at gmail.com (Sheena Scroggins) Date: Thu, 28 Apr 2011 12:53:49 -0700 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: References: Message-ID: Chris, We haven't talked much about the versioning yet, but it will be on the list to figure out asap. So far, the plan is to split out Bio::Root first, followed by a couple modules that depend only on Bio::Root. The plan I proposed was Bio::Das, Bio::Event then Bio::Location. Depending on how much time is remaining for the GSoC project, the next to split out would be Bio::Factory and Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan to still help with the reorganization after the internship is over, but I obviously have to have a stopping point for the GSoC project. Rob provide me with a really nice scrip to list dependencies of the modules, so I plan to make a roadmap towards to end of the summer that will help guide the rest of the reorganization. At that point, we'll have to deal with the circular dependencies carefully. This is a huge project, much bigger than I can do in one summer. But I plan to get it started in a way that makes it easy for others to contribute. Sheena On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields wrote: > Sheena, > > Congrats on being accepted! We've talked about doing this over the years, > but it's not an easy task and it needs a dedicated project to get the ball > rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a > few of my questions/thoughts (Rob could probably chime in as well, but I > think his general thoughts on the project parallel mine): > > 1) The current BioPerl CPAN could just be a simple install script, acting > like a 'Task' or 'Bundle' module, installing the actual Bio-specific > distributions. Doing it this way would allow you to iteratively split off > additional code but retain the original Task/Bundle-based approach to > installation. For instance, the first pass could split out Root, then have > a dependency-light and 'extras' distribution, 2nd round split further based > on function, and so on: > > 1st round (v 1.9) : BioPerl (just an installer) -> installs root, > min-deps, extra-deps > 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, > other-min-deps, extra-deps > ... > Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, > tree, align, coord, map, everything-else > ... > > Also, one could potentially install modules in various ways: interactively, > in predetermined groups, using a user-defined list, etc (one could > effectively create custom BioPerl installs for GBrowse or other tools for > instance). Of course I would only pick the easiest route to start, but > maybe that gives some ideas. Regardless, if the dependency tree is set up > correctly any reliance on other Bio* modules would be defined in the various > Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). > > 2) The Bio::Root modules are probably the true core modules and are the > most stable with regards to changes, so those could be moved to something > like BioPerl-Core. Beyond that, what are the proposed splits? (we've > discussed this on-list before, but it's appropriate to bring this up again) > > 3) How do we want to handle versioning? We can't (and probably shouldn't) > release everything on a synchronized versioning scheme (via > Bio::Root::Version, for instance), that'll quickly fall apart. Personally I > can foresee each split-off dist having it's own version, with the BioPerl > network of modules being in effect it's own mini-CPAN. > > 5) Related to versioning, in my opinion we should maybe aim on eventually > calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. > Lincoln has already done something like this with Bio::Graphics, which was > originally part of BioPerl but split off prior to v 1.6.0. > > 6) In some cases I can see particularly thorny problems, such as circular > dependencies. I can think of a few ways to address that (creating a simple > lightweight Bio::Species class as a fallback if Bio::Tree code isn't > present, for instance), but any additional thoughts on this would be > helpful. > > 7) Do we want to set up something like 'git submodule' for the devs to pull > down all BioPerl-relevant code? > > Other thoughts? > > chris > > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > > Hey everyone, > > > > I wanted to take a minute to introduce myself as one of the Google Summer > of > > Code interns. I was the lucky one chosen to work on the BioPerl > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, > and > > somewhat new to this level of programming so bear with me as I learn the > > technical jargon. Luckily I have both Rob and Chris to mentor me this > > summer! > > > > Reading through the mailing list archives, I see there have been many > > discussion and differing opinions about tackling this project. Given the > > time frame for GSoC and my limited experience, there is no way I will > > complete this project on my own but I will at least be able to start it, > > which will hopefully motivate others to pitch in. So far, the plan for > the > > GSoC project is to start by breaking out Bio::Root, followed by a couple > > other modules based on their dependencies and the time allowed. Each will > be > > published to CPAN independently. You can follow the project (once it > starts) > > on github at https://github.com/sheenams. > > > > I look forward to collaborating with many of you on the reorganization > (hint > > hint)! > > > > Sheena > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Apr 28 17:04:51 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Apr 2011 16:04:51 -0500 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: References: Message-ID: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> Sounds fine; I think (as you indicate) we can deal with issues along the way. Rob, anything to add? chris On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > Chris, > > We haven't talked much about the versioning yet, but it will be on the list to figure out asap. > > So far, the plan is to split out Bio::Root first, followed by a couple modules that depend only on Bio::Root. The plan I proposed was Bio::Das, Bio::Event then Bio::Location. Depending on how much time is remaining for the GSoC project, the next to split out would be Bio::Factory and Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan to still help with the reorganization after the internship is over, but I obviously have to have a stopping point for the GSoC project. > > Rob provide me with a really nice scrip to list dependencies of the modules, so I plan to make a roadmap towards to end of the summer that will help guide the rest of the reorganization. At that point, we'll have to deal with the circular dependencies carefully. > > This is a huge project, much bigger than I can do in one summer. But I plan to get it started in a way that makes it easy for others to contribute. > > Sheena > > > On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields wrote: > Sheena, > > Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine): > > 1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions. Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation. For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on: > > 1st round (v 1.9) : BioPerl (just an installer) -> installs root, min-deps, extra-deps > 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, other-min-deps, extra-deps > ... > Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, tree, align, coord, map, everything-else > ... > > Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance). Of course I would only pick the easiest route to start, but maybe that gives some ideas. Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). > > 2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core. Beyond that, what are the proposed splits? (we've discussed this on-list before, but it's appropriate to bring this up again) > > 3) How do we want to handle versioning? We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart. Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN. > > 5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0. > > 6) In some cases I can see particularly thorny problems, such as circular dependencies. I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful. > > 7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code? > > Other thoughts? > > chris > > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > > Hey everyone, > > > > I wanted to take a minute to introduce myself as one of the Google Summer of > > Code interns. I was the lucky one chosen to work on the BioPerl > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and > > somewhat new to this level of programming so bear with me as I learn the > > technical jargon. Luckily I have both Rob and Chris to mentor me this > > summer! > > > > Reading through the mailing list archives, I see there have been many > > discussion and differing opinions about tackling this project. Given the > > time frame for GSoC and my limited experience, there is no way I will > > complete this project on my own but I will at least be able to start it, > > which will hopefully motivate others to pitch in. So far, the plan for the > > GSoC project is to start by breaking out Bio::Root, followed by a couple > > other modules based on their dependencies and the time allowed. Each will be > > published to CPAN independently. You can follow the project (once it starts) > > on github at https://github.com/sheenams. > > > > I look forward to collaborating with many of you on the reorganization (hint > > hint)! > > > > Sheena > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Thu Apr 28 19:19:51 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 28 Apr 2011 16:19:51 -0700 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> References: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> Message-ID: <4DB9F617.6070705@cornell.edu> I think you guys are on the right track, here are some slightly more detailed plans. I'll use Chris's subject numbering. 1,2,3,5.) I envision the splitting algorithm going like this: no strict; # this is pseudocode! my $split_count = 0; for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { - take $subsystem modules and tests out of bioperl-live (my $new_dist_name = $subsystem) =~ s/::/-/g; - extract $subsystem modules into new dist called $new_dist_name. Make sure all its tests pass, and write some more tests if necessary. - add dep on $subsystem to bioperl-live/Build.PL - push $new_dist_name and bioperl-live to CPAN. $new_dist_name has version '2.000', and bioperl-live has version "1.7.$split_count". } and then, at the end of this loop, bioperl-live will be nothing but a Build.PL and a couple of other things for backcompat, like Bio::Root::Version, Bio::Perl, etc. Important things to notice about this algorithm are that, at each step in the loop: a.) For users that install bioperl with CPAN, doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will get you the same set of modules as before the split started, with the split-off modules at 2.000 versions, and the non-split-off ones at 1.7.x versions. b.) For users (not developers) that are git cloning bioperl-live, even though they are naughty (wink), they can do 'perl Build.PL; ./Build installdeps' to get the split-off modules, downloaded like any other CPAN dependency. There may be some lag before the split-off thing is downloadable from CPAN, c.) For BioPerl developers, unless they are working on a certain module, they should install the split-off modules from CPAN like everybody else, and git clone only the piece they are working on. d.) The version of bioperl-live keeps increasing by 0.001 with each split. The systems that are split off have a 2.x version number, each slightly different depending on when it was split off. After this point, their release schedules and version numbers are independent of eachother and of bioperl-live. For Bio::Perl and Bio::Root::Version, the things that stay in bioperl-live, installing the latest version will get you all the split-off modules. 6.) (thorny circular dependencies and stuff) Those will become quickly apparent as this process proceeds. They'll take some finesse and/or ruthlessness and/or hacking to get around. We'll burn those bridges as we come to them. 7.) (git submodules) Git submodules probably won't be necessary, since at each step in the process BioPerl devs can use ./Build installdeps or cpanm --installdeps . to install whatever the dependencies are for the piece they are working on, whether it's bioperl-live (in the case of a module that has not yet been split off), or one of the distributions that has already been split off (in which case their improvements will probably be releasable to CPAN immediately!). Lots of detail there. I tried to make it structured and easy to skim though. Thoughts? Rob On 04/28/2011 02:04 PM, Chris Fields wrote: > Sounds fine; I think (as you indicate) we can deal with issues along the way. Rob, anything to add? > > chris > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > >> Chris, >> >> We haven't talked much about the versioning yet, but it will be on the list to figure out asap. >> >> So far, the plan is to split out Bio::Root first, followed by a couple modules that depend only on Bio::Root. The plan I proposed was Bio::Das, Bio::Event then Bio::Location. Depending on how much time is remaining for the GSoC project, the next to split out would be Bio::Factory and Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan to still help with the reorganization after the internship is over, but I obviously have to have a stopping point for the GSoC project. >> >> Rob provide me with a really nice scrip to list dependencies of the modules, so I plan to make a roadmap towards to end of the summer that will help guide the rest of the reorganization. At that point, we'll have to deal with the circular dependencies carefully. >> >> This is a huge project, much bigger than I can do in one summer. But I plan to get it started in a way that makes it easy for others to contribute. >> >> Sheena >> >> >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields wrote: >> Sheena, >> >> Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine): >> >> 1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions. Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation. For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on: >> >> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, min-deps, extra-deps >> 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, other-min-deps, extra-deps >> ... >> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, tree, align, coord, map, everything-else >> ... >> >> Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance). Of course I would only pick the easiest route to start, but maybe that gives some ideas. Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). >> >> 2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core. Beyond that, what are the proposed splits? (we've discussed this on-list before, but it's appropriate to bring this up again) >> >> 3) How do we want to handle versioning? We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart. Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN. >> >> 5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0. >> >> 6) In some cases I can see particularly thorny problems, such as circular dependencies. I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful. >> >> 7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code? >> >> Other thoughts? >> >> chris >> >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: >> >>> Hey everyone, >>> >>> I wanted to take a minute to introduce myself as one of the Google Summer of >>> Code interns. I was the lucky one chosen to work on the BioPerl >>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and >>> somewhat new to this level of programming so bear with me as I learn the >>> technical jargon. Luckily I have both Rob and Chris to mentor me this >>> summer! >>> >>> Reading through the mailing list archives, I see there have been many >>> discussion and differing opinions about tackling this project. Given the >>> time frame for GSoC and my limited experience, there is no way I will >>> complete this project on my own but I will at least be able to start it, >>> which will hopefully motivate others to pitch in. So far, the plan for the >>> GSoC project is to start by breaking out Bio::Root, followed by a couple >>> other modules based on their dependencies and the time allowed. Each will be >>> published to CPAN independently. You can follow the project (once it starts) >>> on github at https://github.com/sheenams. >>> >>> I look forward to collaborating with many of you on the reorganization (hint >>> hint)! >>> >>> Sheena >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From sidd.basu at gmail.com Thu Apr 28 22:15:01 2011 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 28 Apr 2011 21:15:01 -0500 Subject: [Bioperl-l] Re: GSoC/BioPerl Reorganization Project In-Reply-To: <4DB9F617.6070705@cornell.edu> References: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> <4DB9F617.6070705@cornell.edu> Message-ID: <20110429021457.GA351@Macintosh-235.local> Hi Robert, At what point in flow the dependencies between the split modules will be added. Is there any particular order the split modules would be created. And how those split off modules will be released in CPAN, one by one as they being generated or all of them in a batch after which they will follow their release schedule. -siddhartha On Thu, 28 Apr 2011, Robert Buels wrote: > I think you guys are on the right track, here are some slightly more > detailed plans. I'll use Chris's subject numbering. > > 1,2,3,5.) I envision the splitting algorithm going like this: > > no strict; # this is pseudocode! > > my $split_count = 0; > for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { > > - take $subsystem modules and tests out of bioperl-live > > (my $new_dist_name = $subsystem) =~ s/::/-/g; > - extract $subsystem modules into new dist called > $new_dist_name. Make sure all its tests pass, and write > some more tests if necessary. > > - add dep on $subsystem to bioperl-live/Build.PL > > - push $new_dist_name and bioperl-live to CPAN. > $new_dist_name has version '2.000', and bioperl-live has > version "1.7.$split_count". > } > > and then, at the end of this loop, bioperl-live will be > nothing but a Build.PL and a couple of other things > for backcompat, like Bio::Root::Version, Bio::Perl, etc. > > Important things to notice about this algorithm are that, at each > step in the loop: > > a.) For users that install bioperl with CPAN, > doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will > get you the same set of modules as before the split > started, with the split-off modules at 2.000 versions, and > the non-split-off ones at 1.7.x versions. > > b.) For users (not developers) that are git cloning > bioperl-live, even though they are naughty (wink), they > can do 'perl Build.PL; ./Build installdeps' to get the > split-off modules, downloaded like any other CPAN > dependency. There may be some lag before the split-off > thing is downloadable from CPAN, > > c.) For BioPerl developers, unless they are working on a > certain module, they should install the split-off modules > from CPAN like everybody else, and git clone only the piece > they are working on. > > d.) The version of bioperl-live keeps increasing by 0.001 with > each split. The systems that are split off have a 2.x > version number, each slightly different depending on when it > was split off. After this point, their release schedules > and version numbers are independent of eachother and of > bioperl-live. For Bio::Perl and Bio::Root::Version, the > things that stay in bioperl-live, installing the latest > version will get you all the split-off modules. > > > 6.) (thorny circular dependencies and stuff) Those will become quickly > apparent as this process proceeds. They'll take some finesse and/or > ruthlessness and/or hacking to get around. We'll burn those bridges as we > come to them. > > 7.) (git submodules) Git submodules probably won't be necessary, since at > each step in the process BioPerl devs can use ./Build installdeps or cpanm > --installdeps . to install whatever the dependencies are for the piece > they are working on, whether it's bioperl-live (in the case of a module > that has not yet been split off), or one of the distributions that has > already been split off (in which case their improvements will probably be > releasable to CPAN immediately!). > > Lots of detail there. I tried to make it structured and easy to skim > though. Thoughts? > > Rob > > > > On 04/28/2011 02:04 PM, Chris Fields wrote: > > Sounds fine; I think (as you indicate) we can deal with issues along the > > way. Rob, anything to add? > > > > chris > > > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > > > >> Chris, > >> > >> We haven't talked much about the versioning yet, but it will be on the > >> list to figure out asap. > >> > >> So far, the plan is to split out Bio::Root first, followed by a couple > >> modules that depend only on Bio::Root. The plan I proposed was Bio::Das, > >> Bio::Event then Bio::Location. Depending on how much time is remaining > >> for the GSoC project, the next to split out would be Bio::Factory and > >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I > >> plan to still help with the reorganization after the internship is over, > >> but I obviously have to have a stopping point for the GSoC project. > >> > >> Rob provide me with a really nice scrip to list dependencies of the > >> modules, so I plan to make a roadmap towards to end of the summer that > >> will help guide the rest of the reorganization. At that point, we'll have > >> to deal with the circular dependencies carefully. > >> > >> This is a huge project, much bigger than I can do in one summer. But I > >> plan to get it started in a way that makes it easy for others to > >> contribute. > >> > >> Sheena > >> > >> > >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields > >> wrote: > >> Sheena, > >> > >> Congrats on being accepted! We've talked about doing this over the years, > >> but it's not an easy task and it needs a dedicated project to get the > >> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off > >> with a few of my questions/thoughts (Rob could probably chime in as well, > >> but I think his general thoughts on the project parallel mine): > >> > >> 1) The current BioPerl CPAN could just be a simple install script, acting > >> like a 'Task' or 'Bundle' module, installing the actual Bio-specific > >> distributions. Doing it this way would allow you to iteratively split > >> off additional code but retain the original Task/Bundle-based approach to > >> installation. For instance, the first pass could split out Root, then > >> have a dependency-light and 'extras' distribution, 2nd round split > >> further based on function, and so on: > >> > >> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, > >> min-deps, extra-deps > >> 2nd round (v 1.901) : BioPerl (just an installer) -> root, > >> seq/feature, other-min-deps, extra-deps > >> ... > >> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, > >> seq, tree, align, coord, map, everything-else > >> ... > >> > >> Also, one could potentially install modules in various ways: > >> interactively, in predetermined groups, using a user-defined list, etc > >> (one could effectively create custom BioPerl installs for GBrowse or > >> other tools for instance). Of course I would only pick the easiest route > >> to start, but maybe that gives some ideas. Regardless, if the dependency > >> tree is set up correctly any reliance on other Bio* modules would be > >> defined in the various Build.PL/Makefile.PL and then installed via CPAN > >> (as is any dependency). > >> > >> 2) The Bio::Root modules are probably the true core modules and are the > >> most stable with regards to changes, so those could be moved to something > >> like BioPerl-Core. Beyond that, what are the proposed splits? (we've > >> discussed this on-list before, but it's appropriate to bring this up > >> again) > >> > >> 3) How do we want to handle versioning? We can't (and probably > >> shouldn't) release everything on a synchronized versioning scheme (via > >> Bio::Root::Version, for instance), that'll quickly fall apart. > >> Personally I can foresee each split-off dist having it's own version, > >> with the BioPerl network of modules being in effect it's own mini-CPAN. > >> > >> 5) Related to versioning, in my opinion we should maybe aim on eventually > >> calling this BioPerl v2.0 and starting with a simpler X.Y versioning > >> scheme. Lincoln has already done something like this with Bio::Graphics, > >> which was originally part of BioPerl but split off prior to v 1.6.0. > >> > >> 6) In some cases I can see particularly thorny problems, such as circular > >> dependencies. I can think of a few ways to address that (creating a > >> simple lightweight Bio::Species class as a fallback if Bio::Tree code > >> isn't present, for instance), but any additional thoughts on this would > >> be helpful. > >> > >> 7) Do we want to set up something like 'git submodule' for the devs to > >> pull down all BioPerl-relevant code? > >> > >> Other thoughts? > >> > >> chris > >> > >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > >> > >>> Hey everyone, > >>> > >>> I wanted to take a minute to introduce myself as one of the Google > >>> Summer of > >>> Code interns. I was the lucky one chosen to work on the BioPerl > >>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, > >>> and > >>> somewhat new to this level of programming so bear with me as I learn the > >>> technical jargon. Luckily I have both Rob and Chris to mentor me this > >>> summer! > >>> > >>> Reading through the mailing list archives, I see there have been many > >>> discussion and differing opinions about tackling this project. Given the > >>> time frame for GSoC and my limited experience, there is no way I will > >>> complete this project on my own but I will at least be able to start it, > >>> which will hopefully motivate others to pitch in. So far, the plan for > >>> the > >>> GSoC project is to start by breaking out Bio::Root, followed by a couple > >>> other modules based on their dependencies and the time allowed. Each > >>> will be > >>> published to CPAN independently. You can follow the project (once it > >>> starts) > >>> on github at https://github.com/sheenams. > >>> > >>> I look forward to collaborating with many of you on the reorganization > >>> (hint > >>> hint)! > >>> > >>> Sheena > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioinfo.khush at gmail.com Fri Apr 29 00:53:50 2011 From: bioinfo.khush at gmail.com (khush ........) Date: Fri, 29 Apr 2011 10:23:50 +0530 Subject: [Bioperl-l] Standalone blast In-Reply-To: References: Message-ID: Dear Dave, Thank you for your support. If need to change the following lines like $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastx', -database => 'nr.fa')); $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); I have a simple and basic query for you, as I am beginners in bioperl, that if I need to download the whole nr database from NCBI to run the code or It will directly fetch information from the NCBI website. I do not understand it, because downloading the whole nr d/b itself takes long time for me. How could I read whole file instead of simple string "TTTATAGATAGAGACAG" in -seq (a fasta file). Is there a simple way to do the exercise according to my conditions. Thank you Kamal On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina wrote: > Hi Kamal, > > This is covered in the beginners' HOWTO: > http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST > > > Dave > > > On Thu, Apr 28, 2011 at 07:22, khush ........ wrote: > >> Hi, >> >> I have some sequences ~250 and wanted to use BLASTX to blast against nr >> database of NCBI, as this is time consuming using web based search. Can >> some >> one please tell me how to start BIOPERL with scuh problems. I know that >> this >> is possible with bioperl, but do not know how. >> >> Any suggestion will be appreciable. >> >> Thanks in advance >> Kamal >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From rmb32 at cornell.edu Fri Apr 29 01:15:01 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 28 Apr 2011 22:15:01 -0700 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: <20110429021457.GA351@Macintosh-235.local> References: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> <4DB9F617.6070705@cornell.edu> <20110429021457.GA351@Macintosh-235.local> Message-ID: <4DBA4955.2030003@cornell.edu> On 04/28/2011 07:15 PM, Siddhartha Basu wrote: > At what point in flow the dependencies between the split modules will be > added. Is there any particular order the split modules would be created. Dependencies are added and characterized at the time each distribution is created. That's why the splitting order starts at Bio::Root, so that you can proceed up the hierarchy of dependencies without having to modify the dependency lists of the distributions that have already been extracted. > And how those split off modules will be released in CPAN, one by one as > they being generated or all of them in a batch after which they will > follow their release schedule. One by one, as they are generated. I think it would be a good idea to re-release bioperl-live with each split as well. This will probably lead to bioperl-live being released nearly every week as the split is ongoing. As a consequence, the master branch of bioperl-live will need to be kept in very good shape. This is easy if you just follow good practice: develop in branches, run *all* the tests before committing, go on IRC and send pull requests for code review, etc. Rob From florent.angly at gmail.com Fri Apr 29 01:24:45 2011 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 29 Apr 2011 15:24:45 +1000 Subject: [Bioperl-l] Standalone blast In-Reply-To: References: Message-ID: <4DBA4B9D.1010400@gmail.com> Hi Kamal, To run BLAST the way Dave described, you need to have BLAST installed on your computer, and you need to download BLAST databases to your computer (or make them yourself with the formatdb command). There are plenty of databases available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes, some of these databases are very large and will take a long time to download. By the way, the BLAST may also take a very long time to execute if you use large databases, so, you'd better run the analysis on a powerful computer or a server. Also read this documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm It stipulates that you can BLAST an entire FASTA file (not just a sequence object): $inputfilename = 't/testquery.fa'; $blast_report = $factory->blastall($inputfilename); Regards, Florent On 29/04/11 14:53, khush ........ wrote: > Dear Dave, > > Thank you for your support. > > If need to change the following lines like > > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastx', > -database => 'nr.fa')); > > $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); > > I have a simple and basic query for you, as I am beginners in bioperl, that > if I need to download the whole nr database from NCBI to run the code or It > will directly fetch information from the NCBI website. I do not understand > it, because downloading the whole nr d/b itself takes long time for me. > > How could I read whole file instead of simple string "TTTATAGATAGAGACAG" in > -seq (a fasta file). Is there a simple way to do the exercise according to > my conditions. > > Thank you > Kamal > > > On Thu, Apr 28, 2011 at 12:59 PM, Dave Messinawrote: > >> Hi Kamal, >> >> This is covered in the beginners' HOWTO: >> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST >> >> >> Dave >> >> >> On Thu, Apr 28, 2011 at 07:22, khush ........wrote: >> >>> Hi, >>> >>> I have some sequences ~250 and wanted to use BLASTX to blast against nr >>> database of NCBI, as this is time consuming using web based search. Can >>> some >>> one please tell me how to start BIOPERL with scuh problems. I know that >>> this >>> is possible with bioperl, but do not know how. >>> >>> Any suggestion will be appreciable. >>> >>> Thanks in advance >>> Kamal >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bioinfo.khush at gmail.com Fri Apr 29 01:46:38 2011 From: bioinfo.khush at gmail.com (khush ........) Date: Fri, 29 Apr 2011 11:16:38 +0530 Subject: [Bioperl-l] Standalone blast In-Reply-To: <4DBA4B9D.1010400@gmail.com> References: <4DBA4B9D.1010400@gmail.com> Message-ID: Dear Florent, Thank you very much for your kind reply and let me clear the concept of running the blast. I am working with simple machine so I need to take permission from my administrator to work on some good server to have whole nr database from NCBI and run the blastx. Thank you Kamal Bioperl is great. On Fri, Apr 29, 2011 at 10:54 AM, Florent Angly wrote: > Hi Kamal, > > To run BLAST the way Dave described, you need to have BLAST installed on > your computer, and you need to download BLAST databases to your computer (or > make them yourself with the formatdb command). There are plenty of databases > available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes, some > of these databases are very large and will take a long time to download. By > the way, the BLAST may also take a very long time to execute if you use > large databases, so, you'd better run the analysis on a powerful computer or > a server. > > Also read this documentation: > http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm< > http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm > > > It stipulates that you can BLAST an entire FASTA file (not just a sequence > object): > > $inputfilename = 't/testquery.fa'; > $blast_report = $factory->blastall($inputfilename); > > > Regards, > > Florent > > > > > > On 29/04/11 14:53, khush ........ wrote: > >> Dear Dave, >> >> Thank you for your support. >> >> If need to change the following lines like >> >> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastx', >> -database => 'nr.fa')); >> >> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); >> >> I have a simple and basic query for you, as I am beginners in bioperl, >> that >> if I need to download the whole nr database from NCBI to run the code or >> It >> will directly fetch information from the NCBI website. I do not understand >> it, because downloading the whole nr d/b itself takes long time for me. >> >> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" >> in >> -seq (a fasta file). Is there a simple way to do the exercise according to >> my conditions. >> >> Thank you >> Kamal >> >> >> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina> >wrote: >> >> Hi Kamal, >>> >>> This is covered in the beginners' HOWTO: >>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST >>> >>> >>> Dave >>> >>> >>> On Thu, Apr 28, 2011 at 07:22, khush ........>> >wrote: >>> >>> Hi, >>>> >>>> I have some sequences ~250 and wanted to use BLASTX to blast against nr >>>> database of NCBI, as this is time consuming using web based search. Can >>>> some >>>> one please tell me how to start BIOPERL with scuh problems. I know that >>>> this >>>> is possible with bioperl, but do not know how. >>>> >>>> Any suggestion will be appreciable. >>>> >>>> Thanks in advance >>>> Kamal >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From cjfields at illinois.edu Fri Apr 29 01:49:31 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Apr 2011 00:49:31 -0500 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: <4DBA4955.2030003@cornell.edu> References: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> <4DB9F617.6070705@cornell.edu> <20110429021457.GA351@Macintosh-235.local> <4DBA4955.2030003@cornell.edu> Message-ID: On Apr 29, 2011, at 12:15 AM, Robert Buels wrote: > On 04/28/2011 07:15 PM, Siddhartha Basu wrote: >> At what point in flow the dependencies between the split modules will be >> added. Is there any particular order the split modules would be created. > > Dependencies are added and characterized at the time each distribution is created. That's why the splitting order starts at Bio::Root, so that you can proceed up the hierarchy of dependencies without having to modify the dependency lists of the distributions that have already been extracted. Yes (or, +1). Those particular split-off bits can probably be called v. 2.0 for all intents and purposes, with whatever is left iteratively converging on 2.0. >> And how those split off modules will be released in CPAN, one by one as >> they being generated or all of them in a batch after which they will >> follow their release schedule. > > One by one, as they are generated. I think it would be a good idea to re-release bioperl-live with each split as well. This will probably lead to bioperl-live being released nearly every week as the split is ongoing. As a consequence, the master branch of bioperl-live will need to be kept in very good shape. This is easy if you just follow good practice: develop in branches, run *all* the tests before committing, go on IRC and send pull requests for code review, etc. > > Rob As code will be pulled out along the way, might be best to ensure branches are up-to-date prior to a merge. One additional question: how are we dealing with commit history? I don't think there is an easy way of carrying that over to a brand-new repo... Not that it's a problem, but something to think about. chris From David.Messina at sbc.su.se Fri Apr 29 03:16:22 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 29 Apr 2011 09:16:22 +0200 Subject: [Bioperl-l] GSoC/BioPerl Reorganization Project In-Reply-To: References: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5@illinois.edu> <4DB9F617.6070705@cornell.edu> <20110429021457.GA351@Macintosh-235.local> <4DBA4955.2030003@cornell.edu> Message-ID: > > One additional question: how are we dealing with commit history? I don't > think there is an easy way of carrying that over to a brand-new repo... > > Not that it's a problem, but something to think about. I believe git filter-branch can be used: "filter-branch is commonly used on a clone of the repo to split a too-large repo into smaller ones." https://github.com/matthewmccullough/git-workshop/raw/master/workbook/htmls/27-Filter-Branch.html Dave From dan.bolser at gmail.com Fri Apr 29 05:53:33 2011 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 29 Apr 2011 10:53:33 +0100 Subject: [Bioperl-l] Question about Bio::Coordinate::Pair In-Reply-To: References: Message-ID: This issue (see below) got raised again recently and, with retrospect, I think I understand better now. In the example case here: http://www.bioperl.org/wiki/Module_Discussion:Bio::Coordinate::Pair what we see is that, that part of the feature to 'map' that extends beyond the range of the '-in' sequence is (arbitrarily?) mapped to a 'gap' (a Bio::Coordinate::Result::Gap) on *that* sequence in the result. This gap has the seq_id and strand of *that* sequence, and that's that. i.e. The Bio::Coordinate::Pair defines the position of (part of) a contig in a scaffold, and we wish to 'map' some features that are in 'contig coordinates' onto the scaffold to get the same features in 'scaffold coordinates'. Any feature falling within the contig (or within that part of the contig that we specify when creating the Bio::Coordinate::Pair) is 'mapped' as a single 'Match' (Bio::Coordinate::Result::Match) within the result. If, however, feature (in 'contig coordinates') lies outside or across the part of the contig specified when creating the Bio::Coordinate::Pair, that (part of) the feature is 'mapped' as a single 'Gap' (Bio::Coordinate::Result::Gap) which is given in terms of the contig (has the contig seq_id and the contig strand). I'm really not sure of the semantics of this... It makes sense that, if we put half of one contig into one scaffold, that we don't want features from the other half to be mapped onto the scaffold, but I don't know why we would want those features to turn up as gaps, but there you have it. The changes I made to Bio::Coordinate::Pair are now defunct, but I'll still request that my test script gets pulled into master [1], because more tests can't hurt right? Cheers, Dan. [1] https://github.com/bioperl/bioperl-live/pull/14 On 8 October 2010 16:57, Dan Bolser wrote: > Actually, I changed the Strand of the gap in a few more places [1], > and now there are a total of 4 failed tests, however, they all appear > to be of the same form as the one reported below: > > > not ok 84 > # ? Failed test at t/Coordinate/CoordinateMapper.t line 229. > # ? ? ? ? ?got: '1' > # ? ? expected: '-1' > > not ok 93 > # ? Failed test at t/Coordinate/CoordinateMapper.t line 246. > # ? ? ? ? ?got: '1' > # ? ? expected: '-1' > > not ok 99 > # ? Failed test at t/Coordinate/CoordinateMapper.t line 262. > # ? ? ? ? ?got: '1' > # ? ? expected: '-1' > > not ok 102 > # ? Failed test at t/Coordinate/CoordinateMapper.t line 265. > # ? ? ? ? ?got: '1' > # ? ? expected: '-1' > > > So now you know. > > > Dan. > > [1] http://github.com/dbolser/bioperl-live/tree/dbolser_bio_coordinate_pair_tests > > On 8 October 2010 16:04, Dan Bolser wrote: >> Well... I tried to toggle the strand of the gap sublocation to match >> that of the match sublocation, and overall I ended up failing one test >> within t/Coordinate/CoordinateMapper.t, test number 55 (line 119)... >> >> This test actually tests for the strandedness of the gap sublocation >> that I'm specifically changing because it leads to the 'unexpected' >> (or at least, 'inconsistent') behaviour that I'm calling a bug (test >> 55 is at the end, with enough preceding context to reproduce it): >> >> # propepide >> my $match1 = Bio::Location::Simple->new >> ? ?(-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 ); >> # peptide >> my $match2 = Bio::Location::Simple->new >> ? ?(-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 ); >> >> ok my $pair = Bio::Coordinate::Pair->new(-in => $match1, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -out => $match2, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -negative => 0, # false, default >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?); >> >> ... >> >> # >> # partial match = gap & match >> # >> $pos2 = Bio::Location::Simple->new >> ? ?(-start => 20, -end => 22, -strand=> -1 ); >> >> ok $res = $pair->map($pos2); >> >> ... >> >> is $res->gap->strand, -1; # TEST 55. ?Fails when I 'fix' Bio::Coordinate::Pair >> >> >> >> In the absence of any other information, can I take this to mean that >> the strand of the gap sublocations are not used for anything >> significant? >> >> >> Cheers, >> Dan. >> >> >> >> On 5 October 2010 11:36, Dan Bolser wrote: >>> Hi, >>> >>> Can someone describe in a bit more detail the purpose of the Gap >>> sublocations that are sometimes returned by Bio::Coordinate::Pair [1]? >>> >>> I found that, according to Bio::Location::Split, if the Match and Gap >>> sublocations have a different strand, the strand method (called via >>> Bio::Coordinate::Result) returns undef. This is inconsistent with the >>> way Bio::Coordinate::Result tends to behave. See the test script and >>> results below, also pasted here [2]. >>> >>> The question is, can I just toggle the strand of the Gap sublocation >>> to match that of the Match sublocation? Or does the strand of the Gap >>> sublocation encode some important but as yet undocumented information? >>> If the strand of the Gap and Match sublocations are made to match >>> (within Bio::Coordinate::Pair) this will simplify code that uses >>> Bio::Coordinate::Pair, making it more consistent, and perhaps help >>> with some other bugs [3]. >>> >>> >>> Cheers, >>> Dan. >>> >>> [1] http://www.bioperl.org/wiki/Module:Bio::Coordinate::Pair >>> [2] http://www.bioperl.org/wiki/Module_Discussion:Bio::Coordinate::Pair >>> [3] http://tinyurl.com/36na2cp >>> >>> >>> #!/usr/bin/perl -w >>> >>> ## Stress test Bio::Coordinate::Pair >>> >>> use strict; >>> use Data::Dumper; >>> >>> use Bio::Location::Simple; >>> use Bio::Coordinate::Pair; >>> >>> ## A contig >>> my $ctg = Bio::Location::Simple-> >>> ?new( -seq_id => 'ctg', >>> ? ? ? -start ?=> ? ?1, >>> ? ? ? -end ? ?=> 1001, >>> ? ? ? -strand => ? +1, >>> ? ? ); >>> >>> ## The contigs position on a chromosome (forward) >>> my $ctg_on_chr_f = Bio::Location::Simple-> >>> ?new( -seq_id => 'ctg on chr r', >>> ? ? ? -start ?=> ? ? ? ? ? 5001, >>> ? ? ? -end ? ?=> ? ? ? ? ? 6001, >>> ? ? ? -strand => ? ? ? ? ? ? +1, >>> ? ? ); >>> >>> ## The contigs position on a chromosome (reverse) >>> my $ctg_on_chr_r = Bio::Location::Simple-> >>> ?new( -seq_id => 'ctg on chr r', >>> ? ? ? -start ?=> ? ? ? ? ? 5001, >>> ? ? ? -end ? ?=> ? ? ? ? ? 6001, >>> ? ? ? -strand => ? ? ? ? ? ? -1, >>> ? ? ); >>> >>> ## Coordinate mapping (forward) >>> my $agp_f = Bio::Coordinate::Pair-> >>> ?new( -in ?=> $ctg, >>> ? ? ? -out => $ctg_on_chr_f >>> ? ? ); >>> >>> ## Coordinate mapping (reverse) >>> my $agp_r = Bio::Coordinate::Pair-> >>> ?new( -in ?=> $ctg, >>> ? ? ? -out => $ctg_on_chr_r >>> ? ? ); >>> >>> >>> >>> ## A match, in contig coordinates... >>> my $match_on_ctg_4 = Bio::Location::Simple-> >>> ?new( -seq_id => 'hit 4', >>> ? ? ? -start ?=> ? ? 925, >>> ? ? ? -end ? ?=> ? ?1125, >>> ? ? ? -strand => ? ? ?-1, >>> ? ? ); >>> >>> ## Map it into chromosome coordinates (forward) >>> my $match_on_chr_4_f = >>> ?$agp_f->map( $match_on_ctg_4 ); >>> >>> print Dumper $match_on_chr_4_f, "\n"; >>> >>> ## Map it into chromosome coordinates (reverse) >>> my $match_on_chr_4_r = >>> ?$agp_r->map( $match_on_ctg_4 ); >>> >>> print Dumper $match_on_chr_4_r, "\n"; >>> >>> __END__ >>> >>> $VAR1 = bless( { >>> ? ? ? ? ? ? ? ? '_sublocations' => [ >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bless( { >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_strand' => -1, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_seqid' => 'ctg on chr r', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_start' => 5925, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_location_type' => 'EXACT', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_end' => 6001 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }, >>> 'Bio::Coordinate::Result::Match' ), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bless( { >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_strand' => -1, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_seqid' => 'ctg', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_location_type' => 'EXACT', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_start' => 1002, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_end' => 1125 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }, 'Bio::Coordinate::Result::Gap' ) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?], >>> ? ? ? ? ? ? ? ? '_gap' => $VAR1->{'_sublocations'}[1], >>> ? ? ? ? ? ? ? ? 'strand' => -1, >>> ? ? ? ? ? ? ? ? '_match' => $VAR1->{'_sublocations'}[0], >>> ? ? ? ? ? ? ? ? '_splittype' => 'JOIN' >>> ? ? ? ? ? ? ? }, 'Bio::Coordinate::Result' ); >>> $VAR2 = ' >>> '; >>> $VAR1 = bless( { >>> ? ? ? ? ? ? ? ? '_sublocations' => [ >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bless( { >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_strand' => 1, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_seqid' => 'ctg on chr r', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_start' => 5001, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_location_type' => 'EXACT', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_end' => 5077 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }, >>> 'Bio::Coordinate::Result::Match' ), >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bless( { >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_strand' => -1, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_seqid' => 'ctg', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_location_type' => 'EXACT', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_start' => 1002, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '_end' => 1125 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }, 'Bio::Coordinate::Result::Gap' ) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?], >>> ? ? ? ? ? ? ? ? '_gap' => $VAR1->{'_sublocations'}[1], >>> ? ? ? ? ? ? ? ? 'strand' => 1, >>> ? ? ? ? ? ? ? ? '_match' => $VAR1->{'_sublocations'}[0], >>> ? ? ? ? ? ? ? ? '_splittype' => 'JOIN' >>> ? ? ? ? ? ? ? }, 'Bio::Coordinate::Result' ); >>> $VAR2 = ' >>> '; >>> >> > From dcampo at usc.edu Tue Apr 19 21:02:20 2011 From: dcampo at usc.edu (Daniel Campo) Date: Tue, 19 Apr 2011 18:02:20 -0700 Subject: [Bioperl-l] AlignIO Message-ID: <97343FFF-E3BA-4169-8578-97D55591862E@usc.edu> Hi, I am trying to install BioPerl in my MacOSX using CPAN. After running the tests I got the following: Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/AlignIO/AlignIO.t 255 65280 28 42 8-28 t/AlignIO/arp.t 255 65280 48 92 3-48 t/Annotation/Annotation.t 255 65280 158 83 9 116 118-158 t/ClusterIO/SequenceFamily.t 255 65280 19 34 3-19 t/LocalDB/Flat.t 255 65280 24 20 15-24 t/LocalDB/Index.t 255 65280 64 66 32-64 t/SeqIO/Handler.t 255 65280 561 1120 2-561 t/SeqIO/chaos.t 1 256 8 1 1 t/SeqIO/swiss.t 255 65280 240 479 1-240 t/SeqTools/GuessSeqFormat.t 1 256 49 2 25 50 t/Tools/Analysis/Protein/Scansite.t 255 65280 14 20 5-14 63 tests and 305 subtests skipped. Failed 11/329 test scripts. 981/17708 subtests failed. Files=329, Tests=17708, 97 wallclock secs (81.97 cusr + 10.56 csys = 92.53 CPU) Failed 11/329 test programs. 981/17708 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO RKOBES/ExtUtils-Manifest-1.58.tar.gz : install NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO And my concern is that I need to run a script that needs the module AlignIO. But that module seems to not have passed the test. Could you please help me on this? Thank you very much in advance. Daniel. --- Daniel Campo Falgueras Postdoctoral Research Associate Molecular and Computational Biology University of Southern California 1050 Childs Way, RRI. 324C Los Angeles, CA, 90089-2910 (+1) 213-821-3976 dcampo at usc.edu http://college.usc.edu/cf/faculty-and-staff/staff.cfm?pid=1027679 From bubli_thakur at rediffmail.com Wed Apr 20 04:38:07 2011 From: bubli_thakur at rediffmail.com (subarna thakur) Date: 20 Apr 2011 08:38:07 -0000 Subject: [Bioperl-l] =?utf-8?q?Problem_with_paml+codeml?= Message-ID: <20110420083807.14444.qmail@f4mail-235-129.rediffmail.com> HiI am using the bioperl script to calculate the ka/ks ratio for a no. of sequence using PAML. As the script runs, a tmp file is generated  and mlc file is also generated in tmp folder. I have checked the mlc file and it seems to be ok. But  the align_output file comes as-SEQ1  SEQ2  Ka    Ks   Ka/Ks    PROT_PERCENTIDThe ouput file is blank without any ka-ks value.The script I am using is follows------------------------------------------------------- #!/usr/bin/perl -w use strict; use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::Tools::Run::Alignment::Clustalw; # for projecting alignments from protein to R/DNA space use Bio::Align::Utilities qw(aa_to_dna_aln); # for input of the sequence data use Bio::SeqIO; use Bio::AlignIO; BEGIN { $ENV{CLUSTALDIR} = '/usr/local/bin' } BEGIN { $ENV{PAMLDIR} = '/root/Desktop/paml44/bin' } BEGIN { $ENV{TMPDIR} = '/tmp' } my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new; my $seqdata = shift || 'nt.fasta'; my $seqio = new Bio::SeqIO(-file => $seqdata, -format => 'fasta'); my %seqs; my @prots; # process each sequence while ( my $seq = $seqio->next_seq ) { $seqs{$seq->display_id} = $seq; # translate them into protein my $protein = $seq->translate(); my $pseq = $protein->seq(); if( $pseq =~ /\*/ && $pseq !~ /\*$/ ) { warn("provided a CDS sequence with a stop codon, PAML will choke!"); exit(0); } # Tcoffee can't handle '*' even if it is trailing $pseq =~ s/\*//g; $protein->seq($pseq); push @prots, $protein; } if( @prots < 2 ) { warn("Need at least 2 CDS sequences to proceed"); exit(0); } open(OUT, ">align_output.txt") || die("cannot open output align_output for writing"); # Align the sequences with clustalw my $aa_aln = $aln_factory->align(\@prots); # project the protein alignment back to CDS coordinates my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs); my @each = $dna_aln->each_seq(); my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1); ( -params => { 'runmode' => -2, 'seqtype' => 1, } ); # set the alignment object $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result(); my $MLmatrix = $result->get_MLmatrix(); my @otus = $result->get_seqs(); my @pos = map { my $c= 1; foreach my $s ( @each ) { last if( $s->display_id eq $_->display_id ); $c++; } $c; } @otus; print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID CDNA_PERCENTID)), "\n"; for( my $i = 0; $i < (scalar @otus -1) ; $i++) { for( my $j = $i+1; $j < (scalar @otus); $j++ ) { my $sub_aa_aln = $aa_aln->select_noncont($pos[$i],$pos[$j]); my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]); print OUT join("\t", $otus[$i]->display_id, $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'}, $MLmatrix->[$i]->[$j]->{'dS'}, $MLmatrix->[$i]->[$j]->{'omega'}, sprintf("%.2f",$sub_aa_aln->percentage_identity), sprintf("%.2f",$sub_dna_aln->percentage_identity), ), "\n"; } }  ----------------------------------------- Can anybody please suggest what is wrong with the script? I think something is wrong in the end section particularly the counter. RegardsSubarna From xinlisun at hotmail.com Thu Apr 21 01:28:12 2011 From: xinlisun at hotmail.com (SunXinli) Date: Thu, 21 Apr 2011 13:28:12 +0800 Subject: [Bioperl-l] PAML::Codeml Message-ID: Dear Sir or Madam, I met a problem when run Bio::Tools::Run::Phylo::PAML::Codeml to calculate Ka/Ks ratio of pairwise sequences. However, the result is fine when I use the same DNA sequence file to run Bio::Tools::Run::Phylo::PAML::Yn00. bioperl version is 1.61-1, and PAML version is 4.4. #!/usr/bin/perl -w use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'out'); while ( my $aln = $alignio->next_aln){ my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new( -params => { 'runmode' => -2, 'seqtype' => 1, } ); $codeml->alignment($aln); my ($rc,$parser) = $codeml->run(); while ( my $result = $parser->next_result ){ my $MLmatrix = $result->get_MLmatrix(); my $dN = $MLmatrix->[0]->[1]->{dN}; my $dS = $MLmatrix->[0]->[1]->{dS}; my $kaks =$MLmatrix->[0]->[1]->{omega}; print "Ka = $dN Ks = $dS Ka/Ks = $kaks\n"; } } Error message: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/share/perl5/Bio/Tools/Phylo/PAML.pm:461 STACK: Bio::Tools::Phylo::PAML::next_result /usr/share/perl5/Bio/Tools/Phylo/PAML.pm:270 STACK: Codeml.pl:18 ---------------------------------------------------------------- Thanks a lot, Xinli From perlbio123 at gmail.com Wed Apr 27 20:26:49 2011 From: perlbio123 at gmail.com (perlbio007) Date: Wed, 27 Apr 2011 17:26:49 -0700 (PDT) Subject: [Bioperl-l] Convert fastq to fasta Message-ID: <31492543.post@talk.nabble.com> Iam new to Bioperl. Pls help. I have a zip folder of sequences which is in fastq format. I need to convert it in fasta format? How I do that using bioperl?What module do I need? -- View this message in context: http://old.nabble.com/Convert-fastq-to-fasta-tp31492543p31492543.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From xiaoheyiyh at yahoo.com Thu Apr 28 16:46:32 2011 From: xiaoheyiyh at yahoo.com (heyi xiao) Date: Thu, 28 Apr 2011 13:46:32 -0700 (PDT) Subject: [Bioperl-l] installation problem with Bio::Tools::Run::StandAloneBlastPlus module Message-ID: <977343.78661.qm@web125416.mail.ne1.yahoo.com> Hi all, I am trying to install bioperl module, Bio::Tools::Run::StandAloneBlastPlus, through CPAN. I updated CPAN and Module::Build before installation. But I got the following error message. Building and testing BioPerl-1.006900 ... FAIL ! Installing Bio::Root::Version failed. See /home/xiao/.cpanm/build.log for details. ! Bailing out the installation for BioPerl-Run-1.006900. Retry with --prompt or --force. I used --force option to install it anyway. But the module cann?t be loaded or used properly, with the following error: Can't locate IPC/Run.pm in @INC.. Obviously there are some problem with BioPerl-Run. But I am not sure what that?s, and how to solve it. Any help would be appreicately. Thanks! Heyi From bioinfo.khush at gmail.com Fri Apr 29 02:34:29 2011 From: bioinfo.khush at gmail.com (khush ........) Date: Fri, 29 Apr 2011 12:04:29 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 96, Issue 28 In-Reply-To: References: Message-ID: Dear, I am trying to calculate the Ka/ks ratio of my aligned sequences by clustalx and for the same I am using So I am using the the scrip given at https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/pairwise_kaks.PLS when I am trying to run the It alert me to chage the line "warn("Could not find the executable for $aln_prog, make sure you have installed it and have either set ".uc($aln_prog)."DIR or it is in your PATH");" "Could not find the executable for clustaw, make sure you have installed it and have either set CLUSTAWDIR or it is in your PATH at kaks.pl line 52." I have clustalw2 and clustalx installed on my system. How to and where to set the path for the same and how to calculate the Ka/Ks raio for my sequences. Thank you Kamal On Fri, Apr 29, 2011 at 11:16 AM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: GSoC/BioPerl Reorganization Project (Sheena Scroggins) > 2. Re: GSoC/BioPerl Reorganization Project (Chris Fields) > 3. Re: GSoC/BioPerl Reorganization Project (Robert Buels) > 4. Re: GSoC/BioPerl Reorganization Project (Siddhartha Basu) > 5. Re: Standalone blast (khush ........) > 6. Re: GSoC/BioPerl Reorganization Project (Robert Buels) > 7. Re: Standalone blast (Florent Angly) > 8. Re: Standalone blast (khush ........) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 28 Apr 2011 12:53:49 -0700 > From: Sheena Scroggins > Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > Chris, > > We haven't talked much about the versioning yet, but it will be on the list > to figure out asap. > > So far, the plan is to split out Bio::Root first, followed by a couple > modules that depend only on Bio::Root. The plan I proposed was Bio::Das, > Bio::Event then Bio::Location. Depending on how much time is remaining for > the GSoC project, the next to split out would be Bio::Factory and > Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan > to still help with the reorganization after the internship is over, but I > obviously have to have a stopping point for the GSoC project. > > Rob provide me with a really nice scrip to list dependencies of the > modules, > so I plan to make a roadmap towards to end of the summer that will help > guide the rest of the reorganization. At that point, we'll have to deal > with > the circular dependencies carefully. > > This is a huge project, much bigger than I can do in one summer. But I plan > to get it started in a way that makes it easy for others to contribute. > > Sheena > > > On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields >wrote: > > > Sheena, > > > > Congrats on being accepted! We've talked about doing this over the years, > > but it's not an easy task and it needs a dedicated project to get the > ball > > rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a > > few of my questions/thoughts (Rob could probably chime in as well, but I > > think his general thoughts on the project parallel mine): > > > > 1) The current BioPerl CPAN could just be a simple install script, acting > > like a 'Task' or 'Bundle' module, installing the actual Bio-specific > > distributions. Doing it this way would allow you to iteratively split > off > > additional code but retain the original Task/Bundle-based approach to > > installation. For instance, the first pass could split out Root, then > have > > a dependency-light and 'extras' distribution, 2nd round split further > based > > on function, and so on: > > > > 1st round (v 1.9) : BioPerl (just an installer) -> installs root, > > min-deps, extra-deps > > 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, > > other-min-deps, extra-deps > > ... > > Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, > > tree, align, coord, map, everything-else > > ... > > > > Also, one could potentially install modules in various ways: > interactively, > > in predetermined groups, using a user-defined list, etc (one could > > effectively create custom BioPerl installs for GBrowse or other tools for > > instance). Of course I would only pick the easiest route to start, but > > maybe that gives some ideas. Regardless, if the dependency tree is set > up > > correctly any reliance on other Bio* modules would be defined in the > various > > Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). > > > > 2) The Bio::Root modules are probably the true core modules and are the > > most stable with regards to changes, so those could be moved to something > > like BioPerl-Core. Beyond that, what are the proposed splits? (we've > > discussed this on-list before, but it's appropriate to bring this up > again) > > > > 3) How do we want to handle versioning? We can't (and probably > shouldn't) > > release everything on a synchronized versioning scheme (via > > Bio::Root::Version, for instance), that'll quickly fall apart. > Personally I > > can foresee each split-off dist having it's own version, with the BioPerl > > network of modules being in effect it's own mini-CPAN. > > > > 5) Related to versioning, in my opinion we should maybe aim on eventually > > calling this BioPerl v2.0 and starting with a simpler X.Y versioning > scheme. > > Lincoln has already done something like this with Bio::Graphics, which > was > > originally part of BioPerl but split off prior to v 1.6.0. > > > > 6) In some cases I can see particularly thorny problems, such as circular > > dependencies. I can think of a few ways to address that (creating a > simple > > lightweight Bio::Species class as a fallback if Bio::Tree code isn't > > present, for instance), but any additional thoughts on this would be > > helpful. > > > > 7) Do we want to set up something like 'git submodule' for the devs to > pull > > down all BioPerl-relevant code? > > > > Other thoughts? > > > > chris > > > > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > > > > Hey everyone, > > > > > > I wanted to take a minute to introduce myself as one of the Google > Summer > > of > > > Code interns. I was the lucky one chosen to work on the BioPerl > > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, > > and > > > somewhat new to this level of programming so bear with me as I learn > the > > > technical jargon. Luckily I have both Rob and Chris to mentor me this > > > summer! > > > > > > Reading through the mailing list archives, I see there have been many > > > discussion and differing opinions about tackling this project. Given > the > > > time frame for GSoC and my limited experience, there is no way I will > > > complete this project on my own but I will at least be able to start > it, > > > which will hopefully motivate others to pitch in. So far, the plan for > > the > > > GSoC project is to start by breaking out Bio::Root, followed by a > couple > > > other modules based on their dependencies and the time allowed. Each > will > > be > > > published to CPAN independently. You can follow the project (once it > > starts) > > > on github at https://github.com/sheenams. > > > > > > I look forward to collaborating with many of you on the reorganization > > (hint > > > hint)! > > > > > > Sheena > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > Message: 2 > Date: Thu, 28 Apr 2011 16:04:51 -0500 > From: Chris Fields > Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project > To: Sheena Scroggins > Cc: BioPerl List , Robert Buels > > Message-ID: <1FF62DC3-941A-4DCB-8464-89D220E4A9C5 at illinois.edu> > Content-Type: text/plain; charset="us-ascii" > > Sounds fine; I think (as you indicate) we can deal with issues along the > way. Rob, anything to add? > > chris > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > > > Chris, > > > > We haven't talked much about the versioning yet, but it will be on the > list to figure out asap. > > > > So far, the plan is to split out Bio::Root first, followed by a couple > modules that depend only on Bio::Root. The plan I proposed was Bio::Das, > Bio::Event then Bio::Location. Depending on how much time is remaining for > the GSoC project, the next to split out would be Bio::Factory and > Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan > to still help with the reorganization after the internship is over, but I > obviously have to have a stopping point for the GSoC project. > > > > Rob provide me with a really nice scrip to list dependencies of the > modules, so I plan to make a roadmap towards to end of the summer that will > help guide the rest of the reorganization. At that point, we'll have to deal > with the circular dependencies carefully. > > > > This is a huge project, much bigger than I can do in one summer. But I > plan to get it started in a way that makes it easy for others to contribute. > > > > Sheena > > > > > > On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields > wrote: > > Sheena, > > > > Congrats on being accepted! We've talked about doing this over the years, > but it's not an easy task and it needs a dedicated project to get the ball > rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a > few of my questions/thoughts (Rob could probably chime in as well, but I > think his general thoughts on the project parallel mine): > > > > 1) The current BioPerl CPAN could just be a simple install script, acting > like a 'Task' or 'Bundle' module, installing the actual Bio-specific > distributions. Doing it this way would allow you to iteratively split off > additional code but retain the original Task/Bundle-based approach to > installation. For instance, the first pass could split out Root, then have > a dependency-light and 'extras' distribution, 2nd round split further based > on function, and so on: > > > > 1st round (v 1.9) : BioPerl (just an installer) -> installs root, > min-deps, extra-deps > > 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, > other-min-deps, extra-deps > > ... > > Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, > tree, align, coord, map, everything-else > > ... > > > > Also, one could potentially install modules in various ways: > interactively, in predetermined groups, using a user-defined list, etc (one > could effectively create custom BioPerl installs for GBrowse or other tools > for instance). Of course I would only pick the easiest route to start, but > maybe that gives some ideas. Regardless, if the dependency tree is set up > correctly any reliance on other Bio* modules would be defined in the various > Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). > > > > 2) The Bio::Root modules are probably the true core modules and are the > most stable with regards to changes, so those could be moved to something > like BioPerl-Core. Beyond that, what are the proposed splits? (we've > discussed this on-list before, but it's appropriate to bring this up again) > > > > 3) How do we want to handle versioning? We can't (and probably > shouldn't) release everything on a synchronized versioning scheme (via > Bio::Root::Version, for instance), that'll quickly fall apart. Personally I > can foresee each split-off dist having it's own version, with the BioPerl > network of modules being in effect it's own mini-CPAN. > > > > 5) Related to versioning, in my opinion we should maybe aim on eventually > calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. > Lincoln has already done something like this with Bio::Graphics, which was > originally part of BioPerl but split off prior to v 1.6.0. > > > > 6) In some cases I can see particularly thorny problems, such as circular > dependencies. I can think of a few ways to address that (creating a simple > lightweight Bio::Species class as a fallback if Bio::Tree code isn't > present, for instance), but any additional thoughts on this would be > helpful. > > > > 7) Do we want to set up something like 'git submodule' for the devs to > pull down all BioPerl-relevant code? > > > > Other thoughts? > > > > chris > > > > On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > > > > Hey everyone, > > > > > > I wanted to take a minute to introduce myself as one of the Google > Summer of > > > Code interns. I was the lucky one chosen to work on the BioPerl > > > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, > and > > > somewhat new to this level of programming so bear with me as I learn > the > > > technical jargon. Luckily I have both Rob and Chris to mentor me this > > > summer! > > > > > > Reading through the mailing list archives, I see there have been many > > > discussion and differing opinions about tackling this project. Given > the > > > time frame for GSoC and my limited experience, there is no way I will > > > complete this project on my own but I will at least be able to start > it, > > > which will hopefully motivate others to pitch in. So far, the plan for > the > > > GSoC project is to start by breaking out Bio::Root, followed by a > couple > > > other modules based on their dependencies and the time allowed. Each > will be > > > published to CPAN independently. You can follow the project (once it > starts) > > > on github at https://github.com/sheenams. > > > > > > I look forward to collaborating with many of you on the reorganization > (hint > > > hint)! > > > > > > Sheena > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > ------------------------------ > > Message: 3 > Date: Thu, 28 Apr 2011 16:19:51 -0700 > From: Robert Buels > Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project > To: Chris Fields > Cc: Sheena Scroggins , BioPerl List > > Message-ID: <4DB9F617.6070705 at cornell.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > I think you guys are on the right track, here are some slightly more > detailed plans. I'll use Chris's subject numbering. > > 1,2,3,5.) I envision the splitting algorithm going like this: > > no strict; # this is pseudocode! > > my $split_count = 0; > for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { > > - take $subsystem modules and tests out of bioperl-live > > (my $new_dist_name = $subsystem) =~ s/::/-/g; > - extract $subsystem modules into new dist called > $new_dist_name. Make sure all its tests pass, and write > some more tests if necessary. > > - add dep on $subsystem to bioperl-live/Build.PL > > - push $new_dist_name and bioperl-live to CPAN. > $new_dist_name has version '2.000', and bioperl-live has > version "1.7.$split_count". > } > > and then, at the end of this loop, bioperl-live will be > nothing but a Build.PL and a couple of other things > for backcompat, like Bio::Root::Version, Bio::Perl, etc. > > Important things to notice about this algorithm are that, at each > step in the loop: > > a.) For users that install bioperl with CPAN, > doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will > get you the same set of modules as before the split > started, with the split-off modules at 2.000 versions, and > the non-split-off ones at 1.7.x versions. > > b.) For users (not developers) that are git cloning > bioperl-live, even though they are naughty (wink), they > can do 'perl Build.PL; ./Build installdeps' to get the > split-off modules, downloaded like any other CPAN > dependency. There may be some lag before the split-off > thing is downloadable from CPAN, > > c.) For BioPerl developers, unless they are working on a > certain module, they should install the split-off modules > from CPAN like everybody else, and git clone only the piece > they are working on. > > d.) The version of bioperl-live keeps increasing by 0.001 with > each split. The systems that are split off have a 2.x > version number, each slightly different depending on when it > was split off. After this point, their release schedules > and version numbers are independent of eachother and of > bioperl-live. For Bio::Perl and Bio::Root::Version, the > things that stay in bioperl-live, installing the latest > version will get you all the split-off modules. > > > 6.) (thorny circular dependencies and stuff) Those will become quickly > apparent as this process proceeds. They'll take some finesse and/or > ruthlessness and/or hacking to get around. We'll burn those bridges as > we come to them. > > 7.) (git submodules) Git submodules probably won't be necessary, since > at each step in the process BioPerl devs can use ./Build installdeps or > cpanm --installdeps . to install whatever the dependencies are for the > piece they are working on, whether it's bioperl-live (in the case of a > module that has not yet been split off), or one of the distributions > that has already been split off (in which case their improvements will > probably be releasable to CPAN immediately!). > > Lots of detail there. I tried to make it structured and easy to skim > though. Thoughts? > > Rob > > > > On 04/28/2011 02:04 PM, Chris Fields wrote: > > Sounds fine; I think (as you indicate) we can deal with issues along the > way. Rob, anything to add? > > > > chris > > > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > > > >> Chris, > >> > >> We haven't talked much about the versioning yet, but it will be on the > list to figure out asap. > >> > >> So far, the plan is to split out Bio::Root first, followed by a couple > modules that depend only on Bio::Root. The plan I proposed was Bio::Das, > Bio::Event then Bio::Location. Depending on how much time is remaining for > the GSoC project, the next to split out would be Bio::Factory and > Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan > to still help with the reorganization after the internship is over, but I > obviously have to have a stopping point for the GSoC project. > >> > >> Rob provide me with a really nice scrip to list dependencies of the > modules, so I plan to make a roadmap towards to end of the summer that will > help guide the rest of the reorganization. At that point, we'll have to deal > with the circular dependencies carefully. > >> > >> This is a huge project, much bigger than I can do in one summer. But I > plan to get it started in a way that makes it easy for others to contribute. > >> > >> Sheena > >> > >> > >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields > wrote: > >> Sheena, > >> > >> Congrats on being accepted! We've talked about doing this over the > years, but it's not an easy task and it needs a dedicated project to get the > ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with > a few of my questions/thoughts (Rob could probably chime in as well, but I > think his general thoughts on the project parallel mine): > >> > >> 1) The current BioPerl CPAN could just be a simple install script, > acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific > distributions. Doing it this way would allow you to iteratively split off > additional code but retain the original Task/Bundle-based approach to > installation. For instance, the first pass could split out Root, then have > a dependency-light and 'extras' distribution, 2nd round split further based > on function, and so on: > >> > >> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, > min-deps, extra-deps > >> 2nd round (v 1.901) : BioPerl (just an installer) -> root, > seq/feature, other-min-deps, extra-deps > >> ... > >> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, > seq, tree, align, coord, map, everything-else > >> ... > >> > >> Also, one could potentially install modules in various ways: > interactively, in predetermined groups, using a user-defined list, etc (one > could effectively create custom BioPerl installs for GBrowse or other tools > for instance). Of course I would only pick the easiest route to start, but > maybe that gives some ideas. Regardless, if the dependency tree is set up > correctly any reliance on other Bio* modules would be defined in the various > Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). > >> > >> 2) The Bio::Root modules are probably the true core modules and are the > most stable with regards to changes, so those could be moved to something > like BioPerl-Core. Beyond that, what are the proposed splits? (we've > discussed this on-list before, but it's appropriate to bring this up again) > >> > >> 3) How do we want to handle versioning? We can't (and probably > shouldn't) release everything on a synchronized versioning scheme (via > Bio::Root::Version, for instance), that'll quickly fall apart. Personally I > can foresee each split-off dist having it's own version, with the BioPerl > network of modules being in effect it's own mini-CPAN. > >> > >> 5) Related to versioning, in my opinion we should maybe aim on > eventually calling this BioPerl v2.0 and starting with a simpler X.Y > versioning scheme. Lincoln has already done something like this with > Bio::Graphics, which was originally part of BioPerl but split off prior to v > 1.6.0. > >> > >> 6) In some cases I can see particularly thorny problems, such as > circular dependencies. I can think of a few ways to address that (creating > a simple lightweight Bio::Species class as a fallback if Bio::Tree code > isn't present, for instance), but any additional thoughts on this would be > helpful. > >> > >> 7) Do we want to set up something like 'git submodule' for the devs to > pull down all BioPerl-relevant code? > >> > >> Other thoughts? > >> > >> chris > >> > >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > >> > >>> Hey everyone, > >>> > >>> I wanted to take a minute to introduce myself as one of the Google > Summer of > >>> Code interns. I was the lucky one chosen to work on the BioPerl > >>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, > and > >>> somewhat new to this level of programming so bear with me as I learn > the > >>> technical jargon. Luckily I have both Rob and Chris to mentor me this > >>> summer! > >>> > >>> Reading through the mailing list archives, I see there have been many > >>> discussion and differing opinions about tackling this project. Given > the > >>> time frame for GSoC and my limited experience, there is no way I will > >>> complete this project on my own but I will at least be able to start > it, > >>> which will hopefully motivate others to pitch in. So far, the plan for > the > >>> GSoC project is to start by breaking out Bio::Root, followed by a > couple > >>> other modules based on their dependencies and the time allowed. Each > will be > >>> published to CPAN independently. You can follow the project (once it > starts) > >>> on github at https://github.com/sheenams. > >>> > >>> I look forward to collaborating with many of you on the reorganization > (hint > >>> hint)! > >>> > >>> Sheena > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > > ------------------------------ > > Message: 4 > Date: Thu, 28 Apr 2011 21:15:01 -0500 > From: Siddhartha Basu > Subject: [Bioperl-l] Re: GSoC/BioPerl Reorganization Project > To: bioperl-l at lists.open-bio.org > Message-ID: <20110429021457.GA351 at Macintosh-235.local> > Content-Type: text/plain; charset=us-ascii > > Hi Robert, > At what point in flow the dependencies between the split modules will be > added. Is there any particular order the split modules would be created. > And how those split off modules will be released in CPAN, one by one as > they being generated or all of them in a batch after which they will > follow their release schedule. > > -siddhartha > > > > On Thu, 28 Apr 2011, Robert Buels wrote: > > > I think you guys are on the right track, here are some slightly more > > detailed plans. I'll use Chris's subject numbering. > > > > 1,2,3,5.) I envision the splitting algorithm going like this: > > > > no strict; # this is pseudocode! > > > > my $split_count = 0; > > for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { > > > > - take $subsystem modules and tests out of bioperl-live > > > > (my $new_dist_name = $subsystem) =~ s/::/-/g; > > - extract $subsystem modules into new dist called > > $new_dist_name. Make sure all its tests pass, and write > > some more tests if necessary. > > > > - add dep on $subsystem to bioperl-live/Build.PL > > > > - push $new_dist_name and bioperl-live to CPAN. > > $new_dist_name has version '2.000', and bioperl-live has > > version "1.7.$split_count". > > } > > > > and then, at the end of this loop, bioperl-live will be > > nothing but a Build.PL and a couple of other things > > for backcompat, like Bio::Root::Version, Bio::Perl, etc. > > > > Important things to notice about this algorithm are that, at each > > step in the loop: > > > > a.) For users that install bioperl with CPAN, > > doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will > > get you the same set of modules as before the split > > started, with the split-off modules at 2.000 versions, and > > the non-split-off ones at 1.7.x versions. > > > > b.) For users (not developers) that are git cloning > > bioperl-live, even though they are naughty (wink), they > > can do 'perl Build.PL; ./Build installdeps' to get the > > split-off modules, downloaded like any other CPAN > > dependency. There may be some lag before the split-off > > thing is downloadable from CPAN, > > > > c.) For BioPerl developers, unless they are working on a > > certain module, they should install the split-off modules > > from CPAN like everybody else, and git clone only the piece > > they are working on. > > > > d.) The version of bioperl-live keeps increasing by 0.001 with > > each split. The systems that are split off have a 2.x > > version number, each slightly different depending on when it > > was split off. After this point, their release schedules > > and version numbers are independent of eachother and of > > bioperl-live. For Bio::Perl and Bio::Root::Version, the > > things that stay in bioperl-live, installing the latest > > version will get you all the split-off modules. > > > > > > 6.) (thorny circular dependencies and stuff) Those will become quickly > > apparent as this process proceeds. They'll take some finesse and/or > > ruthlessness and/or hacking to get around. We'll burn those bridges as > we > > come to them. > > > > 7.) (git submodules) Git submodules probably won't be necessary, since at > > each step in the process BioPerl devs can use ./Build installdeps or > cpanm > > --installdeps . to install whatever the dependencies are for the piece > > they are working on, whether it's bioperl-live (in the case of a module > > that has not yet been split off), or one of the distributions that has > > already been split off (in which case their improvements will probably be > > releasable to CPAN immediately!). > > > > Lots of detail there. I tried to make it structured and easy to skim > > though. Thoughts? > > > > Rob > > > > > > > > On 04/28/2011 02:04 PM, Chris Fields wrote: > > > Sounds fine; I think (as you indicate) we can deal with issues along > the > > > way. Rob, anything to add? > > > > > > chris > > > > > > On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: > > > > > >> Chris, > > >> > > >> We haven't talked much about the versioning yet, but it will be on the > > >> list to figure out asap. > > >> > > >> So far, the plan is to split out Bio::Root first, followed by a couple > > >> modules that depend only on Bio::Root. The plan I proposed was > Bio::Das, > > >> Bio::Event then Bio::Location. Depending on how much time is remaining > > >> for the GSoC project, the next to split out would be Bio::Factory and > > >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I > > >> plan to still help with the reorganization after the internship is > over, > > >> but I obviously have to have a stopping point for the GSoC project. > > >> > > >> Rob provide me with a really nice scrip to list dependencies of the > > >> modules, so I plan to make a roadmap towards to end of the summer that > > >> will help guide the rest of the reorganization. At that point, we'll > have > > >> to deal with the circular dependencies carefully. > > >> > > >> This is a huge project, much bigger than I can do in one summer. But I > > >> plan to get it started in a way that makes it easy for others to > > >> contribute. > > >> > > >> Sheena > > >> > > >> > > >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields > > >> wrote: > > >> Sheena, > > >> > > >> Congrats on being accepted! We've talked about doing this over the > years, > > >> but it's not an easy task and it needs a dedicated project to get the > > >> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start > off > > >> with a few of my questions/thoughts (Rob could probably chime in as > well, > > >> but I think his general thoughts on the project parallel mine): > > >> > > >> 1) The current BioPerl CPAN could just be a simple install script, > acting > > >> like a 'Task' or 'Bundle' module, installing the actual Bio-specific > > >> distributions. Doing it this way would allow you to iteratively split > > >> off additional code but retain the original Task/Bundle-based approach > to > > >> installation. For instance, the first pass could split out Root, then > > >> have a dependency-light and 'extras' distribution, 2nd round split > > >> further based on function, and so on: > > >> > > >> 1st round (v 1.9) : BioPerl (just an installer) -> installs > root, > > >> min-deps, extra-deps > > >> 2nd round (v 1.901) : BioPerl (just an installer) -> root, > > >> seq/feature, other-min-deps, extra-deps > > >> ... > > >> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, > > >> seq, tree, align, coord, map, everything-else > > >> ... > > >> > > >> Also, one could potentially install modules in various ways: > > >> interactively, in predetermined groups, using a user-defined list, etc > > >> (one could effectively create custom BioPerl installs for GBrowse or > > >> other tools for instance). Of course I would only pick the easiest > route > > >> to start, but maybe that gives some ideas. Regardless, if the > dependency > > >> tree is set up correctly any reliance on other Bio* modules would be > > >> defined in the various Build.PL/Makefile.PL and then installed via > CPAN > > >> (as is any dependency). > > >> > > >> 2) The Bio::Root modules are probably the true core modules and are > the > > >> most stable with regards to changes, so those could be moved to > something > > >> like BioPerl-Core. Beyond that, what are the proposed splits? (we've > > >> discussed this on-list before, but it's appropriate to bring this up > > >> again) > > >> > > >> 3) How do we want to handle versioning? We can't (and probably > > >> shouldn't) release everything on a synchronized versioning scheme (via > > >> Bio::Root::Version, for instance), that'll quickly fall apart. > > >> Personally I can foresee each split-off dist having it's own version, > > >> with the BioPerl network of modules being in effect it's own > mini-CPAN. > > >> > > >> 5) Related to versioning, in my opinion we should maybe aim on > eventually > > >> calling this BioPerl v2.0 and starting with a simpler X.Y versioning > > >> scheme. Lincoln has already done something like this with > Bio::Graphics, > > >> which was originally part of BioPerl but split off prior to v 1.6.0. > > >> > > >> 6) In some cases I can see particularly thorny problems, such as > circular > > >> dependencies. I can think of a few ways to address that (creating a > > >> simple lightweight Bio::Species class as a fallback if Bio::Tree code > > >> isn't present, for instance), but any additional thoughts on this > would > > >> be helpful. > > >> > > >> 7) Do we want to set up something like 'git submodule' for the devs to > > >> pull down all BioPerl-relevant code? > > >> > > >> Other thoughts? > > >> > > >> chris > > >> > > >> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: > > >> > > >>> Hey everyone, > > >>> > > >>> I wanted to take a minute to introduce myself as one of the Google > > >>> Summer of > > >>> Code interns. I was the lucky one chosen to work on the BioPerl > > >>> Reorganization (*crowd cheers*). I am a grad student in > bioinformatics, > > >>> and > > >>> somewhat new to this level of programming so bear with me as I learn > the > > >>> technical jargon. Luckily I have both Rob and Chris to mentor me this > > >>> summer! > > >>> > > >>> Reading through the mailing list archives, I see there have been many > > >>> discussion and differing opinions about tackling this project. Given > the > > >>> time frame for GSoC and my limited experience, there is no way I will > > >>> complete this project on my own but I will at least be able to start > it, > > >>> which will hopefully motivate others to pitch in. So far, the plan > for > > >>> the > > >>> GSoC project is to start by breaking out Bio::Root, followed by a > couple > > >>> other modules based on their dependencies and the time allowed. Each > > >>> will be > > >>> published to CPAN independently. You can follow the project (once it > > >>> starts) > > >>> on github at https://github.com/sheenams. > > >>> > > >>> I look forward to collaborating with many of you on the > reorganization > > >>> (hint > > >>> hint)! > > >>> > > >>> Sheena > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------ > > Message: 5 > Date: Fri, 29 Apr 2011 10:23:50 +0530 > From: "khush ........" > Subject: Re: [Bioperl-l] Standalone blast > To: Dave Messina > Cc: bioperl-l at lists.open-bio.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > Dear Dave, > > Thank you for your support. > > If need to change the following lines like > > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastx', > -database => 'nr.fa')); > > $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); > > I have a simple and basic query for you, as I am beginners in bioperl, that > if I need to download the whole nr database from NCBI to run the code or It > will directly fetch information from the NCBI website. I do not understand > it, because downloading the whole nr d/b itself takes long time for me. > > How could I read whole file instead of simple string "TTTATAGATAGAGACAG" in > -seq (a fasta file). Is there a simple way to do the exercise according to > my conditions. > > Thank you > Kamal > > > On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina >wrote: > > > Hi Kamal, > > > > This is covered in the beginners' HOWTO: > > http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST > > > > > > Dave > > > > > > On Thu, Apr 28, 2011 at 07:22, khush ........ >wrote: > > > >> Hi, > >> > >> I have some sequences ~250 and wanted to use BLASTX to blast against nr > >> database of NCBI, as this is time consuming using web based search. Can > >> some > >> one please tell me how to start BIOPERL with scuh problems. I know that > >> this > >> is possible with bioperl, but do not know how. > >> > >> Any suggestion will be appreciable. > >> > >> Thanks in advance > >> Kamal > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > ------------------------------ > > Message: 6 > Date: Thu, 28 Apr 2011 22:15:01 -0700 > From: Robert Buels > Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project > To: BioPerl List > Message-ID: <4DBA4955.2030003 at cornell.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 04/28/2011 07:15 PM, Siddhartha Basu wrote: > > At what point in flow the dependencies between the split modules will be > > added. Is there any particular order the split modules would be created. > > Dependencies are added and characterized at the time each distribution > is created. That's why the splitting order starts at Bio::Root, so that > you can proceed up the hierarchy of dependencies without having to > modify the dependency lists of the distributions that have already been > extracted. > > > And how those split off modules will be released in CPAN, one by one as > > they being generated or all of them in a batch after which they will > > follow their release schedule. > > One by one, as they are generated. I think it would be a good idea to > re-release bioperl-live with each split as well. This will probably > lead to bioperl-live being released nearly every week as the split is > ongoing. As a consequence, the master branch of bioperl-live will need > to be kept in very good shape. This is easy if you just follow good > practice: develop in branches, run *all* the tests before committing, go > on IRC and send pull requests for code review, etc. > > Rob > > > ------------------------------ > > Message: 7 > Date: Fri, 29 Apr 2011 15:24:45 +1000 > From: Florent Angly > Subject: Re: [Bioperl-l] Standalone blast > To: bioinfo.khush at gmail.com > Cc: bioperl-l at lists.open-bio.org > Message-ID: <4DBA4B9D.1010400 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi Kamal, > > To run BLAST the way Dave described, you need to have BLAST installed on > your computer, and you need to download BLAST databases to your computer > (or make them yourself with the formatdb command). There are plenty of > databases available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. > And yes, some of these databases are very large and will take a long > time to download. By the way, the BLAST may also take a very long time > to execute if you use large databases, so, you'd better run the analysis > on a powerful computer or a server. > > Also read this documentation: > > http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm > < > http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm > > > It stipulates that you can BLAST an entire FASTA file (not just a > sequence object): > > $inputfilename = 't/testquery.fa'; > $blast_report = $factory->blastall($inputfilename); > > > Regards, > > Florent > > > > > On 29/04/11 14:53, khush ........ wrote: > > Dear Dave, > > > > Thank you for your support. > > > > If need to change the following lines like > > > > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => > 'blastx', > > -database => 'nr.fa')); > > > > $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); > > > > I have a simple and basic query for you, as I am beginners in bioperl, > that > > if I need to download the whole nr database from NCBI to run the code or > It > > will directly fetch information from the NCBI website. I do not > understand > > it, because downloading the whole nr d/b itself takes long time for me. > > > > How could I read whole file instead of simple string "TTTATAGATAGAGACAG" > in > > -seq (a fasta file). Is there a simple way to do the exercise according > to > > my conditions. > > > > Thank you > > Kamal > > > > > > On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina >wrote: > > > >> Hi Kamal, > >> > >> This is covered in the beginners' HOWTO: > >> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST > >> > >> > >> Dave > >> > >> > >> On Thu, Apr 28, 2011 at 07:22, khush ........ >wrote: > >> > >>> Hi, > >>> > >>> I have some sequences ~250 and wanted to use BLASTX to blast against nr > >>> database of NCBI, as this is time consuming using web based search. Can > >>> some > >>> one please tell me how to start BIOPERL with scuh problems. I know that > >>> this > >>> is possible with bioperl, but do not know how. > >>> > >>> Any suggestion will be appreciable. > >>> > >>> Thanks in advance > >>> Kamal > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ------------------------------ > > Message: 8 > Date: Fri, 29 Apr 2011 11:16:38 +0530 > From: "khush ........" > Subject: Re: [Bioperl-l] Standalone blast > To: Florent Angly > Cc: bioperl-l at lists.open-bio.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > Dear Florent, > > Thank you very much for your kind reply and let me clear the concept of > running the blast. I am working with simple machine so I need to take > permission from my administrator to work on some good server to have whole > nr database from NCBI and run the blastx. > > Thank you > > Kamal > Bioperl is great. > > > On Fri, Apr 29, 2011 at 10:54 AM, Florent Angly >wrote: > > > Hi Kamal, > > > > To run BLAST the way Dave described, you need to have BLAST installed on > > your computer, and you need to download BLAST databases to your computer > (or > > make them yourself with the formatdb command). There are plenty of > databases > > available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes, > some > > of these databases are very large and will take a long time to download. > By > > the way, the BLAST may also take a very long time to execute if you use > > large databases, so, you'd better run the analysis on a powerful computer > or > > a server. > > > > Also read this documentation: > > > http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm > < > > > http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm > > > > > It stipulates that you can BLAST an entire FASTA file (not just a > sequence > > object): > > > > $inputfilename = 't/testquery.fa'; > > $blast_report = $factory->blastall($inputfilename); > > > > > > Regards, > > > > Florent > > > > > > > > > > > > On 29/04/11 14:53, khush ........ wrote: > > > >> Dear Dave, > >> > >> Thank you for your support. > >> > >> If need to change the following lines like > >> > >> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => > 'blastx', > >> -database => 'nr.fa')); > >> > >> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); > >> > >> I have a simple and basic query for you, as I am beginners in bioperl, > >> that > >> if I need to download the whole nr database from NCBI to run the code or > >> It > >> will directly fetch information from the NCBI website. I do not > understand > >> it, because downloading the whole nr d/b itself takes long time for me. > >> > >> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" > >> in > >> -seq (a fasta file). Is there a simple way to do the exercise according > to > >> my conditions. > >> > >> Thank you > >> Kamal > >> > >> > >> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina >> >wrote: > >> > >> Hi Kamal, > >>> > >>> This is covered in the beginners' HOWTO: > >>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST > >>> > >>> > >>> Dave > >>> > >>> > >>> On Thu, Apr 28, 2011 at 07:22, khush ........ >>> >wrote: > >>> > >>> Hi, > >>>> > >>>> I have some sequences ~250 and wanted to use BLASTX to blast against > nr > >>>> database of NCBI, as this is time consuming using web based search. > Can > >>>> some > >>>> one please tell me how to start BIOPERL with scuh problems. I know > that > >>>> this > >>>> is possible with bioperl, but do not know how. > >>>> > >>>> Any suggestion will be appreciable. > >>>> > >>>> Thanks in advance > >>>> Kamal > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 96, Issue 28 > ***************************************** > From cjfields at illinois.edu Fri Apr 29 09:38:00 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Apr 2011 08:38:00 -0500 Subject: [Bioperl-l] AlignIO In-Reply-To: <97343FFF-E3BA-4169-8578-97D55591862E@usc.edu> References: <97343FFF-E3BA-4169-8578-97D55591862E@usc.edu> Message-ID: <34EE11D5-61D8-4A63-AE3A-EFA98F5CAD18@illinois.edu> Daniel, It's hard to determine what's going on in this case w/o seeing the verbose output from one of those test reports. I would say it is probably a missing dependency (maybe Data::Stag), but I haven't seen anything like this before. chris On Apr 19, 2011, at 8:02 PM, Daniel Campo wrote: > Hi, > I am trying to install BioPerl in my MacOSX using CPAN. > After running the tests I got the following: > > > > Failed Test Stat Wstat Total Fail List of Failed > ------------------------------------------------------------------------------- > t/AlignIO/AlignIO.t 255 65280 28 42 8-28 > t/AlignIO/arp.t 255 65280 48 92 3-48 > t/Annotation/Annotation.t 255 65280 158 83 9 116 118-158 > t/ClusterIO/SequenceFamily.t 255 65280 19 34 3-19 > t/LocalDB/Flat.t 255 65280 24 20 15-24 > t/LocalDB/Index.t 255 65280 64 66 32-64 > t/SeqIO/Handler.t 255 65280 561 1120 2-561 > t/SeqIO/chaos.t 1 256 8 1 1 > t/SeqIO/swiss.t 255 65280 240 479 1-240 > t/SeqTools/GuessSeqFormat.t 1 256 49 2 25 50 > t/Tools/Analysis/Protein/Scansite.t 255 65280 14 20 5-14 > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 97 wallclock secs (81.97 cusr + 10.56 csys = 92.53 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > RKOBES/ExtUtils-Manifest-1.58.tar.gz : install NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > > And my concern is that I need to run a script that needs the module AlignIO. But that module seems to not have passed the test. > Could you please help me on this? > Thank you very much in advance. > > Daniel. > > > --- > Daniel Campo Falgueras > Postdoctoral Research Associate > Molecular and Computational Biology > University of Southern California > 1050 Childs Way, RRI. 324C > Los Angeles, CA, 90089-2910 > (+1) 213-821-3976 > dcampo at usc.edu > http://college.usc.edu/cf/faculty-and-staff/staff.cfm?pid=1027679 > > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Apr 29 09:54:10 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Apr 2011 08:54:10 -0500 Subject: [Bioperl-l] installation problem with Bio::Tools::Run::StandAloneBlastPlus module In-Reply-To: <977343.78661.qm@web125416.mail.ne1.yahoo.com> References: <977343.78661.qm@web125416.mail.ne1.yahoo.com> Message-ID: IPC::Run and BioPerl are both listed as dependencies for BioPerl-Run; were these not installed via CPAN? chris On Apr 28, 2011, at 3:46 PM, heyi xiao wrote: > Hi all, > I am trying to install bioperl module, Bio::Tools::Run::StandAloneBlastPlus, through CPAN. I updated CPAN and Module::Build before installation. But I got the following error message. > Building and testing BioPerl-1.006900 ... FAIL > ! Installing Bio::Root::Version failed. See /home/xiao/.cpanm/build.log for details. > ! Bailing out the installation for BioPerl-Run-1.006900. Retry with --prompt or --force. > > I used --force option to install it anyway. But the module cann?t be loaded or used properly, with the following error: > Can't locate IPC/Run.pm in @INC.. > > Obviously there are some problem with BioPerl-Run. But I am not sure what that?s, and how to solve it. Any help would be appreicately. Thanks! > Heyi > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Fri Apr 29 13:11:16 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 29 Apr 2011 12:11:16 -0500 Subject: [Bioperl-l] Convert fastq to fasta In-Reply-To: <31492543.post@talk.nabble.com> References: <31492543.post@talk.nabble.com> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF910215690@EXCHMB-02.stowers-institute.org> I don't think you want bioperl for this try fastq_to_fasta part of http://hannonlab.cshl.edu/fastx_toolkit/ usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE] Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of perlbio007 > Sent: Wednesday, April 27, 2011 7:27 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Convert fastq to fasta > > > Iam new to Bioperl. Pls help. > I have a zip folder of sequences which is in fastq format. I > need to convert > it in fasta format? > How I do that using bioperl?What module do I need? > > > > > > -- > View this message in context: > http://old.nabble.com/Convert-fastq-to-fasta-tp31492543p31492543.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From xiaoheyiyh at yahoo.com Fri Apr 29 16:37:55 2011 From: xiaoheyiyh at yahoo.com (heyi xiao) Date: Fri, 29 Apr 2011 13:37:55 -0700 (PDT) Subject: [Bioperl-l] installation problem with Bio::Tools::Run::StandAloneBlastPlus module In-Reply-To: Message-ID: <443690.86591.qm@web125412.mail.ne1.yahoo.com> Thanks Chris, I do have bioperl pre-installed but not IPC::Run. And I installed IPC::Run. StandAloneBlastPlus works fine this time. thanks a lot for the hint! I have a question here though. I understand that CPAN configures all the dependencies and installed them automatically. But isn?t IPC::Run a dependency of StandAloneBlastPlus according to CPAN installer? Thank you! Heyi --- On Fri, 4/29/11, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] installation problem with Bio::Tools::Run::StandAloneBlastPlus module > To: "heyi xiao" > Cc: bioperl-l at lists.open-bio.org > Date: Friday, April 29, 2011, 9:54 AM > IPC::Run and BioPerl are both listed > as dependencies for BioPerl-Run; were these not installed > via CPAN? > > chris > > On Apr 28, 2011, at 3:46 PM, heyi xiao wrote: > > > Hi all, > > I am trying to install bioperl module, > Bio::Tools::Run::StandAloneBlastPlus, through CPAN. I > updated CPAN and Module::Build before installation. But I > got the following error message. > > Building and testing BioPerl-1.006900 ... FAIL > > ! Installing Bio::Root::Version failed. See > /home/xiao/.cpanm/build.log for details. > > ! Bailing out the installation for > BioPerl-Run-1.006900. Retry with --prompt or --force. > > > > I used --force option to install it anyway. But the > module cann?t be loaded or used properly, with the > following error: > > Can't locate IPC/Run.pm in @INC.. > > > > Obviously there are some problem with BioPerl-Run. But > I am not sure what that?s, and how to solve it. Any help > would be appreicately. Thanks! > > Heyi > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Apr 29 18:50:51 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 29 Apr 2011 15:50:51 -0700 Subject: [Bioperl-l] Question about Bio::Coordinate::Pair In-Reply-To: References: Message-ID: <4DBB40CB.9030901@cornell.edu> On 04/29/2011 02:53 AM, Dan Bolser wrote: > The changes I made to Bio::Coordinate::Pair are now defunct, but I'll > still request that my test script gets pulled into master [1], because > more tests can't hurt right? Pulled in and merged by Duke. Thanks Dan, you rock. Rob From duxroq at hotmail.com Fri Apr 29 18:45:01 2011 From: duxroq at hotmail.com (duxroq) Date: Fri, 29 Apr 2011 15:45:01 -0700 (PDT) Subject: [Bioperl-l] Clustalw problems!!! Message-ID: <31509401.post@talk.nabble.com> Hi, I'm not sure whether my program is not finding the clustal w exceutable or if it is having trouble with the module itself. Here is my error, in the image below: http://old.nabble.com/file/p31509401/Untitled.png Untitled.png here is my code: #!/usr/bin/local/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Root::IO; use Bio::Seq; use Bio::SeqIO; use Bio::SimpleAlign; use Bio::TreeIO; #--------------------------------------------------------------------------------# # Main unless(($#ARGV + 1) > 1) { print_start_message(); exit; } BEGIN { $ENV{CLUSTALDIR} = 'C:\Program Files\clustalw.exe'} #-Set CLUSTALDIR to correct directory use Bio::Tools::Run::Alignment::Clustalw; my $file_name1 = $ARGV[0]; #-name of file containing all sequences my @sequences = read_all_sequences($file_name1,'fasta'); # alt_revcom(\@sequences); #-reverse complement of every other seq # print_sequences(@sequences); my @params = ('outfile' => 'mult_aln.aln'); #-sets parameters for alignment factory ################################################### # # # #The error is probably in the next few lines! agh!# # # # ################################################### my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); $clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal(); if ($clustalfound) { print "\n we found it!!!\n" } my $aln = $factory->align(\@sequences); print "\nDevins name is bob. \n"; #-creates alignment print "\nPercentage Identity:\n",$aln->percentage_identity,"\n\n"; my $cons_str = $aln->consensus_iupac(); #-configures consensus using IUPAC codes my $cons_name = "Consensus_".save_id(@sequences); my $cons_seq = new_sequence($cons_str, $cons_name); write_seq_to_file(">cons.fa",$cons_seq); #-writes consensus to file my $file_name2 = $ARGV[1]; my $lead_seq = read_sequence($file_name2,'fasta'); #-compare consensus sequence to leader # print_sequences($lead_seq); # print_sequences($cons_seq); my @lead_cons_seqs = ($lead_seq, $cons_seq); #-forms array for alignment of leader and consensus @params = ('pairgap' => 50); print "\n bobisnotmyname\n"; $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $aln_lead_cons = $factory->align(\@lead_cons_seqs); print "\nPercentage Identity:\n",$aln_lead_cons->percentage_identity,"\n\n"; my $seq_len = $aln_lead_cons->length(); my @aln_seqs = (); #-array of aligned sequences, including gaps my $i = 0; foreach $seq ($aln_lead_cons->each_seq() ) { $aln_seqs[$i] = $seq; $i++; } # print_sequences(@aln_seqs); my $l_aln_str = ''; #-str of leader sequence from alignment my $c_aln_str = ''; #-str of consensus sequence from alignment $found = 0; $i = 1; while ($found == 0 && $i < $seq_len) { $l_aln_str = substr($aln_seqs[0]->seq(),$i,1); #-gets a substring from l_aln_str if ($l_aln_str !~ m/\./i) { #-checks if substring has gap characters $cons_slice_str = substr($aln_seqs[1]->seq(),$i,490); $found = 1; #-retrieves 490 characters of consensus where } $i++; #-leader begins in alignment } $cons_slice_seq = new_sequence($cons_slice_str,"Sliced_".$cons_name); # print_sequences($cons_slice_seq); write_seq_to_file(">cons_slice.fa",$cons_slice_seq); # End Main #--------------------------------------------------------------------------------# # Subroutines sub alt_revcom { my @sequences = @_; for ( $i=0; $i <= $#sequences; $i++) { if ($i%2==1) { $sequences[$i] = reverse_complement($sequences[$i]); } } } sub save_id { my @sequences = @_; $id = $sequences[0]->display_id; return $id; } sub print_sequences { my @sequences = @_; for ($i = 0; $i <= $#sequences; $i++) { print "Sequence name:",$sequences[$i]->display_id,"\n"; print "Sequence acc:",$sequences[$i]->accession_number,"\n"; print $sequences[$i]->seq(),"\n"; } } sub write_seq_to_file { my ($file_name,$seq) = @_; write_sequence($file_name,'fasta',$seq); print "\n",$seq->display_id," written to file.\n\n"; } -- View this message in context: http://old.nabble.com/Clustalw-problems%21%21%21-tp31509401p31509401.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri Apr 29 23:26:44 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Apr 2011 22:26:44 -0500 Subject: [Bioperl-l] Clustalw problems!!! In-Reply-To: <31509401.post@talk.nabble.com> References: <31509401.post@talk.nabble.com> Message-ID: I don't have access to a Windows machine to test this, unfortunately. I did notice you set CLUSTALWDIR to the actual executable, NOT the directory it is in. Also, the executable name is 'clustalw.exe', not 'clustalw', so possibly change that prior to instantiation: $Bio::Tools::Run::Alignment::ClustalW::PROGRAM_NAME = 'clustalw.exe'; Maybe that's a start? chris On Apr 29, 2011, at 5:45 PM, duxroq wrote: > > Hi, I'm not sure whether my program is not finding the clustal w exceutable > or if it is having trouble with the module itself. Here is my error, in the > image below: > > http://old.nabble.com/file/p31509401/Untitled.png Untitled.png > > here is my code: > > > #!/usr/bin/local/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Root::IO; > use Bio::Seq; > use Bio::SeqIO; > use Bio::SimpleAlign; > use Bio::TreeIO; > > #--------------------------------------------------------------------------------# > > # Main > > unless(($#ARGV + 1) > 1) { > print_start_message(); > exit; > } > > BEGIN { $ENV{CLUSTALDIR} = 'C:\Program Files\clustalw.exe'} #-Set > CLUSTALDIR to correct directory > use Bio::Tools::Run::Alignment::Clustalw; > > my $file_name1 = $ARGV[0]; #-name of file containing all sequences > my @sequences = read_all_sequences($file_name1,'fasta'); > > # alt_revcom(\@sequences); #-reverse complement of every other seq > # print_sequences(@sequences); > > my @params = ('outfile' => 'mult_aln.aln'); #-sets parameters for > alignment factory > ################################################### > # > # > # > #The error is probably in the next few lines! agh!# > # > # > # > ################################################### > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > $clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal(); > if ($clustalfound) { > print "\n we found it!!!\n" } > my $aln = $factory->align(\@sequences); > print "\nDevins name is bob. \n"; > #-creates alignment > print "\nPercentage Identity:\n",$aln->percentage_identity,"\n\n"; > > my $cons_str = $aln->consensus_iupac(); #-configures consensus using > IUPAC codes > my $cons_name = "Consensus_".save_id(@sequences); > my $cons_seq = new_sequence($cons_str, $cons_name); > > write_seq_to_file(">cons.fa",$cons_seq); #-writes consensus to file > > my $file_name2 = $ARGV[1]; > my $lead_seq = read_sequence($file_name2,'fasta'); #-compare consensus > sequence to leader > > # print_sequences($lead_seq); > # print_sequences($cons_seq); > > my @lead_cons_seqs = ($lead_seq, $cons_seq); #-forms array for alignment > of leader and consensus > @params = ('pairgap' => 50); > > print "\n bobisnotmyname\n"; > > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $aln_lead_cons = $factory->align(\@lead_cons_seqs); > print "\nPercentage > Identity:\n",$aln_lead_cons->percentage_identity,"\n\n"; > > my $seq_len = $aln_lead_cons->length(); > > my @aln_seqs = (); #-array of aligned sequences, including gaps > my $i = 0; > foreach $seq ($aln_lead_cons->each_seq() ) { > $aln_seqs[$i] = $seq; > $i++; > } > # print_sequences(@aln_seqs); > > my $l_aln_str = ''; #-str of leader sequence from alignment > my $c_aln_str = ''; #-str of consensus sequence from alignment > $found = 0; > $i = 1; > while ($found == 0 && $i < $seq_len) { > > $l_aln_str = substr($aln_seqs[0]->seq(),$i,1); #-gets a substring from > l_aln_str > if ($l_aln_str !~ m/\./i) { > #-checks if substring has gap characters > $cons_slice_str = substr($aln_seqs[1]->seq(),$i,490); > $found = 1; #-retrieves 490 characters of consensus where > } > $i++; #-leader begins in alignment > } > > $cons_slice_seq = new_sequence($cons_slice_str,"Sliced_".$cons_name); > # print_sequences($cons_slice_seq); > write_seq_to_file(">cons_slice.fa",$cons_slice_seq); > > > # End Main > > #--------------------------------------------------------------------------------# > > # Subroutines > > sub alt_revcom { > my @sequences = @_; > for ( $i=0; $i <= $#sequences; $i++) { > if ($i%2==1) { > $sequences[$i] = reverse_complement($sequences[$i]); > } > } > } > > sub save_id { > my @sequences = @_; > $id = $sequences[0]->display_id; > return $id; > } > > sub print_sequences { > my @sequences = @_; > for ($i = 0; $i <= $#sequences; $i++) { > print "Sequence name:",$sequences[$i]->display_id,"\n"; > print "Sequence acc:",$sequences[$i]->accession_number,"\n"; > print $sequences[$i]->seq(),"\n"; > } > } > > sub write_seq_to_file { > my ($file_name,$seq) = @_; > write_sequence($file_name,'fasta',$seq); > print "\n",$seq->display_id," written to file.\n\n"; > } > > > -- > View this message in context: http://old.nabble.com/Clustalw-problems%21%21%21-tp31509401p31509401.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tzhu at mail.bnu.edu.cn Sat Apr 30 02:56:54 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Sat, 30 Apr 2011 14:56:54 +0800 Subject: [Bioperl-l] Convert fastq to fasta Message-ID: <1304146614.5523.5.camel@ubuntu> Scripts like this are ok: ------------- transform.pl --------------- use Bio::SeqIO; my ($file1,$file2)=@ARGV; my $seqin = Bio::SeqIO -> new (-format => 'fastq',-file => $file1); my $seqout = Bio::SeqIO -> new (-format => 'fasta',-file => ">$file2"); while (my $seq_obj = $seqin -> next_seq) { $seqout -> write_seq($seq_obj); } ------------------------------------------- run as this: perl transfrom.pl sequence.fastq sequence.fasta > Message: 7 > Date: Wed, 27 Apr 2011 17:26:49 -0700 (PDT) > From: perlbio007 > Subject: [Bioperl-l] Convert fastq to fasta > To: Bioperl-l at lists.open-bio.org > Message-ID: <31492543.post at talk.nabble.com> > Content-Type: text/plain; charset=us-ascii > > > Iam new to Bioperl. Pls help. > I have a zip folder of sequences which is in fastq format. I need to > convert > it in fasta format? > How I do that using bioperl?What module do I need? > -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn Website: http://bnuzt.org (mainly written in Chinese) From florent.angly at gmail.com Sat Apr 30 19:19:55 2011 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 01 May 2011 09:19:55 +1000 Subject: [Bioperl-l] Bioperl-l Digest, Vol 96, Issue 28 In-Reply-To: References: Message-ID: <4DBC991B.80002@gmail.com> Kamal, It looks like you have a typo somewhere: what is 'clustaw'? You probably mean 'clustalw'. Florent On 29/04/11 16:34, khush ........ wrote: > Dear, > > I am trying to calculate the Ka/ks ratio of my aligned sequences by clustalx > and for the same I am using > > So I am using the the scrip given at > https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/pairwise_kaks.PLS > > when I am trying to run the It alert me to chage the line > > "warn("Could not find the executable for $aln_prog, make sure you have > installed it and have either set ".uc($aln_prog)."DIR or it is in your > PATH");" > > "Could not find the executable for clustaw, make sure you have installed it > and have either set CLUSTAWDIR or it is in your PATH at kaks.pl line 52." > > I have clustalw2 and clustalx installed on my system. How to and where to > set the path for the same and how to calculate the Ka/Ks raio for my > sequences. > > Thank you > Kamal > > > > > > > On Fri, Apr 29, 2011 at 11:16 AM,wrote: > >> Send Bioperl-l mailing list submissions to >> bioperl-l at lists.open-bio.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> or, via email, send a message with subject or body 'help' to >> bioperl-l-request at lists.open-bio.org >> >> You can reach the person managing the list at >> bioperl-l-owner at lists.open-bio.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Bioperl-l digest..." >> >> >> Today's Topics: >> >> 1. Re: GSoC/BioPerl Reorganization Project (Sheena Scroggins) >> 2. Re: GSoC/BioPerl Reorganization Project (Chris Fields) >> 3. Re: GSoC/BioPerl Reorganization Project (Robert Buels) >> 4. Re: GSoC/BioPerl Reorganization Project (Siddhartha Basu) >> 5. Re: Standalone blast (khush ........) >> 6. Re: GSoC/BioPerl Reorganization Project (Robert Buels) >> 7. Re: Standalone blast (Florent Angly) >> 8. Re: Standalone blast (khush ........) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 28 Apr 2011 12:53:49 -0700 >> From: Sheena Scroggins >> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Chris, >> >> We haven't talked much about the versioning yet, but it will be on the list >> to figure out asap. >> >> So far, the plan is to split out Bio::Root first, followed by a couple >> modules that depend only on Bio::Root. The plan I proposed was Bio::Das, >> Bio::Event then Bio::Location. Depending on how much time is remaining for >> the GSoC project, the next to split out would be Bio::Factory and >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan >> to still help with the reorganization after the internship is over, but I >> obviously have to have a stopping point for the GSoC project. >> >> Rob provide me with a really nice scrip to list dependencies of the >> modules, >> so I plan to make a roadmap towards to end of the summer that will help >> guide the rest of the reorganization. At that point, we'll have to deal >> with >> the circular dependencies carefully. >> >> This is a huge project, much bigger than I can do in one summer. But I plan >> to get it started in a way that makes it easy for others to contribute. >> >> Sheena >> >> >> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields>> wrote: >>> Sheena, >>> >>> Congrats on being accepted! We've talked about doing this over the years, >>> but it's not an easy task and it needs a dedicated project to get the >> ball >>> rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a >>> few of my questions/thoughts (Rob could probably chime in as well, but I >>> think his general thoughts on the project parallel mine): >>> >>> 1) The current BioPerl CPAN could just be a simple install script, acting >>> like a 'Task' or 'Bundle' module, installing the actual Bio-specific >>> distributions. Doing it this way would allow you to iteratively split >> off >>> additional code but retain the original Task/Bundle-based approach to >>> installation. For instance, the first pass could split out Root, then >> have >>> a dependency-light and 'extras' distribution, 2nd round split further >> based >>> on function, and so on: >>> >>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, >>> min-deps, extra-deps >>> 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, >>> other-min-deps, extra-deps >>> ... >>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, >>> tree, align, coord, map, everything-else >>> ... >>> >>> Also, one could potentially install modules in various ways: >> interactively, >>> in predetermined groups, using a user-defined list, etc (one could >>> effectively create custom BioPerl installs for GBrowse or other tools for >>> instance). Of course I would only pick the easiest route to start, but >>> maybe that gives some ideas. Regardless, if the dependency tree is set >> up >>> correctly any reliance on other Bio* modules would be defined in the >> various >>> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). >>> >>> 2) The Bio::Root modules are probably the true core modules and are the >>> most stable with regards to changes, so those could be moved to something >>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've >>> discussed this on-list before, but it's appropriate to bring this up >> again) >>> 3) How do we want to handle versioning? We can't (and probably >> shouldn't) >>> release everything on a synchronized versioning scheme (via >>> Bio::Root::Version, for instance), that'll quickly fall apart. >> Personally I >>> can foresee each split-off dist having it's own version, with the BioPerl >>> network of modules being in effect it's own mini-CPAN. >>> >>> 5) Related to versioning, in my opinion we should maybe aim on eventually >>> calling this BioPerl v2.0 and starting with a simpler X.Y versioning >> scheme. >>> Lincoln has already done something like this with Bio::Graphics, which >> was >>> originally part of BioPerl but split off prior to v 1.6.0. >>> >>> 6) In some cases I can see particularly thorny problems, such as circular >>> dependencies. I can think of a few ways to address that (creating a >> simple >>> lightweight Bio::Species class as a fallback if Bio::Tree code isn't >>> present, for instance), but any additional thoughts on this would be >>> helpful. >>> >>> 7) Do we want to set up something like 'git submodule' for the devs to >> pull >>> down all BioPerl-relevant code? >>> >>> Other thoughts? >>> >>> chris >>> >>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: >>> >>>> Hey everyone, >>>> >>>> I wanted to take a minute to introduce myself as one of the Google >> Summer >>> of >>>> Code interns. I was the lucky one chosen to work on the BioPerl >>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, >>> and >>>> somewhat new to this level of programming so bear with me as I learn >> the >>>> technical jargon. Luckily I have both Rob and Chris to mentor me this >>>> summer! >>>> >>>> Reading through the mailing list archives, I see there have been many >>>> discussion and differing opinions about tackling this project. Given >> the >>>> time frame for GSoC and my limited experience, there is no way I will >>>> complete this project on my own but I will at least be able to start >> it, >>>> which will hopefully motivate others to pitch in. So far, the plan for >>> the >>>> GSoC project is to start by breaking out Bio::Root, followed by a >> couple >>>> other modules based on their dependencies and the time allowed. Each >> will >>> be >>>> published to CPAN independently. You can follow the project (once it >>> starts) >>>> on github at https://github.com/sheenams. >>>> >>>> I look forward to collaborating with many of you on the reorganization >>> (hint >>>> hint)! >>>> >>>> Sheena >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 28 Apr 2011 16:04:51 -0500 >> From: Chris Fields >> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project >> To: Sheena Scroggins >> Cc: BioPerl List, Robert Buels >> >> Message-ID:<1FF62DC3-941A-4DCB-8464-89D220E4A9C5 at illinois.edu> >> Content-Type: text/plain; charset="us-ascii" >> >> Sounds fine; I think (as you indicate) we can deal with issues along the >> way. Rob, anything to add? >> >> chris >> >> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: >> >>> Chris, >>> >>> We haven't talked much about the versioning yet, but it will be on the >> list to figure out asap. >>> So far, the plan is to split out Bio::Root first, followed by a couple >> modules that depend only on Bio::Root. The plan I proposed was Bio::Das, >> Bio::Event then Bio::Location. Depending on how much time is remaining for >> the GSoC project, the next to split out would be Bio::Factory and >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan >> to still help with the reorganization after the internship is over, but I >> obviously have to have a stopping point for the GSoC project. >>> Rob provide me with a really nice scrip to list dependencies of the >> modules, so I plan to make a roadmap towards to end of the summer that will >> help guide the rest of the reorganization. At that point, we'll have to deal >> with the circular dependencies carefully. >>> This is a huge project, much bigger than I can do in one summer. But I >> plan to get it started in a way that makes it easy for others to contribute. >>> Sheena >>> >>> >>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields >> wrote: >>> Sheena, >>> >>> Congrats on being accepted! We've talked about doing this over the years, >> but it's not an easy task and it needs a dedicated project to get the ball >> rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a >> few of my questions/thoughts (Rob could probably chime in as well, but I >> think his general thoughts on the project parallel mine): >>> 1) The current BioPerl CPAN could just be a simple install script, acting >> like a 'Task' or 'Bundle' module, installing the actual Bio-specific >> distributions. Doing it this way would allow you to iteratively split off >> additional code but retain the original Task/Bundle-based approach to >> installation. For instance, the first pass could split out Root, then have >> a dependency-light and 'extras' distribution, 2nd round split further based >> on function, and so on: >>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, >> min-deps, extra-deps >>> 2nd round (v 1.901) : BioPerl (just an installer) -> root, seq/feature, >> other-min-deps, extra-deps >>> ... >>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, seq, >> tree, align, coord, map, everything-else >>> ... >>> >>> Also, one could potentially install modules in various ways: >> interactively, in predetermined groups, using a user-defined list, etc (one >> could effectively create custom BioPerl installs for GBrowse or other tools >> for instance). Of course I would only pick the easiest route to start, but >> maybe that gives some ideas. Regardless, if the dependency tree is set up >> correctly any reliance on other Bio* modules would be defined in the various >> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). >>> 2) The Bio::Root modules are probably the true core modules and are the >> most stable with regards to changes, so those could be moved to something >> like BioPerl-Core. Beyond that, what are the proposed splits? (we've >> discussed this on-list before, but it's appropriate to bring this up again) >>> 3) How do we want to handle versioning? We can't (and probably >> shouldn't) release everything on a synchronized versioning scheme (via >> Bio::Root::Version, for instance), that'll quickly fall apart. Personally I >> can foresee each split-off dist having it's own version, with the BioPerl >> network of modules being in effect it's own mini-CPAN. >>> 5) Related to versioning, in my opinion we should maybe aim on eventually >> calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme. >> Lincoln has already done something like this with Bio::Graphics, which was >> originally part of BioPerl but split off prior to v 1.6.0. >>> 6) In some cases I can see particularly thorny problems, such as circular >> dependencies. I can think of a few ways to address that (creating a simple >> lightweight Bio::Species class as a fallback if Bio::Tree code isn't >> present, for instance), but any additional thoughts on this would be >> helpful. >>> 7) Do we want to set up something like 'git submodule' for the devs to >> pull down all BioPerl-relevant code? >>> Other thoughts? >>> >>> chris >>> >>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: >>> >>>> Hey everyone, >>>> >>>> I wanted to take a minute to introduce myself as one of the Google >> Summer of >>>> Code interns. I was the lucky one chosen to work on the BioPerl >>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, >> and >>>> somewhat new to this level of programming so bear with me as I learn >> the >>>> technical jargon. Luckily I have both Rob and Chris to mentor me this >>>> summer! >>>> >>>> Reading through the mailing list archives, I see there have been many >>>> discussion and differing opinions about tackling this project. Given >> the >>>> time frame for GSoC and my limited experience, there is no way I will >>>> complete this project on my own but I will at least be able to start >> it, >>>> which will hopefully motivate others to pitch in. So far, the plan for >> the >>>> GSoC project is to start by breaking out Bio::Root, followed by a >> couple >>>> other modules based on their dependencies and the time allowed. Each >> will be >>>> published to CPAN independently. You can follow the project (once it >> starts) >>>> on github at https://github.com/sheenams. >>>> >>>> I look forward to collaborating with many of you on the reorganization >> (hint >>>> hint)! >>>> >>>> Sheena >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> ------------------------------ >> >> Message: 3 >> Date: Thu, 28 Apr 2011 16:19:51 -0700 >> From: Robert Buels >> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project >> To: Chris Fields >> Cc: Sheena Scroggins, BioPerl List >> >> Message-ID:<4DB9F617.6070705 at cornell.edu> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> I think you guys are on the right track, here are some slightly more >> detailed plans. I'll use Chris's subject numbering. >> >> 1,2,3,5.) I envision the splitting algorithm going like this: >> >> no strict; # this is pseudocode! >> >> my $split_count = 0; >> for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { >> >> - take $subsystem modules and tests out of bioperl-live >> >> (my $new_dist_name = $subsystem) =~ s/::/-/g; >> - extract $subsystem modules into new dist called >> $new_dist_name. Make sure all its tests pass, and write >> some more tests if necessary. >> >> - add dep on $subsystem to bioperl-live/Build.PL >> >> - push $new_dist_name and bioperl-live to CPAN. >> $new_dist_name has version '2.000', and bioperl-live has >> version "1.7.$split_count". >> } >> >> and then, at the end of this loop, bioperl-live will be >> nothing but a Build.PL and a couple of other things >> for backcompat, like Bio::Root::Version, Bio::Perl, etc. >> >> Important things to notice about this algorithm are that, at each >> step in the loop: >> >> a.) For users that install bioperl with CPAN, >> doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will >> get you the same set of modules as before the split >> started, with the split-off modules at 2.000 versions, and >> the non-split-off ones at 1.7.x versions. >> >> b.) For users (not developers) that are git cloning >> bioperl-live, even though they are naughty (wink), they >> can do 'perl Build.PL; ./Build installdeps' to get the >> split-off modules, downloaded like any other CPAN >> dependency. There may be some lag before the split-off >> thing is downloadable from CPAN, >> >> c.) For BioPerl developers, unless they are working on a >> certain module, they should install the split-off modules >> from CPAN like everybody else, and git clone only the piece >> they are working on. >> >> d.) The version of bioperl-live keeps increasing by 0.001 with >> each split. The systems that are split off have a 2.x >> version number, each slightly different depending on when it >> was split off. After this point, their release schedules >> and version numbers are independent of eachother and of >> bioperl-live. For Bio::Perl and Bio::Root::Version, the >> things that stay in bioperl-live, installing the latest >> version will get you all the split-off modules. >> >> >> 6.) (thorny circular dependencies and stuff) Those will become quickly >> apparent as this process proceeds. They'll take some finesse and/or >> ruthlessness and/or hacking to get around. We'll burn those bridges as >> we come to them. >> >> 7.) (git submodules) Git submodules probably won't be necessary, since >> at each step in the process BioPerl devs can use ./Build installdeps or >> cpanm --installdeps . to install whatever the dependencies are for the >> piece they are working on, whether it's bioperl-live (in the case of a >> module that has not yet been split off), or one of the distributions >> that has already been split off (in which case their improvements will >> probably be releasable to CPAN immediately!). >> >> Lots of detail there. I tried to make it structured and easy to skim >> though. Thoughts? >> >> Rob >> >> >> >> On 04/28/2011 02:04 PM, Chris Fields wrote: >>> Sounds fine; I think (as you indicate) we can deal with issues along the >> way. Rob, anything to add? >>> chris >>> >>> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: >>> >>>> Chris, >>>> >>>> We haven't talked much about the versioning yet, but it will be on the >> list to figure out asap. >>>> So far, the plan is to split out Bio::Root first, followed by a couple >> modules that depend only on Bio::Root. The plan I proposed was Bio::Das, >> Bio::Event then Bio::Location. Depending on how much time is remaining for >> the GSoC project, the next to split out would be Bio::Factory and >> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan >> to still help with the reorganization after the internship is over, but I >> obviously have to have a stopping point for the GSoC project. >>>> Rob provide me with a really nice scrip to list dependencies of the >> modules, so I plan to make a roadmap towards to end of the summer that will >> help guide the rest of the reorganization. At that point, we'll have to deal >> with the circular dependencies carefully. >>>> This is a huge project, much bigger than I can do in one summer. But I >> plan to get it started in a way that makes it easy for others to contribute. >>>> Sheena >>>> >>>> >>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields >> wrote: >>>> Sheena, >>>> >>>> Congrats on being accepted! We've talked about doing this over the >> years, but it's not an easy task and it needs a dedicated project to get the >> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with >> a few of my questions/thoughts (Rob could probably chime in as well, but I >> think his general thoughts on the project parallel mine): >>>> 1) The current BioPerl CPAN could just be a simple install script, >> acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific >> distributions. Doing it this way would allow you to iteratively split off >> additional code but retain the original Task/Bundle-based approach to >> installation. For instance, the first pass could split out Root, then have >> a dependency-light and 'extras' distribution, 2nd round split further based >> on function, and so on: >>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root, >> min-deps, extra-deps >>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root, >> seq/feature, other-min-deps, extra-deps >>>> ... >>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, >> seq, tree, align, coord, map, everything-else >>>> ... >>>> >>>> Also, one could potentially install modules in various ways: >> interactively, in predetermined groups, using a user-defined list, etc (one >> could effectively create custom BioPerl installs for GBrowse or other tools >> for instance). Of course I would only pick the easiest route to start, but >> maybe that gives some ideas. Regardless, if the dependency tree is set up >> correctly any reliance on other Bio* modules would be defined in the various >> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). >>>> 2) The Bio::Root modules are probably the true core modules and are the >> most stable with regards to changes, so those could be moved to something >> like BioPerl-Core. Beyond that, what are the proposed splits? (we've >> discussed this on-list before, but it's appropriate to bring this up again) >>>> 3) How do we want to handle versioning? We can't (and probably >> shouldn't) release everything on a synchronized versioning scheme (via >> Bio::Root::Version, for instance), that'll quickly fall apart. Personally I >> can foresee each split-off dist having it's own version, with the BioPerl >> network of modules being in effect it's own mini-CPAN. >>>> 5) Related to versioning, in my opinion we should maybe aim on >> eventually calling this BioPerl v2.0 and starting with a simpler X.Y >> versioning scheme. Lincoln has already done something like this with >> Bio::Graphics, which was originally part of BioPerl but split off prior to v >> 1.6.0. >>>> 6) In some cases I can see particularly thorny problems, such as >> circular dependencies. I can think of a few ways to address that (creating >> a simple lightweight Bio::Species class as a fallback if Bio::Tree code >> isn't present, for instance), but any additional thoughts on this would be >> helpful. >>>> 7) Do we want to set up something like 'git submodule' for the devs to >> pull down all BioPerl-relevant code? >>>> Other thoughts? >>>> >>>> chris >>>> >>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: >>>> >>>>> Hey everyone, >>>>> >>>>> I wanted to take a minute to introduce myself as one of the Google >> Summer of >>>>> Code interns. I was the lucky one chosen to work on the BioPerl >>>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, >> and >>>>> somewhat new to this level of programming so bear with me as I learn >> the >>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this >>>>> summer! >>>>> >>>>> Reading through the mailing list archives, I see there have been many >>>>> discussion and differing opinions about tackling this project. Given >> the >>>>> time frame for GSoC and my limited experience, there is no way I will >>>>> complete this project on my own but I will at least be able to start >> it, >>>>> which will hopefully motivate others to pitch in. So far, the plan for >> the >>>>> GSoC project is to start by breaking out Bio::Root, followed by a >> couple >>>>> other modules based on their dependencies and the time allowed. Each >> will be >>>>> published to CPAN independently. You can follow the project (once it >> starts) >>>>> on github at https://github.com/sheenams. >>>>> >>>>> I look forward to collaborating with many of you on the reorganization >> (hint >>>>> hint)! >>>>> >>>>> Sheena >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> >> ------------------------------ >> >> Message: 4 >> Date: Thu, 28 Apr 2011 21:15:01 -0500 >> From: Siddhartha Basu >> Subject: [Bioperl-l] Re: GSoC/BioPerl Reorganization Project >> To: bioperl-l at lists.open-bio.org >> Message-ID:<20110429021457.GA351 at Macintosh-235.local> >> Content-Type: text/plain; charset=us-ascii >> >> Hi Robert, >> At what point in flow the dependencies between the split modules will be >> added. Is there any particular order the split modules would be created. >> And how those split off modules will be released in CPAN, one by one as >> they being generated or all of them in a batch after which they will >> follow their release schedule. >> >> -siddhartha >> >> >> >> On Thu, 28 Apr 2011, Robert Buels wrote: >> >>> I think you guys are on the right track, here are some slightly more >>> detailed plans. I'll use Chris's subject numbering. >>> >>> 1,2,3,5.) I envision the splitting algorithm going like this: >>> >>> no strict; # this is pseudocode! >>> >>> my $split_count = 0; >>> for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) { >>> >>> - take $subsystem modules and tests out of bioperl-live >>> >>> (my $new_dist_name = $subsystem) =~ s/::/-/g; >>> - extract $subsystem modules into new dist called >>> $new_dist_name. Make sure all its tests pass, and write >>> some more tests if necessary. >>> >>> - add dep on $subsystem to bioperl-live/Build.PL >>> >>> - push $new_dist_name and bioperl-live to CPAN. >>> $new_dist_name has version '2.000', and bioperl-live has >>> version "1.7.$split_count". >>> } >>> >>> and then, at the end of this loop, bioperl-live will be >>> nothing but a Build.PL and a couple of other things >>> for backcompat, like Bio::Root::Version, Bio::Perl, etc. >>> >>> Important things to notice about this algorithm are that, at each >>> step in the loop: >>> >>> a.) For users that install bioperl with CPAN, >>> doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will >>> get you the same set of modules as before the split >>> started, with the split-off modules at 2.000 versions, and >>> the non-split-off ones at 1.7.x versions. >>> >>> b.) For users (not developers) that are git cloning >>> bioperl-live, even though they are naughty (wink), they >>> can do 'perl Build.PL; ./Build installdeps' to get the >>> split-off modules, downloaded like any other CPAN >>> dependency. There may be some lag before the split-off >>> thing is downloadable from CPAN, >>> >>> c.) For BioPerl developers, unless they are working on a >>> certain module, they should install the split-off modules >>> from CPAN like everybody else, and git clone only the piece >>> they are working on. >>> >>> d.) The version of bioperl-live keeps increasing by 0.001 with >>> each split. The systems that are split off have a 2.x >>> version number, each slightly different depending on when it >>> was split off. After this point, their release schedules >>> and version numbers are independent of eachother and of >>> bioperl-live. For Bio::Perl and Bio::Root::Version, the >>> things that stay in bioperl-live, installing the latest >>> version will get you all the split-off modules. >>> >>> >>> 6.) (thorny circular dependencies and stuff) Those will become quickly >>> apparent as this process proceeds. They'll take some finesse and/or >>> ruthlessness and/or hacking to get around. We'll burn those bridges as >> we >>> come to them. >>> >>> 7.) (git submodules) Git submodules probably won't be necessary, since at >>> each step in the process BioPerl devs can use ./Build installdeps or >> cpanm >>> --installdeps . to install whatever the dependencies are for the piece >>> they are working on, whether it's bioperl-live (in the case of a module >>> that has not yet been split off), or one of the distributions that has >>> already been split off (in which case their improvements will probably be >>> releasable to CPAN immediately!). >>> >>> Lots of detail there. I tried to make it structured and easy to skim >>> though. Thoughts? >>> >>> Rob >>> >>> >>> >>> On 04/28/2011 02:04 PM, Chris Fields wrote: >>>> Sounds fine; I think (as you indicate) we can deal with issues along >> the >>>> way. Rob, anything to add? >>>> >>>> chris >>>> >>>> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote: >>>> >>>>> Chris, >>>>> >>>>> We haven't talked much about the versioning yet, but it will be on the >>>>> list to figure out asap. >>>>> >>>>> So far, the plan is to split out Bio::Root first, followed by a couple >>>>> modules that depend only on Bio::Root. The plan I proposed was >> Bio::Das, >>>>> Bio::Event then Bio::Location. Depending on how much time is remaining >>>>> for the GSoC project, the next to split out would be Bio::Factory and >>>>> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I >>>>> plan to still help with the reorganization after the internship is >> over, >>>>> but I obviously have to have a stopping point for the GSoC project. >>>>> >>>>> Rob provide me with a really nice scrip to list dependencies of the >>>>> modules, so I plan to make a roadmap towards to end of the summer that >>>>> will help guide the rest of the reorganization. At that point, we'll >> have >>>>> to deal with the circular dependencies carefully. >>>>> >>>>> This is a huge project, much bigger than I can do in one summer. But I >>>>> plan to get it started in a way that makes it easy for others to >>>>> contribute. >>>>> >>>>> Sheena >>>>> >>>>> >>>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields >>>>> wrote: >>>>> Sheena, >>>>> >>>>> Congrats on being accepted! We've talked about doing this over the >> years, >>>>> but it's not an easy task and it needs a dedicated project to get the >>>>> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start >> off >>>>> with a few of my questions/thoughts (Rob could probably chime in as >> well, >>>>> but I think his general thoughts on the project parallel mine): >>>>> >>>>> 1) The current BioPerl CPAN could just be a simple install script, >> acting >>>>> like a 'Task' or 'Bundle' module, installing the actual Bio-specific >>>>> distributions. Doing it this way would allow you to iteratively split >>>>> off additional code but retain the original Task/Bundle-based approach >> to >>>>> installation. For instance, the first pass could split out Root, then >>>>> have a dependency-light and 'extras' distribution, 2nd round split >>>>> further based on function, and so on: >>>>> >>>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs >> root, >>>>> min-deps, extra-deps >>>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root, >>>>> seq/feature, other-min-deps, extra-deps >>>>> ... >>>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools, >>>>> seq, tree, align, coord, map, everything-else >>>>> ... >>>>> >>>>> Also, one could potentially install modules in various ways: >>>>> interactively, in predetermined groups, using a user-defined list, etc >>>>> (one could effectively create custom BioPerl installs for GBrowse or >>>>> other tools for instance). Of course I would only pick the easiest >> route >>>>> to start, but maybe that gives some ideas. Regardless, if the >> dependency >>>>> tree is set up correctly any reliance on other Bio* modules would be >>>>> defined in the various Build.PL/Makefile.PL and then installed via >> CPAN >>>>> (as is any dependency). >>>>> >>>>> 2) The Bio::Root modules are probably the true core modules and are >> the >>>>> most stable with regards to changes, so those could be moved to >> something >>>>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've >>>>> discussed this on-list before, but it's appropriate to bring this up >>>>> again) >>>>> >>>>> 3) How do we want to handle versioning? We can't (and probably >>>>> shouldn't) release everything on a synchronized versioning scheme (via >>>>> Bio::Root::Version, for instance), that'll quickly fall apart. >>>>> Personally I can foresee each split-off dist having it's own version, >>>>> with the BioPerl network of modules being in effect it's own >> mini-CPAN. >>>>> 5) Related to versioning, in my opinion we should maybe aim on >> eventually >>>>> calling this BioPerl v2.0 and starting with a simpler X.Y versioning >>>>> scheme. Lincoln has already done something like this with >> Bio::Graphics, >>>>> which was originally part of BioPerl but split off prior to v 1.6.0. >>>>> >>>>> 6) In some cases I can see particularly thorny problems, such as >> circular >>>>> dependencies. I can think of a few ways to address that (creating a >>>>> simple lightweight Bio::Species class as a fallback if Bio::Tree code >>>>> isn't present, for instance), but any additional thoughts on this >> would >>>>> be helpful. >>>>> >>>>> 7) Do we want to set up something like 'git submodule' for the devs to >>>>> pull down all BioPerl-relevant code? >>>>> >>>>> Other thoughts? >>>>> >>>>> chris >>>>> >>>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote: >>>>> >>>>>> Hey everyone, >>>>>> >>>>>> I wanted to take a minute to introduce myself as one of the Google >>>>>> Summer of >>>>>> Code interns. I was the lucky one chosen to work on the BioPerl >>>>>> Reorganization (*crowd cheers*). I am a grad student in >> bioinformatics, >>>>>> and >>>>>> somewhat new to this level of programming so bear with me as I learn >> the >>>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this >>>>>> summer! >>>>>> >>>>>> Reading through the mailing list archives, I see there have been many >>>>>> discussion and differing opinions about tackling this project. Given >> the >>>>>> time frame for GSoC and my limited experience, there is no way I will >>>>>> complete this project on my own but I will at least be able to start >> it, >>>>>> which will hopefully motivate others to pitch in. So far, the plan >> for >>>>>> the >>>>>> GSoC project is to start by breaking out Bio::Root, followed by a >> couple >>>>>> other modules based on their dependencies and the time allowed. Each >>>>>> will be >>>>>> published to CPAN independently. You can follow the project (once it >>>>>> starts) >>>>>> on github at https://github.com/sheenams. >>>>>> >>>>>> I look forward to collaborating with many of you on the >> reorganization >>>>>> (hint >>>>>> hint)! >>>>>> >>>>>> Sheena >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ------------------------------ >> >> Message: 5 >> Date: Fri, 29 Apr 2011 10:23:50 +0530 >> From: "khush ........" >> Subject: Re: [Bioperl-l] Standalone blast >> To: Dave Messina >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Dear Dave, >> >> Thank you for your support. >> >> If need to change the following lines like >> >> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastx', >> -database => 'nr.fa')); >> >> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); >> >> I have a simple and basic query for you, as I am beginners in bioperl, that >> if I need to download the whole nr database from NCBI to run the code or It >> will directly fetch information from the NCBI website. I do not understand >> it, because downloading the whole nr d/b itself takes long time for me. >> >> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" in >> -seq (a fasta file). Is there a simple way to do the exercise according to >> my conditions. >> >> Thank you >> Kamal >> >> >> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina>> wrote: >>> Hi Kamal, >>> >>> This is covered in the beginners' HOWTO: >>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST >>> >>> >>> Dave >>> >>> >>> On Thu, Apr 28, 2011 at 07:22, khush ........>> wrote: >>> >>>> Hi, >>>> >>>> I have some sequences ~250 and wanted to use BLASTX to blast against nr >>>> database of NCBI, as this is time consuming using web based search. Can >>>> some >>>> one please tell me how to start BIOPERL with scuh problems. I know that >>>> this >>>> is possible with bioperl, but do not know how. >>>> >>>> Any suggestion will be appreciable. >>>> >>>> Thanks in advance >>>> Kamal >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> ------------------------------ >> >> Message: 6 >> Date: Thu, 28 Apr 2011 22:15:01 -0700 >> From: Robert Buels >> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project >> To: BioPerl List >> Message-ID:<4DBA4955.2030003 at cornell.edu> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> On 04/28/2011 07:15 PM, Siddhartha Basu wrote: >>> At what point in flow the dependencies between the split modules will be >>> added. Is there any particular order the split modules would be created. >> Dependencies are added and characterized at the time each distribution >> is created. That's why the splitting order starts at Bio::Root, so that >> you can proceed up the hierarchy of dependencies without having to >> modify the dependency lists of the distributions that have already been >> extracted. >> >>> And how those split off modules will be released in CPAN, one by one as >>> they being generated or all of them in a batch after which they will >>> follow their release schedule. >> One by one, as they are generated. I think it would be a good idea to >> re-release bioperl-live with each split as well. This will probably >> lead to bioperl-live being released nearly every week as the split is >> ongoing. As a consequence, the master branch of bioperl-live will need >> to be kept in very good shape. This is easy if you just follow good >> practice: develop in branches, run *all* the tests before committing, go >> on IRC and send pull requests for code review, etc. >> >> Rob >> >> >> ------------------------------ >> >> Message: 7 >> Date: Fri, 29 Apr 2011 15:24:45 +1000 >> From: Florent Angly >> Subject: Re: [Bioperl-l] Standalone blast >> To: bioinfo.khush at gmail.com >> Cc: bioperl-l at lists.open-bio.org >> Message-ID:<4DBA4B9D.1010400 at gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hi Kamal, >> >> To run BLAST the way Dave described, you need to have BLAST installed on >> your computer, and you need to download BLAST databases to your computer >> (or make them yourself with the formatdb command). There are plenty of >> databases available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. >> And yes, some of these databases are very large and will take a long >> time to download. By the way, the BLAST may also take a very long time >> to execute if you use large databases, so, you'd better run the analysis >> on a powerful computer or a server. >> >> Also read this documentation: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm >> < >> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm >> It stipulates that you can BLAST an entire FASTA file (not just a >> sequence object): >> >> $inputfilename = 't/testquery.fa'; >> $blast_report = $factory->blastall($inputfilename); >> >> >> Regards, >> >> Florent >> >> >> >> >> On 29/04/11 14:53, khush ........ wrote: >>> Dear Dave, >>> >>> Thank you for your support. >>> >>> If need to change the following lines like >>> >>> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => >> 'blastx', >>> -database => 'nr.fa')); >>> >>> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); >>> >>> I have a simple and basic query for you, as I am beginners in bioperl, >> that >>> if I need to download the whole nr database from NCBI to run the code or >> It >>> will directly fetch information from the NCBI website. I do not >> understand >>> it, because downloading the whole nr d/b itself takes long time for me. >>> >>> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" >> in >>> -seq (a fasta file). Is there a simple way to do the exercise according >> to >>> my conditions. >>> >>> Thank you >>> Kamal >>> >>> >>> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina>> wrote: >>> >>>> Hi Kamal, >>>> >>>> This is covered in the beginners' HOWTO: >>>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST >>>> >>>> >>>> Dave >>>> >>>> >>>> On Thu, Apr 28, 2011 at 07:22, khush ........>> wrote: >>>>> Hi, >>>>> >>>>> I have some sequences ~250 and wanted to use BLASTX to blast against nr >>>>> database of NCBI, as this is time consuming using web based search. Can >>>>> some >>>>> one please tell me how to start BIOPERL with scuh problems. I know that >>>>> this >>>>> is possible with bioperl, but do not know how. >>>>> >>>>> Any suggestion will be appreciable. >>>>> >>>>> Thanks in advance >>>>> Kamal >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------ >> >> Message: 8 >> Date: Fri, 29 Apr 2011 11:16:38 +0530 >> From: "khush ........" >> Subject: Re: [Bioperl-l] Standalone blast >> To: Florent Angly >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Dear Florent, >> >> Thank you very much for your kind reply and let me clear the concept of >> running the blast. I am working with simple machine so I need to take >> permission from my administrator to work on some good server to have whole >> nr database from NCBI and run the blastx. >> >> Thank you >> >> Kamal >> Bioperl is great. >> >> >> On Fri, Apr 29, 2011 at 10:54 AM, Florent Angly>> wrote: >>> Hi Kamal, >>> >>> To run BLAST the way Dave described, you need to have BLAST installed on >>> your computer, and you need to download BLAST databases to your computer >> (or >>> make them yourself with the formatdb command). There are plenty of >> databases >>> available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes, >> some >>> of these databases are very large and will take a long time to download. >> By >>> the way, the BLAST may also take a very long time to execute if you use >>> large databases, so, you'd better run the analysis on a powerful computer >> or >>> a server. >>> >>> Also read this documentation: >>> >> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm >> < >> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm >>> It stipulates that you can BLAST an entire FASTA file (not just a >> sequence >>> object): >>> >>> $inputfilename = 't/testquery.fa'; >>> $blast_report = $factory->blastall($inputfilename); >>> >>> >>> Regards, >>> >>> Florent >>> >>> >>> >>> >>> >>> On 29/04/11 14:53, khush ........ wrote: >>> >>>> Dear Dave, >>>> >>>> Thank you for your support. >>>> >>>> If need to change the following lines like >>>> >>>> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => >> 'blastx', >>>> -database => 'nr.fa')); >>>> >>>> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa"); >>>> >>>> I have a simple and basic query for you, as I am beginners in bioperl, >>>> that >>>> if I need to download the whole nr database from NCBI to run the code or >>>> It >>>> will directly fetch information from the NCBI website. I do not >> understand >>>> it, because downloading the whole nr d/b itself takes long time for me. >>>> >>>> How could I read whole file instead of simple string "TTTATAGATAGAGACAG" >>>> in >>>> -seq (a fasta file). Is there a simple way to do the exercise according >> to >>>> my conditions. >>>> >>>> Thank you >>>> Kamal >>>> >>>> >>>> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina>>>> wrote: >>>> Hi Kamal, >>>>> This is covered in the beginners' HOWTO: >>>>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST >>>>> >>>>> >>>>> Dave >>>>> >>>>> >>>>> On Thu, Apr 28, 2011 at 07:22, khush ........>>>>> wrote: >>>>> Hi, >>>>>> I have some sequences ~250 and wanted to use BLASTX to blast against >> nr >>>>>> database of NCBI, as this is time consuming using web based search. >> Can >>>>>> some >>>>>> one please tell me how to start BIOPERL with scuh problems. I know >> that >>>>>> this >>>>>> is possible with bioperl, but do not know how. >>>>>> >>>>>> Any suggestion will be appreciable. >>>>>> >>>>>> Thanks in advance >>>>>> Kamal >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> ------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> End of Bioperl-l Digest, Vol 96, Issue 28 >> ***************************************** >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l