From du at ibio.jp Tue Sep 4 07:05:59 2012 From: du at ibio.jp (Du, Peng) Date: Tue, 4 Sep 2012 20:05:59 +0900 Subject: [Bioperl-l] PAML problem In-Reply-To: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> Message-ID: Hi Daisie, Thank you for your reply. Where could I get the bioperl-live you mentioned? Is it something equivalent to bioperl core package? If it is, do I need to reinstall the whole bioperl package? Thank you. Peng On Fri, Aug 17, 2012 at 3:38 AM, Daisie Huang wrote: > I'm not sure which PAML component caused this particular outcome, but the > bugs and fixes I pushed to bioperl-live might fix this. When will those get > pulled into the master? > > If those particular fixes don't help, I'd be happy to take a peek at the > originator's code and see if it's a quick re-parsing fix. > > Daisie > > > On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: >> >> Peng - >> >> This module needs a person who's sole job is to keep tracking bugs and >> updating it with new versions of the program. so far it has burned out >> several developers on working on it since it not stable. >> >> I am not sure what the answer is to the problem, but often it depends on >> the extra parameters used as this changes the order of the output making it >> hard to parse. >> >> So I don't have a solution for you except that you'll have to post the bug >> and the problem output mlc file to redmine and hope that we can entice some >> developers to bang their head against this some more. >> >> Jason >> On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: >> >> > Hi everyone, >> > >> > I am using bioperl to parse paml output, and I saw this >> > >> > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> > MSG: Unknown format of PAML output did not see seqtype >> > STACK: Error::throw >> > STACK: Bio::Root::Root::throw >> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 >> > STACK: Bio::Tools::Phylo::PAML::_parse_summary >> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 >> > STACK: Bio::Tools::Phylo::PAML::next_result >> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 >> > STACK: main::cal_dn_ds dn_ds.pl:131 >> > STACK: dn_ds.pl:44 >> > ---------------------------------------------------------------- >> > >> > I googled and found that, it was caused by PAML version >> > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them >> > worked. Could someone tell me which version is fine? >> > >> > My bioperl version is 1.006001. Thank you very much. >> > >> > -- >> > >> > Peng Du >> > Graduate School of Information Science and Technology, Hokkaido >> > University >> > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >> > Email: d... at ibio.jp Tel: +81 80 3268 9713 >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Biop... at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.... at gmail.com >> ja... at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Biop... at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Peng Du Graduate School of Information Science and Technology, Hokkaido University Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 Email: du at ibio.jp Tel: +81 80 3268 9713 From du at ibio.jp Tue Sep 4 07:34:37 2012 From: du at ibio.jp (Du, Peng) Date: Tue, 4 Sep 2012 20:34:37 +0900 Subject: [Bioperl-l] PAML problem In-Reply-To: References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> Message-ID: Hi Daisie, Problem got fixed~~ Thank you ^_^. On Tue, Sep 4, 2012 at 8:05 PM, Du, Peng wrote: > Hi Daisie, > > Thank you for your reply. > Where could I get the bioperl-live you mentioned? Is it something > equivalent to bioperl core package? > If it is, do I need to reinstall the whole bioperl package? > > Thank you. > Peng > > On Fri, Aug 17, 2012 at 3:38 AM, Daisie Huang wrote: >> I'm not sure which PAML component caused this particular outcome, but the >> bugs and fixes I pushed to bioperl-live might fix this. When will those get >> pulled into the master? >> >> If those particular fixes don't help, I'd be happy to take a peek at the >> originator's code and see if it's a quick re-parsing fix. >> >> Daisie >> >> >> On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: >>> >>> Peng - >>> >>> This module needs a person who's sole job is to keep tracking bugs and >>> updating it with new versions of the program. so far it has burned out >>> several developers on working on it since it not stable. >>> >>> I am not sure what the answer is to the problem, but often it depends on >>> the extra parameters used as this changes the order of the output making it >>> hard to parse. >>> >>> So I don't have a solution for you except that you'll have to post the bug >>> and the problem output mlc file to redmine and hope that we can entice some >>> developers to bang their head against this some more. >>> >>> Jason >>> On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: >>> >>> > Hi everyone, >>> > >>> > I am using bioperl to parse paml output, and I saw this >>> > >>> > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >>> > MSG: Unknown format of PAML output did not see seqtype >>> > STACK: Error::throw >>> > STACK: Bio::Root::Root::throw >>> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 >>> > STACK: Bio::Tools::Phylo::PAML::_parse_summary >>> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 >>> > STACK: Bio::Tools::Phylo::PAML::next_result >>> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 >>> > STACK: main::cal_dn_ds dn_ds.pl:131 >>> > STACK: dn_ds.pl:44 >>> > ---------------------------------------------------------------- >>> > >>> > I googled and found that, it was caused by PAML version >>> > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them >>> > worked. Could someone tell me which version is fine? >>> > >>> > My bioperl version is 1.006001. Thank you very much. >>> > >>> > -- >>> > >>> > Peng Du >>> > Graduate School of Information Science and Technology, Hokkaido >>> > University >>> > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >>> > Email: d... at ibio.jp Tel: +81 80 3268 9713 >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Biop... at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason.... at gmail.com >>> ja... at bioperl.org >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Biop... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Peng Du > Graduate School of Information Science and Technology, Hokkaido University > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > Email: du at ibio.jp Tel: +81 80 3268 9713 -- Peng Du Graduate School of Information Science and Technology, Hokkaido University Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 Email: du at ibio.jp Tel: +81 80 3268 9713 From bottomsc at missouri.edu Thu Sep 6 18:34:17 2012 From: bottomsc at missouri.edu (Christopher Bottoms) Date: Thu, 6 Sep 2012 17:34:17 -0500 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> Message-ID: Dear BioPerl Community, I welcome further comments on Bio::App::SELEX::RNAmotifAnalysis. Below this message is my updated "perldoc" for it. A special thanks to Leon Timmermans and Chris Fields for their prior feedback. Leon, I made several Improvements based on your feedback. Now, a wrapper script is used instead of the module file itself. FASTQ files are an acceptable input format. And the installer was improved. I found that installing Module::Build before our module cleared up issues with several dependencies. I think that our instructions for using cpanminus effectively give the same results as using local::lib. As for Alien packages, I would really like to work on them, but the time (i.e. funding) is currently too limited. Chris, Yes, the "App" part of the proposed name was chosen because this is designed more to be an application than to be modules to be reused. I fully intend to support this distribution myself. If you think it better to separate it more from the BioPerl namespace, I have considered using calling it App::Bio::SELEX::RNAmotifAnalysis. What do you think? I welcome additional feedback. Thanks, Christopher Bottoms Perldoc for Bio::App::SELEX::RNAmotifAnalysis: SYNOPSIS RNAmotifAnalysis --fastq seqs.fq --cpus 4 --run DESCRIPTION This module pipelines steps in the analysis of SELEX (Systematic Evolution of Ligands through EXponential enrichment) data. This main module creates scripts to do the following: (1) Cluster similar sequences based on edit distance. (2) Align sequences within each cluster (using mafft). (3) Calculate the secondary structure of the aligned sequences (using RNAalifold, from the Vienna RNA package) (4) Build covariance models using cmbuild from Infernal. Another useful utility installed with this distribution is "selex_covarianceSearch" for doing iterative refinements of covariance models. If you want to use files that simply list sequences, then use the "--simple" flag instead of the "--fastq" flag. This script assumes that you've already done all of the quality control of your sequences beforehand. If the FASTQ format is used, quality scores are ignored. EXAMPLE USE RNAmotifAnalysis --infile seqs.fq --cpus 4 --run This will cluster the sequences found in 'seqs.fq' and create a FASTA file for each one. The FASTA files will be grouped into batches (i.e. one per cpu requested) that will be placed in a separate directory for each batch, and processed within that directory. At the end of processing, for each cluster there will be a covariance model and postscript illustration files. The batch script used to process each batch will be located in the respective batch directory. To produce the scripts without running them, simply exclude the --run flag from the command line. CONFIGURATION AND ENVIRONMENT As written, this code makes heavy use of UNIX utilities and is therefore only supported on UNIX-like environemnts (e.g. Linux, UNIX, Mac OS X). Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add the directories containing their executables to your PATH, so that the first time you run RNAmotifAnalysis.pm the configuration file (cluster.cfg) that is generated will have all of the correct parameters. Otherwise, you'll need to update the configuration file manually. To update the PATH environment variable with the directory '/usr/local/myapps/bin/', update your .bashrc file, thus: echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. Now, every time you open a new terminal window, the PATH environment variable will contain '/usr/local/myapps/bin/'. To make your new .bashrc file effective immediately (i.e. without having to open a new terminal window), use the following command: source ~/.bashrc INSTALLATION These installation instructions assume being able to open and use a terminal window on Linux. (0) Some systems need several dependencies installed ahead of time. You may be able to skip this step. However, if subsequent steps don't work, then be sure that some basic libraries are installed, as shown below (or ask a system administrator to take care of it). For the applicable distribution, open a terminal and then type the commands as indicated: For RedHat or CentOS 5.x systems (tested on CentOS 5.5) sudo yum install gcc For RedHat or CentOS 6.x systems (tested on "Minimal Desktop" CentOS 6.0) sudo yum install gcc sudo yum install perl-devel For Ubuntu systems (tested on Ubuntu 12-04 LTS) sudo apt-get install curl For Debian 5.x systems: sudo apt-get install build-essentials (1) Install the non-Perl dependencies: (Versions shown are those that we've tested. Please contact us if newer versions do not work.) Infernal 1.0.2 (http://infernal.janelia.org/) MAFFT 6.849b (http://mafft.cbrc.jp/alignment/software/) RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/) After installing these, make sure all of the foloowing executables are in directories within your PATH: cmbuild cmcalibrate cmsearch cmalign mafft RNAalifold (2) Either (a) download and run our installer or (b) use a CPAN client to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer creates the directory 'perl5' inside your home directory. This directory is for holding Perl modules, including this module and any Perl module dependencies not already included on your system. The installer also appends commands to your .bashrc file to make it easy for the Perl runtime to find these new modules (i.e. it includes your local 'perl5/lib/perl5' directory in the PERL5LIB environment variable). (a) Installation method: Use the installer i. Download installer (and name it "installer") curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/200 ii. Make it executable chmod u+x installer iii. Run it. ./installer (b) Installation method: Use a CPAN client. Here we demonstrate the use of cpanminus to install it to a local Perl module directory. These instructions assume absolutely no experience with cpanminus. i. Download cpanminus curl -LOk http://xrl.us/cpanm ii. Make it executable chmod u+x cpanm iii. Make a local perl5 directory (if it doesn't already exist) mkdir -p ~/perl5 iv. Add relevant directories to your PERL5LIB and PATH environment variables by adding the following text to your ~/.bashrc file: # Set PERL5LIB if it doesn't already exist : ${PERL5LIB:=~/perl5/lib/perl5} # Prepend to PERL5LIB if directory not already found in PERL5LIB if ! echo $PERL5LIB | egrep -q "(^|:)~/perl5/lib/perl5($|:)"; then export PERL5LIB=~/perl5/lib/perl5:$PERL5LIB; fi # Prepend to PATH if directory not already found in PATH if ! echo $PATH | egrep -q "(^|:)~/perl5/bin($|:)"; then export PATH=~/perl5/bin:$PATH; fi v. Update environment variables immediately source ~/.bashrc vi. Install Module::Build ./cpanm -l ~/perl5 Module::Build vii. Install Bio::App::SELEX::RNAmotifAnalysis ./cpanm -l ~/perl5 Bio::App::SELEX::RNAmotifAnalysis Please contact the author if, after consulting this documentation and searching Google with error messages, you still encounter difficulties during the installation process using one of these two methods. INCOMPATIBILITIES None known BUGS AND LIMITATIONS There are no known bugs in this module. Please report problems to molecules cpan org Patches are welcome. RELATED PUBLICATIONS Ditzler et. al. Manuscript currently in review. From shalabh.sharma7 at gmail.com Fri Sep 7 15:41:10 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 7 Sep 2012 15:41:10 -0400 Subject: [Bioperl-l] combining multiple gbk files Message-ID: Hi All, Is there a way to combine multiple *gbk files in to one file. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From roy.chaudhuri at gmail.com Fri Sep 7 19:40:22 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Sat, 08 Sep 2012 00:40:22 +0100 Subject: [Bioperl-l] combining multiple gbk files In-Reply-To: References: Message-ID: <504A85E6.50809@gmail.com> Just concatenate them eg. using the Unix command cat: cat file1.gbk file2.gbk > multi.gbk On 07/09/2012 20:41, shalabh sharma wrote: > Hi All, > Is there a way to combine multiple *gbk files in to one file. > > Thanks > Shalabh > From dan.kortschak at adelaide.edu.au Fri Sep 7 22:28:04 2012 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Sat, 08 Sep 2012 11:58:04 +0930 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description Message-ID: <1347071284.3037.18.camel@sueno> Hello, this is slightly off-topic, but bioperl devs seemed like the most likely people to get an answer to this from. Is there an ebnf or other formal description of the genbank and embl formats available some where. The only thing I can find is the verbosity describing them in English at [1]. thanks Dan [1]http://www.insdc.org/documents/feature_table.html From dan.kortschak at adelaide.edu.au Fri Sep 7 23:43:59 2012 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Sat, 08 Sep 2012 13:13:59 +0930 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> Message-ID: <1347075839.3037.21.camel@sueno> Thanks Chris. That's remarkable, so many words and not an actual formal specification. I guess I have some work ahead of me. I found the example, but examples rarely contain all edges and corners. Dan On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: > Re: Genbank, the only know specification I know of is for the feature > table portion of the format as you have below. They do have a > (possibly out of date) example file, note it isn't easily found unless > you search for it: > > http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord > > EMBL is better in this regard: > > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html > > Note that UniProt Knowledgebase also has a user manual outlining the > similarities and differences with EMBL: > > http://web.expasy.org/docs/userman.html > > chris From cjfields at illinois.edu Fri Sep 7 23:39:44 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 8 Sep 2012 03:39:44 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <1347071284.3037.18.camel@sueno> References: <1347071284.3037.18.camel@sueno> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> Re: Genbank, the only know specification I know of is for the feature table portion of the format as you have below. They do have a (possibly out of date) example file, note it isn't easily found unless you search for it: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord EMBL is better in this regard: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html Note that UniProt Knowledgebase also has a user manual outlining the similarities and differences with EMBL: http://web.expasy.org/docs/userman.html chris On Sep 7, 2012, at 9:28 PM, Dan Kortschak wrote: > Hello, this is slightly off-topic, but bioperl devs seemed like the most > likely people to get an answer to this from. > > Is there an ebnf or other formal description of the genbank and embl > formats available some where. The only thing I can find is the verbosity > describing them in English at [1]. > > thanks > Dan > > [1]http://www.insdc.org/documents/feature_table.html > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From molecules at cpan.org Tue Sep 11 10:10:22 2012 From: molecules at cpan.org (Christopher Bottoms) Date: Tue, 11 Sep 2012 09:10:22 -0500 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <1347075839.3037.21.camel@sueno> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> Message-ID: Dan, Why not use BioPerl's Bio::SeqIO, which can parse GenBank files? --Christopher Bottoms On Fri, Sep 7, 2012 at 10:43 PM, Dan Kortschak wrote: > Thanks Chris. That's remarkable, so many words and not an actual formal > specification. I guess I have some work ahead of me. I found the > example, but examples rarely contain all edges and corners. > > Dan > > On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: >> Re: Genbank, the only know specification I know of is for the feature >> table portion of the format as you have below. They do have a >> (possibly out of date) example file, note it isn't easily found unless >> you search for it: >> >> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord >> >> EMBL is better in this regard: >> >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >> >> Note that UniProt Knowledgebase also has a user manual outlining the >> similarities and differences with EMBL: >> >> http://web.expasy.org/docs/userman.html >> >> chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Sep 11 10:39:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 11 Sep 2012 14:39:19 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> Christopher, I think Dan's question is orthogonal to actually parsing a file; it relates more to proper formatting for a particular format based on a specification as well as potential downstream validation. Bio::SeqIO::genbank is geared for flexibility and can handle a lot of mis-formatted data, it can massage some data into the proper format if needed. One must recognize the primary driver for the parsers is to get data into objects, not as a format converter (that just happens to be a nice useful side effect). The problem is, like many formats, a formal specification for Genbank format doesn't exist outside of the NCBI example file (old and incomplete) and the FT definition as far as I know, so calling something 'official' Genbank format isn't possible outside of NCBI. chris (f) On Sep 11, 2012, at 9:10 AM, Christopher Bottoms wrote: > Dan, > > Why not use BioPerl's Bio::SeqIO, which can parse GenBank files? > > --Christopher Bottoms > > On Fri, Sep 7, 2012 at 10:43 PM, Dan Kortschak > wrote: >> Thanks Chris. That's remarkable, so many words and not an actual formal >> specification. I guess I have some work ahead of me. I found the >> example, but examples rarely contain all edges and corners. >> >> Dan >> >> On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: >>> Re: Genbank, the only know specification I know of is for the feature >>> table portion of the format as you have below. They do have a >>> (possibly out of date) example file, note it isn't easily found unless >>> you search for it: >>> >>> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord >>> >>> EMBL is better in this regard: >>> >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>> >>> Note that UniProt Knowledgebase also has a user manual outlining the >>> similarities and differences with EMBL: >>> >>> http://web.expasy.org/docs/userman.html >>> >>> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Tue Sep 11 17:39:06 2012 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 11 Sep 2012 21:39:06 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> Message-ID: Thanks Chris. It is related to both really, and more. Second first, I continue to be amazed at the lack of specification or testing in a significant portion of software in the bioinformatics realm (bioperl is a nice counter example and one that I am grateful for having had as a training ground - and the work that has obviously gone into working through parsing and formatting un- or under-specified formats by the core and other developers is phenomenal). But to the first point, I am unable to use bioperl to parse/format these formats for my project as it is a new project, not written in Perl - apologies for abusing the list - but rather in Go. I could go through the Perl to reimplement based on that, but I was hoping to use a parser generator from a spec, so that I can guarantee the parser/formatter is correct formally. I asked here because I believe the developers of bioperl are some of the foremost experts in parsing the collection of "weakly defined, internally redundant, ambiguous, bulky fruit salad[s] of ... data format[s]" [1] that constute the majority of the file formats out there (this is not a pejorative against the bioperl devs, but rather a testament to their fortitude and strength - I have only implemented the bare minimum of formats in my library so far). thanks Dan [1]http://www.biostars.org/post/show/7126/what-are-the-most-common-stupid-mistakes-in-bioinformatics/#7136 On 12/09/2012, at 12:09 AM, "Fields, Christopher J" wrote: > Christopher, > > I think Dan's question is orthogonal to actually parsing a file; it relates more to proper formatting for a particular format based on a specification as well as potential downstream validation. Bio::SeqIO::genbank is geared for flexibility and can handle a lot of mis-formatted data, it can massage some data into the proper format if needed. One must recognize the primary driver for the parsers is to get data into objects, not as a format converter (that just happens to be a nice useful side effect). > > The problem is, like many formats, a formal specification for Genbank format doesn't exist outside of the NCBI example file (old and incomplete) and the FT definition as far as I know, so calling something 'official' Genbank format isn't possible outside of NCBI. > > chris (f) > > On Sep 11, 2012, at 9:10 AM, Christopher Bottoms wrote: > >> Dan, >> >> Why not use BioPerl's Bio::SeqIO, which can parse GenBank files? >> >> --Christopher Bottoms >> >> On Fri, Sep 7, 2012 at 10:43 PM, Dan Kortschak >> wrote: >>> Thanks Chris. That's remarkable, so many words and not an actual formal >>> specification. I guess I have some work ahead of me. I found the >>> example, but examples rarely contain all edges and corners. >>> >>> Dan >>> >>> On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: >>>> Re: Genbank, the only know specification I know of is for the feature >>>> table portion of the format as you have below. They do have a >>>> (possibly out of date) example file, note it isn't easily found unless >>>> you search for it: >>>> >>>> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord >>>> >>>> EMBL is better in this regard: >>>> >>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>> >>>> Note that UniProt Knowledgebase also has a user manual outlining the >>>> similarities and differences with EMBL: >>>> >>>> http://web.expasy.org/docs/userman.html >>>> >>>> chris >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ahlberg.gustav at gmail.com Wed Sep 5 11:41:13 2012 From: ahlberg.gustav at gmail.com (Gustav Ahlberg) Date: Wed, 5 Sep 2012 17:41:13 +0200 Subject: [Bioperl-l] clustalw2 Message-ID: Hi! I am trying to run clustalw2 through bioperl. It is not working. I have tried exporting the path etc etc. clustalw2 is executable in the terminal. I have included BEGIN { $ENV{CLUSTALDIR} = ' /usr/local/bin/clustalw2/'} in each of my script. What is wrong here. Thanks in advance for your answer. Best Gustav From ahlberg.gustav at gmail.com Thu Sep 6 11:14:04 2012 From: ahlberg.gustav at gmail.com (Gahl) Date: Thu, 6 Sep 2012 08:14:04 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> Message-ID: <34398168.post@talk.nabble.com> Hello, I have the sam problem. How did you fix it? Du, Peng wrote: > > Hi Daisie, > > Problem got fixed~~ > Thank you ^_^. > > On Tue, Sep 4, 2012 at 8:05 PM, Du, Peng wrote: >> Hi Daisie, >> >> Thank you for your reply. >> Where could I get the bioperl-live you mentioned? Is it something >> equivalent to bioperl core package? >> If it is, do I need to reinstall the whole bioperl package? >> >> Thank you. >> Peng >> >> On Fri, Aug 17, 2012 at 3:38 AM, Daisie Huang wrote: >>> I'm not sure which PAML component caused this particular outcome, but >>> the >>> bugs and fixes I pushed to bioperl-live might fix this. When will those >>> get >>> pulled into the master? >>> >>> If those particular fixes don't help, I'd be happy to take a peek at the >>> originator's code and see if it's a quick re-parsing fix. >>> >>> Daisie >>> >>> >>> On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: >>>> >>>> Peng - >>>> >>>> This module needs a person who's sole job is to keep tracking bugs and >>>> updating it with new versions of the program. so far it has burned out >>>> several developers on working on it since it not stable. >>>> >>>> I am not sure what the answer is to the problem, but often it depends >>>> on >>>> the extra parameters used as this changes the order of the output >>>> making it >>>> hard to parse. >>>> >>>> So I don't have a solution for you except that you'll have to post the >>>> bug >>>> and the problem output mlc file to redmine and hope that we can entice >>>> some >>>> developers to bang their head against this some more. >>>> >>>> Jason >>>> On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: >>>> >>>> > Hi everyone, >>>> > >>>> > I am using bioperl to parse paml output, and I saw this >>>> > >>>> > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >>>> > MSG: Unknown format of PAML output did not see seqtype >>>> > STACK: Error::throw >>>> > STACK: Bio::Root::Root::throw >>>> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 >>>> > STACK: Bio::Tools::Phylo::PAML::_parse_summary >>>> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 >>>> > STACK: Bio::Tools::Phylo::PAML::next_result >>>> > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 >>>> > STACK: main::cal_dn_ds dn_ds.pl:131 >>>> > STACK: dn_ds.pl:44 >>>> > ---------------------------------------------------------------- >>>> > >>>> > I googled and found that, it was caused by PAML version >>>> > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them >>>> > worked. Could someone tell me which version is fine? >>>> > >>>> > My bioperl version is 1.006001. Thank you very much. >>>> > >>>> > -- >>>> > >>>> > Peng Du >>>> > Graduate School of Information Science and Technology, Hokkaido >>>> > University >>>> > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >>>> > Email: d... at ibio.jp Tel: +81 80 3268 9713 >>>> > >>>> > _______________________________________________ >>>> > Bioperl-l mailing list >>>> > Biop... at lists.open-bio.org >>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich >>>> jason.... at gmail.com >>>> ja... at bioperl.org >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Biop... at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> >> Peng Du >> Graduate School of Information Science and Technology, Hokkaido >> University >> Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >> Email: du at ibio.jp Tel: +81 80 3268 9713 > > > > -- > > Peng Du > Graduate School of Information Science and Technology, Hokkaido University > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > Email: du at ibio.jp Tel: +81 80 3268 9713 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/PAML-problem-tp34076014p34398168.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hlapp at drycafe.net Tue Sep 11 18:02:46 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 11 Sep 2012 18:02:46 -0400 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> Message-ID: One of the problems in Perl with using a language-neutral definition of the format as a context-free grammar has been that RecDescent was just way too slow for this. One of the Google Summer of Code students working on fast parsers (for SAM/BAM I think) used Ragel (http://www.complang.org/ragel/), which looks quite cool, but unfortunately doesn't support Perl (nor Go :-) -hilmar On Sep 11, 2012, at 5:39 PM, Dan Kortschak wrote: > Thanks Chris. It is related to both really, and more. > > Second first, I continue to be amazed at the lack of specification or testing in a significant portion of software in the bioinformatics realm (bioperl is a nice counter example and one that I am grateful for having had as a training ground - and the work that has obviously gone into working through parsing and formatting un- or under-specified formats by the core and other developers is phenomenal). > > But to the first point, I am unable to use bioperl to parse/format these formats for my project as it is a new project, not written in Perl - apologies for abusing the list - but rather in Go. I could go through the Perl to reimplement based on that, but I was hoping to use a parser generator from a spec, so that I can guarantee the parser/formatter is correct formally. > > I asked here because I believe the developers of bioperl are some of the foremost experts in parsing the collection of "weakly defined, internally redundant, ambiguous, bulky fruit salad[s] of ... data format[s]" [1] that constute the majority of the file formats out there (this is not a pejorative against the bioperl devs, but rather a testament to their fortitude and strength - I have only implemented the bare minimum of formats in my library so far). > > thanks > Dan > > > [1]http://www.biostars.org/post/show/7126/what-are-the-most-common-stupid-mistakes-in-bioinformatics/#7136 > > On 12/09/2012, at 12:09 AM, "Fields, Christopher J" wrote: > >> Christopher, >> >> I think Dan's question is orthogonal to actually parsing a file; it relates more to proper formatting for a particular format based on a specification as well as potential downstream validation. Bio::SeqIO::genbank is geared for flexibility and can handle a lot of mis-formatted data, it can massage some data into the proper format if needed. One must recognize the primary driver for the parsers is to get data into objects, not as a format converter (that just happens to be a nice useful side effect). >> >> The problem is, like many formats, a formal specification for Genbank format doesn't exist outside of the NCBI example file (old and incomplete) and the FT definition as far as I know, so calling something 'official' Genbank format isn't possible outside of NCBI. >> >> chris (f) >> >> On Sep 11, 2012, at 9:10 AM, Christopher Bottoms wrote: >> >>> Dan, >>> >>> Why not use BioPerl's Bio::SeqIO, which can parse GenBank files? >>> >>> --Christopher Bottoms >>> >>> On Fri, Sep 7, 2012 at 10:43 PM, Dan Kortschak >>> wrote: >>>> Thanks Chris. That's remarkable, so many words and not an actual formal >>>> specification. I guess I have some work ahead of me. I found the >>>> example, but examples rarely contain all edges and corners. >>>> >>>> Dan >>>> >>>> On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: >>>>> Re: Genbank, the only know specification I know of is for the feature >>>>> table portion of the format as you have below. They do have a >>>>> (possibly out of date) example file, note it isn't easily found unless >>>>> you search for it: >>>>> >>>>> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord >>>>> >>>>> EMBL is better in this regard: >>>>> >>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>> >>>>> Note that UniProt Knowledgebase also has a user manual outlining the >>>>> similarities and differences with EMBL: >>>>> >>>>> http://web.expasy.org/docs/userman.html >>>>> >>>>> chris >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From dan.kortschak at adelaide.edu.au Tue Sep 11 18:26:16 2012 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 11 Sep 2012 22:26:16 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , Message-ID: <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> Hi Hilmar, Yes, I plan to use ragel, which recently does again support Go at least in a non-official fork, which looks like it will be merged into Adrian's repo. (It might be a nice project for a student to implement a perl back end for ragel - though the absence of formal format descriptions makes its utility for bioperl somewhat moot ;). I'd be interested to see the GSoC project if it's public yet - I'm in the process of writing a pure Go SAM/BAM package to replace my boom interface to libbam. I can see why a generic parser is not appropriate for a perl-based parser (for the same reason I'm not using a generic Go parser, but rather a parser generator). The fact remains, a formal specification is beneficial for testing correctness and avoiding some of the problems bioperl has had in the past when NCBI has changed formats under bioperl's feet. cheers Dan On 12/09/2012, at 7:32 AM, "Hilmar Lapp" wrote: > One of the problems in Perl with using a language-neutral definition of the format as a context-free grammar has been that RecDescent was just way too slow for this. > > One of the Google Summer of Code students working on fast parsers (for SAM/BAM I think) used Ragel (http://www.complang.org/ragel/), which looks quite cool, but unfortunately doesn't support Perl (nor Go :-) > > -hilmar > > On Sep 11, 2012, at 5:39 PM, Dan Kortschak wrote: > >> Thanks Chris. It is related to both really, and more. >> >> Second first, I continue to be amazed at the lack of specification or testing in a significant portion of software in the bioinformatics realm (bioperl is a nice counter example and one that I am grateful for having had as a training ground - and the work that has obviously gone into working through parsing and formatting un- or under-specified formats by the core and other developers is phenomenal). >> >> But to the first point, I am unable to use bioperl to parse/format these formats for my project as it is a new project, not written in Perl - apologies for abusing the list - but rather in Go. I could go through the Perl to reimplement based on that, but I was hoping to use a parser generator from a spec, so that I can guarantee the parser/formatter is correct formally. >> >> I asked here because I believe the developers of bioperl are some of the foremost experts in parsing the collection of "weakly defined, internally redundant, ambiguous, bulky fruit salad[s] of ... data format[s]" [1] that constute the majority of the file formats out there (this is not a pejorative against the bioperl devs, but rather a testament to their fortitude and strength - I have only implemented the bare minimum of formats in my library so far). >> >> thanks >> Dan >> >> >> [1]http://www.biostars.org/post/show/7126/what-are-the-most-common-stupid-mistakes-in-bioinformatics/#7136 >> >> On 12/09/2012, at 12:09 AM, "Fields, Christopher J" wrote: >> >>> Christopher, >>> >>> I think Dan's question is orthogonal to actually parsing a file; it relates more to proper formatting for a particular format based on a specification as well as potential downstream validation. Bio::SeqIO::genbank is geared for flexibility and can handle a lot of mis-formatted data, it can massage some data into the proper format if needed. One must recognize the primary driver for the parsers is to get data into objects, not as a format converter (that just happens to be a nice useful side effect). >>> >>> The problem is, like many formats, a formal specification for Genbank format doesn't exist outside of the NCBI example file (old and incomplete) and the FT definition as far as I know, so calling something 'official' Genbank format isn't possible outside of NCBI. >>> >>> chris (f) >>> >>> On Sep 11, 2012, at 9:10 AM, Christopher Bottoms wrote: >>> >>>> Dan, >>>> >>>> Why not use BioPerl's Bio::SeqIO, which can parse GenBank files? >>>> >>>> --Christopher Bottoms >>>> >>>> On Fri, Sep 7, 2012 at 10:43 PM, Dan Kortschak >>>> wrote: >>>>> Thanks Chris. That's remarkable, so many words and not an actual formal >>>>> specification. I guess I have some work ahead of me. I found the >>>>> example, but examples rarely contain all edges and corners. >>>>> >>>>> Dan >>>>> >>>>> On Sat, 2012-09-08 at 03:39 +0000, Fields, Christopher J wrote: >>>>>> Re: Genbank, the only know specification I know of is for the feature >>>>>> table portion of the format as you have below. They do have a >>>>>> (possibly out of date) example file, note it isn't easily found unless >>>>>> you search for it: >>>>>> >>>>>> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord >>>>>> >>>>>> EMBL is better in this regard: >>>>>> >>>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>>> >>>>>> Note that UniProt Knowledgebase also has a user manual outlining the >>>>>> similarities and differences with EMBL: >>>>>> >>>>>> http://web.expasy.org/docs/userman.html >>>>>> >>>>>> chris >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From hlapp at drycafe.net Tue Sep 11 18:42:10 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 11 Sep 2012 18:42:10 -0400 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> Message-ID: On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: > I'd be interested to see the GSoC project if it's public yet Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.kortschak at adelaide.edu.au Tue Sep 11 18:44:56 2012 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 11 Sep 2012 22:44:56 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au>, Message-ID: Cool, I'd already emailed him about how he's parallelising reads and writes. As I understand he's writing the project in D. The quality of the code looked good to my D-untrained eye. Dan On 12/09/2012, at 8:12 AM, "Hilmar Lapp" wrote: > > On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: > >> I'd be interested to see the GSoC project if it's public yet > > Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From hlapp at drycafe.net Tue Sep 11 19:00:10 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 11 Sep 2012 19:00:10 -0400 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <999F1000-E137-460D-8303-D398B3995003@drycafe.net> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> <999F1000-E137-460D-8303-D398B3995003@drycafe.net> Message-ID: And I should say that the repo is here: https://github.com/lomereiter/sambamba -hilmar On Sep 11, 2012, at 6:41 PM, Hilmar Lapp wrote: > > On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: > >> I'd be interested to see the GSoC project if it's public yet > > Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Tue Sep 11 21:29:04 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Sep 2012 01:29:04 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> <999F1000-E137-460D-8303-D398B3995003@drycafe.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> On the subject of context-free grammars and Perl, has anyone looked at Marpa? The latest version has a C lib that is intended for use outside of Perl. https://metacpan.org/module/Marpa::R2 (and ragel looks very promising as well) chris On Sep 11, 2012, at 6:00 PM, Hilmar Lapp wrote: > And I should say that the repo is here: https://github.com/lomereiter/sambamba > > -hilmar > > On Sep 11, 2012, at 6:41 PM, Hilmar Lapp wrote: > >> >> On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: >> >>> I'd be interested to see the GSoC project if it's public yet >> >> Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Sep 11 21:31:36 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Sep 2012 01:31:36 +0000 Subject: [Bioperl-l] clustalw2 In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA6380@CHIMBX5.ad.uillinois.edu> Can you call /usr/local/bin/clustalw2/clustalw directly? chris On Sep 5, 2012, at 10:41 AM, Gustav Ahlberg wrote: > Hi! > I am trying to run clustalw2 through bioperl. It is not working. > > I have tried exporting the path etc etc. clustalw2 is executable in the > terminal. I have included BEGIN { $ENV{CLUSTALDIR} = ' > /usr/local/bin/clustalw2/'} in each of my script. What is wrong here. > Thanks in advance for your answer. > > Best Gustav > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Sep 11 21:33:33 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Sep 2012 01:33:33 +0000 Subject: [Bioperl-l] PAML problem In-Reply-To: <34398168.post@talk.nabble.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> <34398168.post@talk.nabble.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA63AC@CHIMBX5.ad.uillinois.edu> Gahl, Daisie fixed PAML in the latest bioperl-live: https://github.com/bioperl/bioperl-live You can use git to check it out and use it, or use https://github.com/bioperl/bioperl-live/downloads and download a tarball/zip. chris On Sep 6, 2012, at 10:14 AM, Gahl wrote: > > Hello, > I have the sam problem. How did you fix it? > > Du, Peng wrote: >> >> Hi Daisie, >> >> Problem got fixed~~ >> Thank you ^_^. >> >> On Tue, Sep 4, 2012 at 8:05 PM, Du, Peng wrote: >>> Hi Daisie, >>> >>> Thank you for your reply. >>> Where could I get the bioperl-live you mentioned? Is it something >>> equivalent to bioperl core package? >>> If it is, do I need to reinstall the whole bioperl package? >>> >>> Thank you. >>> Peng >>> >>> On Fri, Aug 17, 2012 at 3:38 AM, Daisie Huang wrote: >>>> I'm not sure which PAML component caused this particular outcome, but >>>> the >>>> bugs and fixes I pushed to bioperl-live might fix this. When will those >>>> get >>>> pulled into the master? >>>> >>>> If those particular fixes don't help, I'd be happy to take a peek at the >>>> originator's code and see if it's a quick re-parsing fix. >>>> >>>> Daisie >>>> >>>> >>>> On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: >>>>> >>>>> Peng - >>>>> >>>>> This module needs a person who's sole job is to keep tracking bugs and >>>>> updating it with new versions of the program. so far it has burned out >>>>> several developers on working on it since it not stable. >>>>> >>>>> I am not sure what the answer is to the problem, but often it depends >>>>> on >>>>> the extra parameters used as this changes the order of the output >>>>> making it >>>>> hard to parse. >>>>> >>>>> So I don't have a solution for you except that you'll have to post the >>>>> bug >>>>> and the problem output mlc file to redmine and hope that we can entice >>>>> some >>>>> developers to bang their head against this some more. >>>>> >>>>> Jason >>>>> On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I am using bioperl to parse paml output, and I saw this >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >>>>>> MSG: Unknown format of PAML output did not see seqtype >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw >>>>>> /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Tools::Phylo::PAML::_parse_summary >>>>>> /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 >>>>>> STACK: Bio::Tools::Phylo::PAML::next_result >>>>>> /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 >>>>>> STACK: main::cal_dn_ds dn_ds.pl:131 >>>>>> STACK: dn_ds.pl:44 >>>>>> ---------------------------------------------------------------- >>>>>> >>>>>> I googled and found that, it was caused by PAML version >>>>>> incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them >>>>>> worked. Could someone tell me which version is fine? >>>>>> >>>>>> My bioperl version is 1.006001. Thank you very much. >>>>>> >>>>>> -- >>>>>> >>>>>> Peng Du >>>>>> Graduate School of Information Science and Technology, Hokkaido >>>>>> University >>>>>> Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >>>>>> Email: d... at ibio.jp Tel: +81 80 3268 9713 >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Biop... at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> Jason Stajich >>>>> jason.... at gmail.com >>>>> ja... at bioperl.org >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Biop... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> >>> Peng Du >>> Graduate School of Information Science and Technology, Hokkaido >>> University >>> Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >>> Email: du at ibio.jp Tel: +81 80 3268 9713 >> >> >> >> -- >> >> Peng Du >> Graduate School of Information Science and Technology, Hokkaido University >> Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 >> Email: du at ibio.jp Tel: +81 80 3268 9713 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://old.nabble.com/PAML-problem-tp34076014p34398168.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bottomsc at missouri.edu Tue Sep 11 23:02:49 2012 From: bottomsc at missouri.edu (Christopher Bottoms) Date: Tue, 11 Sep 2012 22:02:49 -0500 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> Message-ID: Dear Leon and Chris, Thanks for your feedback. Leon, I made several Improvements based on your feedback. Now, a wrapper script is used instead of the module file itself. FASTQ files are an acceptable input format. And the installer was improved. I found that installing Module::Build before our module cleared up issues with several dependencies. I think that our instructions for using cpanminus effectively give the same results as using local::lib. As for Alien packages, I would really like to work on them, but the time (i.e. funding) is currently too limited. Chris, Yes, the "App" part of the proposed name was chosen because this is designed more to be an application than to be modules to be reused. I fully intend to support this distribution myself. If you think it better to separate it more from the BioPerl namespace, I have considered calling it App::Bio::SELEX::RNAmotifAnalysis. What do you think? I plan to upload tomorrow if there aren't any objections. Thanks, Christopher Bottoms From taozhu at mail.bnu.edu.cn Wed Sep 12 08:28:31 2012 From: taozhu at mail.bnu.edu.cn (Tao Zhu) Date: Wed, 12 Sep 2012 20:28:31 +0800 Subject: [Bioperl-l] How to change a fasta format alignment into clustalw format? Message-ID: <50507FEF.2090504@mail.bnu.edu.cn> Hello, everyone I have an multiple protein sequence alignment in FASTA format: >SPOG_04578#scry MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- >SOCG_01498#soct ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE >SPAC1002.07c#spom -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID >SJAG_03288#sjap --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE I want to change it to CLUSTALW format. It could have been easy: my $in = shift; my $out = shift; my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta'); my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw'); while ( my $align_obj = $alignio->next_aln ) { $writeio->write_aln($align_obj); } That'OK. However it doesn't work, because it says "seq doesn't validate". In fact there has letter "2" in the alignment. Such "2" is intentionally marked by myself, meaning a phase-2 intron exists here. I hope to keep these markers in the output clustalw format. Is there any methods? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From hlapp at drycafe.net Wed Sep 12 09:02:28 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 12 Sep 2012 09:02:28 -0400 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> <999F1000-E137-460D-8303-D398B3995003@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> Message-ID: I hadn't seen that, but it looks interesting. However, the definition of the grammar is again Perl-style. What'd be really nice is to have grammars for which one can generate a state machine in one language, verify through testing that it (the grammar) is correct, and then be able to generate state machines in other languages without also having to rewrite all the tests. My sense of Ragel is that it's pretty close to that idea. -hilmar On Sep 11, 2012, at 9:29 PM, Fields, Christopher J wrote: > On the subject of context-free grammars and Perl, has anyone looked at Marpa? The latest version has a C lib that is intended for use outside of Perl. > > https://metacpan.org/module/Marpa::R2 > > (and ragel looks very promising as well) > > chris > > On Sep 11, 2012, at 6:00 PM, Hilmar Lapp > wrote: > >> And I should say that the repo is here: https://github.com/lomereiter/sambamba >> >> -hilmar >> >> On Sep 11, 2012, at 6:41 PM, Hilmar Lapp wrote: >> >>> >>> On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: >>> >>>> I'd be interested to see the GSoC project if it's public yet >>> >>> Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Sep 12 09:12:10 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Sep 2012 13:12:10 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> <999F1000-E137-460D-8303-D398B3995003@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA806F@CHIMBX5.ad.uillinois.edu> Yes, that's correct re: Marpa, though I believe the long-term goal is for language independence (so there would have to be a way to generate such a state machine). And I agree, Ragel is pretty close to that. chris On Sep 12, 2012, at 8:02 AM, Hilmar Lapp wrote: > I hadn't seen that, but it looks interesting. However, the definition of the grammar is again Perl-style. What'd be really nice is to have grammars for which one can generate a state machine in one language, verify through testing that it (the grammar) is correct, and then be able to generate state machines in other languages without also having to rewrite all the tests. My sense of Ragel is that it's pretty close to that idea. > > -hilmar > > On Sep 11, 2012, at 9:29 PM, Fields, Christopher J wrote: > >> On the subject of context-free grammars and Perl, has anyone looked at Marpa? The latest version has a C lib that is intended for use outside of Perl. >> >> https://metacpan.org/module/Marpa::R2 >> >> (and ragel looks very promising as well) >> >> chris >> >> On Sep 11, 2012, at 6:00 PM, Hilmar Lapp >> wrote: >> >>> And I should say that the repo is here: https://github.com/lomereiter/sambamba >>> >>> -hilmar >>> >>> On Sep 11, 2012, at 6:41 PM, Hilmar Lapp wrote: >>> >>>> >>>> On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: >>>> >>>>> I'd be interested to see the GSoC project if it's public yet >>>> >>>> Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com >>>> >>>> -hilmar >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From cjfields at illinois.edu Wed Sep 12 09:37:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Sep 2012 13:37:46 +0000 Subject: [Bioperl-l] How to change a fasta format alignment into clustalw format? In-Reply-To: <50507FEF.2090504@mail.bnu.edu.cn> References: <50507FEF.2090504@mail.bnu.edu.cn> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BA833F@CHIMBX5.ad.uillinois.edu> The below worked fine for me using the latest bioperl-live. Are you using an older version? chris [cjfields at pyrimidine-laptop clustalw]$ cat convert.pl #!/usr/bin/env perl use Modern::Perl; use Bio::AlignIO; my $in = Bio::AlignIO->new(-file => shift, -format => 'fasta'); my $out = Bio::AlignIO->new(-format => 'clustalw'); while (my $aln = $in->next_aln) { $out->write_aln($aln); } [cjfields at pyrimidine-laptop clustalw]$ cat test.fa >SPOG_04578#scry MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- >SOCG_01498#soct ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE >SPAC1002.07c#spom -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID >SJAG_03288#sjap --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE [cjfields at pyrimidine-laptop clustalw]$ perl convert.pl test.fa CLUSTAL W (1.81) multiple sequence alignment SPOG_04578#scry/1-54 MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- SOCG_01498#soct/1-51 ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE SPAC1002.07c#spom/1-50 -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID SJAG_03288#sjap/1-53 --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE :. :* : .:* ..* **** ***:::. :***** .* .*** On Sep 12, 2012, at 7:28 AM, Tao Zhu wrote: > Hello, everyone > > I have an multiple protein sequence alignment in FASTA format: > >> SPOG_04578#scry > MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- >> SOCG_01498#soct > ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE >> SPAC1002.07c#spom > -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID >> SJAG_03288#sjap > --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE > > I want to change it to CLUSTALW format. It could have been easy: > > my $in = shift; > my $out = shift; > my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta'); > my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw'); > while ( my $align_obj = $alignio->next_aln ) { > $writeio->write_aln($align_obj); > } > > That'OK. However it doesn't work, because it says "seq doesn't validate". > > In fact there has letter "2" in the alignment. Such "2" is intentionally > marked by myself, meaning a phase-2 intron exists here. I hope to keep > these markers in the output clustalw format. Is there any methods? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From BottomsC at missouri.edu Wed Sep 12 09:03:23 2012 From: BottomsC at missouri.edu (Bottoms, Christopher A) Date: Wed, 12 Sep 2012 13:03:23 +0000 Subject: [Bioperl-l] genbank/embl format ebnf or other formal description In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> References: <1347071284.3037.18.camel@sueno> <118F034CF4C3EF48A96F86CE585B94BF33BA1EB2@CHIMBX5.ad.uillinois.edu> <1347075839.3037.21.camel@sueno> , <118F034CF4C3EF48A96F86CE585B94BF33BA5125@CHIMBX5.ad.uillinois.edu> , <11975870-010F-4E23-9C44-A7FF6070D3A3@adelaide.edu.au> <999F1000-E137-460D-8303-D398B3995003@drycafe.net> , <118F034CF4C3EF48A96F86CE585B94BF33BA633E@CHIMBX5.ad.uillinois.edu> Message-ID: For Marpa, see also http://blogs.perl.org/users/jeffrey_kegler/2012/07/two-new-interfaces-to-marpa.html ________________________________________ From: Fields, Christopher J [cjfields at illinois.edu] Sent: Tuesday, September 11, 2012 8:29 PM To: Hilmar Lapp Cc: Dan Kortschak; ; Christopher Bottoms Subject: Re: [Bioperl-l] genbank/embl format ebnf or other formal description On the subject of context-free grammars and Perl, has anyone looked at Marpa? The latest version has a C lib that is intended for use outside of Perl. https://metacpan.org/module/Marpa::R2 (and ragel looks very promising as well) chris On Sep 11, 2012, at 6:00 PM, Hilmar Lapp wrote: > And I should say that the repo is here: https://github.com/lomereiter/sambamba > > -hilmar > > On Sep 11, 2012, at 6:41 PM, Hilmar Lapp wrote: > >> >> On Sep 11, 2012, at 6:26 PM, Dan Kortschak wrote: >> >>> I'd be interested to see the GSoC project if it's public yet >> >> Of course it is - it's GSoC after all, and an OBF student. You should be able to find links etc on the student's respective posts on his blog: http://lomereiter.wordpress.com >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From taozhu at mail.bnu.edu.cn Thu Sep 13 01:26:56 2012 From: taozhu at mail.bnu.edu.cn (Tao Zhu) Date: Thu, 13 Sep 2012 13:26:56 +0800 Subject: [Bioperl-l] How to change a fasta format alignment into clustalw format? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BA833F@CHIMBX5.ad.uillinois.edu> References: <50507FEF.2090504@mail.bnu.edu.cn> <118F034CF4C3EF48A96F86CE585B94BF33BA833F@CHIMBX5.ad.uillinois.edu> Message-ID: <50516EA0.70407@mail.bnu.edu.cn> Thank you! I'm using an old version, perhaps 1.6.1? I don't know how to check the version. When I turn to version 1.6.9, the problem has been solved. ? 2012?09?12? 21:37, Fields, Christopher J ??: > The below worked fine for me using the latest bioperl-live. Are you using an older version? > > chris > > [cjfields at pyrimidine-laptop clustalw]$ cat convert.pl > #!/usr/bin/env perl > use Modern::Perl; > use Bio::AlignIO; > > my $in = Bio::AlignIO->new(-file => shift, > -format => 'fasta'); > > my $out = Bio::AlignIO->new(-format => 'clustalw'); > > while (my $aln = $in->next_aln) { > $out->write_aln($aln); > } > > [cjfields at pyrimidine-laptop clustalw]$ cat test.fa >> SPOG_04578#scry > MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- >> SOCG_01498#soct > ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE >> SPAC1002.07c#spom > -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID >> SJAG_03288#sjap > --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE > > [cjfields at pyrimidine-laptop clustalw]$ perl convert.pl test.fa > CLUSTAL W (1.81) multiple sequence alignment > > > SPOG_04578#scry/1-54 MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- > SOCG_01498#soct/1-51 ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE > SPAC1002.07c#spom/1-50 -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID > SJAG_03288#sjap/1-53 --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE > :. :* : .:* ..* **** ***:::. :***** .* .*** > > On Sep 12, 2012, at 7:28 AM, Tao Zhu wrote: > >> Hello, everyone >> >> I have an multiple protein sequence alignment in FASTA format: >> >>> SPOG_04578#scry >> MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT- >>> SOCG_01498#soct >> ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE >>> SPAC1002.07c#spom >> -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID >>> SJAG_03288#sjap >> --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE >> >> I want to change it to CLUSTALW format. It could have been easy: >> >> my $in = shift; >> my $out = shift; >> my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta'); >> my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw'); >> while ( my $align_obj = $alignio->next_aln ) { >> $writeio->write_aln($align_obj); >> } >> >> That'OK. However it doesn't work, because it says "seq doesn't validate". >> >> In fact there has letter "2" in the alignment. Such "2" is intentionally >> marked by myself, meaning a phase-2 intron exists here. I hope to keep >> these markers in the output clustalw format. Is there any methods? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From rbuels at gmail.com Thu Sep 13 14:42:08 2012 From: rbuels at gmail.com (Robert Buels) Date: Thu, 13 Sep 2012 14:42:08 -0400 Subject: [Bioperl-l] bp_fetch.pl genbank example doesn't work anymore Message-ID: <50522900.1030806@gmail.com> Sigh. rob at x bioperl-live$ perl -I. scripts/*/bp_fetch.pl net::genbank:X47072 Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. Error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: id 'X47072' does not exist STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:167 STACK: scripts/index/bp_fetch.pl:307 ----------------------------------------------------------- Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. Error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: WebDBSeqI Request Error: HTTP/1.1 400 Bad Request Cache-Control: private Connection: close Date: Thu, 13 Sep 2012 18:40:49 GMT Server: Apache Content-Type: text/plain; charset=UTF-8 Access-Control-Allow-Origin: * Client-Date: Thu, 13 Sep 2012 18:40:49 GMT Client-Peer: 165.112.7.20:80 Client-Response-Num: 1 Client-Transfer-Encoding: chunked NCBI-SID: 73BF4D7E05228B11_0006SID Set-Cookie: ncbi_sid=73BF4D7E05228B11_0006SID; domain=.nih.gov; path=/; expires=Fri, 13 Sep 2013 18:40:49 GMT Cannot process ID list: OUT OF RANGE. Parameter retstart=0 + 1 is greater than number of IDs supplied in request. Incoming request includes 0 IDs. STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 STACK: Bio::DB::WebDBSeqI::_stream_request Bio/DB/WebDBSeqI.pm:773 STACK: Bio::DB::WebDBSeqI::get_seq_stream Bio/DB/WebDBSeqI.pm:467 STACK: Bio::DB::WebDBSeqI::get_Stream_by_id Bio/DB/WebDBSeqI.pm:288 STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:158 STACK: scripts/index/bp_fetch.pl:307 ----------------------------------------------------------- From jason.stajich at gmail.com Thu Sep 13 18:33:53 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 13 Sep 2012 15:33:53 -0700 Subject: [Bioperl-l] bp_fetch.pl genbank example doesn't work anymore In-Reply-To: <50522900.1030806@gmail.com> References: <50522900.1030806@gmail.com> Message-ID: <5BD449D2-0177-4CB4-87A2-8655D01E20C6@gmail.com> Rob - that accession does not exist, but it works with a valid accession, what would you like it to do when the accession is not loadable? perl scripts/index/bp_fetch.pl net::genbank:JX295726.1 On Sep 13, 2012, at 11:42 AM, Robert Buels wrote: > Sigh. > > rob at x bioperl-live$ perl -I. scripts/*/bp_fetch.pl net::genbank:X47072 > Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. > > Error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: id 'X47072' does not exist > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:167 > STACK: scripts/index/bp_fetch.pl:307 > ----------------------------------------------------------- > Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. > > Error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > HTTP/1.1 400 Bad Request > Cache-Control: private > Connection: close > Date: Thu, 13 Sep 2012 18:40:49 GMT > Server: Apache > Content-Type: text/plain; charset=UTF-8 > Access-Control-Allow-Origin: * > Client-Date: Thu, 13 Sep 2012 18:40:49 GMT > Client-Peer: 165.112.7.20:80 > Client-Response-Num: 1 > Client-Transfer-Encoding: chunked > NCBI-SID: 73BF4D7E05228B11_0006SID > Set-Cookie: ncbi_sid=73BF4D7E05228B11_0006SID; domain=.nih.gov; path=/; expires=Fri, 13 Sep 2013 18:40:49 GMT > > Cannot process ID list: OUT OF RANGE. Parameter retstart=0 + 1 is greater than number of IDs supplied in request. Incoming request includes 0 IDs. > > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 > STACK: Bio::DB::WebDBSeqI::_stream_request Bio/DB/WebDBSeqI.pm:773 > STACK: Bio::DB::WebDBSeqI::get_seq_stream Bio/DB/WebDBSeqI.pm:467 > STACK: Bio::DB::WebDBSeqI::get_Stream_by_id Bio/DB/WebDBSeqI.pm:288 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:158 > STACK: scripts/index/bp_fetch.pl:307 > ----------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From rbuels at gmail.com Fri Sep 14 15:57:39 2012 From: rbuels at gmail.com (Robert Buels) Date: Fri, 14 Sep 2012 15:57:39 -0400 Subject: [Bioperl-l] bp_fetch.pl genbank example doesn't work anymore In-Reply-To: <5BD449D2-0177-4CB4-87A2-8655D01E20C6@gmail.com> References: <50522900.1030806@gmail.com> <5BD449D2-0177-4CB4-87A2-8655D01E20C6@gmail.com> Message-ID: <50538C33.5080904@gmail.com> Well, two things then: 1. the POD example should be updated to one that obviously works (I did this just now, https://github.com/bioperl/bioperl-live/commit/e59237909386e672f8b0e22004cc5e4196221321) 2. It would be nice to translate NCBI's useless error message to something intelligible. I'll settle for number 1 for now though. R On 09/13/2012 06:33 PM, Jason Stajich wrote: > Rob - that accession does not exist, but it works with a valid accession, what would you like it to do when the accession is not loadable? > > perl scripts/index/bp_fetch.pl net::genbank:JX295726.1 > > On Sep 13, 2012, at 11:42 AM, Robert Buels wrote: > >> Sigh. >> >> rob at x bioperl-live$ perl -I. scripts/*/bp_fetch.pl net::genbank:X47072 >> Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. >> >> Error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: id 'X47072' does not exist >> STACK: Error::throw >> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:167 >> STACK: scripts/index/bp_fetch.pl:307 >> ----------------------------------------------------------- >> Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. >> >> Error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: WebDBSeqI Request Error: >> HTTP/1.1 400 Bad Request >> Cache-Control: private >> Connection: close >> Date: Thu, 13 Sep 2012 18:40:49 GMT >> Server: Apache >> Content-Type: text/plain; charset=UTF-8 >> Access-Control-Allow-Origin: * >> Client-Date: Thu, 13 Sep 2012 18:40:49 GMT >> Client-Peer: 165.112.7.20:80 >> Client-Response-Num: 1 >> Client-Transfer-Encoding: chunked >> NCBI-SID: 73BF4D7E05228B11_0006SID >> Set-Cookie: ncbi_sid=73BF4D7E05228B11_0006SID; domain=.nih.gov; path=/; expires=Fri, 13 Sep 2013 18:40:49 GMT >> >> Cannot process ID list: OUT OF RANGE. Parameter retstart=0 + 1 is greater than number of IDs supplied in request. Incoming request includes 0 IDs. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 >> STACK: Bio::DB::WebDBSeqI::_stream_request Bio/DB/WebDBSeqI.pm:773 >> STACK: Bio::DB::WebDBSeqI::get_seq_stream Bio/DB/WebDBSeqI.pm:467 >> STACK: Bio::DB::WebDBSeqI::get_Stream_by_id Bio/DB/WebDBSeqI.pm:288 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:158 >> STACK: scripts/index/bp_fetch.pl:307 >> ----------------------------------------------------------- >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From cjfields at illinois.edu Fri Sep 14 16:23:10 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 14 Sep 2012 20:23:10 +0000 Subject: [Bioperl-l] bp_fetch.pl genbank example doesn't work anymore In-Reply-To: <50538C33.5080904@gmail.com> References: <50522900.1030806@gmail.com> <5BD449D2-0177-4CB4-87A2-8655D01E20C6@gmail.com> <50538C33.5080904@gmail.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BB0645@CHIMBX5.ad.uillinois.edu> Should the script try to retrieve 0 IDs? That seems like a bug. chris On Sep 14, 2012, at 2:57 PM, Robert Buels wrote: > Well, two things then: > > 1. the POD example should be updated to one that obviously works (I did this just now, https://github.com/bioperl/bioperl-live/commit/e59237909386e672f8b0e22004cc5e4196221321) > > 2. It would be nice to translate NCBI's useless error message to something intelligible. > > I'll settle for number 1 for now though. > > R > > On 09/13/2012 06:33 PM, Jason Stajich wrote: >> Rob - that accession does not exist, but it works with a valid accession, what would you like it to do when the accession is not loadable? >> >> perl scripts/index/bp_fetch.pl net::genbank:JX295726.1 >> >> On Sep 13, 2012, at 11:42 AM, Robert Buels wrote: >> >>> Sigh. >>> >>> rob at x bioperl-live$ perl -I. scripts/*/bp_fetch.pl net::genbank:X47072 >>> Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. >>> >>> Error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: id 'X47072' does not exist >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 >>> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:167 >>> STACK: scripts/index/bp_fetch.pl:307 >>> ----------------------------------------------------------- >>> Sequence X47072 in Database genbank in net::genbank:X47072 is not loadable. Skipping. >>> >>> Error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: WebDBSeqI Request Error: >>> HTTP/1.1 400 Bad Request >>> Cache-Control: private >>> Connection: close >>> Date: Thu, 13 Sep 2012 18:40:49 GMT >>> Server: Apache >>> Content-Type: text/plain; charset=UTF-8 >>> Access-Control-Allow-Origin: * >>> Client-Date: Thu, 13 Sep 2012 18:40:49 GMT >>> Client-Peer: 165.112.7.20:80 >>> Client-Response-Num: 1 >>> Client-Transfer-Encoding: chunked >>> NCBI-SID: 73BF4D7E05228B11_0006SID >>> Set-Cookie: ncbi_sid=73BF4D7E05228B11_0006SID; domain=.nih.gov; path=/; expires=Fri, 13 Sep 2013 18:40:49 GMT >>> >>> Cannot process ID list: OUT OF RANGE. Parameter retstart=0 + 1 is greater than number of IDs supplied in request. Incoming request includes 0 IDs. >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:486 >>> STACK: Bio::DB::WebDBSeqI::_stream_request Bio/DB/WebDBSeqI.pm:773 >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream Bio/DB/WebDBSeqI.pm:467 >>> STACK: Bio::DB::WebDBSeqI::get_Stream_by_id Bio/DB/WebDBSeqI.pm:288 >>> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id Bio/DB/WebDBSeqI.pm:158 >>> STACK: scripts/index/bp_fetch.pl:307 >>> ----------------------------------------------------------- >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Sep 17 08:13:51 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Sep 2012 12:13:51 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j Message-ID: Hello all, If anyone is experimenting with Neo4j (www.neo4j.org) and needs a full-featured Perl OO interface, please give REST::Neo4p (https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! cheers, MAJ From guhli007 at umn.edu Mon Sep 17 10:32:12 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Mon, 17 Sep 2012 09:32:12 -0500 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: References: Message-ID: Hey thanks. I've been using the REST interface directly with Mojo::UA but will check this out. I've got a relatively large project that uses Neo4j and am now just writing the interface library. Is it possible to build a neo4j batch importer in perl or does it require java or do you know? I've got > 3.5mln nodes going in and several indexes, and it increases for each organism. --Joseph On Mon, Sep 17, 2012 at 7:13 AM, Mark A. Jensen wrote: > Hello all, > If anyone is experimenting with Neo4j (www.neo4j.org) and needs a > full-featured Perl OO interface, please give REST::Neo4p ( > https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! > cheers, > MAJ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Mon Sep 17 11:00:43 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 17 Sep 2012 11:00:43 -0400 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: References: Message-ID: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> Very nice, Mark! -hilmar On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: > Hello all, > If anyone is experimenting with Neo4j (www.neo4j.org) and needs a full-featured Perl OO interface, please give REST::Neo4p (https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! > cheers, > MAJ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Mon Sep 17 13:08:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 17 Sep 2012 17:08:28 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> References: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> Agree, very nice! chris On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: > Very nice, Mark! -hilmar > > On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: > >> Hello all, >> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a full-featured Perl OO interface, please give REST::Neo4p (https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >> cheers, >> MAJ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Sep 17 13:28:42 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 17 Sep 2012 17:28:42 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> References: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BB2DBC@CHIMBX5.ad.uillinois.edu> Following up on this, any specific modules where this would be a boon? I could see this being very useful for Bio::Tree (and Jason's recent work on creating a persistent backend for storing tree info). Bio::Ontology also? chris On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" wrote: > Agree, very nice! > > chris > > On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: > >> Very nice, Mark! -hilmar >> >> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >> >>> Hello all, >>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a full-featured Perl OO interface, please give REST::Neo4p (https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >>> cheers, >>> MAJ >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Mon Sep 17 13:44:14 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 17 Sep 2012 11:44:14 -0600 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33BB2DBC@CHIMBX5.ad.uillinois.edu> References: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF33BB2DBC@CHIMBX5.ad.uillinois.edu> Message-ID: Yes, indeed. And lo and behold, the Open Tree of Life (http://opentreeoflife.org) folks are at present building their tree store on top of Neo4J. They're mostly Python, though. (Which I'm sure our Biopython colleagues will love to hear.) -hilmar On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: > Following up on this, any specific modules where this would be a boon? I could see this being very useful for Bio::Tree (and Jason's recent work on creating a persistent backend for storing tree info). Bio::Ontology also? > > chris > > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" wrote: > >> Agree, very nice! >> >> chris >> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: >> >>> Very nice, Mark! -hilmar >>> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >>> >>>> Hello all, >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a full-featured Perl OO interface, please give REST::Neo4p (https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >>>> cheers, >>>> MAJ >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From guhli007 at umn.edu Mon Sep 17 14:21:29 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Mon, 17 Sep 2012 13:21:29 -0500 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: References: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF33BB2DBC@CHIMBX5.ad.uillinois.edu> Message-ID: Another project using Neo4j is Bio4j. The database is generated using java but can be queried through the REST interface with CYPHER. I haven't had a chance to build it yet or look at it, but may do so in the near future. I am using Neo4j for my own project and the only thing generic enough I could contribute with (that also overlaps with another module) is The Gene Ontology. I import it (~60k node, ~600k relationships, but I ignore a few that others may consider key) and could clean up and contribute that code. I do use a bulk import script written in java that uses the native libraries to process everything so would have to make it work for this module directly (or keep the java import, as it is open source). --Joseph On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: > Yes, indeed. And lo and behold, the Open Tree of Life ( > http://opentreeoflife.org) folks are at present building their tree store > on top of Neo4J. They're mostly Python, though. (Which I'm sure our > Biopython colleagues will love to hear.) > > -hilmar > > On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: > > > Following up on this, any specific modules where this would be a boon? > I could see this being very useful for Bio::Tree (and Jason's recent work > on creating a persistent backend for storing tree info). Bio::Ontology also? > > > > chris > > > > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < > cjfields at illinois.edu> wrote: > > > >> Agree, very nice! > >> > >> chris > >> > >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: > >> > >>> Very nice, Mark! -hilmar > >>> > >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: > >>> > >>>> Hello all, > >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a > full-featured Perl OO interface, please give REST::Neo4p ( > https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! > >>>> cheers, > >>>> MAJ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From guhli007 at umn.edu Mon Sep 17 14:22:39 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Mon, 17 Sep 2012 13:22:39 -0500 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: References: <44E5DC0A-7056-4209-BB07-F86A32F96635@drycafe.net> <118F034CF4C3EF48A96F86CE585B94BF33BB2D13@CHIMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF33BB2DBC@CHIMBX5.ad.uillinois.edu> Message-ID: Forgot the link, sorry: http://www.bio4j.com/ "Schema" is here: http://blog.bio4j.com/2012/05/new-bio4j-general-domain-model-schema-available/ --Joseph On Mon, Sep 17, 2012 at 1:21 PM, Joseph Guhlin wrote: > Another project using Neo4j is Bio4j. The database is generated using java > but can be queried through the REST interface with CYPHER. > > I haven't had a chance to build it yet or look at it, but may do so in the > near future. > > I am using Neo4j for my own project and the only thing generic enough I > could contribute with (that also overlaps with another module) is The Gene > Ontology. I import it (~60k node, ~600k relationships, but I ignore a few > that others may consider key) and could clean up and contribute that code. > I do use a bulk import script written in java that uses the native > libraries to process everything so would have to make it work for this > module directly (or keep the java import, as it is open source). > > --Joseph > > On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: > >> Yes, indeed. And lo and behold, the Open Tree of Life ( >> http://opentreeoflife.org) folks are at present building their tree >> store on top of Neo4J. They're mostly Python, though. (Which I'm sure our >> Biopython colleagues will love to hear.) >> >> -hilmar >> >> On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: >> >> > Following up on this, any specific modules where this would be a boon? >> I could see this being very useful for Bio::Tree (and Jason's recent work >> on creating a persistent backend for storing tree info). Bio::Ontology also? >> > >> > chris >> > >> > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < >> cjfields at illinois.edu> wrote: >> > >> >> Agree, very nice! >> >> >> >> chris >> >> >> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: >> >> >> >>> Very nice, Mark! -hilmar >> >>> >> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >> >>> >> >>>> Hello all, >> >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a >> full-featured Perl OO interface, please give REST::Neo4p ( >> https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >> >>>> cheers, >> >>>> MAJ >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Bioperl-l mailing list >> >>>> Bioperl-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> -- >> >>> =========================================================== >> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> >>> =========================================================== >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From maj at fortinbras.us Mon Sep 17 15:02:12 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Sep 2012 19:02:12 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j Message-ID: W00t! Glad to see there is scope for this; now I can say I did one useful thing for BP in two years! There is batch support (as well as traversals) in the REST api; looks like the next two features I should add. cheers all MAJ Batch is an interesting problem, >-----Original Message----- >From: Joseph Guhlin [mailto:guhli007 at umn.edu] >Sent: Monday, September 17, 2012 02:21 PM >To: 'Hilmar Lapp' >Cc: 'Fields, Christopher J', , 'Mark A. Jensen' >Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j > >Another project using Neo4j is Bio4j. The database is generated using java >but can be queried through the REST interface with CYPHER. > >I haven't had a chance to build it yet or look at it, but may do so in the >near future. > >I am using Neo4j for my own project and the only thing generic enough I >could contribute with (that also overlaps with another module) is The Gene >Ontology. I import it (~60k node, ~600k relationships, but I ignore a few >that others may consider key) and could clean up and contribute that code. >I do use a bulk import script written in java that uses the native >libraries to process everything so would have to make it work for this >module directly (or keep the java import, as it is open source). > >--Joseph > >On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: > >> Yes, indeed. And lo and behold, the Open Tree of Life ( >> http://opentreeoflife.org) folks are at present building their tree store >> on top of Neo4J. They're mostly Python, though. (Which I'm sure our >> Biopython colleagues will love to hear.) >> >> -hilmar >> >> On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: >> >> > Following up on this, any specific modules where this would be a boon? >> I could see this being very useful for Bio::Tree (and Jason's recent work >> on creating a persistent backend for storing tree info). Bio::Ontology also? >> > >> > chris >> > >> > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < >> cjfields at illinois.edu> wrote: >> > >> >> Agree, very nice! >> >> >> >> chris >> >> >> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: >> >> >> >>> Very nice, Mark! -hilmar >> >>> >> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >> >>> >> >>>> Hello all, >> >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a >> full-featured Perl OO interface, please give REST::Neo4p ( >> https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >> >>>> cheers, >> >>>> MAJ >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Bioperl-l mailing list >> >>>> Bioperl-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> -- >> >>> =========================================================== >> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> >>> =========================================================== >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bill_zt at sina.com Tue Sep 18 05:28:19 2012 From: bill_zt at sina.com (Tao Zhu) Date: Tue, 18 Sep 2012 17:28:19 +0800 Subject: [Bioperl-l] converting to clustalw alignment format Message-ID: <50583EB3.4040607@sina.com> I have a DNA seq alignment in FASTA format: > scry TATAAAAACATTAAGTGCTACTGCACTGATATAATTGTTAAC > soct TACAAAAATTTCAAATGCTATTGTACTGATATCATTGTCGAC > spom TTTAAAAATGTAGAGTGTTATTGCACTAATTTAAGCATTGAC > sjap GAAAAAGAAATCGAGTGTTATTGTACAGACTTGATTGTAAAC I changed it to CLUSTALW format using Bio::AlignIO my $in = shift; my $out = shift; use Bio::AlignIO; my $align_obj = Bio::AlignIO->new(-file=>$in, -format=>'fasta')->next_aln; my $writeio_obj = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw'); $writeio_obj->write_aln($align_obj); The result is: scry/1-139 TATAAAAACATTAAGTGCTACTGCACTGATATAATTGTTAAC soct/1-137 TACAAAAATTTCAAATGCTATTGTACTGATATCATTGTCGAC spom/1-135 TTTAAAAATGTAGAGTGTTATTGCACTAATTTAAGCATTGAC sjap/1-87 GAAAAAGAAATCGAGTGTTATTGTACAGACTTGATTGTAAAC *** * * * ** ** ** ** * * * * ** I didn't want positions like "/1-139" to be shown, I just want the same sequence name as original. How should I do to eliminate positions like "/1-XXX"? Thank you! From roy.chaudhuri at gmail.com Tue Sep 18 07:41:57 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 18 Sep 2012 12:41:57 +0100 Subject: [Bioperl-l] converting to clustalw alignment format In-Reply-To: <50583EB3.4040607@sina.com> References: <50583EB3.4040607@sina.com> Message-ID: Include the option: -displayname_flat=>1 when you create your writeio_obj, ie.: my $writeio_obj = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw', -displayname_flat=>1); See: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/AlignIO.html#POD1 Cheers, Roy. On 18 September 2012 10:28, Tao Zhu wrote: > > > I have a DNA seq alignment in FASTA format: > >> scry > TATAAAAACATTAAGTGCTACTGCACTGATATAATTGTTAAC >> soct > TACAAAAATTTCAAATGCTATTGTACTGATATCATTGTCGAC >> spom > TTTAAAAATGTAGAGTGTTATTGCACTAATTTAAGCATTGAC >> sjap > GAAAAAGAAATCGAGTGTTATTGTACAGACTTGATTGTAAAC > > I changed it to CLUSTALW format using Bio::AlignIO > > my $in = shift; > my $out = shift; > use Bio::AlignIO; > my $align_obj = Bio::AlignIO->new(-file=>$in, > -format=>'fasta')->next_aln; > my $writeio_obj = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw'); > $writeio_obj->write_aln($align_obj); > > The result is: > scry/1-139 TATAAAAACATTAAGTGCTACTGCACTGATATAATTGTTAAC > soct/1-137 TACAAAAATTTCAAATGCTATTGTACTGATATCATTGTCGAC > spom/1-135 TTTAAAAATGTAGAGTGTTATTGCACTAATTTAAGCATTGAC > sjap/1-87 GAAAAAGAAATCGAGTGTTATTGTACAGACTTGATTGTAAAC > *** * * * ** ** ** ** * * * * ** > > I didn't want positions like "/1-139" to be shown, I just want the same > sequence name as original. > > How should I do to eliminate positions like "/1-XXX"? Thank you! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.nasko at gmail.com Thu Sep 20 08:47:55 2012 From: dan.nasko at gmail.com (Dan Nasko) Date: Thu, 20 Sep 2012 08:47:55 -0400 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio Message-ID: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> Hi, I've recently begun working through some PacBio sequencing data and it has been chocking up current bioperl FASTQ I/O modules. Here are the problems I'm running into: [1] PacBio will report quality scores up to 100 - I believe there's an upper limit of 93 and the FASTQ parser will throw and error if that's surpassed. [2] Very often PacBio will have one base sequences. e.g.: @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589 T + 0 @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 G + ( If this one base sequence has a quality character of "0" (quality score 15), shown above, I/O will throw the following error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Quality string [0 at m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321] of length [78] doesn't match length of sequence T [1], line: 86394 STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.12/Bio/Root/Root.pm:472 STACK: Bio::SeqIO::fastq::next_dataset /Library/Perl/5.12/Bio/SeqIO/fastq.pm:102 STACK: Bio::SeqIO::fastq::next_seq /Library/Perl/5.12/Bio/SeqIO/fastq.pm:29 STACK: quality_length_filter.pl:146 ----------------------------------------------------------- For some reason when it encounters ^0$ on the quality line, it won't see the [\n] and will take up the next sequence's header as quality scores. (i.e. @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 was the name of the next sequence). Thanks, Dan From p.j.a.cock at googlemail.com Thu Sep 20 10:30:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Sep 2012 15:30:07 +0100 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio In-Reply-To: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> References: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> Message-ID: On Thu, Sep 20, 2012 at 1:47 PM, Dan Nasko wrote: > Hi, > > I've recently begun working through some PacBio sequencing data > and it has been chocking up current bioperl FASTQ I/O modules. > Here are the problems I'm running into: > > [1] PacBio will report quality scores up to 100 - I believe there's > an upper limit of 93 and the FASTQ parser will throw and error if > that's surpassed. How exactly? The 93 limit comes from the fact that the top printable ASCII character is '~', 126 - 33 = 93. Are PacBio joining in the game of redefining FASTQ encodings? An example would be very interesting. > [2] Very often PacBio will have one base sequences. e.g.: > > > @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589 > T > + > 0 > @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 > G > + > ( > > If this one base sequence has a quality character of "0" > (quality score 15), shown above, I/O will throw the following error: I thought that BioPerl bug had been fixed... or maybe it was the very similar situation of a quality score using the zero character? Peter From cjfields at illinois.edu Thu Sep 20 12:08:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 20 Sep 2012 16:08:21 +0000 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio In-Reply-To: References: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BC0E0C@CITESMBX5.ad.uillinois.edu> On Sep 20, 2012, at 9:30 AM, Peter Cock wrote: > On Thu, Sep 20, 2012 at 1:47 PM, Dan Nasko wrote: >> Hi, >> >> I've recently begun working through some PacBio sequencing data >> and it has been chocking up current bioperl FASTQ I/O modules. >> Here are the problems I'm running into: >> >> [1] PacBio will report quality scores up to 100 - I believe there's >> an upper limit of 93 and the FASTQ parser will throw and error if >> that's surpassed. > > How exactly? The 93 limit comes from the fact that the top printable > ASCII character is '~', 126 - 33 = 93. Are PacBio joining in the game > of redefining FASTQ encodings? An example would be very > interesting. > >> [2] Very often PacBio will have one base sequences. e.g.: >> >> >> @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589 >> T >> + >> 0 >> @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 >> G >> + >> ( >> >> If this one base sequence has a quality character of "0" >> (quality score 15), shown above, I/O will throw the following error: > > I thought that BioPerl bug had been fixed... or maybe it was the > very similar situation of a quality score using the zero character? > > Peter This should be fixed. Is this using the latest CPAN release? The latest code from GitHub? chris From p.j.a.cock at googlemail.com Thu Sep 20 12:31:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Sep 2012 17:31:50 +0100 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio In-Reply-To: <002159B2-0A4A-4B7B-863C-EE26D21088DE@gmail.com> References: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> <002159B2-0A4A-4B7B-863C-EE26D21088DE@gmail.com> Message-ID: Thanks Dan - I got the attachment too. I've CC'd the list again. On Thu, Sep 20, 2012 at 5:01 PM, Dan Nasko wrote: > Hi Peter, > > Not sure how familiar you are with PacBio, but here's a quick synopsis. > > For each read the sequencer sequences 2 to 6 copies of the read at extremely > low quality (0-14). Many of these reads are exceedingly small (1 to 3 bases) > -- which is where I run into that issue of quality encoded value of 0 > throwing an error. > > From there these 2 to 6 copies of the read are aligned into a final > consensus sequence which will have extremely high quality scores -- which is > where I'm getting this >93 problem. > > I was at a PacBio conference a couple of weeks ago in Baltimore and one of > their representatives ensured me they're encoding in Phred33, so this isn't > the issue. Using Perl's ord function I created a hash indexing quality > character, the score, and number of occurrences. Here are the last line of > output: > > ... > y 88 3683 > z 89 3662 > { 90 4363 > | 91 3750 > } 92 2831 > ~ 93 3150 > 94 9219 > ? 95 2221 > ? 96 4232 > ? 97 1554 > ? 98 1399 > ? 99 1723 > ? 100 25991 Right - I understand, but believe this is not valid FASTQ because they are using non-printing characters. i.e. I think BioPerl is doing the right thing here (and Biopython, BioRuby, BioJava, EMBOSS should too): http://dx.doi.org/10.1093/nar/gkp1137 To stay within the printable character set, PacBio must cap their FASTQ output at chr(127). This equivalently means capping the PHRED scores at 93 (although it might be wise to cap at less than that as IIRC some tools may treat the max value specially - so perhaps cap at 90?). The same restriction also applies to SAM/BAM, since SAM also uses the Sanger FASTQ encoding (PHRED + 33). This range is explicit in the specification as [!-~]+ (regular expression). Do you have contact details for a PacBio technical representative? I think they will need to fix this. Regards, Peter From dan.nasko at gmail.com Thu Sep 20 14:00:11 2012 From: dan.nasko at gmail.com (Dan Nasko) Date: Thu, 20 Sep 2012 14:00:11 -0400 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio In-Reply-To: References: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> <002159B2-0A4A-4B7B-863C-EE26D21088DE@gmail.com> Message-ID: Peter, Here is Frank Boellmann's email, he's a Field Application Specialist and Bioinformatician at PacBio who is generally great at resolving these issues: fboellmann at pacificbiosciences.com I've gone ahead and contacted another one of their software engineers and am waiting for a response from him. Dan On Sep 20, 2012, at 12:31 PM, Peter Cock wrote: > Thanks Dan - I got the attachment too. I've CC'd the list again. > > On Thu, Sep 20, 2012 at 5:01 PM, Dan Nasko wrote: >> Hi Peter, >> >> Not sure how familiar you are with PacBio, but here's a quick synopsis. >> >> For each read the sequencer sequences 2 to 6 copies of the read at extremely >> low quality (0-14). Many of these reads are exceedingly small (1 to 3 bases) >> -- which is where I run into that issue of quality encoded value of 0 >> throwing an error. >> >> From there these 2 to 6 copies of the read are aligned into a final >> consensus sequence which will have extremely high quality scores -- which is >> where I'm getting this >93 problem. >> >> I was at a PacBio conference a couple of weeks ago in Baltimore and one of >> their representatives ensured me they're encoding in Phred33, so this isn't >> the issue. Using Perl's ord function I created a hash indexing quality >> character, the score, and number of occurrences. Here are the last line of >> output: >> >> ... >> y 88 3683 >> z 89 3662 >> { 90 4363 >> | 91 3750 >> } 92 2831 >> ~ 93 3150 >> 94 9219 >> ? 95 2221 >> ? 96 4232 >> ? 97 1554 >> ? 98 1399 >> ? 99 1723 >> ? 100 25991 > > Right - I understand, but believe this is not valid FASTQ because they > are using non-printing characters. i.e. I think BioPerl is doing the right > thing here (and Biopython, BioRuby, BioJava, EMBOSS should too): > http://dx.doi.org/10.1093/nar/gkp1137 > > To stay within the printable character set, PacBio must cap their FASTQ > output at chr(127). This equivalently means capping the PHRED scores > at 93 (although it might be wise to cap at less than that as IIRC some > tools may treat the max value specially - so perhaps cap at 90?). > > The same restriction also applies to SAM/BAM, since SAM also > uses the Sanger FASTQ encoding (PHRED + 33). This range is > explicit in the specification as [!-~]+ (regular expression). > > Do you have contact details for a PacBio technical representative? > I think they will need to fix this. > > Regards, > > Peter From maj at fortinbras.us Sun Sep 23 17:46:32 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 23 Sep 2012 21:46:32 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j Message-ID: All, I have added support for batch processing to REST::Neo4p, please have a look at https://metacpan.org/module/REST::Neo4p::Batch. To use it, you write in the context of the object framework, but surround your code with a "batch" block; e.g.: #!perl # loader... use REST::Neo4p; use REST::Neo4p::Batch; open $f, shift() or die $!; batch { while (<$f>) { chomp; ($name, $value) = split /\t/; REST::Neo4p::Node->new({name => $name, value => $value}); } 'discard_objs'; exit(0); The code takes care of packaging the contained individual API calls and sending them off to the server in neo4j's batch format. Please send me bugs on RT (https://rt.cpan.org/Public/Bug/Report.html?Queue=REST-Neo4p) cheers MAJ >-----Original Message----- >From: Mark A. Jensen [mailto:maj at fortinbras.us] >Sent: Monday, September 17, 2012 03:02 PM >To: 'Joseph Guhlin', 'Hilmar Lapp' >Cc: 'Fields, Christopher J', , 'Mark A. Jensen' >Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j > >W00t! Glad to see there is scope for this; now I can say I did one useful thing for BP in two years! > >There is batch support (as well as traversals) in the REST api; looks like the next two features I should add. > >cheers all MAJ > >Batch is an interesting problem, >>-----Original Message----- >>From: Joseph Guhlin [mailto:guhli007 at umn.edu] >>Sent: Monday, September 17, 2012 02:21 PM >>To: 'Hilmar Lapp' >>Cc: 'Fields, Christopher J', , 'Mark A. Jensen' >>Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j >> >>Another project using Neo4j is Bio4j. The database is generated using java >>but can be queried through the REST interface with CYPHER. >> >>I haven't had a chance to build it yet or look at it, but may do so in the >>near future. >> >>I am using Neo4j for my own project and the only thing generic enough I >>could contribute with (that also overlaps with another module) is The Gene >>Ontology. I import it (~60k node, ~600k relationships, but I ignore a few >>that others may consider key) and could clean up and contribute that code. >>I do use a bulk import script written in java that uses the native >>libraries to process everything so would have to make it work for this >>module directly (or keep the java import, as it is open source). >> >>--Joseph >> >>On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: >> >>> Yes, indeed. And lo and behold, the Open Tree of Life ( >>> http://opentreeoflife.org) folks are at present building their tree store >>> on top of Neo4J. They're mostly Python, though. (Which I'm sure our >>> Biopython colleagues will love to hear.) >>> >>> -hilmar >>> >>> On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: >>> >>> > Following up on this, any specific modules where this would be a boon? >>> I could see this being very useful for Bio::Tree (and Jason's recent work >>> on creating a persistent backend for storing tree info). Bio::Ontology also? >>> > >>> > chris >>> > >>> > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < >>> cjfields at illinois.edu> wrote: >>> > >>> >> Agree, very nice! >>> >> >>> >> chris >>> >> >>> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp wrote: >>> >> >>> >>> Very nice, Mark! -hilmar >>> >>> >>> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >>> >>> >>> >>>> Hello all, >>> >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a >>> full-featured Perl OO interface, please give REST::Neo4p ( >>> https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via RT! >>> >>>> cheers, >>> >>>> MAJ >>> >>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Bioperl-l mailing list >>> >>>> Bioperl-l at lists.open-bio.org >>> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> >>> =========================================================== >>> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> >>> =========================================================== >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >>> >> >>> >> _______________________________________________ >>> >> Bioperl-l mailing list >>> >> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From guhli007 at umn.edu Tue Sep 25 10:53:37 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Tue, 25 Sep 2012 09:53:37 -0500 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j In-Reply-To: References: Message-ID: Thanks, I'll try to run some tests on this this week. Is there a limit to the number of things you can do? Should I stagger at 10k or 100k or somewhere or do you know, or will it handle it itself? Best, --Joseph On Sun, Sep 23, 2012 at 4:46 PM, Mark A. Jensen wrote: > All, > I have added support for batch processing to REST::Neo4p, please have a > look at https://metacpan.org/module/REST::Neo4p::Batch. > > To use it, you write in the context of the object framework, but surround > your code with a "batch" block; e.g.: > > #!perl > # loader... > use REST::Neo4p; > use REST::Neo4p::Batch; > > open $f, shift() or die $!; > batch { > while (<$f>) { > chomp; > ($name, $value) = split /\t/; > REST::Neo4p::Node->new({name => $name, value => $value}); > } 'discard_objs'; > exit(0); > > The code takes care of packaging the contained individual API calls and > sending them off to the server in neo4j's batch format. > > Please send me bugs on RT ( > https://rt.cpan.org/Public/Bug/Report.html?Queue=REST-Neo4p) > > cheers MAJ > > >-----Original Message----- > >From: Mark A. Jensen [mailto:maj at fortinbras.us] > >Sent: Monday, September 17, 2012 03:02 PM > >To: 'Joseph Guhlin', 'Hilmar Lapp' > >Cc: 'Fields, Christopher J', , 'Mark A. > Jensen' > >Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j > > > >W00t! Glad to see there is scope for this; now I can say I did one useful > thing for BP in two years! > > > >There is batch support (as well as traversals) in the REST api; looks > like the next two features I should add. > > > >cheers all MAJ > > > >Batch is an interesting problem, > >>-----Original Message----- > >>From: Joseph Guhlin [mailto:guhli007 at umn.edu] > >>Sent: Monday, September 17, 2012 02:21 PM > >>To: 'Hilmar Lapp' > >>Cc: 'Fields, Christopher J', , 'Mark A. > Jensen' > >>Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j > >> > >>Another project using Neo4j is Bio4j. The database is generated using > java > >>but can be queried through the REST interface with CYPHER. > >> > >>I haven't had a chance to build it yet or look at it, but may do so in > the > >>near future. > >> > >>I am using Neo4j for my own project and the only thing generic enough I > >>could contribute with (that also overlaps with another module) is The > Gene > >>Ontology. I import it (~60k node, ~600k relationships, but I ignore a few > >>that others may consider key) and could clean up and contribute that > code. > >>I do use a bulk import script written in java that uses the native > >>libraries to process everything so would have to make it work for this > >>module directly (or keep the java import, as it is open source). > >> > >>--Joseph > >> > >>On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: > >> > >>> Yes, indeed. And lo and behold, the Open Tree of Life ( > >>> http://opentreeoflife.org) folks are at present building their tree > store > >>> on top of Neo4J. They're mostly Python, though. (Which I'm sure our > >>> Biopython colleagues will love to hear.) > >>> > >>> -hilmar > >>> > >>> On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: > >>> > >>> > Following up on this, any specific modules where this would be a > boon? > >>> I could see this being very useful for Bio::Tree (and Jason's recent > work > >>> on creating a persistent backend for storing tree info). Bio::Ontology > also? > >>> > > >>> > chris > >>> > > >>> > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < > >>> cjfields at illinois.edu> wrote: > >>> > > >>> >> Agree, very nice! > >>> >> > >>> >> chris > >>> >> > >>> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp > wrote: > >>> >> > >>> >>> Very nice, Mark! -hilmar > >>> >>> > >>> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: > >>> >>> > >>> >>>> Hello all, > >>> >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a > >>> full-featured Perl OO interface, please give REST::Neo4p ( > >>> https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via > RT! > >>> >>>> cheers, > >>> >>>> MAJ > >>> >>>> > >>> >>>> > >>> >>>> _______________________________________________ > >>> >>>> Bioperl-l mailing list > >>> >>>> Bioperl-l at lists.open-bio.org > >>> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> >>> > >>> >>> -- > >>> >>> =========================================================== > >>> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > >>> >>> =========================================================== > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> _______________________________________________ > >>> >>> Bioperl-l mailing list > >>> >>> Bioperl-l at lists.open-bio.org > >>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> >> > >>> >> > >>> >> _______________________________________________ > >>> >> Bioperl-l mailing list > >>> >> Bioperl-l at lists.open-bio.org > >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > > >>> > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From maj at fortinbras.us Tue Sep 25 13:07:25 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 Sep 2012 17:07:25 +0000 Subject: [Bioperl-l] New Perl ORM for graph db Neo4j Message-ID: Hi Joseph, Yes, it does its own chunking. You can control with $REST::Neo4p::JOB_CHUNK which defaults to 1024. That is a completely untested number, so any info is welcomed. The running by chunk should be transparent to the user, so if it borks let me know that too-- cheers MAJ >-----Original Message----- >From: Joseph Guhlin [mailto:guhli007 at umn.edu] >Sent: Tuesday, September 25, 2012 10:53 AM >To: 'Mark A. Jensen' >Cc: 'Hilmar Lapp', 'Fields, Christopher J', >Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j > >Thanks, I'll try to run some tests on this this week. Is there a limit to >the number of things you can do? Should I stagger at 10k or 100k or >somewhere or do you know, or will it handle it itself? > >Best, >--Joseph > >On Sun, Sep 23, 2012 at 4:46 PM, Mark A. Jensen wrote: > >> All, >> I have added support for batch processing to REST::Neo4p, please have a >> look at https://metacpan.org/module/REST::Neo4p::Batch. >> >> To use it, you write in the context of the object framework, but surround >> your code with a "batch" block; e.g.: >> >> #!perl >> # loader... >> use REST::Neo4p; >> use REST::Neo4p::Batch; >> >> open $f, shift() or die $!; >> batch { >> while (<$f>) { >> chomp; >> ($name, $value) = split /\t/; >> REST::Neo4p::Node->new({name => $name, value => $value}); >> } 'discard_objs'; >> exit(0); >> >> The code takes care of packaging the contained individual API calls and >> sending them off to the server in neo4j's batch format. >> >> Please send me bugs on RT ( >> https://rt.cpan.org/Public/Bug/Report.html?Queue=REST-Neo4p) >> >> cheers MAJ >> >> >-----Original Message----- >> >From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >Sent: Monday, September 17, 2012 03:02 PM >> >To: 'Joseph Guhlin', 'Hilmar Lapp' >> >Cc: 'Fields, Christopher J', , 'Mark A. >> Jensen' >> >Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j >> > >> >W00t! Glad to see there is scope for this; now I can say I did one useful >> thing for BP in two years! >> > >> >There is batch support (as well as traversals) in the REST api; looks >> like the next two features I should add. >> > >> >cheers all MAJ >> > >> >Batch is an interesting problem, >> >>-----Original Message----- >> >>From: Joseph Guhlin [mailto:guhli007 at umn.edu] >> >>Sent: Monday, September 17, 2012 02:21 PM >> >>To: 'Hilmar Lapp' >> >>Cc: 'Fields, Christopher J', , 'Mark A. >> Jensen' >> >>Subject: Re: [Bioperl-l] New Perl ORM for graph db Neo4j >> >> >> >>Another project using Neo4j is Bio4j. The database is generated using >> java >> >>but can be queried through the REST interface with CYPHER. >> >> >> >>I haven't had a chance to build it yet or look at it, but may do so in >> the >> >>near future. >> >> >> >>I am using Neo4j for my own project and the only thing generic enough I >> >>could contribute with (that also overlaps with another module) is The >> Gene >> >>Ontology. I import it (~60k node, ~600k relationships, but I ignore a few >> >>that others may consider key) and could clean up and contribute that >> code. >> >>I do use a bulk import script written in java that uses the native >> >>libraries to process everything so would have to make it work for this >> >>module directly (or keep the java import, as it is open source). >> >> >> >>--Joseph >> >> >> >>On Mon, Sep 17, 2012 at 12:44 PM, Hilmar Lapp wrote: >> >> >> >>> Yes, indeed. And lo and behold, the Open Tree of Life ( >> >>> http://opentreeoflife.org) folks are at present building their tree >> store >> >>> on top of Neo4J. They're mostly Python, though. (Which I'm sure our >> >>> Biopython colleagues will love to hear.) >> >>> >> >>> -hilmar >> >>> >> >>> On Sep 17, 2012, at 11:28 AM, Fields, Christopher J wrote: >> >>> >> >>> > Following up on this, any specific modules where this would be a >> boon? >> >>> I could see this being very useful for Bio::Tree (and Jason's recent >> work >> >>> on creating a persistent backend for storing tree info). Bio::Ontology >> also? >> >>> > >> >>> > chris >> >>> > >> >>> > On Sep 17, 2012, at 12:08 PM, "Fields, Christopher J" < >> >>> cjfields at illinois.edu> wrote: >> >>> > >> >>> >> Agree, very nice! >> >>> >> >> >>> >> chris >> >>> >> >> >>> >> On Sep 17, 2012, at 10:00 AM, Hilmar Lapp >> wrote: >> >>> >> >> >>> >>> Very nice, Mark! -hilmar >> >>> >>> >> >>> >>> On Sep 17, 2012, at 8:13 AM, Mark A. Jensen wrote: >> >>> >>> >> >>> >>>> Hello all, >> >>> >>>> If anyone is experimenting with Neo4j (www.neo4j.org) and needs a >> >>> full-featured Perl OO interface, please give REST::Neo4p ( >> >>> https://metacpan.org/module/REST::Neo4p) a try. Send me the bugs via >> RT! >> >>> >>>> cheers, >> >>> >>>> MAJ >> >>> >>>> >> >>> >>>> >> >>> >>>> _______________________________________________ >> >>> >>>> Bioperl-l mailing list >> >>> >>>> Bioperl-l at lists.open-bio.org >> >>> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >>> >> >>> >>> -- >> >>> >>> =========================================================== >> >>> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> >>> >>> =========================================================== >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> _______________________________________________ >> >>> >>> Bioperl-l mailing list >> >>> >>> Bioperl-l at lists.open-bio.org >> >>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >> >>> >> >> >>> >> _______________________________________________ >> >>> >> Bioperl-l mailing list >> >>> >> Bioperl-l at lists.open-bio.org >> >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> > >> >>> >> >>> -- >> >>> =========================================================== >> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> >>> =========================================================== >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >> >> >>_______________________________________________ >> >>Bioperl-l mailing list >> >>Bioperl-l at lists.open-bio.org >> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >> > >> >_______________________________________________ >> >Bioperl-l mailing list >> >Bioperl-l at lists.open-bio.org >> >http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> > From damienmohalloran at yahoo.co.uk Wed Sep 26 16:55:49 2012 From: damienmohalloran at yahoo.co.uk (Damien OHalloran) Date: Wed, 26 Sep 2012 21:55:49 +0100 (BST) Subject: [Bioperl-l] Help connecting to Genbank...? In-Reply-To: References: Message-ID: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> I'm a newbie trying to to download DNA sequences from Genbank. I used the script below but I get the error "cant locate Bio/SeqIO.pm" - help? #!/usr/bin/perl -w use Bio::SeqIO; use Bio::DB::GenBank; $genBank = new Bio::DB::GenBank; my $seq = $genBank->get_Seq_by_acc('AF060485');? my $seqOut = new Bio::SeqIO(-format => 'genbank'); $seqOut->write_seq($seq); From shalabh.sharma7 at gmail.com Wed Sep 26 17:13:22 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 26 Sep 2012 17:13:22 -0400 Subject: [Bioperl-l] Help connecting to Genbank...? In-Reply-To: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> References: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> Message-ID: Hey Damein, Can you send the error you getting because this should work fine. -Shalabh On Wed, Sep 26, 2012 at 4:55 PM, Damien OHalloran < damienmohalloran at yahoo.co.uk> wrote: > > I'm a newbie trying to to download DNA sequences from Genbank. I used the > script below but I get the error "cant locate Bio/SeqIO.pm" - help? > > #!/usr/bin/perl -w > use Bio::SeqIO; > use Bio::DB::GenBank; > $genBank = new Bio::DB::GenBank; > my $seq = $genBank->get_Seq_by_acc('AF060485'); > my $seqOut = new Bio::SeqIO(-format => 'genbank'); > $seqOut->write_seq($seq); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From shalabh.sharma7 at gmail.com Wed Sep 26 17:13:56 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 26 Sep 2012 17:13:56 -0400 Subject: [Bioperl-l] Help connecting to Genbank...? In-Reply-To: References: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> Message-ID: You sure you installed bioperl ? On Wed, Sep 26, 2012 at 5:13 PM, shalabh sharma wrote: > Hey Damein, > Can you send the error you getting because this > should work fine. > > -Shalabh > > > On Wed, Sep 26, 2012 at 4:55 PM, Damien OHalloran < > damienmohalloran at yahoo.co.uk> wrote: > >> >> I'm a newbie trying to to download DNA sequences from Genbank. I used the >> script below but I get the error "cant locate Bio/SeqIO.pm" - help? >> >> #!/usr/bin/perl -w >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> $genBank = new Bio::DB::GenBank; >> my $seq = $genBank->get_Seq_by_acc('AF060485'); >> my $seqOut = new Bio::SeqIO(-format => 'genbank'); >> $seqOut->write_seq($seq); >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From damienmohalloran at yahoo.co.uk Wed Sep 26 17:34:19 2012 From: damienmohalloran at yahoo.co.uk (Damien OHalloran) Date: Wed, 26 Sep 2012 22:34:19 +0100 (BST) Subject: [Bioperl-l] previous problem fixed Message-ID: <1348695259.49515.YahooMailNeo@web133201.mail.ir2.yahoo.com> previous problem outlined in last thread is now fixed - thanks! From thomas.sharpton at gmail.com Wed Sep 26 17:54:37 2012 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 26 Sep 2012 14:54:37 -0700 Subject: [Bioperl-l] Help connecting to Genbank...? In-Reply-To: References: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> Message-ID: Damien, It would be helpful to see the specific error. But, I'm guessing that Perl can't find the BioPerl package (as opposed to an internal BioPerl error). If that's the case, then check to make sure that you added the location of the BioPerl root directory (it should contain Bio/) to your PERL5LIB system variable. Hope this helps, Tom On Wed, Sep 26, 2012 at 2:13 PM, shalabh sharma wrote: > You sure you installed bioperl ? > > On Wed, Sep 26, 2012 at 5:13 PM, shalabh sharma > wrote: > > > Hey Damein, > > Can you send the error you getting because this > > should work fine. > > > > -Shalabh > > > > > > On Wed, Sep 26, 2012 at 4:55 PM, Damien OHalloran < > > damienmohalloran at yahoo.co.uk> wrote: > > > >> > >> I'm a newbie trying to to download DNA sequences from Genbank. I used > the > >> script below but I get the error "cant locate Bio/SeqIO.pm" - help? > >> > >> #!/usr/bin/perl -w > >> use Bio::SeqIO; > >> use Bio::DB::GenBank; > >> $genBank = new Bio::DB::GenBank; > >> my $seq = $genBank->get_Seq_by_acc('AF060485'); > >> my $seqOut = new Bio::SeqIO(-format => 'genbank'); > >> $seqOut->write_seq($seq); > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alexeymorozov1991 at gmail.com Thu Sep 27 00:46:41 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Thu, 27 Sep 2012 13:46:41 +0900 Subject: [Bioperl-l] Help connecting to Genbank...? In-Reply-To: References: <1348692949.12401.YahooMailNeo@web133206.mail.ir2.yahoo.com> Message-ID: 2012/9/27 Thomas Sharpton > Damien, > > It would be helpful to see the specific error. But, I'm guessing that Perl > can't find the BioPerl package (as opposed to an internal BioPerl error). > If that's the case, then check to make sure that you added the location of > the BioPerl root directory (it should contain Bio/) to your PERL5LIB system > variable. > > Hope this helps, > Tom > > On Wed, Sep 26, 2012 at 2:13 PM, shalabh sharma > wrote: > > > You sure you installed bioperl ? > > > > On Wed, Sep 26, 2012 at 5:13 PM, shalabh sharma > > wrote: > > > > > Hey Damein, > > > Can you send the error you getting because this > > > should work fine. > > > > > > -Shalabh > > > > > > > > > On Wed, Sep 26, 2012 at 4:55 PM, Damien OHalloran < > > > damienmohalloran at yahoo.co.uk> wrote: > > > > > >> > > >> I'm a newbie trying to to download DNA sequences from Genbank. I used > > the > > >> script below but I get the error "cant locate Bio/SeqIO.pm" - help? > > >> > > >> #!/usr/bin/perl -w > > >> use Bio::SeqIO; > > >> use Bio::DB::GenBank; > > >> $genBank = new Bio::DB::GenBank; > > >> my $seq = $genBank->get_Seq_by_acc('AF060485'); > > >> my $seqOut = new Bio::SeqIO(-format => 'genbank'); > > >> $seqOut->write_seq($seq); > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > > > > > > > -- > > > Shalabh Sharma > > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > > Department of Marine Sciences > > > University of Georgia > > > Athens, GA 30602-3636 > > > > > > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Why not type into terminal 'sudo apt-get install bioperl' just in case you haven't installed bioperl properly? It will also drag along a lot of useful tools like blast and muscle. Alexey Morozov Limnological institute SB RAS Irkutsk, Russia. From antonio.nhanijr at gmail.com Thu Sep 27 04:39:41 2012 From: antonio.nhanijr at gmail.com (Antonio Nhani Jr) Date: Thu, 27 Sep 2012 10:39:41 +0200 Subject: [Bioperl-l] Clustalw alignment parsing Message-ID: <60A311B7-3CE2-49E8-AB37-1D052D9C98E4@gmail.com> Hi All, I need to parse a several seqs clustalw alignment file, searching for regions to design primers (preference for not degenerated). The amplified region must have base(s) differences, to identify each seq. Haven't found a method in Bio::SimpleAlignIO to do this? Any hint? All the very best, Antonio From e.osimo at gmail.com Thu Sep 27 08:58:57 2012 From: e.osimo at gmail.com (Emanuele F. Osimo) Date: Thu, 27 Sep 2012 14:58:57 +0200 Subject: [Bioperl-l] Bio::Graphics xyplot labels Message-ID: Dear all, is it possible to make xyplot add a label for each point, indicating the numerical value? Thanks Emanuele Osimo From scott at scottcain.net Thu Sep 27 09:37:51 2012 From: scott at scottcain.net (Scott Cain) Date: Thu, 27 Sep 2012 09:37:51 -0400 Subject: [Bioperl-l] Bio::Graphics xyplot labels In-Reply-To: References: Message-ID: Hi Emanuele, I am pretty sure you can't do that with the stock xyplot. You can specify individual colors for each point which could be scaled to the score though. Scott On Thu, Sep 27, 2012 at 8:58 AM, Emanuele F. Osimo wrote: > Dear all, > is it possible to make xyplot add a label for each point, indicating the > numerical value? > Thanks > Emanuele Osimo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From alexeymorozov1991 at gmail.com Thu Sep 27 23:09:49 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 28 Sep 2012 12:09:49 +0900 Subject: [Bioperl-l] Clustalw alignment parsing In-Reply-To: <60A311B7-3CE2-49E8-AB37-1D052D9C98E4@gmail.com> References: <60A311B7-3CE2-49E8-AB37-1D052D9C98E4@gmail.com> Message-ID: 2012/9/27 Antonio Nhani Jr > Hi All, > > I need to parse a several seqs clustalw alignment file, searching for > regions to design primers (preference for not degenerated). > The amplified region must have base(s) differences, to identify each seq. > Haven't found a method in Bio::SimpleAlignIO to do this? > > Any hint? > > All the very best, > > Antonio > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Hi Antonio, Seems like there is no primer design module, Bio::SeqFeature::Primer allows only to annotate them and find melting temperatures. Is it truly necessary to find primers via pure bioperl? I mean, there are lots of primer design tools out there, and if it's part of longer pipeline you can just write a parser for their output. Alexey From cjfields at illinois.edu Fri Sep 28 10:56:49 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 28 Sep 2012 14:56:49 +0000 Subject: [Bioperl-l] Clustalw alignment parsing In-Reply-To: References: <60A311B7-3CE2-49E8-AB37-1D052D9C98E4@gmail.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33BD0AF9@CHIMBX5.ad.uillinois.edu> On Sep 27, 2012, at 10:09 PM, Alexey Morozov wrote: > 2012/9/27 Antonio Nhani Jr > >> Hi All, >> >> I need to parse a several seqs clustalw alignment file, searching for >> regions to design primers (preference for not degenerated). >> The amplified region must have base(s) differences, to identify each seq. >> Haven't found a method in Bio::SimpleAlignIO to do this? >> >> Any hint? >> >> All the very best, >> >> Antonio >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > Hi Antonio, > Seems like there is no primer design module, Bio::SeqFeature::Primer allows > only to annotate them and find melting temperatures. Is it truly necessary > to find primers via pure bioperl? I mean, there are lots of primer design > tools out there, and if it's part of longer pipeline you can just write a > parser for their output. > > Alexey Have a look at Bio::Tools::Primer3Redux (on CPAN) for a BioPerl-based wrapper around Primer3. https://metacpan.org/release/Bio-Tools-Primer3Redux chris From fedeabascal at yahoo.es Sat Sep 29 13:30:20 2012 From: fedeabascal at yahoo.es (Federico Abascal) Date: Sat, 29 Sep 2012 18:30:20 +0100 (BST) Subject: [Bioperl-l] Genbank query problem Message-ID: <1348939820.45184.YahooMailNeo@web171302.mail.ir2.yahoo.com> Dear colleagues, I have a script (mitobank.pl) that is used by some people. It is aimed to retrieve?mitochondrial genomes for a given taxonomic id. The problem arose when, some months ago, the NCBI reorganized the way genomes are queried and the script no longer worked. I have tried modifying the query string with no success. What the script asked for was like: my $seq; my $gb = new Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query ? =>(txid314147[Organism:exp] AND mitochondrial[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast) OR (txid314147[Organism:exp] AND mitochondrion[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast), ?-db ? ? ?=> 'genome'); It used to return the list of genomes available for that taxonomic id. However, the NCBI now returns a different kind of results. I tried to modify the script and query the "nucleotide" database, but this does not work properly. Any one could help me, please? Thanks in advance, Federico From jason.stajich at gmail.com Sun Sep 30 18:37:30 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 30 Sep 2012 15:37:30 -0700 Subject: [Bioperl-l] Fwd: small tree problem References: Message-ID: <59EDC376-BE44-43D8-AD55-65A6A6896C2D@gmail.com> Greg - resending to mailing list too. Begin forwarded message: > > Subject: small tree problem > Date: September 30, 2012 3:26:39 PM PDT > To: Gregory Jordan > > Greg - > > Using the latest tree code, we're getting some errors just when writing out sequences. > > I get these errors because the method is trying to call a NodeFunctionsI method on a Tree object - I then differed it to use the root node, but I guess there can be cases where the Root is also undefined. > Can't call method "_node_from_arg" on an undefined value at XXXX/lib/perl5/Bio/Tree/TreeFunctionsI.pm line 130, > > What do you think is the best solution here? > > Jason > -- > Jason E Stajich, PhD > Assistant Professor > Plant Pathology & Microbiology > University of California, Riverside > 951.827.2363 > http://lab.stajich.org http://fungalgenomes.org http://fungidb.org http://1000.fungalgenomes.org/ > twitter @stajichlab @hyphaltip @fungalgenomes @fungidb > http://plantpathology.ucr.edu http://genomics.ucr.edu > Jason Stajich jason.stajich at gmail.com jason at bioperl.org From p.j.a.cock at googlemail.com Sun Sep 30 18:38:24 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 30 Sep 2012 23:38:24 +0100 Subject: [Bioperl-l] Update of SeqIO:: fastq Module for PacBio In-Reply-To: References: <58001297-BD1D-460C-8749-B9DCA683A557@gmail.com> <002159B2-0A4A-4B7B-863C-EE26D21088DE@gmail.com> Message-ID: Hi Dan, Did you get any reply from PacBio? Re: http://lists.open-bio.org/pipermail/bioperl-l/2012-September/036778.html Thanks, Peter On Thu, Sep 20, 2012 at 7:00 PM, Dan Nasko wrote: > Peter, > > Here is Frank Boellmann's email, he's a Field Application Specialist and > Bioinformatician at PacBio who is generally great at resolving these issues: > fboellmann at pacificbiosciences.com > > I've gone ahead and contacted another one of their software engineers and am > waiting for a response from him. > > Dan From jason.stajich at gmail.com Sun Sep 30 21:09:30 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 30 Sep 2012 18:09:30 -0700 Subject: [Bioperl-l] Genbank query problem In-Reply-To: <1348939820.45184.YahooMailNeo@web171302.mail.ir2.yahoo.com> References: <1348939820.45184.YahooMailNeo@web171302.mail.ir2.yahoo.com> Message-ID: <35C9CD61-0108-4214-9033-65CC8AEB06E9@gmail.com> Are they organized in the bioprojects at least? I've been working on something related with dumping of genomes based on what is in bioprojects part of NCBI. It isn't documented yet since still in dev, but you can try these three scripts. you need to give it a place to write with the (-b) option and you'll want to change the query for the 1st script with the -q option. - fix the query to the taxon you want bioprojects from: https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/download_eutils_bioproject.pl - then run this to download the sequences https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/download_sequences_from_bioproject_staging.pl - the run this to get the assemblies that for some reason aren't available at nuclids in the bioprojects file but can be gleaned from the genbank file -- maybe not needed for your MT genome project anyways. https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/download_sequences_from_bioproject_cleanup_missing.pl Jason On Sep 29, 2012, at 10:30 AM, Federico Abascal wrote: > Dear colleagues, > > I have a script (mitobank.pl) that is used by some people. It is aimed to retrieve mitochondrial genomes for a given taxonomic id. The problem arose when, some months ago, the NCBI reorganized the way genomes are queried and the script no longer worked. I have tried modifying the query string with no success. > > What the script asked for was like: > > > my $seq; > my $gb = new Bio::DB::GenBank; > my $query = Bio::DB::Query::GenBank->new > (-query =>(txid314147[Organism:exp] AND mitochondrial[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast) OR (txid314147[Organism:exp] AND mitochondrion[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast), > -db => 'genome'); > > It used to return the list of genomes available for that taxonomic id. However, the NCBI now returns a different kind of results. > I tried to modify the script and query the "nucleotide" database, but this does not work properly. > > Any one could help me, please? > > Thanks in advance, > Federico > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From fedeabascal at yahoo.es Sat Sep 29 06:47:09 2012 From: fedeabascal at yahoo.es (Federico Abascal) Date: Sat, 29 Sep 2012 11:47:09 +0100 (BST) Subject: [Bioperl-l] Genbank query problem Message-ID: <1348915629.74810.YahooMailNeo@web171306.mail.ir2.yahoo.com> Dear colleagues, I have a script (mitobank.pl) that is used by some people. It is aimed to retrieve mitochondrial genomes for a given taxonomic id. The problem arose when, some months ago, the NCBI reorganized the way genomes are queried and the script no longer worked. I have tried modifying the query string with no success. What the script asked for was like: my $seq; my $gb = new Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query ? =>(txid314147[Organism:exp] AND mitochondrial[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast) OR (txid314147[Organism:exp] AND mitochondrion[title] AND genome[ti] NOT plasmid[title] NOT chromosome NOT chloroplast), ?-db ? ? ?=> 'genome'); It used to return the list of genomes available for that taxonomic id. However, the NCBI now returns a different kind of results. I tried to modify the script and query the "nucleotide" database, but this does not work properly. Any one could help me, please? Thanks in advance, Federico