From s.denaxas at gmail.com Tue Feb 1 06:59:23 2011 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Tue, 1 Feb 2011 11:59:23 +0000 Subject: [Bioperl-l] medperl, something kinda like bioperl In-Reply-To: <4D4749EE.3010901@cornell.edu> References: <4D4749EE.3010901@cornell.edu> Message-ID: On Mon, Jan 31, 2011 at 11:46 PM, Robert Buels wrote: > Hi Spiros, > > This is a fine idea. My most important piece of advice is to keep the code > loosely coupled and flexible. > > Don't try to make big monolithic distributions like Bioperl. Keep the code > as loosely-coupled as possible: think carefully before making something be a > subclass of something else, or have some other kind of direct dependency > upon it. Things change. Coding practices change. Technology changes too, > both on the bio/med side, *and* on the code side. > > For the project to stay healthy for the long haul, it needs to stay easy > for people to wrap their minds around the codebase, and then work on it: > developers need to be able to focus their efforts on the code that they are > interested in without having to worry about huge amounts of other code. For > this to be possible, the various parts of the codebase need to stay > organized and compartmentalized, with minimal, well-characterized dependency > relationships between them. > > Good luck! > > Rob > > > Hello Rob, thanks for the feedback. It will be definitely a learning experience for me as well. I'm planing on setting up some sort of public resource for people to have a look at and discuss / make a plan before actual coding gets done. Will keep you posted. thanks Spiros > Spiros Denaxas wrote: > >> Hello, >> >> I am sending this email here since I consider all people that contribute >> and/or follow the bioperl project as the best starting point for advice on >> a >> new project I am currently planning ; my apologies if its considered >> off-topic. >> >> While the bioinformatics community has greatly benefitted from the Perl >> community, with the shining example of bioperl, the medical community is >> sadly a bit behind. I am currently employed in a public health / >> epidemiology environment and have under numerous occasions discovered >> opportunities to contribute code to CPAN that has made my life easier. I >> know I am not alone, but a very quick search on CPAN for related modules >> form the medical / biomedical domain does not return much for now. >> >> I recently gave a presentation at the London Perl Workshop [1] and while >> creating it, I thought, would it be useful to have something similar to >> bioperl for modules which largely contribute to the medical / >> epidemiological domain? I was thinking of creating something like medperl, >> alas similar to bioperl, but in a very very simple form. It would serve as >> a >> reference point to the (albeit small) numbers of modules that are >> currently >> on CPAN and will also hopefully urge people to contribute some of their >> code >> along the way. >> >> So I would like to request your advice on: >> >> a) Can you think of any reasons for not doing this? >> b) Does anybody know of something similar? >> c) Does anybody feel like they could contribute? >> >> Regards, >> Spiros Denaxas >> >> [1] >> >> http://www.slideshare.net/spirosd/perl-cures-coronary-heart-disease-lpw2010 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From clements at nescent.org Tue Feb 1 00:57:52 2011 From: clements at nescent.org (Dave Clements) Date: Mon, 31 Jan 2011 21:57:52 -0800 Subject: [Bioperl-l] March 2011 GMOD Meeting Registration is now open In-Reply-To: References: Message-ID: Hello all, Registration is now open for the March 2011 GMOD Meeting ( http://gmod.oicr.on.ca/wiki/March_2011_GMOD_Meeting). This meeting will be held March 5-6, as part of GMOD Americas 2011, which also includes a day of Satellite Meetings, a GMOD Course (already full), and for the first time, an "Introduction to GMOD" session the night before the meeting for GMOD newcomers. GMOD Americas 2011 events are being held at the US National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, United States. As with previous GMOD meetings, this meeting will have a mixture of project talks, component talks, and user talks. Our guest speaker is Dr. Eric Stone of North Carolina State University. Dr. Stone will talk about his experience on the "Drosophila Genome Reference Panel," a project that is sequencing 192 lines. See http://gmod.oicr.on.ca/wiki/March_2011_GMOD_Meeting#Agenda for more. The agenda is driven by attendee suggestions, and you are encouraged to add your suggestions now ( http://gmod.oicr.on.ca/wiki/March_2011_GMOD_Meeting#Agenda_Proposals). For examples of what happens at a GMOD meeting, see the writeup of the September 2010 GMOD Meeting (http://gmod.oicr.on.ca/wiki/September_2010_GMOD_Meeting), or any previous meeting. GMOD meetings are an excellent way to meet GMOD developers and users and to learn (and affect) what's coming in the project. Registration for the March 2011 GMOD Meeting is $80 on or before February 18 <<<<======= $95 after February 18 Please register early, both to save money, and ensure a spot. You are also strongly encouraged to sign up for (or propose) a Satellite Meeting (more details to come). Details on transportation, suggested lodging, and other logistics are on the GMOD Americas 2011 page. This meeting, and all GMOD Americas 2011 events, are jointly sponsored by NESCent and the Galaxy Project. Dave Clements Galaxy Project -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://nescent.org http://usegalaxy.org/ From greg at ebi.ac.uk Tue Feb 1 09:21:58 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Tue, 1 Feb 2011 14:21:58 +0000 Subject: [Bioperl-l] nucleotide changes along tree In-Reply-To: <9b7468ad-fe3a-4ace-8cd2-69b146fddd28@j11g2000yqh.googlegroups.com> References: <9b7468ad-fe3a-4ace-8cd2-69b146fddd28@j11g2000yqh.googlegroups.com> Message-ID: Hi Nicholas, PAML is the de facto standard for ancestral reconstruction of DNA sequences. http://abacus.gene.ucl.ac.uk/software/paml.html BioPerl contains a Bio::Tools::Phylo::PAML module for running PAML and parsing the output. There's a how-to (http://www.bioperl.org/wiki/HOWTO:PAML) and documentation on the specific methods to access ancestral state reconstructions ( http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Tools/Phylo/PAML/Result.html#Synopsis) . If you're just doing a one-off run and don't need to automate things, you may be better off running PAML on its own. Extensive documentation is available from the PAML site (first link above). Cheers, Greg On Mon, Jan 31, 2011 at 4:30 PM, Nicholas Price wrote: > Hi > > I have three nucleotide sequences from human, chimp and Orangutan and > the corresponding tree.I want align the sequences and for each column > in the alignment where there are substitutions, I want to infer on > which branches the changes occurred using a maximum likelihood method. > Is there a way to do this in Bioperl?? > > thank you > > Nicholas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Tue Feb 1 09:22:41 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 01 Feb 2011 14:22:41 +0000 Subject: [Bioperl-l] Proposed improvement to to Bio::Tools::Run::Primer3Redux In-Reply-To: References: <344D48F6FA61134A9B17AE445882A195010C77@HC-MAILBOXC1-N5.healthcare.uiowa.edu> Message-ID: <1296570161.19678.26.camel@deskpro15336.internal.sanger.ac.uk> Hi John and Chris, I was wondering about the state of affairs with this new Bio-Tools-Primer3Redux module. I need to run and parse Primer3 v2.xx as well and I also need the SEQUENCE_PRIMER_PAIR_OK_REGION_LIST function. I was about to put together a module for my own use when I saw your messages. So, if there is anything I can do to help with this I would be happy to do so (instead of re-inventing the wheel). Frank On Mon, 2011-01-24 at 12:41 -0600, Chris Fields wrote: > John, > > This patch is made off an older version of Bio-Tools-Primer3Redux, which is now hosted in a separate repo on GitHub: > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > I get one patch failure against the latest code which is easily added (the SEQUENCE_PRIMER_PAIR_OK_REGION_LIST parameter), but tests now fail (see below). Can you resubmit this against the latest code? > > chris > > > $ ./Build test --test-files t/Run/Primer3Redux.t --verbose > t/Run/Primer3Redux.t .. Subroutine p3_settings_file redefined at /Users/cjfields/bioperl/Bio-Tools-Primer3Redux/blib/lib/Bio/Tools/Run/Primer3Redux.pm line 620. > > ok 1 - use Bio::Tools::Run::Primer3Redux; > ok 2 > ok 3 - program_name > SEQUENCE_ID=Test1 > > SEQUENCE_TEMPLATE=AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC > > PRIMER_EXPLAIN_FLAG=1 > > PRIMER_PRODUCT_SIZE_RANGE=100-250 > > PRIMER_SALT_CORRECTIONS=1 > > PRIMER_TASK=pick_pcr_primers > > PRIMER_TM_FORMULA=1 > > = > Unknown open() mode '/Users/cjfields/bin/primer3_core # Tests were run but no plan was declared and done_testing() was not seen. > Dubious, test returned 255 (wstat 65280, 0xff00) > All 3 subtests passed > > Test Summary Report > ------------------- > t/Run/Primer3Redux.t (Wstat: 65280 Tests: 3 Failed: 0) > Non-zero exit status: 255 > Parse errors: No plan found in TAP output > Files=1, Tests=3, 1 wallclock secs ( 0.02 usr 0.01 sys + 0.20 cusr 0.04 csys = 0.27 CPU) > Result: FAIL > Failed 1/1 test programs. 0/3 subtests failed. > > On Jan 24, 2011, at 11:44 AM, Ma, Man Chun John wrote: > > > Hi, > > > > Attached are my proposed diff for some changes for Bio::Tools::Run::Primer3Redux to more fully implement the new features of Primer3 version 2.x.x: > > > > 1. Adding support for the commond-line argument p3_settings_file that has been available for all 2.x.x versions, and > > 2. Adding support for the "Sequence" tag SEQUENCE_PRIMER_PAIR_OK_REGION_LIST, a new function in version 2.2.3 > > > > Although I have used this module quite heavily in my projects and it appeared to run well, I'm not sure if there are bugs--not to say I have yet understand how to write /t scripts, so I wonder if someone would like to test this up. > > > > Cheers, > > > > John MC Ma > > Graduate Assistant > > Kwitek Lab > > Department of Internal Medicine > > 3125E MERF > > 375 Newton Road > > Iowa City IA 52242_______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Tue Feb 1 09:57:45 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Feb 2011 08:57:45 -0600 Subject: [Bioperl-l] Proposed improvement to to Bio::Tools::Run::Primer3Redux In-Reply-To: <1296570161.19678.26.camel@deskpro15336.internal.sanger.ac.uk> References: <344D48F6FA61134A9B17AE445882A195010C77@HC-MAILBOXC1-N5.healthcare.uiowa.edu> <1296570161.19678.26.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Frank, You are more than welcome to look at the code on github and improve it. In fact, let me know if you have a github account and I can add you as a collaborator (John, same for you). I'll probably work on conversion to Dist::Zilla at some point for easier distribution building, but will keep a stub Build.PL for easy installation from github if needed. Key thing I want to make sure we keep up is tests and test coverage. Could probably improve the backend a bit more as well, but it works for now. chris On Feb 1, 2011, at 8:22 AM, Frank Schwach wrote: > Hi John and Chris, > > I was wondering about the state of affairs with this new > Bio-Tools-Primer3Redux module. I need to run and parse Primer3 v2.xx as > well and I also need the SEQUENCE_PRIMER_PAIR_OK_REGION_LIST function. I > was about to put together a module for my own use when I saw your > messages. So, if there is anything I can do to help with this I would be > happy to do so (instead of re-inventing the wheel). > > Frank > > > > On Mon, 2011-01-24 at 12:41 -0600, Chris Fields wrote: >> John, >> >> This patch is made off an older version of Bio-Tools-Primer3Redux, which is now hosted in a separate repo on GitHub: >> >> https://github.com/cjfields/Bio-Tools-Primer3Redux >> >> I get one patch failure against the latest code which is easily added (the SEQUENCE_PRIMER_PAIR_OK_REGION_LIST parameter), but tests now fail (see below). Can you resubmit this against the latest code? >> >> chris >> >> >> $ ./Build test --test-files t/Run/Primer3Redux.t --verbose >> t/Run/Primer3Redux.t .. Subroutine p3_settings_file redefined at /Users/cjfields/bioperl/Bio-Tools-Primer3Redux/blib/lib/Bio/Tools/Run/Primer3Redux.pm line 620. >> >> ok 1 - use Bio::Tools::Run::Primer3Redux; >> ok 2 >> ok 3 - program_name >> SEQUENCE_ID=Test1 >> >> SEQUENCE_TEMPLATE=AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC >> >> PRIMER_EXPLAIN_FLAG=1 >> >> PRIMER_PRODUCT_SIZE_RANGE=100-250 >> >> PRIMER_SALT_CORRECTIONS=1 >> >> PRIMER_TASK=pick_pcr_primers >> >> PRIMER_TM_FORMULA=1 >> >> = >> Unknown open() mode '/Users/cjfields/bin/primer3_core > # Tests were run but no plan was declared and done_testing() was not seen. >> Dubious, test returned 255 (wstat 65280, 0xff00) >> All 3 subtests passed >> >> Test Summary Report >> ------------------- >> t/Run/Primer3Redux.t (Wstat: 65280 Tests: 3 Failed: 0) >> Non-zero exit status: 255 >> Parse errors: No plan found in TAP output >> Files=1, Tests=3, 1 wallclock secs ( 0.02 usr 0.01 sys + 0.20 cusr 0.04 csys = 0.27 CPU) >> Result: FAIL >> Failed 1/1 test programs. 0/3 subtests failed. >> >> On Jan 24, 2011, at 11:44 AM, Ma, Man Chun John wrote: >> >>> Hi, >>> >>> Attached are my proposed diff for some changes for Bio::Tools::Run::Primer3Redux to more fully implement the new features of Primer3 version 2.x.x: >>> >>> 1. Adding support for the commond-line argument p3_settings_file that has been available for all 2.x.x versions, and >>> 2. Adding support for the "Sequence" tag SEQUENCE_PRIMER_PAIR_OK_REGION_LIST, a new function in version 2.2.3 >>> >>> Although I have used this module quite heavily in my projects and it appeared to run well, I'm not sure if there are bugs--not to say I have yet understand how to write /t scripts, so I wonder if someone would like to test this up. >>> >>> Cheers, >>> >>> John MC Ma >>> Graduate Assistant >>> Kwitek Lab >>> Department of Internal Medicine >>> 3125E MERF >>> 375 Newton Road >>> Iowa City IA 52242_______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Tue Feb 1 09:58:52 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Feb 2011 08:58:52 -0600 Subject: [Bioperl-l] nucleotide changes along tree In-Reply-To: References: <9b7468ad-fe3a-4ace-8cd2-69b146fddd28@j11g2000yqh.googlegroups.com> Message-ID: <6FC85CEA-1583-4D87-988B-A539AF2270FF@illinois.edu> +1 on all this, just wish the output were easily parsable from version to version :P chris On Feb 1, 2011, at 8:21 AM, Gregory Jordan wrote: > Hi Nicholas, > > PAML is the de facto standard for ancestral reconstruction of DNA > sequences. http://abacus.gene.ucl.ac.uk/software/paml.html > > BioPerl contains a > Bio::Tools::Phylo::PAML module for running PAML and parsing the output. > There's a how-to (http://www.bioperl.org/wiki/HOWTO:PAML) and documentation > on the specific methods to access ancestral state reconstructions ( > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Tools/Phylo/PAML/Result.html#Synopsis) > . > > If you're just doing a one-off run and don't need to automate things, you > may be better off running PAML on its own. Extensive documentation is > available from the PAML site (first link above). > > Cheers, > Greg > > On Mon, Jan 31, 2011 at 4:30 PM, Nicholas Price wrote: > >> Hi >> >> I have three nucleotide sequences from human, chimp and Orangutan and >> the corresponding tree.I want align the sequences and for each column >> in the alignment where there are substitutions, I want to infer on >> which branches the changes occurred using a maximum likelihood method. >> Is there a way to do this in Bioperl?? >> >> thank you >> >> Nicholas >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From zsuzjsl at gmail.com Tue Feb 1 12:16:37 2011 From: zsuzjsl at gmail.com (Jinshun Zhong) Date: Tue, 1 Feb 2011 11:16:37 -0600 Subject: [Bioperl-l] Fail to install DBD-mysql Message-ID: Hi folks, I followed the instruction to install the program, using GUI, command line or cygwin. But every time I failed to find DBD-mysql, then I could not proceed with "make" command. Is there anyone who knew how to deal with this? Thanks. Jinshun From scott at scottcain.net Tue Feb 1 12:37:06 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 1 Feb 2011 12:37:06 -0500 Subject: [Bioperl-l] Fail to install DBD-mysql In-Reply-To: References: Message-ID: Hi Jinshun, What version are you trying to install that involves make? It must be very old, so that is a problem to start with. DBD::mysql is not a required prerequisite for BioPerl, so that shouldn't be a problem. Scott On Tue, Feb 1, 2011 at 12:16 PM, Jinshun Zhong wrote: > Hi folks, > > I followed the instruction to install the program, using GUI, command line > or cygwin. But every time I failed to find DBD-mysql, then I could not > proceed with "make" command. Is there anyone who knew how to deal with this? > > Thanks. > Jinshun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From harpactocrates at googlemail.com Tue Feb 1 13:10:22 2011 From: harpactocrates at googlemail.com (pablo marin-garcia) Date: Tue, 1 Feb 2011 18:10:22 +0000 Subject: [Bioperl-l] medperl, something kinda like bioperl Message-ID: Hello Spiros, I have been writing a set of modules and scripts called pmGWAS (perl modules for genome wide association) and PMG (Perl for Medical Genetics). They are based in bioperl and ensembl and I was going to release in github and later in CPAN. They are modules for visualization and creating report analysis for medical sequencing and genotyping. Also I have been writing parsers for the common genotype and SNPs file formats that at some point I would like to 'bioperlize' following the SeqIO factory style. Would be nice to use the bioperl wiki for organize and coordinate these kind of modules. The current view on bioperl development, I think, today is more like the unix motto: do one thing, do it well and interact well with others. As Rob said the trick is to keep things simple and modular following the bioperl interfaces or defining new ones and try to convince people to adhere. Nice to see more people wanting to increase the perl contribution in this area. -Pablo > > > Hello Rob, > > thanks for the feedback. It will be definitely a learning experience for me > as well. I'm planing on setting up some sort of public resource for people > to have a look at and discuss / make a plan before actual coding gets done. > Will keep you posted. > > thanks > Spiros > > > > Spiros Denaxas wrote: > > > >> Hello, > >> > >> I am sending this email here since I consider all people that contribute > >> and/or follow the bioperl project as the best starting point for advice on > >> a > >> new project I am currently planning ; my apologies if its considered > >> off-topic. > >> > >> While the bioinformatics community has greatly benefitted from the Perl > >> community, with the shining example of bioperl, the medical community is > >> sadly a bit behind. I am currently employed in a public health / > >> epidemiology environment and have under numerous occasions discovered > >> opportunities to contribute code to CPAN that has made my life easier. I > >> know I am not alone, but a very quick search on CPAN for related modules > >> form the medical / biomedical domain does not return much for now. > >> > >> I recently gave a presentation at the London Perl Workshop [1] and while > >> creating it, I thought, would it be useful to have something similar to > >> bioperl for modules which largely contribute to the medical / > >> epidemiological domain? I was thinking of creating something like medperl, > >> alas similar to bioperl, but in a very very simple form. It would serve as > >> a > >> reference point to the (albeit small) numbers of modules that are > >> currently > >> on CPAN and will also hopefully urge people to contribute some of their > >> code > >> along the way. > >> > >> So I would like to request your advice on: > >> > >> a) Can you think of any reasons for not doing this? > >> b) Does anybody know of something similar? > >> c) Does anybody feel like they could contribute? > >> > >> Regards, > >> Spiros Denaxas -- ?? - Pablo Marin-Garcia From clements at nescent.org Wed Feb 2 01:25:39 2011 From: clements at nescent.org (Dave Clements) Date: Tue, 1 Feb 2011 22:25:39 -0800 Subject: [Bioperl-l] GMOD Satellite Meetings, March 7, NESCent, Durham, NC In-Reply-To: References: Message-ID: Hello all, Yesterday, we opened registration for the March 2011 GMOD Meeting (see http://gmod.oicr.on.ca/wiki/March_2011_GMOD_Meeting). *That meeting is part of a larger event, GMOD Americas 2011, that also includes several Satellite Meetings on March 7*, the day after the meeting ends. Satellite meetings are smaller groups of people meeting to discuss a common interest, or work on a common problem (think special interest groups / birds-of-a-feather). Unlike the GMOD Meeting, there is *no registration fee * for the Satellites, and you don't even need to go to any other GMOD Americas events to participate in the Satellites. *If you are in the area, or attending other GMOD Americas events, or are just very interested in the topic, we strongly encourage you to attend.* The current list of Satellites (see http://gmod.oicr.on.ca/wiki/Satellite_Meetings_-_GMOD_Americas_2011) includes: - *GMOD Evo Hackathon Followup*, organized by Duke Leto . A followup to the GMOD Evo Hackathonheld at NESCent in November 2011. You didn't to participate in the original event to participate in this followup. Also, if there is interest, this satellite can extend for more than one day. - *Customizing and Extending JBrowse *, organized by Mitch Skinner . JBrowse has a few different extension points, but they're not (yet) well-documented or widely used. The GMOD meeting would be a good time to review those APIs, relate them to the things that people want to do with them, discuss any potential changes or new APIs to support specific use cases, and potentially start to implement an extension. - *GMOD Web services toolkit* , organized by Josh Goodman . Come to work on or discuss the GMOD Web services API and the toolkit . - *GMOD in the Sequencing Center*, organized by Chris Hemmerich , Dave Clements . Sequencing centers have tremendous bioinformatics needs that GMOD can help address. Attend this satellite to find out what other sequencing centers are doing with GMOD, and how GMOD can help you help your researchers. If you are interested in participating in these, please contact the organizers, and/or add your name to the satellite's participants list on the wiki. Satellites can be organized by anyone. If you have a topic you would like to cover, please add it to the list and announce it the appropriate mailing lists. Several previous satellites are written up on the GMOD wiki, if you want an idea of what happens at a satellite. Finally, please let me and Scott know if you have any questions. Thanks, and hope to see you in March! Dave C. -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://usegalaxy.org/ From djsomers at wisc.edu Thu Feb 3 14:58:44 2011 From: djsomers at wisc.edu (Dana J. Wohlbach) Date: Thu, 3 Feb 2011 13:58:44 -0600 Subject: [Bioperl-l] Bio::Search::Tiling::MapTiling error Message-ID: Hi all, I have been trying to use the Bio::Search::Tiling::MapTiling module to parse the output of a WU-Blast report, but have been encountering the error message below. I have tested this on perl v5.10.0 or v5.8.8 with bioperl v1.6.1 or v1.5.21. I have also tried parsing either tBlastN or BlastN (sample attached to this message) reports but get the same error message, which I have been unable to resolve. Any suggestions would be much appreciated. The script I am running is: #!usr/bin/perl use warnings; use strict; use diagnostics; use Bio::SearchIO; use Bio::Search::Tiling::MapTiling; my $file = shift; my $in = new Bio::SearchIO(-format => 'blast', -file => $file); while (my $result = $in->next_result ) { print "\nQUERY: ", $result->query_name, "\n"; while ( my $hit = $result->next_hit ) { print " HIT: ", $hit->name; my $tiling = new Bio::Search::Tiling::MapTiling(-hit => $hit); my $numID = $tiling->identities(); print " NUMID: $numID\n"; } } And once it gets to a hit with multiple HSPs, it chokes with the message: --------------------- WARNING --------------------- MSG: No HSPS present for type 'query' in context 'p_' for this hit --------------------------------------------------- Can't use an undefined value as an ARRAY reference at /Library/Perl/5.10.0/Bio/Search/Tiling/MapTiling.pm line 963, line 820 (#1) (F) A value used as either a hard reference or a symbolic reference must be a defined value. This helps to delurk some insidious errors. Uncaught exception from user code: Can't use an undefined value as an ARRAY reference at /Library/Perl/5.10.0/Bio/Search/Tiling/MapTiling.pm line 963, line 820. at /Library/Perl/5.10.0/Bio/Search/Tiling/MapTiling.pm line 963 Bio::Search::Tiling::MapTiling::_calc_stats('Bio::Search::Tiling::MapTiling=HASH(0x100c37400)', 'query', 'exact', 'p_') called at /Library/Perl/5.10.0/Bio/Search/Tiling/MapTiling.pm line 279 Bio::Search::Tiling::MapTiling::identities('Bio::Search::Tiling::MapTiling=HASH(0x100c37400)') called at DummyParseBlastN.pl line 22 Also, is there any reason that identities (i.e. $tiling->identities()) would return a non-integer? When I run the above script, I am getting numbers like 150.686622649381 for some of the tilied HSP identity values. Thanks in advance for any help! Cheers, Dana -- /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ Dana J. Wohlbach, Ph.D. Research Associate University of Wisconsin-Madison Department of Genetics Gasch Lab Genetics-Biotechnology Center 425 Henry Mall Madison, WI 53706 608-265-0863 \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ -------------- next part -------------- A non-text attachment was scrubbed... Name: sampleBlastNOut Type: application/octet-stream Size: 483095 bytes Desc: not available URL: From witch.of.agnessi at gmail.com Thu Feb 3 17:37:10 2011 From: witch.of.agnessi at gmail.com (B Enn) Date: Thu, 3 Feb 2011 17:37:10 -0500 Subject: [Bioperl-l] Parsing Tabulated Blast result Message-ID: Hi all, I'm new to Bioperl and trying to parse some tabulted result (-m 9 option) generated by TBLASTX (Blast 2.2.24). I wish to know if the top hit comes from the minus strand of the Query sequence or not. I believe one easy way to know it if the start_query is greater than end_query. However I found that,while using Bio::SearchIO, the start(query) is always reported as smaller than end(query), even if the alignment shows it in a different manner. I'm trying to find out using strand('query'). I think if the query sequence match comes from the complementary strand of the sequnce supplied, it is marked as -1. Am I doing it correctly? I'll also be very grateful if someone can provide any pointer/link on how to know which reading frame the TOP query match belongs to. For that I possibly need a full BLAST report. The code(crude) I wrote is as follows: use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blasttable', -file => 'test.out'); while( my $result = $in->next_result ) { my $hit = $result->next_hit; my $hsp = $hit->next_hsp; print $hsp->start('query') ,"\t",$hsp->end('query'),"\t",$hsp->strand('query'),"\n"; } Thanks in advance From anjan.purkayastha at gmail.com Thu Feb 3 17:20:40 2011 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Thu, 3 Feb 2011 17:20:40 -0500 Subject: [Bioperl-l] Problem installing CJFIELDS/BioPerl-1.6.1.tar.gz Message-ID: Hello, I am unable to install distribution CJFIELDS/BioPerl-1.6.1.tar.gz on my Mac 10.6.6. Here is the tail end of the installation report: Result: FAIL Failed 9/329 test programs. 8/19347 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Could not read '/Users/anjan/.cpan/build/GraphViz-2.04-NQf6M6/META.yml'. Falling back to other methods to determine prerequisites Failed during this command: SHLOMIF/Error-0.17016.tar.gz : install NO JSWARTZ/Cache-Cache-1.06.tar.gz : install NO LDS/AcePerl-1.92.tar.gz : install NO JMCNAMARA/OLE-Storage_Lite-0.19.tar.gz : install NO JMCNAMARA/Spreadsheet-ParseExcel-0.58.tar.gz : install NO GROMMEL/Math-Random-0.71.tar.gz : install NO JHI/Graph-0.94.tar.gz : install NO JARW/Math-Spline-0.01.tar.gz : install NO SHLOMIF/Statistics-Descriptive-3.0201.tar.gz : install NO RONAN/SVG-2.50.tar.gz : install NO JARW/Math-Derivative-0.01.tar.gz : install NO COGENT/Tree-DAG_Node-1.06.tar.gz : install NO ALLENDAY/SVG-Graph-0.02.tar.gz : install NO ADAMK/Task-Weaken-1.03.tar.gz : install NO ADAMK/Class-Inspector-1.25.tar.gz : install NO MKUTTER/SOAP-Lite-0.712.tar.gz : install NO CMUNGALL/Data-Stag-0.11.tar.gz : install NO LBROCARD/GraphViz-2.04.tar.gz : writemakefile NO '/usr/bin/perl Makefile.PL' returned status 512 TOKUHIROM/Test-Requires-0.06.tar.gz : install NO DOY/Try-Tiny-0.09.tar.gz : install NO RJBS/Test-Fatal-0.003.tar.gz : install NO ADAMK/Params-Util-1.03.tar.gz : install NO RJBS/Sub-Install-0.925.tar.gz : install NO DROLSKY/Package-DeprecationManager-0.10.tar.gz: install NO FLORA/MRO-Compat-0.11.tar.gz : install NO DOY/Package-Stash-XS-0.21.tar.gz : install NO DOY/Package-Stash-0.25.tar.gz : make_test NO FLORA/Sub-Name-0.05.tar.gz : install NO RJBS/Data-OptList-0.106.tar.gz : install NO CHOCOLATE/Scope-Guard-0.20.tar.gz : install NO RJBS/Sub-Exporter-0.982.tar.gz : install NO FLORA/Devel-GlobalDestruction-0.03.tar.gz : install NO FLORA/Class-MOP-1.12.tar.gz : make_test NO DROLSKY/Moose-1.21.tar.gz : make_test NO DAVECROSS/Array-Compare-2.01.tar.gz : make_test NO MHX/Convert-Binary-C-0.74.tar.gz : install NO TPEDERSE/Algorithm-Munkres-0.08.tar.gz : install NO MIROD/XML-Twig-3.37.tar.gz : install NO JHI/Set-Scalar-1.25.tar.gz : install NO DCONWAY/Parse-RecDescent-1.965001.tar.gz : install NO JMCNAMARA/Spreadsheet-WriteExcel-2.37.tar.gz : install NO KMACLEOD/libxml-perl-0.08.tar.gz : install NO RBERJON/XML-Filter-BufferText-1.01.tar.gz : install NO PERIGRIN/XML-SAX-Writer-0.53.tar.gz : install NO RDF/Clone-0.31.tar.gz : install NO MIROD/XML-XPathEngine-0.12.tar.gz : install NO TJMATHER/XML-RegExp-0.03.tar.gz : install NO TJMATHER/XML-DOM-1.44.tar.gz : install NO MIROD/XML-DOM-XPath-0.14.tar.gz : install NO SHAWNPW/PostScript-0.06.tar.gz : install NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Any feedback on what may be going wrong will be immensley helpful. Thanks, Anjan -- =================================== Anjan Purkayastha, PhD Senior Computational Biologist TessArae LLC 46090 Lake Center Plaza, Suite 304 Potomac Falls, VA 20165** Mobile-703.740.6939 =================================== From scott at scottcain.net Thu Feb 3 22:03:44 2011 From: scott at scottcain.net (Scott Cain) Date: Thu, 3 Feb 2011 22:03:44 -0500 Subject: [Bioperl-l] Problem installing CJFIELDS/BioPerl-1.6.1.tar.gz In-Reply-To: References: Message-ID: Hi Anjan, Can you tell us what tests failed? Scott On Thu, Feb 3, 2011 at 5:20 PM, ANJAN PURKAYASTHA wrote: > Hello, > I am unable to install distribution CJFIELDS/BioPerl-1.6.1.tar.gz ?on my Mac > 10.6.6. > Here is the tail end of the installation report: > Result: FAIL > Failed 9/329 test programs. 8/19347 subtests failed. > ?CJFIELDS/BioPerl-1.6.1.tar.gz > ?./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > ?reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store persistent > state > Running Build install > ?make test had returned bad status, won't install without force > Could not read '/Users/anjan/.cpan/build/GraphViz-2.04-NQf6M6/META.yml'. > Falling back to other methods to determine prerequisites > Failed during this command: > ?SHLOMIF/Error-0.17016.tar.gz ? ? ? ? ? ? ? ? : install NO > ?JSWARTZ/Cache-Cache-1.06.tar.gz ? ? ? ? ? ? ?: install NO > ?LDS/AcePerl-1.92.tar.gz ? ? ? ? ? ? ? ? ? ? ?: install NO > ?JMCNAMARA/OLE-Storage_Lite-0.19.tar.gz ? ? ? : install NO > ?JMCNAMARA/Spreadsheet-ParseExcel-0.58.tar.gz : install NO > ?GROMMEL/Math-Random-0.71.tar.gz ? ? ? ? ? ? ?: install NO > ?JHI/Graph-0.94.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO > ?JARW/Math-Spline-0.01.tar.gz ? ? ? ? ? ? ? ? : install NO > ?SHLOMIF/Statistics-Descriptive-3.0201.tar.gz : install NO > ?RONAN/SVG-2.50.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO > ?JARW/Math-Derivative-0.01.tar.gz ? ? ? ? ? ? : install NO > ?COGENT/Tree-DAG_Node-1.06.tar.gz ? ? ? ? ? ? : install NO > ?ALLENDAY/SVG-Graph-0.02.tar.gz ? ? ? ? ? ? ? : install NO > ?ADAMK/Task-Weaken-1.03.tar.gz ? ? ? ? ? ? ? ?: install NO > ?ADAMK/Class-Inspector-1.25.tar.gz ? ? ? ? ? ?: install NO > ?MKUTTER/SOAP-Lite-0.712.tar.gz ? ? ? ? ? ? ? : install NO > ?CMUNGALL/Data-Stag-0.11.tar.gz ? ? ? ? ? ? ? : install NO > ?LBROCARD/GraphViz-2.04.tar.gz ? ? ? ? ? ? ? ?: writemakefile NO > '/usr/bin/perl Makefile.PL' returned status 512 > ?TOKUHIROM/Test-Requires-0.06.tar.gz ? ? ? ? ?: install NO > ?DOY/Try-Tiny-0.09.tar.gz ? ? ? ? ? ? ? ? ? ? : install NO > ?RJBS/Test-Fatal-0.003.tar.gz ? ? ? ? ? ? ? ? : install NO > ?ADAMK/Params-Util-1.03.tar.gz ? ? ? ? ? ? ? ?: install NO > ?RJBS/Sub-Install-0.925.tar.gz ? ? ? ? ? ? ? ?: install NO > ?DROLSKY/Package-DeprecationManager-0.10.tar.gz: install NO > ?FLORA/MRO-Compat-0.11.tar.gz ? ? ? ? ? ? ? ? : install NO > ?DOY/Package-Stash-XS-0.21.tar.gz ? ? ? ? ? ? : install NO > ?DOY/Package-Stash-0.25.tar.gz ? ? ? ? ? ? ? ?: make_test NO > ?FLORA/Sub-Name-0.05.tar.gz ? ? ? ? ? ? ? ? ? : install NO > ?RJBS/Data-OptList-0.106.tar.gz ? ? ? ? ? ? ? : install NO > ?CHOCOLATE/Scope-Guard-0.20.tar.gz ? ? ? ? ? ?: install NO > ?RJBS/Sub-Exporter-0.982.tar.gz ? ? ? ? ? ? ? : install NO > ?FLORA/Devel-GlobalDestruction-0.03.tar.gz ? ?: install NO > ?FLORA/Class-MOP-1.12.tar.gz ? ? ? ? ? ? ? ? ?: make_test NO > ?DROLSKY/Moose-1.21.tar.gz ? ? ? ? ? ? ? ? ? ?: make_test NO > ?DAVECROSS/Array-Compare-2.01.tar.gz ? ? ? ? ?: make_test NO > ?MHX/Convert-Binary-C-0.74.tar.gz ? ? ? ? ? ? : install NO > ?TPEDERSE/Algorithm-Munkres-0.08.tar.gz ? ? ? : install NO > ?MIROD/XML-Twig-3.37.tar.gz ? ? ? ? ? ? ? ? ? : install NO > ?JHI/Set-Scalar-1.25.tar.gz ? ? ? ? ? ? ? ? ? : install NO > ?DCONWAY/Parse-RecDescent-1.965001.tar.gz ? ? : install NO > ?JMCNAMARA/Spreadsheet-WriteExcel-2.37.tar.gz : install NO > ?KMACLEOD/libxml-perl-0.08.tar.gz ? ? ? ? ? ? : install NO > ?RBERJON/XML-Filter-BufferText-1.01.tar.gz ? ?: install NO > ?PERIGRIN/XML-SAX-Writer-0.53.tar.gz ? ? ? ? ?: install NO > ?RDF/Clone-0.31.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO > ?MIROD/XML-XPathEngine-0.12.tar.gz ? ? ? ? ? ?: install NO > ?TJMATHER/XML-RegExp-0.03.tar.gz ? ? ? ? ? ? ?: install NO > ?TJMATHER/XML-DOM-1.44.tar.gz ? ? ? ? ? ? ? ? : install NO > ?MIROD/XML-DOM-XPath-0.14.tar.gz ? ? ? ? ? ? ?: install NO > ?SHAWNPW/PostScript-0.06.tar.gz ? ? ? ? ? ? ? : install NO > ?CJFIELDS/BioPerl-1.6.1.tar.gz ? ? ? ? ? ? ? ?: make_test NO > > Any feedback on what may be going wrong will be immensley helpful. > Thanks, > Anjan > > -- > =================================== > Anjan Purkayastha, PhD > Senior Computational Biologist > TessArae LLC > 46090 Lake Center Plaza, Suite 304 > Potomac Falls, VA 20165** > Mobile-703.740.6939 > =================================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From chad.a.davis at gmail.com Fri Feb 4 08:20:16 2011 From: chad.a.davis at gmail.com (Chad Davis) Date: Fri, 4 Feb 2011 14:20:16 +0100 Subject: [Bioperl-l] Problem installing CJFIELDS/BioPerl-1.6.1.tar.gz In-Reply-To: References: Message-ID: I may be off track here, but I saw a reference to GraphViz (though no specific error) above. This reminded me of many problems I've had with the GraphViz module over the last several years, though I think it's better today. I generally had to, in CPAN: force install GraphViz before installing BioPerl. Make sure you have graphviz installed, first: fink install graphviz (I don't think 'graphviz-dev' is required?) But, of course, it would be better to find the actual source of the problem and fix it. If you could do: test GraphViz and report the errors, in addition to any other Bioperl tests that might be failing. Chad On Fri, Feb 4, 2011 at 04:03, Scott Cain wrote: > Hi Anjan, > > Can you tell us what tests failed? > > Scott > > > On Thu, Feb 3, 2011 at 5:20 PM, ANJAN PURKAYASTHA > wrote: >> Hello, >> I am unable to install distribution CJFIELDS/BioPerl-1.6.1.tar.gz ?on my Mac >> 10.6.6. >> Here is the tail end of the installation report: >> Result: FAIL >> Failed 9/329 test programs. 8/19347 subtests failed. >> ?CJFIELDS/BioPerl-1.6.1.tar.gz >> ?./Build test -- NOT OK >> //hint// to see the cpan-testers results for installing this module, try: >> ?reports CJFIELDS/BioPerl-1.6.1.tar.gz >> Warning (usually harmless): 'YAML' not installed, will not store persistent >> state >> Running Build install >> ?make test had returned bad status, won't install without force >> Could not read '/Users/anjan/.cpan/build/GraphViz-2.04-NQf6M6/META.yml'. >> Falling back to other methods to determine prerequisites >> Failed during this command: >> ?SHLOMIF/Error-0.17016.tar.gz ? ? ? ? ? ? ? ? : install NO >> ?JSWARTZ/Cache-Cache-1.06.tar.gz ? ? ? ? ? ? ?: install NO >> ?LDS/AcePerl-1.92.tar.gz ? ? ? ? ? ? ? ? ? ? ?: install NO >> ?JMCNAMARA/OLE-Storage_Lite-0.19.tar.gz ? ? ? : install NO >> ?JMCNAMARA/Spreadsheet-ParseExcel-0.58.tar.gz : install NO >> ?GROMMEL/Math-Random-0.71.tar.gz ? ? ? ? ? ? ?: install NO >> ?JHI/Graph-0.94.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO >> ?JARW/Math-Spline-0.01.tar.gz ? ? ? ? ? ? ? ? : install NO >> ?SHLOMIF/Statistics-Descriptive-3.0201.tar.gz : install NO >> ?RONAN/SVG-2.50.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO >> ?JARW/Math-Derivative-0.01.tar.gz ? ? ? ? ? ? : install NO >> ?COGENT/Tree-DAG_Node-1.06.tar.gz ? ? ? ? ? ? : install NO >> ?ALLENDAY/SVG-Graph-0.02.tar.gz ? ? ? ? ? ? ? : install NO >> ?ADAMK/Task-Weaken-1.03.tar.gz ? ? ? ? ? ? ? ?: install NO >> ?ADAMK/Class-Inspector-1.25.tar.gz ? ? ? ? ? ?: install NO >> ?MKUTTER/SOAP-Lite-0.712.tar.gz ? ? ? ? ? ? ? : install NO >> ?CMUNGALL/Data-Stag-0.11.tar.gz ? ? ? ? ? ? ? : install NO >> ?LBROCARD/GraphViz-2.04.tar.gz ? ? ? ? ? ? ? ?: writemakefile NO >> '/usr/bin/perl Makefile.PL' returned status 512 >> ?TOKUHIROM/Test-Requires-0.06.tar.gz ? ? ? ? ?: install NO >> ?DOY/Try-Tiny-0.09.tar.gz ? ? ? ? ? ? ? ? ? ? : install NO >> ?RJBS/Test-Fatal-0.003.tar.gz ? ? ? ? ? ? ? ? : install NO >> ?ADAMK/Params-Util-1.03.tar.gz ? ? ? ? ? ? ? ?: install NO >> ?RJBS/Sub-Install-0.925.tar.gz ? ? ? ? ? ? ? ?: install NO >> ?DROLSKY/Package-DeprecationManager-0.10.tar.gz: install NO >> ?FLORA/MRO-Compat-0.11.tar.gz ? ? ? ? ? ? ? ? : install NO >> ?DOY/Package-Stash-XS-0.21.tar.gz ? ? ? ? ? ? : install NO >> ?DOY/Package-Stash-0.25.tar.gz ? ? ? ? ? ? ? ?: make_test NO >> ?FLORA/Sub-Name-0.05.tar.gz ? ? ? ? ? ? ? ? ? : install NO >> ?RJBS/Data-OptList-0.106.tar.gz ? ? ? ? ? ? ? : install NO >> ?CHOCOLATE/Scope-Guard-0.20.tar.gz ? ? ? ? ? ?: install NO >> ?RJBS/Sub-Exporter-0.982.tar.gz ? ? ? ? ? ? ? : install NO >> ?FLORA/Devel-GlobalDestruction-0.03.tar.gz ? ?: install NO >> ?FLORA/Class-MOP-1.12.tar.gz ? ? ? ? ? ? ? ? ?: make_test NO >> ?DROLSKY/Moose-1.21.tar.gz ? ? ? ? ? ? ? ? ? ?: make_test NO >> ?DAVECROSS/Array-Compare-2.01.tar.gz ? ? ? ? ?: make_test NO >> ?MHX/Convert-Binary-C-0.74.tar.gz ? ? ? ? ? ? : install NO >> ?TPEDERSE/Algorithm-Munkres-0.08.tar.gz ? ? ? : install NO >> ?MIROD/XML-Twig-3.37.tar.gz ? ? ? ? ? ? ? ? ? : install NO >> ?JHI/Set-Scalar-1.25.tar.gz ? ? ? ? ? ? ? ? ? : install NO >> ?DCONWAY/Parse-RecDescent-1.965001.tar.gz ? ? : install NO >> ?JMCNAMARA/Spreadsheet-WriteExcel-2.37.tar.gz : install NO >> ?KMACLEOD/libxml-perl-0.08.tar.gz ? ? ? ? ? ? : install NO >> ?RBERJON/XML-Filter-BufferText-1.01.tar.gz ? ?: install NO >> ?PERIGRIN/XML-SAX-Writer-0.53.tar.gz ? ? ? ? ?: install NO >> ?RDF/Clone-0.31.tar.gz ? ? ? ? ? ? ? ? ? ? ? ?: install NO >> ?MIROD/XML-XPathEngine-0.12.tar.gz ? ? ? ? ? ?: install NO >> ?TJMATHER/XML-RegExp-0.03.tar.gz ? ? ? ? ? ? ?: install NO >> ?TJMATHER/XML-DOM-1.44.tar.gz ? ? ? ? ? ? ? ? : install NO >> ?MIROD/XML-DOM-XPath-0.14.tar.gz ? ? ? ? ? ? ?: install NO >> ?SHAWNPW/PostScript-0.06.tar.gz ? ? ? ? ? ? ? : install NO >> ?CJFIELDS/BioPerl-1.6.1.tar.gz ? ? ? ? ? ? ? ?: make_test NO >> >> Any feedback on what may be going wrong will be immensley helpful. >> Thanks, >> Anjan >> >> -- >> =================================== >> Anjan Purkayastha, PhD >> Senior Computational Biologist >> TessArae LLC >> 46090 Lake Center Plaza, Suite 304 >> Potomac Falls, VA 20165** >> Mobile-703.740.6939 >> =================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From harpactocrates at googlemail.com Fri Feb 4 09:12:54 2011 From: harpactocrates at googlemail.com (pablo marin-garcia) Date: Fri, 4 Feb 2011 14:12:54 +0000 Subject: [Bioperl-l] Problem installing CJFIELDS/BioPerl-1.6.1.tar.gz In-Reply-To: References: Message-ID: On Fri, Feb 4, 2011 at 1:20 PM, Chad Davis wrote: > I may be off track here, but I saw a reference to GraphViz (though no > specific error) above. This reminded me of many problems I've had with > the GraphViz module over the last several years, though I think it's > better today. I generally had to, in CPAN: > > force install GraphViz > > before installing BioPerl. Make sure you have graphviz installed, first: > Hello Chad, Regarding GraphViz and other dependencies when installing bioperl and biographics [now independent of bioperl] I wrote a description of my problem that I have before here http://pablomarin-garcia.blogspot.com/2010/04/installing-graphviz-from-perl-cpan.html http://pablomarin-garcia.blogspot.com/2010/04/perl-biographics-module-dependencies-in.html In the case of GraphViz it needs "Warning: prerequisite IPC::Run 0.6 not found." What I needed for installing it in a new ubuntu 9.04 box: (in a mac would be different) [Note the following dependencies could not be for bioperl but for biographics or both, but it is nice to have them anyway] [ sudo aptitude install] - libgd2-xpm-dev # gd for GD. I don't know the difference between xpm and noxpm but anyway - libexpat-dev # expat for XML::Parser - graphviz graphviz-dev graphviz-doc libgraphviz-dev graphviz-cairo # for GraphViz - libdb4.6-dev # for DB_File (The headers for 4.7 were also available but I had 4.6 installed) [CPAN] - install IPC::Run # for GraphViz and other modules - install DB_File # used by several modules so it better to install it first - install XML::parser - install GraphViz - install GD - install CJFIELDS/BioPerl-1.6.1.tar.gz The only caveats here were that you should know that you need the expat for XML, IPC::Run for GraphViz dependency and that DB_File uses the Berkeley db headers (that I needed to google for finding the package that contains them). -- ?? - Pablo Marin-Garcia From buiduyminh at gmail.com Fri Feb 4 10:44:26 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 10:44:26 -0500 Subject: [Bioperl-l] How to install 1.6.2 Message-ID: Hi, I am trying to use the "summary" feature in bp_seqfeature_load.pl (for Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install 1.6.2 but no luck. First, I used GIT : git clone git://github.com/bioperl/bioperl-live.git and I got this error: * "github.com[0: 207.97.227.239]: errno=Connection timed out fatal: unable to connect a socket (Connection timed out)" * Then, I downloaded snapshot Core Modules on this page http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work either. could someone please show me how to install 1.6.2 or if there is anyway to use summary feature, please let me know. Thank you very much, -- Minh Bui. From buiduyminh at gmail.com Fri Feb 4 11:05:52 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 11:05:52 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: Oh One more thing, I just downloaded Bioperl 1.6.2 and When I tried to run perl Build.PL, i got this message. * Checking whether your kit is complete... WARNING: the following files are missing in your kit: .shipit Please inform the author. Checking prerequisites... Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] * Then I ran ./Build test but it failed. Test Summary Report ------------------- t/SeqIO/embl.t (Wstat: 512 Tests: 85 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 95 tests but ran 85. Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + 79.46 cusr 4.21 csys = 87.11 CPU) Result: FAIL Failed 1/350 test programs. 0/22459 subtests failed. Is there anyway to install 1.6.2 because I really need it. Thank you for your help. On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui wrote: > Hi, > > I am trying to use the "summary" feature in bp_seqfeature_load.pl (for > Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install > 1.6.2 but no luck. > > First, I used GIT : > > git clone git://github.com/bioperl/bioperl-live.git > > and I got this error: > * > "github.com[0: 207.97.227.239]: errno=Connection timed out > fatal: unable to connect a socket (Connection timed out)" > > * > Then, I downloaded snapshot Core Modules on this page http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work either. > > > could someone please show me how to install 1.6.2 or if there is anyway to use summary feature, please let me know. > > Thank you very much, > > > -- > Minh Bui. > -- Minh Bui. From scott at scottcain.net Fri Feb 4 11:09:07 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Feb 2011 11:09:07 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: Hi Minh, I don't know why that test failed, but if you don't need to parse or produce EMBL files, it doesn't matter (and even if you do, it still may not matter, depending on what the test was). I suggest installing. Scott On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: > Oh > One more thing, I just downloaded Bioperl 1.6.2 and > > When I tried to run perl Build.PL, i got this message. > * > Checking whether your kit is complete... > WARNING: the following files are missing in your kit: > ? ?.shipit > Please inform the author. > > Checking prerequisites... > Install [a]ll optional external modules, [n]one, or choose [i]nteractively? > [n] > * > > Then I ran ./Build test but it failed. > > Test Summary Report > ------------------- > t/SeqIO/embl.t ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Wstat: 512 Tests: 85 Failed: 0) > ?Non-zero exit status: 2 > ?Parse errors: Bad plan. ?You planned 95 tests but ran 85. > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr ?0.66 sys + 79.46 cusr > 4.21 csys = 87.11 CPU) > Result: FAIL > Failed 1/350 test programs. 0/22459 subtests failed. > > Is there anyway to install 1.6.2 because I really need it. Thank you for > your help. > > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui wrote: > >> Hi, >> >> I am trying to use the "summary" feature in ?bp_seqfeature_load.pl (for >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install >> 1.6.2 but no luck. >> >> First, I used GIT : >> >> git clone git://github.com/bioperl/bioperl-live.git >> >> and I got this error: >> * >> "github.com[0: 207.97.227.239]: errno=Connection timed out >> fatal: unable to connect a socket (Connection timed out)" >> >> * >> Then, I downloaded snapshot Core Modules on this page http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work either. >> >> >> could someone please show me how to install 1.6.2 or if there is anyway to use summary feature, please let me know. >> >> Thank you very much, >> >> >> -- >> Minh Bui. >> > > > > -- > Minh Bui. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From buiduyminh at gmail.com Fri Feb 4 11:14:14 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 11:14:14 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: HI Scott, I installed it anyway but the terminal (i am using Ubuntu 10.10) and it says that bp_seqfeature_load.pl (unchanged) --> I still can't use --summary feature. Is there other way to use this feature? On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain wrote: > Hi Minh, > > I don't know why that test failed, but if you don't need to parse or > produce EMBL files, it doesn't matter (and even if you do, it still > may not matter, depending on what the test was). I suggest > installing. > > Scott > > > On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: > > Oh > > One more thing, I just downloaded Bioperl 1.6.2 and > > > > When I tried to run perl Build.PL, i got this message. > > * > > Checking whether your kit is complete... > > WARNING: the following files are missing in your kit: > > .shipit > > Please inform the author. > > > > Checking prerequisites... > > Install [a]ll optional external modules, [n]one, or choose > [i]nteractively? > > [n] > > * > > > > Then I ran ./Build test but it failed. > > > > Test Summary Report > > ------------------- > > t/SeqIO/embl.t (Wstat: 512 Tests: 85 Failed: > 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 95 tests but ran 85. > > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + 79.46 > cusr > > 4.21 csys = 87.11 CPU) > > Result: FAIL > > Failed 1/350 test programs. 0/22459 subtests failed. > > > > Is there anyway to install 1.6.2 because I really need it. Thank you for > > your help. > > > > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui wrote: > > > >> Hi, > >> > >> I am trying to use the "summary" feature in bp_seqfeature_load.pl (for > >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install > >> 1.6.2 but no luck. > >> > >> First, I used GIT : > >> > >> git clone git://github.com/bioperl/bioperl-live.git > >> > >> and I got this error: > >> * > >> "github.com[0: 207.97.227.239]: errno=Connection timed out > >> fatal: unable to connect a socket (Connection timed out)" > >> > >> * > >> Then, I downloaded snapshot Core Modules on this page > http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work either. > >> > >> > >> could someone please show me how to install 1.6.2 or if there is anyway > to use summary feature, please let me know. > >> > >> Thank you very much, > >> > >> > >> -- > >> Minh Bui. > >> > > > > > > > > -- > > Minh Bui. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Minh Bui. From scott at scottcain.net Fri Feb 4 11:23:54 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Feb 2011 11:23:54 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: Hi Minh, Where did you get BioPerl 1.6.2 from? It hasn't been released yet. You need to be BioPerl from git hub to use summary features: https://github.com/bioperl/bioperl-live and click the download link on the right and select either tar.gz or zip file to download. Scott On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui wrote: > HI Scott, > > I installed it anyway but the terminal (i am using Ubuntu 10.10) and it says > that bp_seqfeature_load.pl (unchanged) --> I still can't use --summary > feature. > > Is there other way to use this feature? > > > > > On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain wrote: >> >> Hi Minh, >> >> I don't know why that test failed, but if you don't need to parse or >> produce EMBL files, it doesn't matter (and even if you do, it still >> may not matter, depending on what the test was). ?I suggest >> installing. >> >> Scott >> >> >> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: >> > Oh >> > One more thing, I just downloaded Bioperl 1.6.2 and >> > >> > When I tried to run perl Build.PL, i got this message. >> > * >> > Checking whether your kit is complete... >> > WARNING: the following files are missing in your kit: >> > ? ?.shipit >> > Please inform the author. >> > >> > Checking prerequisites... >> > Install [a]ll optional external modules, [n]one, or choose >> > [i]nteractively? >> > [n] >> > * >> > >> > Then I ran ./Build test but it failed. >> > >> > Test Summary Report >> > ------------------- >> > t/SeqIO/embl.t ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Wstat: 512 Tests: 85 Failed: >> > 0) >> > ?Non-zero exit status: 2 >> > ?Parse errors: Bad plan. ?You planned 95 tests but ran 85. >> > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr ?0.66 sys + 79.46 >> > cusr >> > 4.21 csys = 87.11 CPU) >> > Result: FAIL >> > Failed 1/350 test programs. 0/22459 subtests failed. >> > >> > Is there anyway to install 1.6.2 because I really need it. Thank you for >> > your help. >> > >> > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui wrote: >> > >> >> Hi, >> >> >> >> I am trying to use the "summary" feature in ?bp_seqfeature_load.pl (for >> >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install >> >> 1.6.2 but no luck. >> >> >> >> First, I used GIT : >> >> >> >> git clone git://github.com/bioperl/bioperl-live.git >> >> >> >> and I got this error: >> >> * >> >> "github.com[0: 207.97.227.239]: errno=Connection timed out >> >> fatal: unable to connect a socket (Connection timed out)" >> >> >> >> * >> >> Then, I downloaded snapshot Core Modules on this page >> >> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work either. >> >> >> >> >> >> could someone please show me how to install 1.6.2 or if there is anyway >> >> to use summary feature, please let me know. >> >> >> >> Thank you very much, >> >> >> >> >> >> -- >> >> Minh Bui. >> >> >> > >> > >> > >> > -- >> > Minh Bui. >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Minh Bui. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From buiduyminh at gmail.com Fri Feb 4 11:27:48 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 11:27:48 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: I got it from here https://github.com/bioperl/bioperl-live/tree/release-1-6-2 On Fri, Feb 4, 2011 at 11:23 AM, Scott Cain wrote: > Hi Minh, > > Where did you get BioPerl 1.6.2 from? It hasn't been released yet. > You need to be BioPerl from git hub to use summary features: > > https://github.com/bioperl/bioperl-live > > and click the download link on the right and select either tar.gz or > zip file to download. > > Scott > > > On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui wrote: > > HI Scott, > > > > I installed it anyway but the terminal (i am using Ubuntu 10.10) and it > says > > that bp_seqfeature_load.pl (unchanged) --> I still can't use --summary > > feature. > > > > Is there other way to use this feature? > > > > > > > > > > On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain wrote: > >> > >> Hi Minh, > >> > >> I don't know why that test failed, but if you don't need to parse or > >> produce EMBL files, it doesn't matter (and even if you do, it still > >> may not matter, depending on what the test was). I suggest > >> installing. > >> > >> Scott > >> > >> > >> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: > >> > Oh > >> > One more thing, I just downloaded Bioperl 1.6.2 and > >> > > >> > When I tried to run perl Build.PL, i got this message. > >> > * > >> > Checking whether your kit is complete... > >> > WARNING: the following files are missing in your kit: > >> > .shipit > >> > Please inform the author. > >> > > >> > Checking prerequisites... > >> > Install [a]ll optional external modules, [n]one, or choose > >> > [i]nteractively? > >> > [n] > >> > * > >> > > >> > Then I ran ./Build test but it failed. > >> > > >> > Test Summary Report > >> > ------------------- > >> > t/SeqIO/embl.t (Wstat: 512 Tests: 85 > Failed: > >> > 0) > >> > Non-zero exit status: 2 > >> > Parse errors: Bad plan. You planned 95 tests but ran 85. > >> > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + 79.46 > >> > cusr > >> > 4.21 csys = 87.11 CPU) > >> > Result: FAIL > >> > Failed 1/350 test programs. 0/22459 subtests failed. > >> > > >> > Is there anyway to install 1.6.2 because I really need it. Thank you > for > >> > your help. > >> > > >> > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui > wrote: > >> > > >> >> Hi, > >> >> > >> >> I am trying to use the "summary" feature in bp_seqfeature_load.pl(for > >> >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to install > >> >> 1.6.2 but no luck. > >> >> > >> >> First, I used GIT : > >> >> > >> >> git clone git://github.com/bioperl/bioperl-live.git > >> >> > >> >> and I got this error: > >> >> * > >> >> "github.com[0: 207.97.227.239]: errno=Connection timed out > >> >> fatal: unable to connect a socket (Connection timed out)" > >> >> > >> >> * > >> >> Then, I downloaded snapshot Core Modules on this page > >> >> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work > either. > >> >> > >> >> > >> >> could someone please show me how to install 1.6.2 or if there is > anyway > >> >> to use summary feature, please let me know. > >> >> > >> >> Thank you very much, > >> >> > >> >> > >> >> -- > >> >> Minh Bui. > >> >> > >> > > >> > > >> > > >> > -- > >> > Minh Bui. > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > > Minh Bui. > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Minh Bui. From scott at scottcain.net Fri Feb 4 11:33:37 2011 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Feb 2011 11:33:37 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: OK, the seqfeature_load script in that branch has the summary feature option in it. Are you sure that the script you have doesn't? Scott On Fri, Feb 4, 2011 at 11:27 AM, Minh Bui wrote: > I got it from here > https://github.com/bioperl/bioperl-live/tree/release-1-6-2 > > > > On Fri, Feb 4, 2011 at 11:23 AM, Scott Cain wrote: >> >> Hi Minh, >> >> Where did you get BioPerl 1.6.2 from? ?It hasn't been released yet. >> You need to be BioPerl from git hub to use summary features: >> >> ?https://github.com/bioperl/bioperl-live >> >> and click the download link on the right and select either tar.gz or >> zip file to download. >> >> Scott >> >> >> On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui wrote: >> > HI Scott, >> > >> > I installed it anyway but the terminal (i am using Ubuntu 10.10) and it >> > says >> > that bp_seqfeature_load.pl (unchanged) --> I still can't use --summary >> > feature. >> > >> > Is there other way to use this feature? >> > >> > >> > >> > >> > On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain wrote: >> >> >> >> Hi Minh, >> >> >> >> I don't know why that test failed, but if you don't need to parse or >> >> produce EMBL files, it doesn't matter (and even if you do, it still >> >> may not matter, depending on what the test was). ?I suggest >> >> installing. >> >> >> >> Scott >> >> >> >> >> >> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: >> >> > Oh >> >> > One more thing, I just downloaded Bioperl 1.6.2 and >> >> > >> >> > When I tried to run perl Build.PL, i got this message. >> >> > * >> >> > Checking whether your kit is complete... >> >> > WARNING: the following files are missing in your kit: >> >> > ? ?.shipit >> >> > Please inform the author. >> >> > >> >> > Checking prerequisites... >> >> > Install [a]ll optional external modules, [n]one, or choose >> >> > [i]nteractively? >> >> > [n] >> >> > * >> >> > >> >> > Then I ran ./Build test but it failed. >> >> > >> >> > Test Summary Report >> >> > ------------------- >> >> > t/SeqIO/embl.t ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Wstat: 512 Tests: 85 >> >> > Failed: >> >> > 0) >> >> > ?Non-zero exit status: 2 >> >> > ?Parse errors: Bad plan. ?You planned 95 tests but ran 85. >> >> > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr ?0.66 sys + >> >> > 79.46 >> >> > cusr >> >> > 4.21 csys = 87.11 CPU) >> >> > Result: FAIL >> >> > Failed 1/350 test programs. 0/22459 subtests failed. >> >> > >> >> > Is there anyway to install 1.6.2 because I really need it. Thank you >> >> > for >> >> > your help. >> >> > >> >> > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui >> >> > wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> I am trying to use the "summary" feature in ?bp_seqfeature_load.pl >> >> >> (for >> >> >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to >> >> >> install >> >> >> 1.6.2 but no luck. >> >> >> >> >> >> First, I used GIT : >> >> >> >> >> >> git clone git://github.com/bioperl/bioperl-live.git >> >> >> >> >> >> and I got this error: >> >> >> * >> >> >> "github.com[0: 207.97.227.239]: errno=Connection timed out >> >> >> fatal: unable to connect a socket (Connection timed out)" >> >> >> >> >> >> * >> >> >> Then, I downloaded snapshot Core Modules on this page >> >> >> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work >> >> >> either. >> >> >> >> >> >> >> >> >> could someone please show me how to install 1.6.2 or if there is >> >> >> anyway >> >> >> to use summary feature, please let me know. >> >> >> >> >> >> Thank you very much, >> >> >> >> >> >> >> >> >> -- >> >> >> Minh Bui. >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Minh Bui. >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> > >> > -- >> > Minh Bui. >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Minh Bui. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From minou.nowrousian at rub.de Fri Feb 4 11:37:46 2011 From: minou.nowrousian at rub.de (Minou Nowrousian) Date: 4 Feb 2011 17:37:46 +0100 Subject: [Bioperl-l] BioPerl with Active Perl 5.12.2 ? Message-ID: <000001cbc489$da869d90$8f93d8b0$@rub.de> Hi all, is Active Perl 5.12.2 (for Windows) supported by BioPerl or only Active Perl 5.10 ? Regards, Minou Minou Nowrousian, Ph.D. Department of General and Molecular Botany Ruhr-University Bochum ND 6/165 Universitaetsstr. 150 44780 Bochum Germany phone +49 234 3224588 fax +49 234 3214184 email minou.nowrousian at rub.de From buiduyminh at gmail.com Fri Feb 4 11:37:59 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 11:37:59 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: I install the version you sent. and it works. Thank you very much, Scott. On Fri, Feb 4, 2011 at 11:33 AM, Scott Cain wrote: > OK, the seqfeature_load script in that branch has the summary feature > option in it. Are you sure that the script you have doesn't? > > Scott > > > On Fri, Feb 4, 2011 at 11:27 AM, Minh Bui wrote: > > I got it from here > > https://github.com/bioperl/bioperl-live/tree/release-1-6-2 > > > > > > > > On Fri, Feb 4, 2011 at 11:23 AM, Scott Cain wrote: > >> > >> Hi Minh, > >> > >> Where did you get BioPerl 1.6.2 from? It hasn't been released yet. > >> You need to be BioPerl from git hub to use summary features: > >> > >> https://github.com/bioperl/bioperl-live > >> > >> and click the download link on the right and select either tar.gz or > >> zip file to download. > >> > >> Scott > >> > >> > >> On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui wrote: > >> > HI Scott, > >> > > >> > I installed it anyway but the terminal (i am using Ubuntu 10.10) and > it > >> > says > >> > that bp_seqfeature_load.pl (unchanged) --> I still can't use > --summary > >> > feature. > >> > > >> > Is there other way to use this feature? > >> > > >> > > >> > > >> > > >> > On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain > wrote: > >> >> > >> >> Hi Minh, > >> >> > >> >> I don't know why that test failed, but if you don't need to parse or > >> >> produce EMBL files, it doesn't matter (and even if you do, it still > >> >> may not matter, depending on what the test was). I suggest > >> >> installing. > >> >> > >> >> Scott > >> >> > >> >> > >> >> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui > wrote: > >> >> > Oh > >> >> > One more thing, I just downloaded Bioperl 1.6.2 and > >> >> > > >> >> > When I tried to run perl Build.PL, i got this message. > >> >> > * > >> >> > Checking whether your kit is complete... > >> >> > WARNING: the following files are missing in your kit: > >> >> > .shipit > >> >> > Please inform the author. > >> >> > > >> >> > Checking prerequisites... > >> >> > Install [a]ll optional external modules, [n]one, or choose > >> >> > [i]nteractively? > >> >> > [n] > >> >> > * > >> >> > > >> >> > Then I ran ./Build test but it failed. > >> >> > > >> >> > Test Summary Report > >> >> > ------------------- > >> >> > t/SeqIO/embl.t (Wstat: 512 Tests: 85 > >> >> > Failed: > >> >> > 0) > >> >> > Non-zero exit status: 2 > >> >> > Parse errors: Bad plan. You planned 95 tests but ran 85. > >> >> > Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + > >> >> > 79.46 > >> >> > cusr > >> >> > 4.21 csys = 87.11 CPU) > >> >> > Result: FAIL > >> >> > Failed 1/350 test programs. 0/22459 subtests failed. > >> >> > > >> >> > Is there anyway to install 1.6.2 because I really need it. Thank > you > >> >> > for > >> >> > your help. > >> >> > > >> >> > On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui > >> >> > wrote: > >> >> > > >> >> >> Hi, > >> >> >> > >> >> >> I am trying to use the "summary" feature in > bp_seqfeature_load.pl > >> >> >> (for > >> >> >> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to > >> >> >> install > >> >> >> 1.6.2 but no luck. > >> >> >> > >> >> >> First, I used GIT : > >> >> >> > >> >> >> git clone git://github.com/bioperl/bioperl-live.git > >> >> >> > >> >> >> and I got this error: > >> >> >> * > >> >> >> "github.com[0: 207.97.227.239]: errno=Connection timed out > >> >> >> fatal: unable to connect a socket (Connection timed out)" > >> >> >> > >> >> >> * > >> >> >> Then, I downloaded snapshot Core Modules on this page > >> >> >> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work > >> >> >> either. > >> >> >> > >> >> >> > >> >> >> could someone please show me how to install 1.6.2 or if there is > >> >> >> anyway > >> >> >> to use summary feature, please let me know. > >> >> >> > >> >> >> Thank you very much, > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Minh Bui. > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Minh Bui. > >> >> > _______________________________________________ > >> >> > Bioperl-l mailing list > >> >> > Bioperl-l at lists.open-bio.org > >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> > ------------------------------------------------------------------------ > >> >> Scott Cain, Ph. D. scott at > scottcain > >> >> dot net > >> >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> >> Ontario Institute for Cancer Research > >> > > >> > > >> > > >> > -- > >> > Minh Bui. > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > > Minh Bui. > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Minh Bui. From cjfields at illinois.edu Fri Feb 4 11:50:22 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Feb 2011 10:50:22 -0600 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: References: Message-ID: <9DC67C84-FA88-4F3C-90B9-3D007E4B64E7@illinois.edu> Note that is a branch, not a tagged release. We created a new branch for the eventual 1.6.2 release due to problems cherry-picking commits over the last year or so. It's not really different from master, with the exception that we'll be merging code in from the recent evo hackathon soon to master and NOT to the release. chris On Feb 4, 2011, at 10:33 AM, Scott Cain wrote: > OK, the seqfeature_load script in that branch has the summary feature > option in it. Are you sure that the script you have doesn't? > > Scott > > > On Fri, Feb 4, 2011 at 11:27 AM, Minh Bui wrote: >> I got it from here >> https://github.com/bioperl/bioperl-live/tree/release-1-6-2 >> >> >> >> On Fri, Feb 4, 2011 at 11:23 AM, Scott Cain wrote: >>> >>> Hi Minh, >>> >>> Where did you get BioPerl 1.6.2 from? It hasn't been released yet. >>> You need to be BioPerl from git hub to use summary features: >>> >>> https://github.com/bioperl/bioperl-live >>> >>> and click the download link on the right and select either tar.gz or >>> zip file to download. >>> >>> Scott >>> >>> >>> On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui wrote: >>>> HI Scott, >>>> >>>> I installed it anyway but the terminal (i am using Ubuntu 10.10) and it >>>> says >>>> that bp_seqfeature_load.pl (unchanged) --> I still can't use --summary >>>> feature. >>>> >>>> Is there other way to use this feature? >>>> >>>> >>>> >>>> >>>> On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain wrote: >>>>> >>>>> Hi Minh, >>>>> >>>>> I don't know why that test failed, but if you don't need to parse or >>>>> produce EMBL files, it doesn't matter (and even if you do, it still >>>>> may not matter, depending on what the test was). I suggest >>>>> installing. >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui wrote: >>>>>> Oh >>>>>> One more thing, I just downloaded Bioperl 1.6.2 and >>>>>> >>>>>> When I tried to run perl Build.PL, i got this message. >>>>>> * >>>>>> Checking whether your kit is complete... >>>>>> WARNING: the following files are missing in your kit: >>>>>> .shipit >>>>>> Please inform the author. >>>>>> >>>>>> Checking prerequisites... >>>>>> Install [a]ll optional external modules, [n]one, or choose >>>>>> [i]nteractively? >>>>>> [n] >>>>>> * >>>>>> >>>>>> Then I ran ./Build test but it failed. >>>>>> >>>>>> Test Summary Report >>>>>> ------------------- >>>>>> t/SeqIO/embl.t (Wstat: 512 Tests: 85 >>>>>> Failed: >>>>>> 0) >>>>>> Non-zero exit status: 2 >>>>>> Parse errors: Bad plan. You planned 95 tests but ran 85. >>>>>> Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + >>>>>> 79.46 >>>>>> cusr >>>>>> 4.21 csys = 87.11 CPU) >>>>>> Result: FAIL >>>>>> Failed 1/350 test programs. 0/22459 subtests failed. >>>>>> >>>>>> Is there anyway to install 1.6.2 because I really need it. Thank you >>>>>> for >>>>>> your help. >>>>>> >>>>>> On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to use the "summary" feature in bp_seqfeature_load.pl >>>>>>> (for >>>>>>> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to >>>>>>> install >>>>>>> 1.6.2 but no luck. >>>>>>> >>>>>>> First, I used GIT : >>>>>>> >>>>>>> git clone git://github.com/bioperl/bioperl-live.git >>>>>>> >>>>>>> and I got this error: >>>>>>> * >>>>>>> "github.com[0: 207.97.227.239]: errno=Connection timed out >>>>>>> fatal: unable to connect a socket (Connection timed out)" >>>>>>> >>>>>>> * >>>>>>> Then, I downloaded snapshot Core Modules on this page >>>>>>> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work >>>>>>> either. >>>>>>> >>>>>>> >>>>>>> could someone please show me how to install 1.6.2 or if there is >>>>>>> anyway >>>>>>> to use summary feature, please let me know. >>>>>>> >>>>>>> Thank you very much, >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Minh Bui. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Minh Bui. >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain >>>>> dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>> >>>> >>>> >>>> -- >>>> Minh Bui. >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >> >> >> >> -- >> Minh Bui. >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buiduyminh at gmail.com Fri Feb 4 12:52:21 2011 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 4 Feb 2011 12:52:21 -0500 Subject: [Bioperl-l] How to install 1.6.2 In-Reply-To: <9DC67C84-FA88-4F3C-90B9-3D007E4B64E7@illinois.edu> References: <9DC67C84-FA88-4F3C-90B9-3D007E4B64E7@illinois.edu> Message-ID: Thank you so much for your help and clarification. Minh. On Fri, Feb 4, 2011 at 11:50 AM, Chris Fields wrote: > Note that is a branch, not a tagged release. We created a new branch for > the eventual 1.6.2 release due to problems cherry-picking commits over the > last year or so. It's not really different from master, with the exception > that we'll be merging code in from the recent evo hackathon soon to master > and NOT to the release. > > chris > > On Feb 4, 2011, at 10:33 AM, Scott Cain wrote: > > > OK, the seqfeature_load script in that branch has the summary feature > > option in it. Are you sure that the script you have doesn't? > > > > Scott > > > > > > On Fri, Feb 4, 2011 at 11:27 AM, Minh Bui wrote: > >> I got it from here > >> https://github.com/bioperl/bioperl-live/tree/release-1-6-2 > >> > >> > >> > >> On Fri, Feb 4, 2011 at 11:23 AM, Scott Cain > wrote: > >>> > >>> Hi Minh, > >>> > >>> Where did you get BioPerl 1.6.2 from? It hasn't been released yet. > >>> You need to be BioPerl from git hub to use summary features: > >>> > >>> https://github.com/bioperl/bioperl-live > >>> > >>> and click the download link on the right and select either tar.gz or > >>> zip file to download. > >>> > >>> Scott > >>> > >>> > >>> On Fri, Feb 4, 2011 at 11:14 AM, Minh Bui > wrote: > >>>> HI Scott, > >>>> > >>>> I installed it anyway but the terminal (i am using Ubuntu 10.10) and > it > >>>> says > >>>> that bp_seqfeature_load.pl (unchanged) --> I still can't use > --summary > >>>> feature. > >>>> > >>>> Is there other way to use this feature? > >>>> > >>>> > >>>> > >>>> > >>>> On Fri, Feb 4, 2011 at 11:09 AM, Scott Cain > wrote: > >>>>> > >>>>> Hi Minh, > >>>>> > >>>>> I don't know why that test failed, but if you don't need to parse or > >>>>> produce EMBL files, it doesn't matter (and even if you do, it still > >>>>> may not matter, depending on what the test was). I suggest > >>>>> installing. > >>>>> > >>>>> Scott > >>>>> > >>>>> > >>>>> On Fri, Feb 4, 2011 at 11:05 AM, Minh Bui > wrote: > >>>>>> Oh > >>>>>> One more thing, I just downloaded Bioperl 1.6.2 and > >>>>>> > >>>>>> When I tried to run perl Build.PL, i got this message. > >>>>>> * > >>>>>> Checking whether your kit is complete... > >>>>>> WARNING: the following files are missing in your kit: > >>>>>> .shipit > >>>>>> Please inform the author. > >>>>>> > >>>>>> Checking prerequisites... > >>>>>> Install [a]ll optional external modules, [n]one, or choose > >>>>>> [i]nteractively? > >>>>>> [n] > >>>>>> * > >>>>>> > >>>>>> Then I ran ./Build test but it failed. > >>>>>> > >>>>>> Test Summary Report > >>>>>> ------------------- > >>>>>> t/SeqIO/embl.t (Wstat: 512 Tests: 85 > >>>>>> Failed: > >>>>>> 0) > >>>>>> Non-zero exit status: 2 > >>>>>> Parse errors: Bad plan. You planned 95 tests but ran 85. > >>>>>> Files=350, Tests=22459, 96 wallclock secs ( 2.78 usr 0.66 sys + > >>>>>> 79.46 > >>>>>> cusr > >>>>>> 4.21 csys = 87.11 CPU) > >>>>>> Result: FAIL > >>>>>> Failed 1/350 test programs. 0/22459 subtests failed. > >>>>>> > >>>>>> Is there anyway to install 1.6.2 because I really need it. Thank you > >>>>>> for > >>>>>> your help. > >>>>>> > >>>>>> On Fri, Feb 4, 2011 at 10:44 AM, Minh Bui > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I am trying to use the "summary" feature in bp_seqfeature_load.pl > >>>>>>> (for > >>>>>>> Gbrowse) but it requires Bioperl 1.6.2. I have been trying to > >>>>>>> install > >>>>>>> 1.6.2 but no luck. > >>>>>>> > >>>>>>> First, I used GIT : > >>>>>>> > >>>>>>> git clone git://github.com/bioperl/bioperl-live.git > >>>>>>> > >>>>>>> and I got this error: > >>>>>>> * > >>>>>>> "github.com[0: 207.97.227.239]: errno=Connection timed out > >>>>>>> fatal: unable to connect a socket (Connection timed out)" > >>>>>>> > >>>>>>> * > >>>>>>> Then, I downloaded snapshot Core Modules on this page > >>>>>>> http://www.bioperl.org/wiki/Getting_BioPerl but it doesn't work > >>>>>>> either. > >>>>>>> > >>>>>>> > >>>>>>> could someone please show me how to install 1.6.2 or if there is > >>>>>>> anyway > >>>>>>> to use summary feature, please let me know. > >>>>>>> > >>>>>>> Thank you very much, > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Minh Bui. > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Minh Bui. > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> > ------------------------------------------------------------------------ > >>>>> Scott Cain, Ph. D. scott at > scottcain > >>>>> dot net > >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>>>> Ontario Institute for Cancer Research > >>>> > >>>> > >>>> > >>>> -- > >>>> Minh Bui. > >>>> > >>> > >>> > >>> > >>> -- > >>> > ------------------------------------------------------------------------ > >>> Scott Cain, Ph. D. scott at scottcain > >>> dot net > >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >>> Ontario Institute for Cancer Research > >> > >> > >> > >> -- > >> Minh Bui. > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Minh Bui. From cjfields at illinois.edu Fri Feb 4 16:19:13 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Feb 2011 15:19:13 -0600 Subject: [Bioperl-l] Bioperl-run Wrappers In-Reply-To: References: Message-ID: On Jan 28, 2011, at 12:18 PM, Ben Bimber wrote: > Hello, > > I'm using CommandExts to wrap a number of tools. In a pipeline I was > looking to make the tools log their current version. I realized that > instead of using run() in a unique way for each tool, perhaps there > should be a consistent method that gets called and returns a version > string. because obtaining this version string is specific to the > tool, perhaps each wrapper could provide a version() method that runs > the appropriate command on the executable, parses, then returns some > string. has something like this been discussed? have others already > solved this? > > Thanks, > Ben (apologies for the late response, maybe you worked it out?) If you mean a version string for the wrapped tool, there is a Bio::Tools::Run::WrapperBase method called version() I believe (not implemented for obvious reasons, but implemented by each wrapper as needed). If you want a specific version of the module (say, an API version) you may want to assign $VERSION or create a new global ($API_VERSION, perhaps) in case it conflicts with the BioPerl core version. chris From bbimber at gmail.com Fri Feb 4 16:33:57 2011 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 4 Feb 2011 15:33:57 -0600 Subject: [Bioperl-l] Bioperl-run Wrappers In-Reply-To: References: Message-ID: Hi Chris, I actually was referring to the software version of the executable itself, not the perl code. for example, I added something like this to a mosaik wrapper I made: sub version { my $self = shift; my ($out, $err); IPC::Run::run([$self->executable, '-h'], '>', \$out, '2>', \$err); my @out = split("\n", $out); my $version = join(';', grep( /^Mosaik/i, @out)); $version =~ m/^.*0m ([\d\.]*)\b/i; return "Mosaik Version: $1\n"; } It runs the executable, parses the output (which is specific to mosaik in this case) and returns it. I had an issue with a tool version recently, so I decided it was probably a good idea to start recording them with pipelines. In hindsight version() is probably the wrong name since it's confusing with perl's VERSION, but maybe exe_version() or something makes sense. I would personally find it useful if there were a standard, but optional method across BioPerl wrappers to do this sort of thing. implementing it would be optional per wrapper. all it would really need to do it return a string. -Ben On Fri, Feb 4, 2011 at 3:19 PM, Chris Fields wrote: > On Jan 28, 2011, at 12:18 PM, Ben Bimber wrote: > >> Hello, >> >> I'm using CommandExts to wrap a number of tools. ?In a pipeline I was >> looking to make the tools log their current version. ?I realized that >> instead of using run() in a unique way for each tool, perhaps there >> should be a consistent method that gets called and returns a version >> string. ?because obtaining this version string is specific to the >> tool, perhaps each wrapper could provide a version() method that runs >> the appropriate command on the executable, parses, then returns some >> string. ?has something like this been discussed? ?have others already >> solved this? >> >> Thanks, >> Ben > > (apologies for the late response, maybe you worked it out?) > > If you mean a version string for the wrapped tool, there is a Bio::Tools::Run::WrapperBase method called version() I believe (not implemented for obvious reasons, but implemented by each wrapper as needed). ?If you want a specific version of the module (say, an API version) you may want to assign $VERSION or create a new global ($API_VERSION, perhaps) in case it conflicts with the BioPerl core version. > > chris > > > From jason.stajich at gmail.com Fri Feb 4 17:03:48 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 04 Feb 2011 14:03:48 -0800 Subject: [Bioperl-l] Bioperl-run Wrappers In-Reply-To: References: Message-ID: <4D4C77C4.7080506@gmail.com> There is such a method in the Wrapper interface - it is implemented for tools (exe) that support outputting the version in some way. e.g. here is implementation for infernal (Bio::Tools::Run::Infernal) =head2 version Title : version Usage : $v = $prog->version(); Function: Determine the version number of the program (uses cmsearch) Example : Returns : float or undef Args : none =cut sub version { my ($self) = @_; return unless $self->executable; my $exe = $self->executable; my $string = `$exe -h 2>&1`; my $v; if ($string =~ m{Infernal\s([\d.]+)}) { $v = $1; $self->deprecated(-message => "Only Infernal 1.0 and above is supported.", -version => 1.006001) if $v < 1; } return $self->{'_progversion'} = $v || undef; } Ben Bimber wrote: > Hi Chris, > > I actually was referring to the software version of the executable > itself, not the perl code. for example, I added something like this > to a mosaik wrapper I made: > > sub version { > my $self = shift; > my ($out, $err); > IPC::Run::run([$self->executable, '-h'], '>', \$out, '2>', \$err); > my @out = split("\n", $out); > my $version = join(';', grep( /^Mosaik/i, @out)); > $version =~ m/^.*0m ([\d\.]*)\b/i; > return "Mosaik Version: $1\n"; > > } > > It runs the executable, parses the output (which is specific to mosaik > in this case) and returns it. I had an issue with a tool version > recently, so I decided it was probably a good idea to start recording > them with pipelines. In hindsight version() is probably the wrong > name since it's confusing with perl's VERSION, but maybe exe_version() > or something makes sense. I would personally find it useful if there > were a standard, but optional method across BioPerl wrappers to do > this sort of thing. implementing it would be optional per wrapper. > all it would really need to do it return a string. > > -Ben > > > > > > > > > > On Fri, Feb 4, 2011 at 3:19 PM, Chris Fields wrote: >> On Jan 28, 2011, at 12:18 PM, Ben Bimber wrote: >> >>> Hello, >>> >>> I'm using CommandExts to wrap a number of tools. In a pipeline I was >>> looking to make the tools log their current version. I realized that >>> instead of using run() in a unique way for each tool, perhaps there >>> should be a consistent method that gets called and returns a version >>> string. because obtaining this version string is specific to the >>> tool, perhaps each wrapper could provide a version() method that runs >>> the appropriate command on the executable, parses, then returns some >>> string. has something like this been discussed? have others already >>> solved this? >>> >>> Thanks, >>> Ben >> (apologies for the late response, maybe you worked it out?) >> >> If you mean a version string for the wrapped tool, there is a Bio::Tools::Run::WrapperBase method called version() I believe (not implemented for obvious reasons, but implemented by each wrapper as needed). If you want a specific version of the module (say, an API version) you may want to assign $VERSION or create a new global ($API_VERSION, perhaps) in case it conflicts with the BioPerl core version. >> >> chris >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich From bbimber at gmail.com Fri Feb 4 17:10:10 2011 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 4 Feb 2011 16:10:10 -0600 Subject: [Bioperl-l] Bioperl-run Wrappers In-Reply-To: <4D4C77C4.7080506@gmail.com> References: <4D4C77C4.7080506@gmail.com> Message-ID: Jason, thanks for the reply - i will look into that more, but that looks roughly comparable to what i tried to do. the one comment i have is that "Returns : float or undef" may or may not be the right thing. i forget which forum i saw this in, but relatively recently there was a similar discussion about whether version was a string or numeric. it happened because some tool reported its version as "1.3b". one could argue that this is the tool's fault and a version is always numeric, but it should be noted that sometimes it isnt reported as such. -Ben On Fri, Feb 4, 2011 at 4:03 PM, Jason Stajich wrote: > There is such a method in the Wrapper interface - it is implemented for > tools (exe) that support outputting the version in some way. e.g. here is > implementation for infernal (Bio::Tools::Run::Infernal) > > =head2 ?version > > ?Title ? : version > ?Usage ? : $v = $prog->version(); > ?Function: Determine the version number of the program (uses cmsearch) > ?Example : > ?Returns : float or undef > ?Args ? ?: none > > =cut > > sub version { > ? ?my ($self) = @_; > ? ?return unless $self->executable; > ? ?my $exe = $self->executable; > ? ?my $string = `$exe -h 2>&1`; > ? ?my $v; > ? ?if ($string =~ m{Infernal\s([\d.]+)}) { > ? ? ? ?$v = $1; > ? ? ? ?$self->deprecated(-message => "Only Infernal 1.0 and above is > supported.", > ? ? ? ? ? ? ? ? ? ? ? ? ?-version => 1.006001) if $v < 1; > ? ?} > ? ?return $self->{'_progversion'} = $v || undef; > } > > > > Ben Bimber wrote: >> >> Hi Chris, >> >> I actually was referring to the software version of the executable >> itself, not the perl code. ?for example, I added something like this >> to a mosaik wrapper I made: >> >> sub version { >> ? ? ? ?my $self = shift; >> ? ? ? ?my ($out, $err); >> ? ? ? ?IPC::Run::run([$self->executable, '-h'], '>', \$out, '2>', \$err); >> ? ? ? ?my @out = split("\n", $out); >> ? ? ? ?my $version = join(';', grep( /^Mosaik/i, @out)); >> ? ? ? ?$version =~ m/^.*0m ([\d\.]*)\b/i; >> ? ? ? ?return "Mosaik Version: $1\n"; >> >> } >> >> It runs the executable, parses the output (which is specific to mosaik >> in this case) and returns it. ?I had an issue with a tool version >> recently, so I decided it was probably a good idea to start recording >> them with pipelines. ?In hindsight version() is probably the wrong >> name since it's confusing with perl's VERSION, but maybe exe_version() >> or something makes sense. ?I would personally find it useful if there >> were a standard, but optional method across BioPerl wrappers to do >> this sort of thing. ?implementing it would be optional per wrapper. >> all it would really need to do it return a string. >> >> -Ben >> >> >> >> >> >> >> >> >> >> On Fri, Feb 4, 2011 at 3:19 PM, Chris Fields >> ?wrote: >>> >>> On Jan 28, 2011, at 12:18 PM, Ben Bimber wrote: >>> >>>> Hello, >>>> >>>> I'm using CommandExts to wrap a number of tools. ?In a pipeline I was >>>> looking to make the tools log their current version. ?I realized that >>>> instead of using run() in a unique way for each tool, perhaps there >>>> should be a consistent method that gets called and returns a version >>>> string. ?because obtaining this version string is specific to the >>>> tool, perhaps each wrapper could provide a version() method that runs >>>> the appropriate command on the executable, parses, then returns some >>>> string. ?has something like this been discussed? ?have others already >>>> solved this? >>>> >>>> Thanks, >>>> Ben >>> >>> (apologies for the late response, maybe you worked it out?) >>> >>> If you mean a version string for the wrapped tool, there is a >>> Bio::Tools::Run::WrapperBase method called version() I believe (not >>> implemented for obvious reasons, but implemented by each wrapper as needed). >>> ?If you want a specific version of the module (say, an API version) you may >>> want to assign $VERSION or create a new global ($API_VERSION, perhaps) in >>> case it conflicts with the BioPerl core version. >>> >>> chris >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > > > From aradwen at gmail.com Sat Feb 5 10:23:52 2011 From: aradwen at gmail.com (Radhouane Aniba) Date: Sat, 5 Feb 2011 10:23:52 -0500 Subject: [Bioperl-l] Biocoders.net Message-ID: Dear collegues, biocoders.net is now open for perl/bioperl and java/biojava coders since I recieved many requests for that. Feel free to join and to share your source codes and snippets. Regards Radhouane http://biocoders.net -- *Radhouane Aniba* *Bioinformatics Research Associate* *Institute for Advanced Computer Studies Center for Bioinformatics and Computational Biology* *(CBCB)* *University of Maryland, College Park MD 20742* From hlapp at drycafe.net Sat Feb 5 18:45:47 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sat, 5 Feb 2011 18:45:47 -0500 Subject: [Bioperl-l] NESCent Seeks Hackathon Whitepapers In-Reply-To: <0D7D89E4-C0D4-4347-A94C-21800E927746@ad.unc.edu> References: <0D7D89E4-C0D4-4347-A94C-21800E927746@ad.unc.edu> Message-ID: <066A2391-6041-408C-B26E-9B867DE785C7@drycafe.net> The National Evolutionary Synthesis Center (NESCent), in keeping with its objective to promote collaborative development of open-source, reusable, and standards-supporting informatics resources, sponsors highly collaborative, face-to-face software development events, called "hackathons" (see [1]). To ensure that this program continues to be responsive to user needs and to tap into the expertise and creativity of the evolutionary biology community, NESCent is soliciting short whitepapers (2-6 pages) [2] on potential target areas for future hackathons. To further encourage submissions, we have now distilled specific guidelines for proposing hackathon events, based on the experiences gained from events we have sponsored in the past: http://informatics.nescent.org/wiki/Hackathon_Whitepaper_Guidelines The Center's Call for Informatics Whitepapers [3] includes not only hackathons, but also a large spectrum of other initiatives to be undertaken by the Center, including training, software development, collaborative ontology development, and coordination of data standards. Whitepapers are accepted at any time and reviewed on an on- going basis. URLs: [1] Collaborative cyberinfrastructure events and programs organized by NESCent: http://informatics.nescent.org/wiki/Main_Page [2] NESCent Call for Informatics Whitepapers http://www.nescent.org/informatics/whitepapers.php [3] Hackathon Whitepaper Guidelines: http://informatics.nescent.org/wiki/Hackathon_Whitepaper_Guidelines [4] Past NESCent-sponsored hackathons: http://informatics.nescent.org/wiki/Main_Page#Hackathons From shalabh.sharma7 at gmail.com Sun Feb 6 15:42:02 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Sun, 6 Feb 2011 15:42:02 -0500 Subject: [Bioperl-l] Reading and writing fastq files Message-ID: Hi, i am trying to read and write fastq files. I can read them and can change format (like fastq files to fasta) but when i try to write them back as 'fastq' format i am getting a warning: --------------------- WARNING --------------------- MSG: You can't write FASTQ without supplying a Bio::Seq::Quality object! and i am not getting getting any output. I am using bioperl 1.61 My part of code looks like this: #!/usr/bin/perl -w use Bio::SeqIO; $in = Bio::SeqIO->new(-file => "$ARGV[0]", -format => 'fastq'); - - - - -- - - - - -- - - while(my $seq = $in->next_seq){ -- - - - - - - - - - - - - $outr->write_seq($in); } fastq file looks like this: @1477:2:1:1143:901/1 NTCGGTACAGCGACAAACAGACGATATCACCGGCTAAGCTCGATGGTGGTTACGGATGCGAAACAACGTGGTAGCTCAGGTAAGGATTTAAGGCCTTCTATTACTTTGGTTAATGAAGGCCGTGAACCAATTTGTGTGCCTGGACTCAATA + BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @1477:2:1:1143:901/2 ATTAACCACCGCACCTGCAGGCATTACATAATGCACCGCGATATTGGTTCCAGCCACCCAAATTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNGCTTCCCATCGNTAACCACCATC + \cf_ff[fdcfebcad^e\YadcYdceWe\^deaed_Y[c\dd[ce^bfbdbRLWa]R^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @1477:2:1:1166:923/1 NTACTCCAGCGGAAAATGCTACGCTTCGATCATTGCTAATATCAAATAACGTTTTTTGCTCAACCGATGAGCTTTCCAGTCGGTAAGGAAGCGGTTCATTAGCCTGAGCGAGCGGGTCAAAAACGATATCTTCGCGAGCTTCATACTTAAC + BKOHONNNMN_______________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB I would really appreciate if some one can help me out. Thanks Shalabh From bbimber at gmail.com Sun Feb 6 16:09:31 2011 From: bbimber at gmail.com (Ben Bimber) Date: Sun, 6 Feb 2011 15:09:31 -0600 Subject: [Bioperl-l] Reading and writing fastq files In-Reply-To: References: Message-ID: i think you want: $outr->write_seq($seq); instead of trying to write $in. -ben On Sun, Feb 6, 2011 at 2:42 PM, shalabh sharma wrote: > Hi, > ? ?i am trying to read and write fastq files. > I can read them and can change format (like fastq files to fasta) but when i > try to write them back as 'fastq' format i am getting a warning: > > --------------------- WARNING --------------------- > MSG: You can't write FASTQ without supplying a Bio::Seq::Quality object! > > and i am not getting getting any output. > I am using bioperl 1.61 > > My part of code looks like this: > > #!/usr/bin/perl -w > use Bio::SeqIO; > $in = Bio::SeqIO->new(-file => "$ARGV[0]", -format => 'fastq'); > - - - - -- - > ?- - - -- - - > > while(my $seq = $in->next_seq){ > ? ? ? ? -- - - - - ?- > ? ? ? ? - - - - - - - > ? ? ? ? $outr->write_seq($in); > } > > fastq file looks like this: > > @1477:2:1:1143:901/1 > NTCGGTACAGCGACAAACAGACGATATCACCGGCTAAGCTCGATGGTGGTTACGGATGCGAAACAACGTGGTAGCTCAGGTAAGGATTTAAGGCCTTCTATTACTTTGGTTAATGAAGGCCGTGAACCAATTTGTGTGCCTGGACTCAATA > + > BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > @1477:2:1:1143:901/2 > ATTAACCACCGCACCTGCAGGCATTACATAATGCACCGCGATATTGGTTCCAGCCACCCAAATTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNGCTTCCCATCGNTAACCACCATC > + > \cf_ff[fdcfebcad^e\YadcYdceWe\^deaed_Y[c\dd[ce^bfbdbRLWa]R^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > @1477:2:1:1166:923/1 > NTACTCCAGCGGAAAATGCTACGCTTCGATCATTGCTAATATCAAATAACGTTTTTTGCTCAACCGATGAGCTTTCCAGTCGGTAAGGAAGCGGTTCATTAGCCTGAGCGAGCGGGTCAAAAACGATATCTTCGCGAGCTTCATACTTAAC > + > BKOHONNNMN_______________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > > I would really appreciate if some one can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Sun Feb 6 16:12:39 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Sun, 6 Feb 2011 16:12:39 -0500 Subject: [Bioperl-l] Reading and writing fastq files In-Reply-To: References: Message-ID: Hi Ben, Thanks a lot, i didn't even noticed that, my bad. I really appreciate it. Thanks Shalabh On Sun, Feb 6, 2011 at 4:09 PM, Ben Bimber wrote: > i think you want: > > $outr->write_seq($seq); > > instead of trying to write $in. > > -ben > > > > On Sun, Feb 6, 2011 at 2:42 PM, shalabh sharma > wrote: > > Hi, > > i am trying to read and write fastq files. > > I can read them and can change format (like fastq files to fasta) but > when i > > try to write them back as 'fastq' format i am getting a warning: > > > > --------------------- WARNING --------------------- > > MSG: You can't write FASTQ without supplying a Bio::Seq::Quality object! > > > > and i am not getting getting any output. > > I am using bioperl 1.61 > > > > My part of code looks like this: > > > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > $in = Bio::SeqIO->new(-file => "$ARGV[0]", -format => 'fastq'); > > - - - - -- - > > - - - -- - - > > > > while(my $seq = $in->next_seq){ > > -- - - - - - > > - - - - - - - > > $outr->write_seq($in); > > } > > > > fastq file looks like this: > > > > @1477:2:1:1143:901/1 > > > NTCGGTACAGCGACAAACAGACGATATCACCGGCTAAGCTCGATGGTGGTTACGGATGCGAAACAACGTGGTAGCTCAGGTAAGGATTTAAGGCCTTCTATTACTTTGGTTAATGAAGGCCGTGAACCAATTTGTGTGCCTGGACTCAATA > > + > > > BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > > @1477:2:1:1143:901/2 > > > ATTAACCACCGCACCTGCAGGCATTACATAATGCACCGCGATATTGGTTCCAGCCACCCAAATTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNGCTTCCCATCGNTAACCACCATC > > + > > > \cf_ff[fdcfebcad^e\YadcYdceWe\^deaed_Y[c\dd[ce^bfbdbRLWa]R^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > > @1477:2:1:1166:923/1 > > > NTACTCCAGCGGAAAATGCTACGCTTCGATCATTGCTAATATCAAATAACGTTTTTTGCTCAACCGATGAGCTTTCCAGTCGGTAAGGAAGCGGTTCATTAGCCTGAGCGAGCGGGTCAAAAACGATATCTTCGCGAGCTTCATACTTAAC > > + > > > BKOHONNNMN_______________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > > > > I would really appreciate if some one can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sharmashalu.bio at gmail.com Mon Feb 7 17:07:33 2011 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Mon, 7 Feb 2011 17:07:33 -0500 Subject: [Bioperl-l] randomizing fastq sequences Message-ID: Hi, i am trying to test one program for which i need to change order of sequences in a fastq file. My fastq file contains about 50,000 sequences. Is there any script that can do this task? Thanks Shalu From simon.andrews at bbsrc.ac.uk Tue Feb 8 03:41:10 2011 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Tue, 8 Feb 2011 08:41:10 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: References: Message-ID: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> On 7 Feb 2011, at 22:07, shalu sharma wrote: > Hi, > i am trying to test one program for which i need to change order of > sequences in a fastq file. > My fastq file contains about 50,000 sequences. > Is there any script that can do this task? Since FastQ is supported in SeqIO you could do something like (untested): #!/usr/bin/perl use warnings; use strict; use List::Util 'shuffle'; use Bio::SeqIO; my @seqs; my $in = Bio::SeqIO->new(-file => 'your_intput.fastq', -format => 'Fastq'); while (my $seq = $in -> next_seq()) { push @seqs,$seq; } @seqs = shuffle(@seqs); my $out = Bio::SeqIO->new(-file => '>your_output.fastq', -format => 'Fastq'); foreach my $seq (@seqs) { $out->write_seq($seq); } ## End This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that. Simon. From fs5 at sanger.ac.uk Tue Feb 8 04:08:37 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 08 Feb 2011 09:08:37 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> References: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> Message-ID: <4D510815.1020800@sanger.ac.uk> If memory is an issue then I guess you could create a file of just the sequence IDs (one per line), then shuffle those (using List::Util like Simon demonstrated). In the end you would substitute the IDs for the whole fastq entry again, which you can do without reading an entire file into memory (might be bit slow but that probably doesn't matter) Frank simon andrews (BI) wrote: > On 7 Feb 2011, at 22:07, shalu sharma wrote: > > >> Hi, >> i am trying to test one program for which i need to change order of >> sequences in a fastq file. >> My fastq file contains about 50,000 sequences. >> Is there any script that can do this task? >> > > Since FastQ is supported in SeqIO you could do something like (untested): > > #!/usr/bin/perl > use warnings; > use strict; > use List::Util 'shuffle'; > use Bio::SeqIO; > > my @seqs; > > my $in = Bio::SeqIO->new(-file => 'your_intput.fastq', > -format => 'Fastq'); > > while (my $seq = $in -> next_seq()) { > push @seqs,$seq; > } > > @seqs = shuffle(@seqs); > > my $out = Bio::SeqIO->new(-file => '>your_output.fastq', > -format => 'Fastq'); > > foreach my $seq (@seqs) { > $out->write_seq($seq); > } > > ## End > > This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that. > > Simon. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Tue Feb 8 06:31:15 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 08 Feb 2011 11:31:15 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: <4D510815.1020800@sanger.ac.uk> References: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> <4D510815.1020800@sanger.ac.uk> Message-ID: <4D512983.1040008@gmail.com> TMTOWTDI, maybe also use the Tie::File module? Something like: #!/usr/bin/perl use warnings FATAL=>qw(all); use Modern::Perl; use Tie::File; use Fcntl qw(O_RDONLY); use List::Util qw(shuffle); my @fastq; tie @fastq, 'Tie::File', $ARGV[0], mode=>O_RDONLY or die $!; say join "\n", @fastq[4*$_..4*$_+3] for shuffle 0..$#fastq/4; Cheers, Roy. On 08/02/2011 09:08, Frank Schwach wrote: > If memory is an issue then I guess you could create a file of just the > sequence IDs (one per line), then shuffle those (using List::Util like > Simon demonstrated). In the end you would substitute the IDs for the > whole fastq entry again, which you can do without reading an entire file > into memory (might be bit slow but that probably doesn't > matter) > Frank > > > simon andrews (BI) wrote: >> On 7 Feb 2011, at 22:07, shalu sharma wrote: >> >> >>> Hi, >>> i am trying to test one program for which i need to change order of >>> sequences in a fastq file. >>> My fastq file contains about 50,000 sequences. >>> Is there any script that can do this task? >>> >> >> Since FastQ is supported in SeqIO you could do something like (untested): >> >> #!/usr/bin/perl >> use warnings; >> use strict; >> use List::Util 'shuffle'; >> use Bio::SeqIO; >> >> my @seqs; >> >> my $in = Bio::SeqIO->new(-file => 'your_intput.fastq', >> -format => 'Fastq'); >> >> while (my $seq = $in -> next_seq()) { >> push @seqs,$seq; >> } >> >> @seqs = shuffle(@seqs); >> >> my $out = Bio::SeqIO->new(-file => '>your_output.fastq', >> -format => 'Fastq'); >> >> foreach my $seq (@seqs) { >> $out->write_seq($seq); >> } >> >> ## End >> >> This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that. >> >> Simon. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From fs5 at sanger.ac.uk Tue Feb 8 06:57:12 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 08 Feb 2011 11:57:12 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: <4D512983.1040008@gmail.com> References: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> <4D510815.1020800@sanger.ac.uk> <4D512983.1040008@gmail.com> Message-ID: <4D512F98.2020005@sanger.ac.uk> nice one - but if I understand it correctly it relies on there being exactly 4 lines for each record. This is probably the case but it would be a good idea to double-check the fastq file in question, just to make sure. Frank Roy Chaudhuri wrote: > TMTOWTDI, maybe also use the Tie::File module? > > Something like: > > #!/usr/bin/perl > use warnings FATAL=>qw(all); > use Modern::Perl; > use Tie::File; > use Fcntl qw(O_RDONLY); > use List::Util qw(shuffle); > my @fastq; > tie @fastq, 'Tie::File', $ARGV[0], mode=>O_RDONLY or die $!; > say join "\n", @fastq[4*$_..4*$_+3] for shuffle 0..$#fastq/4; > > Cheers, > Roy. > > On 08/02/2011 09:08, Frank Schwach wrote: >> If memory is an issue then I guess you could create a file of just the >> sequence IDs (one per line), then shuffle those (using List::Util like >> Simon demonstrated). In the end you would substitute the IDs for the >> whole fastq entry again, which you can do without reading an entire file >> into memory (might be bit slow but that probably doesn't >> matter) >> Frank >> >> >> simon andrews (BI) wrote: >>> On 7 Feb 2011, at 22:07, shalu sharma wrote: >>> >>> >>>> Hi, >>>> i am trying to test one program for which i need to change order of >>>> sequences in a fastq file. >>>> My fastq file contains about 50,000 sequences. >>>> Is there any script that can do this task? >>>> >>> >>> Since FastQ is supported in SeqIO you could do something like >>> (untested): >>> >>> #!/usr/bin/perl >>> use warnings; >>> use strict; >>> use List::Util 'shuffle'; >>> use Bio::SeqIO; >>> >>> my @seqs; >>> >>> my $in = Bio::SeqIO->new(-file => 'your_intput.fastq', >>> -format => 'Fastq'); >>> >>> while (my $seq = $in -> next_seq()) { >>> push @seqs,$seq; >>> } >>> >>> @seqs = shuffle(@seqs); >>> >>> my $out = Bio::SeqIO->new(-file => '>your_output.fastq', >>> -format => 'Fastq'); >>> >>> foreach my $seq (@seqs) { >>> $out->write_seq($seq); >>> } >>> >>> ## End >>> >>> This has the disadvantage that it will hold all of the sequences in >>> memory whilst shuffling, but I don't think there's an easy way >>> around that. >>> >>> Simon. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Tue Feb 8 07:09:39 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 08 Feb 2011 12:09:39 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: <4D512F98.2020005@sanger.ac.uk> References: <0B4CB72F-2479-48B3-93BF-96733C670C7A@bbsrc.ac.uk> <4D510815.1020800@sanger.ac.uk> <4D512983.1040008@gmail.com> <4D512F98.2020005@sanger.ac.uk> Message-ID: <4D513283.2060205@gmail.com> Sorry, I should have included that caveat. On 08/02/2011 11:57, Frank Schwach wrote: > nice one - but if I understand it correctly it relies on there being > exactly 4 lines for each record. This is probably the case but it would > be a good idea to double-check the fastq file in question, just to make > sure. > > Frank > > > Roy Chaudhuri wrote: >> TMTOWTDI, maybe also use the Tie::File module? >> >> Something like: >> >> #!/usr/bin/perl >> use warnings FATAL=>qw(all); >> use Modern::Perl; >> use Tie::File; >> use Fcntl qw(O_RDONLY); >> use List::Util qw(shuffle); >> my @fastq; >> tie @fastq, 'Tie::File', $ARGV[0], mode=>O_RDONLY or die $!; >> say join "\n", @fastq[4*$_..4*$_+3] for shuffle 0..$#fastq/4; >> >> Cheers, >> Roy. >> >> On 08/02/2011 09:08, Frank Schwach wrote: >>> If memory is an issue then I guess you could create a file of just the >>> sequence IDs (one per line), then shuffle those (using List::Util like >>> Simon demonstrated). In the end you would substitute the IDs for the >>> whole fastq entry again, which you can do without reading an entire file >>> into memory (might be bit slow but that probably doesn't >>> matter) >>> Frank >>> >>> >>> simon andrews (BI) wrote: >>>> On 7 Feb 2011, at 22:07, shalu sharma wrote: >>>> >>>> >>>>> Hi, >>>>> i am trying to test one program for which i need to change order of >>>>> sequences in a fastq file. >>>>> My fastq file contains about 50,000 sequences. >>>>> Is there any script that can do this task? >>>>> >>>> >>>> Since FastQ is supported in SeqIO you could do something like >>>> (untested): >>>> >>>> #!/usr/bin/perl >>>> use warnings; >>>> use strict; >>>> use List::Util 'shuffle'; >>>> use Bio::SeqIO; >>>> >>>> my @seqs; >>>> >>>> my $in = Bio::SeqIO->new(-file => 'your_intput.fastq', >>>> -format => 'Fastq'); >>>> >>>> while (my $seq = $in -> next_seq()) { >>>> push @seqs,$seq; >>>> } >>>> >>>> @seqs = shuffle(@seqs); >>>> >>>> my $out = Bio::SeqIO->new(-file => '>your_output.fastq', >>>> -format => 'Fastq'); >>>> >>>> foreach my $seq (@seqs) { >>>> $out->write_seq($seq); >>>> } >>>> >>>> ## End >>>> >>>> This has the disadvantage that it will hold all of the sequences in >>>> memory whilst shuffling, but I don't think there's an easy way >>>> around that. >>>> >>>> Simon. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From MEC at stowers.org Tue Feb 8 10:12:47 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 8 Feb 2011 09:12:47 -0600 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: References: Message-ID: Gotta chime in.... If you're working with fastq files are working in unix and have the `shuf` command available I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences. Like this example, which uses a test fastq from my local install of bioperl: > cd ~/local/src/bioperl-live/t/data/fastq/ > cdbfasta -Q example.fastq 3 entries from file example.fastq were indexed in file example.fastq.cidx > cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq There would be issues if your IDs are not unique. Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > shalu sharma > Sent: Monday, February 07, 2011 4:08 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] randomizing fastq sequences > > Hi, > i am trying to test one program for which i need to change > order of sequences in a fastq file. > My fastq file contains about 50,000 sequences. > Is there any script that can do this task? > > Thanks > Shalu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Feb 8 10:53:27 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Feb 2011 09:53:27 -0600 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: References: Message-ID: <16863A4B-A091-42C9-BD6A-2CA383743ED0@illinois.edu> Just to note, I have been thinking about wrapping this for fast indexing and retrieval of FASTQ for bioperl (this came up in a prior thread, with the same suggestion from Malcolm IIRC). chris On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote: > Gotta chime in.... > > If > you're working with fastq files > are working in unix and have the `shuf` command available > > I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them > > Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences. > > Like this example, which uses a test fastq from my local install of bioperl: > >> cd ~/local/src/bioperl-live/t/data/fastq/ >> cdbfasta -Q example.fastq > 3 entries from file example.fastq were indexed in file example.fastq.cidx >> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq > > There would be issues if your IDs are not unique. > > Malcolm Cook > Stowers Institute for Medical Research - Bioinformatics > Kansas City, Missouri USA > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> shalu sharma >> Sent: Monday, February 07, 2011 4:08 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] randomizing fastq sequences >> >> Hi, >> i am trying to test one program for which i need to change >> order of sequences in a fastq file. >> My fastq file contains about 50,000 sequences. >> Is there any script that can do this task? >> >> Thanks >> Shalu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sharmashalu.bio at gmail.com Tue Feb 8 11:48:44 2011 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Tue, 8 Feb 2011 11:48:44 -0500 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: <16863A4B-A091-42C9-BD6A-2CA383743ED0@illinois.edu> References: <16863A4B-A091-42C9-BD6A-2CA383743ED0@illinois.edu> Message-ID: Hi All, Thanks for all the suggestions. @Simon Andrew and Roy: Your method worked perfect but now memory is the issue. Now i have to select 50K fastq sequences from a illumina data (around 70 mil reads) randomly , so is there again any module that can select random sequences from fastq file? I can still use same methods on 50k sequences but getting 50k from huge data set is a problem. Also at some point i need to shuffle the fastq reads (order of nucleotides). I am really sorry for asking lot of things , i know i am really bad in handling fastq sequences. i would really appreciate your suggestions. Thanks Shalu On Tue, Feb 8, 2011 at 10:53 AM, Chris Fields wrote: > Just to note, I have been thinking about wrapping this for fast indexing > and retrieval of FASTQ for bioperl (this came up in a prior thread, with the > same suggestion from Malcolm IIRC). > > chris > > On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote: > > > Gotta chime in.... > > > > If > > you're working with fastq files > > are working in unix and have the `shuf` command available > > > > I recommand you to install cdbyank > http://sourceforge.net/projects/cdbfasta/ which provides for indexing > fasta and fastq files and providing random access to them > > > > Index the fastq, then extract the IDs with cdyank, pipe them through > `shuf` and then through cdyank again to pull out the sequences. > > > > Like this example, which uses a test fastq from my local install of > bioperl: > > > >> cd ~/local/src/bioperl-live/t/data/fastq/ > >> cdbfasta -Q example.fastq > > 3 entries from file example.fastq were indexed in file example.fastq.cidx > >> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > > shuf_example.fastq > > > > There would be issues if your IDs are not unique. > > > > Malcolm Cook > > Stowers Institute for Medical Research - Bioinformatics > > Kansas City, Missouri USA > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >> shalu sharma > >> Sent: Monday, February 07, 2011 4:08 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] randomizing fastq sequences > >> > >> Hi, > >> i am trying to test one program for which i need to change > >> order of sequences in a fastq file. > >> My fastq file contains about 50,000 sequences. > >> Is there any script that can do this task? > >> > >> Thanks > >> Shalu > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Feb 8 12:29:56 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Feb 2011 11:29:56 -0600 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: References: <16863A4B-A091-42C9-BD6A-2CA383743ED0@illinois.edu> Message-ID: Shalu, (Note: this isn't a Perl solution): I do think this problem has been solved somewhat in R/BioC if you have it installed, in the ShortRead package (see 'Sampler-class' in the ShortRead docs). I think using perl and the current BioPerl Bio::Index::Fastq indexing scheme for FASTQ will be problematic/slow for very large files with millions of sequences (i.e. pretty much anything that is rolling out of modern day sequencing pipelines), as the current indexing implementation uses a very simple indexing scheme using DB_File, originally designed years ago for much smaller sequencing samples. Think: Sanger sequencing. Of course, this is with the caveat that I haven't tested this out personally, but I recall some complaints about this in the past (Jason?). There was an effort to deal with this at one point with AnyDBM_File (which allows SQLite now) but I don't think it progressed very far, primarily b/c there simply hasn't been enough demand. Most users seem to sample randomly from BAM files instead, which are conveniently accessible via samtools/Picard/bamtools/etc (bamtools has a 'random' option for this purpose). chris On Feb 8, 2011, at 10:48 AM, shalu sharma wrote: > Hi All, > Thanks for all the suggestions. > @Simon Andrew and Roy: > Your method worked perfect but now memory is the issue. > Now i have to select 50K fastq sequences from a illumina data (around 70 mil reads) randomly , so is there again any module that can select random sequences from fastq file? > > I can still use same methods on 50k sequences but getting 50k from huge data set is a problem. > Also at some point i need to shuffle the fastq reads (order of nucleotides). > > I am really sorry for asking lot of things , i know i am really bad in handling fastq sequences. > i would really appreciate your suggestions. > > Thanks > Shalu > > On Tue, Feb 8, 2011 at 10:53 AM, Chris Fields wrote: > Just to note, I have been thinking about wrapping this for fast indexing and retrieval of FASTQ for bioperl (this came up in a prior thread, with the same suggestion from Malcolm IIRC). > > chris > > On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote: > > > Gotta chime in.... > > > > If > > you're working with fastq files > > are working in unix and have the `shuf` command available > > > > I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them > > > > Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences. > > > > Like this example, which uses a test fastq from my local install of bioperl: > > > >> cd ~/local/src/bioperl-live/t/data/fastq/ > >> cdbfasta -Q example.fastq > > 3 entries from file example.fastq were indexed in file example.fastq.cidx > >> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq > > > > There would be issues if your IDs are not unique. > > > > Malcolm Cook > > Stowers Institute for Medical Research - Bioinformatics > > Kansas City, Missouri USA > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >> shalu sharma > >> Sent: Monday, February 07, 2011 4:08 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] randomizing fastq sequences > >> > >> Hi, > >> i am trying to test one program for which i need to change > >> order of sequences in a fastq file. > >> My fastq file contains about 50,000 sequences. > >> Is there any script that can do this task? > >> > >> Thanks > >> Shalu > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From simon.andrews at bbsrc.ac.uk Wed Feb 9 03:35:34 2011 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 9 Feb 2011 08:35:34 +0000 Subject: [Bioperl-l] randomizing fastq sequences In-Reply-To: References: <16863A4B-A091-42C9-BD6A-2CA383743ED0@illinois.edu> Message-ID: <9A0DA8F0-A3F1-47F6-951F-AD884F5630C8@bbsrc.ac.uk> On 8 Feb 2011, at 16:48, shalu sharma wrote: > Hi All, > Thanks for all the suggestions. > @Simon Andrew and Roy: > Your method worked perfect but now memory is the issue. > Now i have to select 50K fastq sequences from a illumina data (around 70 mil > reads) randomly , so is there again any module that can select random > sequences from fastq file? The simple approach to this is to do it in two passes. In the first pass you simply find out how many fastq entries you have in your file. You then randomly select 50K numbers from 1..[number of fastq seqs in file]. In the second pass you pull out any sequences at an index position you randomly selected. If you don't mind them being in the same order then you can just write them out immediately and use virtually no memory, or you could put them in an array and shuffle them before writing (using the same memory as the 50K experiment). If you're going to be doing a lot of shuffling on the same dataset then it would be worth looking into doing a proper indexing of your file, as others have suggested, but if you're only going to do this once per dataset then it might not save you any time. > Also at some point i need to shuffle the fastq reads (order of > nucleotides). Same basic process - extract the sequence, split it into an array, shuffle the array, reassign the sequence. If you want to keep the original quality scores associated with the same bases you'll need to shuffle indices and then reassemble both the shuffled sequences and qualities at the same time. Simon. From jordi.durban at gmail.com Mon Feb 14 05:41:56 2011 From: jordi.durban at gmail.com (Jordi Durban) Date: Mon, 14 Feb 2011 11:41:56 +0100 Subject: [Bioperl-l] parse columns file Message-ID: Hi all! I'm trying to parse a three columns file. The first one could be repeated. However, I would like to obtain the results for the first one. That is: my file: *uaccno=FF56QEU12HD1LC * *gi|166216293|sp|P0C616.1|PA2HA_BOTAS 3.52797e-18* uaccno=FF56QEU12HD1LC gi|164421989|gb|ABY55159.1| 3.52797e-18 *uaccno=FF56QEU12HMBY2* *gi|166216293|sp|P0C616.1|PA2HA_BOTAS 9.01317e-19* uaccno=FF56QEU12HMBY2 gi|164421989|gb|ABY55159.1| 9.01317e-19 *uaccno=FF56QEU12HDB9V * * gi|166215047|sp|P24605.3|PA2H2_BOTAS 2.61668e-25* uaccno=FF56QEU12HDB9V gi|71041979|pdb|1Y4L|A 2.61668e-25 I'm interested in the bold lines. I'll be grateful for some advices. Thanks -- Jordi From jordi.durban at gmail.com Mon Feb 14 06:34:36 2011 From: jordi.durban at gmail.com (Jordi Durban) Date: Mon, 14 Feb 2011 12:34:36 +0100 Subject: [Bioperl-l] parse columns file In-Reply-To: References: Message-ID: Thank you very much John. The output should be the two fields from each entry. In the example above, it should be: *uaccno=FF56QEU12HD1LC * *gi|166216293|sp|P0C616.1|PA2HA_BOTAS **uaccno=FF56QEU12HMBY2* *gi|166216293|sp|P0C616.1|PA2HA_BOTAS * *uaccno=FF56QEU12HDB9V * * gi|166215047|sp|P24605.3|PA2H2_BOTAS *According to http://perl.about.com/od/filesystem/a/perl_parse_tabs.htm I have to do: open (FILE, 'data.txt'); while () { chomp; ($name, $email, $phone) = split("\t"); print "Name: $name\n"; print "Email: $email\n"; print "Phone: $phone\n"; print "---------\n"; } close (FILE); But this script doesn't deal with the duplicated lines... 2011/2/14 John SJ Anderson > On Mon, Feb 14, 2011 at 05:41, Jordi Durban > wrote: > > Hi all! > > I'm trying to parse a three columns file. The first one could be > repeated. > > However, I would like to obtain the results for the first one. > [ snip ] > > You've given us the input. What should the output look like? > > j. > -- Jordi From jordi.durban at gmail.com Mon Feb 14 07:27:58 2011 From: jordi.durban at gmail.com (Jordi Durban) Date: Mon, 14 Feb 2011 13:27:58 +0100 Subject: [Bioperl-l] parse columns file In-Reply-To: <1297685447.4541.113.camel@deskpro15336.internal.sanger.ac.uk> References: <1297685447.4541.113.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Ok. That's true. I suspected that was a "hash" question but I don't know much about hashes and I expectd to find some bioperl scripts to do that. Thank you very much. 2011/2/14 Frank Schwach > Hi Jordi, > > you will want to use a *hash* to store your IDs if you want to make them > non-redundant. This is not BioPerl related - it's more of a basic Perl > question. My recommendation would be to get the excellent "Beginning > Perl for Bioinformatics" book from O'Reilly and also check out the > general-Perl sites and forums, such as perlmonks.com. You can also just > google for Perl hashes and see if it makes sense to you and try to work > it into the example script you found already. Good luck!! > > Frank > > > On Mon, 2011-02-14 at 12:34 +0100, Jordi Durban wrote: > > Thank you very much John. The output should be the two fields from each > > entry. In the example above, it should be: > > *uaccno=FF56QEU12HD1LC * *gi|166216293|sp|P0C616.1|PA2HA_BOTAS > > **uaccno=FF56QEU12HMBY2* *gi|166216293|sp|P0C616.1|PA2HA_BOTAS * > > *uaccno=FF56QEU12HDB9V * * gi|166215047|sp|P24605.3|PA2H2_BOTAS > > > > *According to http://perl.about.com/od/filesystem/a/perl_parse_tabs.htmI > > have to do: > > > > open (FILE, 'data.txt'); > > while () { > > chomp; > > ($name, $email, $phone) = split("\t"); > > print "Name: $name\n"; > > print "Email: $email\n"; > > print "Phone: $phone\n"; > > > > print "---------\n"; > > } > > > > close (FILE); > > > > But this script doesn't deal with the duplicated lines... > > > > 2011/2/14 John SJ Anderson > > > > > On Mon, Feb 14, 2011 at 05:41, Jordi Durban > > > wrote: > > > > Hi all! > > > > I'm trying to parse a three columns file. The first one could be > > > repeated. > > > > However, I would like to obtain the results for the first one. > > > [ snip ] > > > > > > You've given us the input. What should the output look like? > > > > > > j. > > > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > -- Jordi From fs5 at sanger.ac.uk Mon Feb 14 07:10:46 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 14 Feb 2011 12:10:46 +0000 Subject: [Bioperl-l] parse columns file In-Reply-To: References: Message-ID: <1297685447.4541.113.camel@deskpro15336.internal.sanger.ac.uk> Hi Jordi, you will want to use a *hash* to store your IDs if you want to make them non-redundant. This is not BioPerl related - it's more of a basic Perl question. My recommendation would be to get the excellent "Beginning Perl for Bioinformatics" book from O'Reilly and also check out the general-Perl sites and forums, such as perlmonks.com. You can also just google for Perl hashes and see if it makes sense to you and try to work it into the example script you found already. Good luck!! Frank On Mon, 2011-02-14 at 12:34 +0100, Jordi Durban wrote: > Thank you very much John. The output should be the two fields from each > entry. In the example above, it should be: > *uaccno=FF56QEU12HD1LC * *gi|166216293|sp|P0C616.1|PA2HA_BOTAS > **uaccno=FF56QEU12HMBY2* *gi|166216293|sp|P0C616.1|PA2HA_BOTAS * > *uaccno=FF56QEU12HDB9V * * gi|166215047|sp|P24605.3|PA2H2_BOTAS > > *According to http://perl.about.com/od/filesystem/a/perl_parse_tabs.htm I > have to do: > > open (FILE, 'data.txt'); > while () { > chomp; > ($name, $email, $phone) = split("\t"); > print "Name: $name\n"; > print "Email: $email\n"; > print "Phone: $phone\n"; > > print "---------\n"; > } > > close (FILE); > > But this script doesn't deal with the duplicated lines... > > 2011/2/14 John SJ Anderson > > > On Mon, Feb 14, 2011 at 05:41, Jordi Durban > > wrote: > > > Hi all! > > > I'm trying to parse a three columns file. The first one could be > > repeated. > > > However, I would like to obtain the results for the first one. > > [ snip ] > > > > You've given us the input. What should the output look like? > > > > j. > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rondonbio at yahoo.com.br Mon Feb 14 12:49:01 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 14 Feb 2011 09:49:01 -0800 (PST) Subject: [Bioperl-l] extract overhangs from Clustalw Message-ID: <285603.48757.qm@web130206.mail.mud.yahoo.com> How to extract overhangs from Clustalw alignments and return the aln file without the overhangs? I'm trying to use Bio::AlignIO, but Im confused. Thank you very much. Rondon From rene.malenfant at gmail.com Thu Feb 10 14:11:39 2011 From: rene.malenfant at gmail.com (=?ISO-8859-1?Q?Ren=E9_Malenfant?=) Date: Thu, 10 Feb 2011 12:11:39 -0700 Subject: [Bioperl-l] BioPerl Fastq Parser Error? Message-ID: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> Hi. I think there may be an error in BioPerl's FASTQ parser. It fails when the last character of a quality sequence is a zero and it is alone on a line by itself. I've attached the error message below: Thanks, Rene Malenfant Error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Quality string [!!) WZ`cii~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {uuooWWTWWWW>WWWQWWWWW]QKHHHHHHHHHHHHHHEEEEEEEEEEEEBEEEEBBBBBBBBBBBB] of length [120] doesn't match length of sequence nnAGACTTTGTATTTATGTTCCTTTTTGGTGGATTCTTAATGACTATATCGTTACTGTCGAATGCTAGAAGA AGGCTCTTTCCGAGGTCGGACAGCAGACTTTGTATTtatgttccttttt [121], line: 322124 STACK: Error::throw STACK: Bio::Root::Root::throw /home/rene/perl5/lib/perl5/Bio/Root/ Root.pm:368 STACK: Bio::SeqIO::fastq::next_dataset /home/rene/perl5/lib/perl5/Bio/ SeqIO/fastq.pm:102 STACK: Bio::SeqIO::fastq::next_seq /home/rene/perl5/lib/perl5/Bio/ SeqIO/fastq.pm:29 STACK: ./foo.pl:12 ----------------------------------------------------------- Program I'm trying to use: ==== #! /usr/bin/perl -w use strict; use Bio::SeqIO; my $inputfilename = shift; my $in = Bio::SeqIO->new(-file => "$inputfilename", -format => 'Fastq'); my $out = Bio::SeqIO->new(-file => ">outputfilename", -format => 'Fasta'); while ( my $seq = $in->next_seq() ) { $out->write_seq($seq); } exit; ==== Sequence that appears to be messing things up. It looks fine to me: ==== @21_NODE_24053-0.94|PREMKED.FA|PREDICTED: nnAGACTTTGTATTTATGTTCCTTTTTGGTGGATTCTTAATGACTATATCGTTACTGTCG AATGCTAGAAGAAGGCTCTTTCCGAGGTCGGACAGCAGACTTTGTATTtatgttcctttt t + !!)WZ`cii~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{uuooWWT W0WW>WWWQWWWWW]QKHHHHHHHHHHHHHHEEEEEEEEEEEEBEEEEBBBBBBBBBBBB 0 ==== From clements at nescent.org Mon Feb 14 14:43:22 2011 From: clements at nescent.org (Dave Clements) Date: Mon, 14 Feb 2011 11:43:22 -0800 Subject: [Bioperl-l] March 2011 GMOD Meeting early registration closes this Friday In-Reply-To: References: Message-ID: Hello all, *Register for the March 2011 GMOD Meeting ( http://gmod.org/wiki/March_2011_GMOD_Meeting) by this Friday, Feb 18, and save over 15%.* The March meeting is part of GMOD Americas 2011 ( http://gmod.org/wiki/GMOD_Americas_2011), an event that includes Satellite Meetings, a GMOD Course (already full), and for the first time, an "Introduction to GMOD" session the night before the meeting for GMOD newcomers. GMOD Americas 2011 events are being held at the US National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, United States. Also, if you have a topic you want to discuss at the meeting, please send it to Scott Cain ASAP so he can get you on the program. *About GMOD:* GMOD is the Generic Model Organism Database project, a collection of interoperable open-source software components for annotating, visualizing, managing and analyzing biological data. GMOD is also an active community of software developers and biologists addressing common challenges with their data. The GMOD suite includes widely used tools such as GBrowse and JBrowse (and WebGBrowse) for genome browsing, Apollo and MAKER for genome annotation, GBrowse_syn and CMap for comparative genomics visualization, Chado, BioMart and InterMine for data integration, management, and querying, and Galaxy and Ergatis (and ISGA) for data analysis. *Meeting Overview: *As with previous GMOD meetings, this meeting will have a mixture of project talks, component talks, and user talks. Our guest speaker is Dr. Eric Stone of North Carolina State University. Dr. Stone will talk about his experience on the "Drosophila Genome Reference Panel," a project that is sequencing and annotating almost 200 inbred lines. The agenda is driven by attendee suggestions, and you are encouraged to add your suggestions now. For an idea of what happens at a GMOD meeting, see the writeup of the September 2010 GMOD Meeting. GMOD meetings are an excellent way to meet GMOD developers and users and to learn (and affect) what's coming in the project. *Registration: *Registration for the March 2011 GMOD Meeting is $80 on or before February 18 $95 after February 18 Please register early, both to save money, and ensure a spot. You are also strongly encouraged to submit a talk and/or sign up for (or propose) a Satellite Meeting. Details on transportation, suggested lodging, and other logistics are on the GMOD Americas 2011 page. This meeting, and all GMOD Americas 2011 events, are jointly sponsored by NESCent and the Galaxy Project. Hope to see you in North Carolina! Dave Clements Galaxy Project -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://usegalaxy.org/ http://getgalaxy.org/ From p.j.a.cock at googlemail.com Mon Feb 14 15:14:23 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 14 Feb 2011 20:14:23 +0000 Subject: [Bioperl-l] BioPerl Fastq Parser Error? In-Reply-To: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> References: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> Message-ID: On Thu, Feb 10, 2011 at 7:11 PM, Ren? Malenfant wrote: > Hi. ?I think there may be an error in BioPerl's FASTQ parser. ?It fails when > the last character of a quality sequence is a zero and it is alone on a line > by itself. Interesting example - and it does look valid to me too. It would probably make a good test case, since automatic conversion of a string to a boolean may treat "0" (zero) as false. I'm curious where the file came from, since not many tools output line wrapped FASTQ (it makes parsing and indexing so much easier). Peter From cjfields at illinois.edu Mon Feb 14 15:14:44 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Feb 2011 14:14:44 -0600 Subject: [Bioperl-l] BioPerl Fastq Parser Error? In-Reply-To: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> References: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> Message-ID: <75D5224A-C239-4353-9610-422D547B4333@illinois.edu> Rene, This is a bug, now fixed in bioperl-live master on github. The parsed lines were not being checked for definedness (instead for true/false), so the very last line evaluated as false and prematurely ended the parse, should work fine now. chris On Feb 10, 2011, at 1:11 PM, Ren? Malenfant wrote: > Hi. I think there may be an error in BioPerl's FASTQ parser. It fails when the last character of a quality sequence is a zero and it is alone on a line by itself. > > I've attached the error message below: > > > Thanks, > > Rene Malenfant > > > > Error message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Quality string [!!)WZ`cii~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{uuooWWTWWWW>WWWQWWWWW]QKHHHHHHHHHHHHHHEEEEEEEEEEEEBEEEEBBBBBBBBBBBB] of length [120] > doesn't match length of sequence nnAGACTTTGTATTTATGTTCCTTTTTGGTGGATTCTTAATGACTATATCGTTACTGTCGAATGCTAGAAGAAGGCTCTTTCCGAGGTCGGACAGCAGACTTTGTATTtatgttccttttt > [121], line: 322124 > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/rene/perl5/lib/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /home/rene/perl5/lib/perl5/Bio/SeqIO/fastq.pm:102 > STACK: Bio::SeqIO::fastq::next_seq /home/rene/perl5/lib/perl5/Bio/SeqIO/fastq.pm:29 > STACK: ./foo.pl:12 > ----------------------------------------------------------- > > > Program I'm trying to use: > ==== > #! /usr/bin/perl -w > use strict; > use Bio::SeqIO; > > my $inputfilename = shift; > my $in = Bio::SeqIO->new(-file => "$inputfilename", > -format => 'Fastq'); > my $out = Bio::SeqIO->new(-file => ">outputfilename", > -format => 'Fasta'); > > while ( my $seq = $in->next_seq() ) { > $out->write_seq($seq); > } > > exit; > ==== > > > Sequence that appears to be messing things up. It looks fine to me: > ==== > @21_NODE_24053-0.94|PREMKED.FA|PREDICTED: > nnAGACTTTGTATTTATGTTCCTTTTTGGTGGATTCTTAATGACTATATCGTTACTGTCG > AATGCTAGAAGAAGGCTCTTTCCGAGGTCGGACAGCAGACTTTGTATTtatgttcctttt > t > + > !!)WZ`cii~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{uuooWWT > W0WW>WWWQWWWWW]QKHHHHHHHHHHHHHHEEEEEEEEEEEEBEEEEBBBBBBBBBBBB > 0 > ==== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Feb 14 15:21:16 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Feb 2011 14:21:16 -0600 Subject: [Bioperl-l] BioPerl Fastq Parser Error? In-Reply-To: References: <565AB0AA-CAB9-4D26-B663-D40618778A5C@gmail.com> Message-ID: <88BEE816-B64E-410E-90FA-2A698B436870@illinois.edu> On Feb 14, 2011, at 2:14 PM, Peter Cock wrote: > On Thu, Feb 10, 2011 at 7:11 PM, Ren? Malenfant > wrote: >> Hi. I think there may be an error in BioPerl's FASTQ parser. It fails when >> the last character of a quality sequence is a zero and it is alone on a line >> by itself. > > Interesting example - and it does look valid to me too. It would > probably make a good test case, since automatic conversion of > a string to a boolean may treat "0" (zero) as false. Right, that was the problem: here's the snippet from fastq.pm prior to my fixing it: while ($line) { # changed to 'while (defined $line) {...}' .... } > I'm curious where the file came from, since not many tools output > line wrapped FASTQ (it makes parsing and indexing so much easier). > > Peter Agreed, would be good to know. I haven't come across any tools that wrap FASTQ myself. chris From jordi.durban at gmail.com Mon Feb 14 15:52:23 2011 From: jordi.durban at gmail.com (Jordi Durban) Date: Mon, 14 Feb 2011 21:52:23 +0100 Subject: [Bioperl-l] extract overhangs from Clustalw In-Reply-To: <285603.48757.qm@web130206.mail.mud.yahoo.com> References: <285603.48757.qm@web130206.mail.mud.yahoo.com> Message-ID: What do you mean by "overhangs"? Have ypu heard about Extending Bio::Tools::Run::Clustalw? hope this helps. 2011/2/14 Rondon Neto > How to extract overhangs from Clustalw alignments and return the aln file > without the overhangs? I'm trying to use Bio::AlignIO, but Im confused. > Thank you very much. > > Rondon > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jordi From comp_sea at yahoo.com Mon Feb 14 16:51:46 2011 From: comp_sea at yahoo.com (Syed Mustafa Hussain) Date: Mon, 14 Feb 2011 13:51:46 -0800 (PST) Subject: [Bioperl-l] Bio::Graphics Message-ID: <707016.34332.qm@web33602.mail.mud.yahoo.com> Hi, We had recently updated BioPerl and Bio::Graphics and found some applications not working properly. As an example simple graphics script like: ##################################################### use Bio::Graphics::Panel; use Bio::SeqFeature::Generic; use CGI; # or any other CGI:: form handler/decoder print "Content-type: text/html\n\n"; my $panel = Bio::Graphics::Panel->new(-length => 700, -width => 700 ); my $track = $panel->add_track(-glyph => 'generic', -label => 1 ); my $feature = Bio::SeqFeature::Generic->new(-start => 1, -end => 400 ); $track->add_feature($feature); print "TEST IMAGE:
"; open GRAPH, "> /srv/www/htdocs/tmp/test.png" or die "could not open image file"; print GRAPH $panel->png; close(GRAPH); print ""; print ""; ##################################################### is giving following error when I debug: DB<1> n Can't locate object method "attributes" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 703. at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 703 Bio::Graphics::Glyph::bgcolor('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)') called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 1299 Bio::Graphics::Glyph::filled_box('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 400, 7) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 1471 Bio::Graphics::Glyph::draw_component('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/generic.pm line 347 Bio::Graphics::Glyph::generic::draw_component('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 1050 Bio::Graphics::Glyph::draw('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/generic.pm line 338 Bio::Graphics::Glyph::generic::draw('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/track.pm line 35 Bio::Graphics::Glyph::track::draw('Bio::Graphics::Glyph::track=HASH(0x1d5bb10)', 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Panel.pm line 588 Bio::Graphics::Panel::gd('Bio::Graphics::Panel=HASH(0x1b1f6b0)') called at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Panel.pm line 1067 Bio::Graphics::Panel::png('Bio::Graphics::Panel=HASH(0x1b1f6b0)') called at test.cgi line 37 Debugged program terminated. Use q to quit or R to restart, use o inhibit_exit to avoid stopping after program termination, h q, h R or h o to get additional info. ##################################################### Is it because of some incompatibility between version of bioperl and bio::Graphics or some thing else? Thanks, Mustafa. ____________________________________________________________________________________ Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains. http://farechase.yahoo.com/promo-generic-14795097 From roy.chaudhuri at gmail.com Tue Feb 15 07:19:00 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 15 Feb 2011 12:19:00 +0000 Subject: [Bioperl-l] extract overhangs from Clustalw In-Reply-To: References: <285603.48757.qm@web130206.mail.mud.yahoo.com> Message-ID: <4D5A6F34.9070109@gmail.com> Hi Rondon, I'm assuming you mean "how do I remove columns with gaps at either end of the alignment?" (since your alignment should be flush if it has come from ClustalW). In future, when you ask questions on the list please try and give as much information as possible, that way we do not have to guess at what you mean. If those are the only gaps in the alignment, then this will work: my $gapfree=$aln->remove_gaps; However, if there are gaps in the central region that you want to keep, then try something like this: $aln->gap_line=~/^(-*).*[^-](-*)$/; my $endgapfree=$aln->remove_columns([0,length($1)-1], [$aln->length-length($2), $aln->length-1]); This will not work correctly if there aren't gaps at both ends of the alignment, so you may have to add in a few checks if you can't make that assumption. Cheers, Roy. On 14/02/2011 20:52, Jordi Durban wrote: > What do you mean by "overhangs"? > Have ypu heard about Extending Bio::Tools::Run::Clustalw? > hope this helps. > 2011/2/14 Rondon Neto > >> How to extract overhangs from Clustalw alignments and return the aln file >> without the overhangs? I'm trying to use Bio::AlignIO, but Im confused. >> Thank you very much. >> >> Rondon >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From jovel_juan at hotmail.com Tue Feb 15 13:36:11 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 15 Feb 2011 18:36:11 +0000 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files In-Reply-To: <4D3F44A8.8020001@bioperl.org> References: <000801cbbcd6$ddd295e0$9977c1a0$@edu>,<4D3F44A8.8020001@bioperl.org> Message-ID: Good Morning guys, sorry for the naive question: What's the simplest way to fish redundant sequences (complete or partial) between two (or more) fasta files. I was thinking just to do it with SeqIO, opening two files, and compare each sequence of file_1 to each record of file_2, like: # Read each record of file 1 and compare to each read of file 2while(my $dna1 = $seqin1->next_seq){ my $seq1 = $dna1->seq; my $id1 = $dna1->id; # Iterate inside de second fasta file while(my $dna2 = $seqin2->next_seq){ my $seq2 = $dna2->seq; my $id2 = $dna2->id; if(($seq1 =~ /$seq2/)||($seq2 =~ /$seq1/)){ print "Match found \n"; print OUT "Records $id1 and $id2 are redundants"; ... I am afraid it is going to be slow for large files. AND, more importantly, how do I reset the object containing the second file to the first line, as done in Perl with (SEEK(IN, 0,0)) for example. Does SeqIO allows that (sorry, I am not a frequent user of SeqIO). If there is another more-elaborated module to fish such redundant sequences, I will appreciate to know. Thanks, JUAN From jovel_juan at hotmail.com Tue Feb 15 14:34:58 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 15 Feb 2011 19:34:58 +0000 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu>, <4D3F44A8.8020001@bioperl.org>, Message-ID: Good Morning guys, > sorry for the naive question: What's the simplest way to fish redundant sequences (complete or partial) between two (or more) fasta files. > I was thinking just to do it with SeqIO, opening two files, and compare each sequence of file_1 to each record of file_2, like: # Read each record of file 1 and compare to each read of file 2 while(my $dna1 = $seqin1->next_seq){ my $seq1 = $dna1->seq; my $id1 = $dna1->id; # Iterate inside de second fasta file while(my $dna2 = $seqin2->next_seq){ my $seq2 = $dna2->seq; my $id2 = $dna2->id; if(($seq1 =~ /$seq2/)||($seq2 =~ /$seq1/)){ print "Match found \n"; print OUT "Records $id1 and $id2 are redundants"; I am afraid it is going to be slow for large files. AND, more importantly, how do I reset the object containing the second file to the first line, as done in Perl with (SEEK(IN, 0,0)) for example. Does SeqIO allows that (sorry, I am not a frequent user of SeqIO). If there is another more-elaborated module to fish such redundant sequences, I will appreciate to know. Thanks, JUAN From cjfields at illinois.edu Tue Feb 15 15:02:38 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 15 Feb 2011 14:02:38 -0600 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu>, <4D3F44A8.8020001@bioperl.org>, Message-ID: Juan, If you are checking for simple complete matches, I would suggest using a hash. However, you are also looking for partial matches as well. In this case it seems like you should be (ab)using something akin to mcl to cluster like sequences together; you're essentially performing an all-v-all comparison anyway, at least take advantage of faster tools. So, basically: 1) Run an all-v-all comparison, filtering on 100% identity, no gaps 2) cluster using mcl Note the BLAST-related programs here for that purpose: http://www.micans.org/mcl/man/mclfamily.html I think you can also use other tools instead of BLAST, just can't recall the mcl pipeline at the moment to use. chris On Feb 15, 2011, at 1:34 PM, Juan Jovel wrote: > Good Morning guys, >> sorry for the naive question: What's the simplest way to fish redundant sequences (complete or partial) between two (or more) fasta files. >> I was thinking just to do it with SeqIO, opening two files, and compare each sequence of file_1 to each record of file_2, like: > # Read each record of file 1 and compare to each read of > file 2 > > while(my $dna1 = $seqin1->next_seq){ > > my $seq1 = > $dna1->seq; > > my $id1 = > $dna1->id; > > > > # Iterate > inside de second fasta file > > while(my $dna2 > = $seqin2->next_seq){ > > my $seq2 = $dna2->seq; > > my > $id2 = $dna2->id; > > > > > if(($seq1 =~ /$seq2/)||($seq2 =~ /$seq1/)){ > > > print "Match found \n"; > > > print OUT "Records $id1 and $id2 are redundants"; > > I am afraid it is going to be slow for large files. AND, more importantly, how do I reset the object containing the second file to the first line, as done in Perl with (SEEK(IN, 0,0)) for example. Does SeqIO allows that (sorry, I am not a frequent user of SeqIO). If there is another more-elaborated module to fish such redundant sequences, I will appreciate to know. > > Thanks, > JUAN From David.Messina at sbc.su.se Tue Feb 15 15:09:35 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 15 Feb 2011 21:09:35 +0100 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: Hi Juan, There's a nice example script in the BioPerl distribution that Jason Stajich wrote which uses MD5 checksums to do the sequence comparison: https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/bp_nrdb.PLS There are also faster, nonBioPerl tools for this, such as the one that comes with UCLUST: http://www.drive5.com/usearch/ Dave From cjfields at illinois.edu Tue Feb 15 15:25:07 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 15 Feb 2011 14:25:07 -0600 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: SHA should work as well, didn't think of that (though I suppose the encoding step for either would be rate-limiting?). Will have to keep an eye on UCLUST, didn't know about that one. chris On Feb 15, 2011, at 2:09 PM, Dave Messina wrote: > Hi Juan, > > There's a nice example script in the BioPerl distribution that Jason Stajich > wrote which uses MD5 checksums to do the sequence comparison: > > > https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/bp_nrdb.PLS > > > There are also faster, nonBioPerl tools for this, such as the one that comes > with UCLUST: > > http://www.drive5.com/usearch/ > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Tue Feb 15 15:28:09 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 15 Feb 2011 14:28:09 -0600 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: there there is CD-HIT and blastclust from ncbi (which I think still gets installed as part of installed NCBI blast suite) Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Tuesday, February 15, 2011 2:25 PM > To: Dave Messina > Cc: Juan Jovel; bioperl > Subject: Re: [Bioperl-l] Fishing redundant sequences in FASTA > files [Right formatting] > > SHA should work as well, didn't think of that (though I > suppose the encoding step for either would be rate-limiting?). > > Will have to keep an eye on UCLUST, didn't know about that one. > > chris > > On Feb 15, 2011, at 2:09 PM, Dave Messina wrote: > > > Hi Juan, > > > > There's a nice example script in the BioPerl distribution > that Jason > > Stajich wrote which uses MD5 checksums to do the sequence > comparison: > > > > > > > https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/ > > bp_nrdb.PLS > > > > > > There are also faster, nonBioPerl tools for this, such as > the one that > > comes with UCLUST: > > > > http://www.drive5.com/usearch/ > > > > > > Dave > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Tue Feb 15 15:47:20 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 15 Feb 2011 21:47:20 +0100 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: SHA should work as well, didn't think of that (though I suppose the encoding > step for either would be rate-limiting?). > I haven't tested it, but I suspect that encoding either MD5 or SHA would be relatively quick compared to the sequence I/O, no? Will have to keep an eye on UCLUST, didn't know about that one. As it happens, my current pipeline uses MCL but I'm testing UCLUST as a replacement since it's waaay faster. I'll let you know how the comparison turns out. And for that matter, if anyone listening has experience with UCLUST or CD-HIT or other clustering methods (ideally in the context of metagenomic next-gen sequence), please chime in with your thoughts. Dave From cjfields at illinois.edu Tue Feb 15 16:01:49 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 15 Feb 2011 15:01:49 -0600 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: <1A27685E-82ED-497E-97DB-DA877BFE91B3@illinois.edu> On Feb 15, 2011, at 2:47 PM, Dave Messina wrote: > SHA should work as well, didn't think of that (though I suppose the encoding >> step for either would be rate-limiting?). >> > > I haven't tested it, but I suspect that encoding either MD5 or SHA would be > relatively quick compared to the sequence I/O, no? Possibly. But one nice thing is clustering allows for partial matches (which I think is the original criterion). I don't believe SHA/MD5 would work for that purpose. > Will have to keep an eye on UCLUST, didn't know about that one. > > > As it happens, my current pipeline uses MCL but I'm testing UCLUST as a > replacement since it's waaay faster. I'll let you know how the comparison > turns out. > > And for that matter, if anyone listening has experience with UCLUST or > CD-HIT or other clustering methods (ideally in the context of metagenomic > next-gen sequence), please chime in with your thoughts. As malcolm pointed out, blastclust is also available with legacy BLAST, though I'm not sure it's available with BLAST+ (didn't see anything obvious with BLAST+ for that purpose). chris From David.Messina at sbc.su.se Tue Feb 15 16:22:58 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 15 Feb 2011 22:22:58 +0100 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: <1A27685E-82ED-497E-97DB-DA877BFE91B3@illinois.edu> References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> <1A27685E-82ED-497E-97DB-DA877BFE91B3@illinois.edu> Message-ID: > > But one nice thing is clustering allows for partial matches (which I think > is the original criterion). I don't believe SHA/MD5 would work for that > purpose. Yep, for sure. Checksums will find full-length exact matches only. Dave From jason at bioperl.org Tue Feb 15 17:21:34 2011 From: jason at bioperl.org (Jason Stajich) Date: Tue, 15 Feb 2011 14:21:34 -0800 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> <1A27685E-82ED-497E-97DB-DA877BFE91B3@illinois.edu> Message-ID: <4D5AFC6E.9020209@bioperl.org> also see cd-hit which allows you to tune the %id matching. Dave Messina wrote: >> But one nice thing is clustering allows for partial matches (which I think >> is the original criterion). I don't believe SHA/MD5 would work for that >> purpose. > > > Yep, for sure. Checksums will find full-length exact matches only. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From roy.chaudhuri at gmail.com Wed Feb 16 05:49:39 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 16 Feb 2011 10:49:39 +0000 Subject: [Bioperl-l] Res: extract overhangs from Clustalw In-Reply-To: <93231.32484.qm@web130201.mail.mud.yahoo.com> References: <285603.48757.qm@web130206.mail.mud.yahoo.com> <4D5A6F34.9070109@gmail.com> <93231.32484.qm@web130201.mail.mud.yahoo.com> Message-ID: <4D5BABC3.9000601@gmail.com> Hi Rondon, Please remember to cc the mailing list when you reply, that way others can chip in with an answer. The $endgapfree variable contains a Bio::SimpleAlign object, not a Bio::AlignIO. So you can get rid of the while loop and just say: $out->write_aln($endgapfree); Cheers, Roy. On 15/02/2011 18:05, Rondon Neto wrote: > Thank you! Sorry my poor question. > > I have gaps in the central region and want to keep them. I use your > suggestion and that works. I tried to print the alignment without > success, so I'm thinking to save it in a new file, like we do in SeqIO, > but its not working. Is that the way? see my code: > > ----------------------------------------------- > #/usr/bin/perl > use warnings; > use strict; > use Bio::AlignIO; > > my $str = Bio::AlignIO->new('-file' => $ARGV[0]); > my $aln = $str->next_aln(); > $aln->gap_line=~/^(-*).*[^-](-*)$/; > my $endgapfree=$aln->remove_columns([0,length($1)-1], > [$aln->length-length($2), $aln->length-1]); > > my $out = Bio::AlignIO->new(-file => ">test.out", > -format => 'clustalw'); > > while ( my $aln = $endgapfree->next_aln() ) { > $out->write_aln($aln); > } > > exit; > ------------------------------------------------ > > Thank you again, > > Rondon Neto > > > > ------------------------------------------------------------------------ > *De:* Roy Chaudhuri > *Para:* Rondon Neto > *Cc:* Jordi Durban ; bioperl-l at lists.open-bio.org > *Enviadas:* Ter?a-feira, 15 de Fevereiro de 2011 10:19:00 > *Assunto:* Re: [Bioperl-l] extract overhangs from Clustalw > > Hi Rondon, > > I'm assuming you mean "how do I remove columns with gaps at either end > of the alignment?" (since your alignment should be flush if it has come > from ClustalW). In future, when you ask questions on the list please try > and give as much information as possible, that way we do not have to > guess at what you mean. > > If those are the only gaps in the alignment, then this will work: > > my $gapfree=$aln->remove_gaps; > > However, if there are gaps in the central region that you want to keep, > then try something like this: > > $aln->gap_line=~/^(-*).*[^-](-*)$/; > my $endgapfree=$aln->remove_columns([0,length($1)-1], > [$aln->length-length($2), $aln->length-1]); > > This will not work correctly if there aren't gaps at both ends of the > alignment, so you may have to add in a few checks if you can't make that > assumption. > > Cheers, > Roy. > > On 14/02/2011 20:52, Jordi Durban wrote: > > What do you mean by "overhangs"? > > Have ypu heard about Extending Bio::Tools::Run::Clustalw? > > hope this helps. > > 2011/2/14 Rondon Neto > > > > >> How to extract overhangs from Clustalw alignments and return the aln > file > >> without the overhangs? I'm trying to use Bio::AlignIO, but Im confused. > >> Thank you very much. > >> > >> Rondon > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > From ghai.rohit at gmail.com Wed Feb 16 06:03:44 2011 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Wed, 16 Feb 2011 12:03:44 +0100 Subject: [Bioperl-l] Res: extract overhangs from Clustalw In-Reply-To: <4D5BABC3.9000601@gmail.com> References: <285603.48757.qm@web130206.mail.mud.yahoo.com> <4D5A6F34.9070109@gmail.com> <93231.32484.qm@web130201.mail.mud.yahoo.com> <4D5BABC3.9000601@gmail.com> Message-ID: Hi You could also consider using trimAL for cutting out selected columns from an alignment. http://trimal.cgenomics.org/ cheers Rohit On Wed, Feb 16, 2011 at 11:49 AM, Roy Chaudhuri wrote: > Hi Rondon, > > Please remember to cc the mailing list when you reply, that way others can > chip in with an answer. > > The $endgapfree variable contains a Bio::SimpleAlign object, not a > Bio::AlignIO. So you can get rid of the while loop and just say: > > $out->write_aln($endgapfree); > > Cheers, > Roy. > > On 15/02/2011 18:05, Rondon Neto wrote: > >> Thank you! Sorry my poor question. >> >> I have gaps in the central region and want to keep them. I use your >> suggestion and that works. I tried to print the alignment without >> success, so I'm thinking to save it in a new file, like we do in SeqIO, >> but its not working. Is that the way? see my code: >> >> ----------------------------------------------- >> #/usr/bin/perl >> use warnings; >> use strict; >> use Bio::AlignIO; >> >> my $str = Bio::AlignIO->new('-file' => $ARGV[0]); >> my $aln = $str->next_aln(); >> $aln->gap_line=~/^(-*).*[^-](-*)$/; >> my $endgapfree=$aln->remove_columns([0,length($1)-1], >> [$aln->length-length($2), $aln->length-1]); >> >> my $out = Bio::AlignIO->new(-file => ">test.out", >> -format => 'clustalw'); >> >> while ( my $aln = $endgapfree->next_aln() ) { >> $out->write_aln($aln); >> } >> >> exit; >> ------------------------------------------------ >> >> Thank you again, >> >> Rondon Neto >> >> >> >> ------------------------------------------------------------------------ >> *De:* Roy Chaudhuri >> *Para:* Rondon Neto >> *Cc:* Jordi Durban ; bioperl-l at lists.open-bio.org >> *Enviadas:* Ter?a-feira, 15 de Fevereiro de 2011 10:19:00 >> *Assunto:* Re: [Bioperl-l] extract overhangs from Clustalw >> >> Hi Rondon, >> >> I'm assuming you mean "how do I remove columns with gaps at either end >> of the alignment?" (since your alignment should be flush if it has come >> from ClustalW). In future, when you ask questions on the list please try >> and give as much information as possible, that way we do not have to >> guess at what you mean. >> >> If those are the only gaps in the alignment, then this will work: >> >> my $gapfree=$aln->remove_gaps; >> >> However, if there are gaps in the central region that you want to keep, >> then try something like this: >> >> $aln->gap_line=~/^(-*).*[^-](-*)$/; >> my $endgapfree=$aln->remove_columns([0,length($1)-1], >> [$aln->length-length($2), $aln->length-1]); >> >> This will not work correctly if there aren't gaps at both ends of the >> alignment, so you may have to add in a few checks if you can't make that >> assumption. >> >> Cheers, >> Roy. >> >> On 14/02/2011 20:52, Jordi Durban wrote: >> > What do you mean by "overhangs"? >> > Have ypu heard about Extending Bio::Tools::Run::Clustalw? >> > hope this helps. >> > 2011/2/14 Rondon Neto> > >> > >> >> How to extract overhangs from Clustalw alignments and return the aln >> file >> >> without the overhangs? I'm trying to use Bio::AlignIO, but Im >> confused. >> >> Thank you very much. >> >> >> >> Rondon >> >> >> >> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >> > >> > >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adsj at novozymes.com Wed Feb 16 08:01:31 2011 From: adsj at novozymes.com (Adam =?utf-8?Q?Sj=C3=B8gren?=) Date: Wed, 16 Feb 2011 14:01:31 +0100 Subject: [Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting] In-Reply-To: (Chris Fields's message of "Tue, 15 Feb 2011 14:25:07 -0600") References: <000801cbbcd6$ddd295e0$9977c1a0$@edu> <4D3F44A8.8020001@bioperl.org> Message-ID: <87d3msf80k.fsf@topper.koldfront.dk> On Tue, 15 Feb 2011 14:25:07 -0600, Chris wrote: > SHA should work as well, didn't think of that (though I suppose the > encoding step for either would be rate-limiting?). Disk I/O might be the bottleneck - on a 3+ year old desktop I get ~144 MB/s for sha1 and ~217 MB/s for md5 in a simple test: $ dd if=/dev/zero bs=1M count=1024 | sha1sum - 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 7.44032 s, 144 MB/s 2a492f15396a6768bcbca016993f4b4c8b0b5307 - $ dd if=/dev/zero bs=1M count=1024 | md5sum - 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.94205 s, 217 MB/s cd573cfaace07e7949bc0c46028904ff - On a reasonably new standard Dell desktop I get ~249 MB/s and ~410 MB/s respectively. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From lembark at wrkhors.com Wed Feb 16 10:12:18 2011 From: lembark at wrkhors.com (Steven Lembark) Date: Wed, 16 Feb 2011 09:12:18 -0600 Subject: [Bioperl-l] YAPC::NA 2011 is going to be at a resort this year... Message-ID: <20110216091218.36a10773.lembark_wrkhors.com@wrkhors.com> Comfortable place to meet, and much cheaper than any of the Bioinformatics conferences I know about. It'd be great to get a track on bioinfrmatics there. The point is getting talks that are useful to people who use Perl and BioPerl -- not just about Perl or Bio::* internals themselves. Useful topics include things like how-to talks about getting work done with Perl, BioPerl, Bio::*, or even integrating Perl with R. The website, including submissions page, are at: -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lembark at wrkhors.com +1 888 359 3508 From shalabh.sharma7 at gmail.com Wed Feb 16 10:38:16 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 16 Feb 2011 10:38:16 -0500 Subject: [Bioperl-l] Trimming low quality reads Message-ID: Hi, Is there any bioperl module available to quality trim in fasta-qual format. i am little worried about the efficiency as i have huge data (~ 50 gb). Also i would really appreciate if some one has some other suggestions. Thanks Shalabh From jordi.durban at gmail.com Wed Feb 16 11:36:06 2011 From: jordi.durban at gmail.com (Jordi Durban) Date: Wed, 16 Feb 2011 17:36:06 +0100 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: Message-ID: Well, there's a program called Seqtrim that uses bioperl to trim the sequences. Here more information: http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi Hope this helps. 2011/2/16 shalabh sharma > Hi, > Is there any bioperl module available to quality trim in fasta-qual > format. > i am little worried about the efficiency as i have huge data (~ 50 gb). > Also i would really appreciate if some one has some other suggestions. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jordi From jason.stajich at gmail.com Wed Feb 16 11:41:40 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 16 Feb 2011 08:41:40 -0800 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: Message-ID: <4D5BFE44.80604@gmail.com> I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ There are lots of answers to NGS questions on seqanswers too http://www.google.com/search?q=site:seqanswers.com+trim Jordi Durban wrote: > Well, there's a program called Seqtrim that uses bioperl to trim the > sequences. > Here more information: > http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi > Hope this helps. > > 2011/2/16 shalabh sharma > >> Hi, >> Is there any bioperl module available to quality trim in fasta-qual >> format. >> i am little worried about the efficiency as i have huge data (~ 50 gb). >> Also i would really appreciate if some one has some other suggestions. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Jason Stajich From cjfields at illinois.edu Wed Feb 16 11:59:30 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Feb 2011 10:59:30 -0600 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: <4D5BFE44.80604@gmail.com> References: <4D5BFE44.80604@gmail.com> Message-ID: <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> +1 on using fastx. I believe this is what our local seq pipeline uses prior to us sending out the processed stuff. chris On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: > I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ > > There are lots of answers to NGS questions on seqanswers too > http://www.google.com/search?q=site:seqanswers.com+trim > > > Jordi Durban wrote: >> Well, there's a program called Seqtrim that uses bioperl to trim the >> sequences. >> Here more information: >> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi >> Hope this helps. >> >> 2011/2/16 shalabh sharma >> >>> Hi, >>> Is there any bioperl module available to quality trim in fasta-qual >>> format. >>> i am little worried about the efficiency as i have huge data (~ 50 gb). >>> Also i would really appreciate if some one has some other suggestions. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > -- > Jason Stajich > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jovel_juan at hotmail.com Wed Feb 16 12:01:21 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Wed, 16 Feb 2011 17:01:21 +0000 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: Message-ID: Hello Shalabh, http://hannonlab.cshl.edu/fastx_toolkit/ is pretty good. Do not re-invent the wheel if not necessary. The Hannon's lab has done a lot of work on deep sequencing analyses which led to many fundamental discoveries.....this is a reason to trust their processing algorithms. Best, JUAN From bbimber at gmail.com Wed Feb 16 12:10:34 2011 From: bbimber at gmail.com (Ben Bimber) Date: Wed, 16 Feb 2011 11:10:34 -0600 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> Message-ID: Fastx is great and we use those tools in a number of places, but If I'm not mistaken, doesnt its trimming involve filters their either including or excluding the read as a whole, rather than end clipping? The need probably depends on your data. With short reads, I could imagine that's what you want. With 500bp 454 reads, end clipping is nice. I ended up making a simple little (and not terribly efficient) script that does 3' end clipping. My datasets are orders of magnitude smaller than what you posted though.... -Ben On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields wrote: > +1 on using fastx. ?I believe this is what our local seq pipeline uses prior to us sending out the processed stuff. > > chris > > On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: > >> I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ >> >> There are lots of answers to NGS questions on seqanswers too >> http://www.google.com/search?q=site:seqanswers.com+trim >> >> >> Jordi Durban wrote: >>> Well, there's a program called Seqtrim that uses bioperl to trim the >>> sequences. >>> Here more information: >>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi >>> Hope this helps. >>> >>> 2011/2/16 shalabh sharma >>> >>>> Hi, >>>> ? ?Is there any bioperl module available to quality trim in fasta-qual >>>> ?format. >>>> i am little worried about the efficiency as i have huge data (~ 50 gb). >>>> Also i would really appreciate if some one has some other suggestions. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >> >> -- >> Jason Stajich >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Wed Feb 16 12:28:03 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 16 Feb 2011 12:28:03 -0500 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> Message-ID: Hi , Thanks all for valuable suggestions, actually i was looking at FASTX tool kit but at i glance i saw that the quality filter is only implemented for FASTQ files and not for fasta files. But i will take a look again. Thanks Shalabh On Wed, Feb 16, 2011 at 12:10 PM, Ben Bimber wrote: > Fastx is great and we use those tools in a number of places, but If > I'm not mistaken, doesnt its trimming involve filters their either > including or excluding the read as a whole, rather than end clipping? > > The need probably depends on your data. With short reads, I could > imagine that's what you want. With 500bp 454 reads, end clipping is > nice. I ended up making a simple little (and not terribly efficient) > script that does 3' end clipping. My datasets are orders of magnitude > smaller than what you posted though.... > > -Ben > > > > > On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields > wrote: > > +1 on using fastx. I believe this is what our local seq pipeline uses > prior to us sending out the processed stuff. > > > > chris > > > > On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: > > > >> I would use a faster implementation like the fastx toolkit - > http://hannonlab.cshl.edu/fastx_toolkit/ > >> > >> There are lots of answers to NGS questions on seqanswers too > >> http://www.google.com/search?q=site:seqanswers.com+trim > >> > >> > >> Jordi Durban wrote: > >>> Well, there's a program called Seqtrim that uses bioperl to trim the > >>> sequences. > >>> Here more information: > >>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi > >>> Hope this helps. > >>> > >>> 2011/2/16 shalabh sharma > >>> > >>>> Hi, > >>>> Is there any bioperl module available to quality trim in fasta-qual > >>>> format. > >>>> i am little worried about the efficiency as i have huge data (~ 50 > gb). > >>>> Also i would really appreciate if some one has some other suggestions. > >>>> > >>>> Thanks > >>>> Shalabh > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Wed Feb 16 12:36:01 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Feb 2011 11:36:01 -0600 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> Message-ID: Ah, you need qual filtering tied to a specific fasta file. As most tools like fastx expect FASTQ input, it might be advisable to convert to FASTQ (which Bio::SeqIO::fastq does if I'm not mistaken). chris On Feb 16, 2011, at 11:28 AM, shalabh sharma wrote: > Hi , > Thanks all for valuable suggestions, actually i was looking at FASTX tool kit but at i glance i saw that the quality filter is only implemented for FASTQ files and not for fasta files. > But i will take a look again. > > Thanks > Shalabh > > > On Wed, Feb 16, 2011 at 12:10 PM, Ben Bimber wrote: > Fastx is great and we use those tools in a number of places, but If > I'm not mistaken, doesnt its trimming involve filters their either > including or excluding the read as a whole, rather than end clipping? > > The need probably depends on your data. With short reads, I could > imagine that's what you want. With 500bp 454 reads, end clipping is > nice. I ended up making a simple little (and not terribly efficient) > script that does 3' end clipping. My datasets are orders of magnitude > smaller than what you posted though.... > > -Ben > > > > > On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields wrote: > > +1 on using fastx. I believe this is what our local seq pipeline uses prior to us sending out the processed stuff. > > > > chris > > > > On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: > > > >> I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ > >> > >> There are lots of answers to NGS questions on seqanswers too > >> http://www.google.com/search?q=site:seqanswers.com+trim > >> > >> > >> Jordi Durban wrote: > >>> Well, there's a program called Seqtrim that uses bioperl to trim the > >>> sequences. > >>> Here more information: > >>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi > >>> Hope this helps. > >>> > >>> 2011/2/16 shalabh sharma > >>> > >>>> Hi, > >>>> Is there any bioperl module available to quality trim in fasta-qual > >>>> format. > >>>> i am little worried about the efficiency as i have huge data (~ 50 gb). > >>>> Also i would really appreciate if some one has some other suggestions. > >>>> > >>>> Thanks > >>>> Shalabh > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Wed Feb 16 12:49:34 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Feb 2011 11:49:34 -0600 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> Message-ID: <1B27EE5A-86FE-4C74-B4DC-0899881B3A41@illinois.edu> Ben, I haven't used fastx directly, but from the docs I would guess fastx_clipper returns everything that isn't adaptor-only, has just N's, or is above a specified length. (e.g. I would assume running 'fastx_clipper -k -n -l 0 ' would return everything). Is that not the case? chris On Feb 16, 2011, at 11:10 AM, Ben Bimber wrote: > Fastx is great and we use those tools in a number of places, but If > I'm not mistaken, doesnt its trimming involve filters their either > including or excluding the read as a whole, rather than end clipping? > > The need probably depends on your data. With short reads, I could > imagine that's what you want. With 500bp 454 reads, end clipping is > nice. I ended up making a simple little (and not terribly efficient) > script that does 3' end clipping. My datasets are orders of magnitude > smaller than what you posted though.... > > -Ben > > > > > On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields wrote: >> +1 on using fastx. I believe this is what our local seq pipeline uses prior to us sending out the processed stuff. >> >> chris >> >> On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: >> >>> I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ >>> >>> There are lots of answers to NGS questions on seqanswers too >>> http://www.google.com/search?q=site:seqanswers.com+trim >>> >>> >>> Jordi Durban wrote: >>>> Well, there's a program called Seqtrim that uses bioperl to trim the >>>> sequences. >>>> Here more information: >>>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi >>>> Hope this helps. >>>> >>>> 2011/2/16 shalabh sharma >>>> >>>>> Hi, >>>>> Is there any bioperl module available to quality trim in fasta-qual >>>>> format. >>>>> i am little worried about the efficiency as i have huge data (~ 50 gb). >>>>> Also i would really appreciate if some one has some other suggestions. >>>>> >>>>> Thanks >>>>> Shalabh >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>> >>> -- >>> Jason Stajich >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed Feb 16 12:50:39 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 16 Feb 2011 12:50:39 -0500 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> Message-ID: @chris, Thanks for the suggestions, as i have mentioned earlier that i have very huge dataset, so before starting anything i am trying to run few test to see the efficiency of seqTrim, so i can avoid converting fasta to fastq. Thanks Shalabh On Wed, Feb 16, 2011 at 12:36 PM, Chris Fields wrote: > Ah, you need qual filtering tied to a specific fasta file. As most tools > like fastx expect FASTQ input, it might be advisable to convert to FASTQ > (which Bio::SeqIO::fastq does if I'm not mistaken). > > chris > > On Feb 16, 2011, at 11:28 AM, shalabh sharma wrote: > > > Hi , > > Thanks all for valuable suggestions, actually i was looking at > FASTX tool kit but at i glance i saw that the quality filter is only > implemented for FASTQ files and not for fasta files. > > But i will take a look again. > > > > Thanks > > Shalabh > > > > > > On Wed, Feb 16, 2011 at 12:10 PM, Ben Bimber wrote: > > Fastx is great and we use those tools in a number of places, but If > > I'm not mistaken, doesnt its trimming involve filters their either > > including or excluding the read as a whole, rather than end clipping? > > > > The need probably depends on your data. With short reads, I could > > imagine that's what you want. With 500bp 454 reads, end clipping is > > nice. I ended up making a simple little (and not terribly efficient) > > script that does 3' end clipping. My datasets are orders of magnitude > > smaller than what you posted though.... > > > > -Ben > > > > > > > > > > On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields > wrote: > > > +1 on using fastx. I believe this is what our local seq pipeline uses > prior to us sending out the processed stuff. > > > > > > chris > > > > > > On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: > > > > > >> I would use a faster implementation like the fastx toolkit - > http://hannonlab.cshl.edu/fastx_toolkit/ > > >> > > >> There are lots of answers to NGS questions on seqanswers too > > >> http://www.google.com/search?q=site:seqanswers.com+trim > > >> > > >> > > >> Jordi Durban wrote: > > >>> Well, there's a program called Seqtrim that uses bioperl to trim the > > >>> sequences. > > >>> Here more information: > > >>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi > > >>> Hope this helps. > > >>> > > >>> 2011/2/16 shalabh sharma > > >>> > > >>>> Hi, > > >>>> Is there any bioperl module available to quality trim in > fasta-qual > > >>>> format. > > >>>> i am little worried about the efficiency as i have huge data (~ 50 > gb). > > >>>> Also i would really appreciate if some one has some other > suggestions. > > >>>> > > >>>> Thanks > > >>>> Shalabh > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>> > > >>> > > >>> > > >>> > > >> > > >> -- > > >> Jason Stajich > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > From bbimber at gmail.com Wed Feb 16 13:15:24 2011 From: bbimber at gmail.com (Ben Bimber) Date: Wed, 16 Feb 2011 12:15:24 -0600 Subject: [Bioperl-l] Trimming low quality reads In-Reply-To: <1B27EE5A-86FE-4C74-B4DC-0899881B3A41@illinois.edu> References: <4D5BFE44.80604@gmail.com> <796DBEB4-6788-4AF5-AE81-A17B10E95388@illinois.edu> <1B27EE5A-86FE-4C74-B4DC-0899881B3A41@illinois.edu> Message-ID: hi chris, you're right on fastx_clipped, but i think the tool i'm remembering is called quality filter or something. it does things like filter reads containing Ns, reads with total quality below a threshold etc. as the same suggests, it removes low qual reads, but does not appear to trim per se. -ben On Wed, Feb 16, 2011 at 11:49 AM, Chris Fields wrote: > Ben, > > I haven't used fastx directly, but from the docs I would guess fastx_clipper returns everything that isn't adaptor-only, has just N's, or is above a specified length. ?(e.g. I would assume running 'fastx_clipper -k -n -l 0 ' would return everything). ?Is that not the case? > > chris > > On Feb 16, 2011, at 11:10 AM, Ben Bimber wrote: > >> Fastx is great and we use those tools in a number of places, but If >> I'm not mistaken, doesnt its trimming involve filters their either >> including or excluding the read as a whole, rather than end clipping? >> >> The need probably depends on your data. ?With short reads, I could >> imagine that's what you want. ?With 500bp 454 reads, end clipping is >> nice. ?I ended up making a simple little (and not terribly efficient) >> script that does 3' end clipping. ?My datasets are orders of magnitude >> smaller than what you posted though.... >> >> -Ben >> >> >> >> >> On Wed, Feb 16, 2011 at 10:59 AM, Chris Fields wrote: >>> +1 on using fastx. ?I believe this is what our local seq pipeline uses prior to us sending out the processed stuff. >>> >>> chris >>> >>> On Feb 16, 2011, at 10:41 AM, Jason Stajich wrote: >>> >>>> I would use a faster implementation like the fastx toolkit - http://hannonlab.cshl.edu/fastx_toolkit/ >>>> >>>> There are lots of answers to NGS questions on seqanswers too >>>> http://www.google.com/search?q=site:seqanswers.com+trim >>>> >>>> >>>> Jordi Durban wrote: >>>>> Well, there's a program called Seqtrim that uses bioperl to trim the >>>>> sequences. >>>>> Here more information: >>>>> http://www.scbi.uma.es/cgi-bin/seqtrim/seqtrim_login.cgi >>>>> Hope this helps. >>>>> >>>>> 2011/2/16 shalabh sharma >>>>> >>>>>> Hi, >>>>>> ? ?Is there any bioperl module available to quality trim in fasta-qual >>>>>> ?format. >>>>>> i am little worried about the efficiency as i have huge data (~ 50 gb). >>>>>> Also i would really appreciate if some one has some other suggestions. >>>>>> >>>>>> Thanks >>>>>> Shalabh >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Jason Stajich >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jonathan at leto.net Wed Feb 16 17:02:39 2011 From: jonathan at leto.net (Jonathan "Duke" Leto) Date: Wed, 16 Feb 2011 14:02:39 -0800 Subject: [Bioperl-l] YAPC::NA 2011 is going to be at a resort this year... In-Reply-To: <20110216091218.36a10773.lembark_wrkhors.com@wrkhors.com> References: <20110216091218.36a10773.lembark_wrkhors.com@wrkhors.com> Message-ID: Howdy, I plan on attending this conf and giving some talks relating to Perl + bioinformatics. Perhaps we should organize some kind of hackathon? Duke On Wed, Feb 16, 2011 at 7:12 AM, Steven Lembark wrote: > > Comfortable place to meet, and much cheaper than > any of the Bioinformatics conferences I know about. > > It'd be great to get a track on bioinfrmatics > there. The point is getting talks that are useful > to people who use Perl and BioPerl -- not just > about Perl or Bio::* internals themselves. > > Useful topics include things like how-to talks > about getting work done with Perl, BioPerl, Bio::*, > or even integrating Perl with R. > > The website, including submissions page, are at: > > ? ? > > -- > Steven Lembark ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3646 Flora Pl > Workhorse Computing ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? St Louis, MO 63110 > lembark at wrkhors.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+1 888 359 3508 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jonathan "Duke" Leto jonathan at leto.net http://leto.net From jw12 at sanger.ac.uk Thu Feb 17 09:30:08 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 17 Feb 2011 14:30:08 +0000 Subject: [Bioperl-l] DAS Workshop Registration Closing Soon Message-ID: <0BCCE860-9AEA-4377-A9D6-F28E264DE43A@sanger.ac.uk> Registration closes for the DAS workshop at 5pm this Friday GMT. Limited places still available. Please note that for the tutorials day (Day 1) it is advisable to know at least one of PERL, Java or Javascript. Further information and registration from here: http://www.ebi.ac.uk/training/onsite/110302DAS.html There are still a few places for short talks on the second day if you have anything to talk about of interest to the DAS community. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From bpcwhite at gmail.com Fri Feb 18 09:28:28 2011 From: bpcwhite at gmail.com (Bryan White) Date: Fri, 18 Feb 2011 06:28:28 -0800 (PST) Subject: [Bioperl-l] Append a whole tree to a node Message-ID: Hello, I was wondering if there is built in functionality to append a whole subtree to a node? For instance, I want to replace the Primate node on an order-level mammal tree with a Primate species-level tree. It seems like if there isn't already a function for it, that iterating through add_descendant from Bio::Tree::NodeI would be my best bet. Thanks, Bryan From laurent.frantz at wur.nl Fri Feb 18 12:51:26 2011 From: laurent.frantz at wur.nl (Frantz, Laurent) Date: Fri, 18 Feb 2011 18:51:26 +0100 Subject: [Bioperl-l] PAML Codeml version supported Message-ID: Dear Bioperl Gurus, I have a large set of genes that I need to check separately for dS/dN. I am trying to use the Bioperl wrapper for PAML, however I can not get it to work. I have read that there are problems with the output of newer version of PAML that does not fit to the wrapper. I have tried 4.4, 4.1 and then 3.15 none of those work. Here is my error message (same with all those versions): ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 STACK: Bio::Tools::Phylo::PAML::next_result /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 STACK: paml_gene_parse.pl:26 ---------------------------------------------------------------- Could anyone tell me which of the PAML - Codeml versions gives an output that can be parsed by Bioperl? I am surprised that it does not work with the version 3.15 as it is suppose to (http://www.bioperl.org/wiki/PAML). I am also wondering if my syntax is correct. @files = @ARGV; foreach $files (@files) { my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new ( -params => { 'runmode' => -2, 'seqtype' => 1, } ); $kaks_factory->alignment($files); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result; my $MLmatrix = $result->get_MLmatrix(); print "$MLmatrix\n"; } The files that I input are already formated for Codeml, they work perfectly fine when I feed them myself to Codeml through the config file. I hope someone can help me.. Thank you, Laurent Frantz PhD Student - Comparative Genomics Wageningen University laurent.frantz at wur.nl From sharmashalu.bio at gmail.com Fri Feb 18 16:13:34 2011 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Fri, 18 Feb 2011 16:13:34 -0500 Subject: [Bioperl-l] reading Quality files Message-ID: Hi , Can i read quality files just as the same way i read fasta files (by using Bio::SeqIO) ? I tried reading with using -format => 'fasta' but as the output all the spaces between the quality scores are deleted. ex: >1508:1:1:2228:1817_55bp_47.8_0.74 3939383938393935393939393936383735393533383338382735233838333835381135353838383836173536362735383838353835353638353636293835383238293838403939314040357241596965596364474950225467646314607056636769717175707161732538333520353735382029254040294037392036393229253535371940403537353540402940423137423737223540222227202220 and when i tried to read it with -format => 'qual' i got error message. Can't locate object method "seq" via package "Bio::Seq::PrimaryQual" at readQual.pl line 6, line 1 Do i have to call different methods when using quality files? I will really appreciate your help. Thanks Shalu From florent.angly at gmail.com Fri Feb 18 20:02:17 2011 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 19 Feb 2011 11:02:17 +1000 Subject: [Bioperl-l] reading Quality files In-Reply-To: References: Message-ID: <4D5F1699.7010409@gmail.com> On 19/02/11 07:13, shalu sharma wrote: > Hi , > Can i read quality files just as the same way i read fasta files (by > using Bio::SeqIO) ? > I tried reading with using -format => 'fasta' but as the output all the > spaces between the quality scores are deleted. > ex: >> 1508:1:1:2228:1817_55bp_47.8_0.74 > 3939383938393935393939393936383735393533383338382735233838333835381135353838383836173536362735383838353835353638353636293835383238293838403939314040357241596965596364474950225467646314607056636769717175707161732538333520353735382029254040294037392036393229253535371940403537353540402940423137423737223540222227202220 > Yes, that is not correct. There is a specific IO module for quality scores: Bio::SeqIO::qual > and when i tried to read it with -format => 'qual' That is good: see http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/SeqIO/qual.pm Once you have initialized a Bio::SeqIO object, you can call next_seq() to get the next Bio::Seq::Quality object. > i got error message. > Can't locate object method "seq" via package "Bio::Seq::PrimaryQual" at > readQual.pl line 6, line 1 Once you have a Bio::Seq::Quality object, you can use the qual() method on it. See http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Seq/Quality.pm > Do i have to call different methods when using quality files? > > I will really appreciate your help. > > Thanks > Shalu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mm809 at cam.ac.uk Sat Feb 19 09:53:27 2011 From: mm809 at cam.ac.uk (Mingwei Min) Date: Sat, 19 Feb 2011 14:53:27 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence Message-ID: Hi, I am trying to positioning some post-tranlational modification sites, which is marked in peptides, in a full length protein sequence. Anyone would be kind to tell me the model I could use for this? Many thanks Mingwei From David.Messina at sbc.su.se Sat Feb 19 10:25:17 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 19 Feb 2011 16:25:17 +0100 Subject: [Bioperl-l] PAML Codeml version supported In-Reply-To: References: Message-ID: Hi Laurent, The last version of PAML/CODEML that I personally verified to work with BioPerl was 4.3b. Unfortunately, PAML's author has a bad habit of constantly changing his programs' output format ? and constantly releasing new versions ? so it's difficult for us to keep up. Are you using a relatively recent version of bioperl-live downloaded from github? If so, would you please file this as a bug? Please include with that bug a codeml output file from version 4.4 (assuming that's the version you want to use). http://bugzilla.open-bio.org/ I haven't tested it, but I think your code is fine ? the error you posted is in the parsing code, not in the run-PAML code. Incidentally, I'm surprised 3.15 doesn't work because we have tests specifically for that version that pass AFAIK. But of course tests don't cover everything. Dave On Fri, Feb 18, 2011 at 18:51, Frantz, Laurent wrote: > Dear Bioperl Gurus, > > I have a large set of genes that I need to check separately for dS/dN. I am > trying to use the Bioperl wrapper for PAML, however I can not get it to > work. > I have read that there are problems with the output of newer version of > PAML that does not fit to the wrapper. I have tried 4.4, 4.1 and then 3.15 > none of those work. > > Here is my error message (same with all those versions): > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output did not see seqtype > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > STACK: Bio::Tools::Phylo::PAML::_parse_summary > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > STACK: Bio::Tools::Phylo::PAML::next_result > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > STACK: paml_gene_parse.pl:26 > ---------------------------------------------------------------- > > Could anyone tell me which of the PAML - Codeml versions gives an output > that can be parsed by Bioperl? > I am surprised that it does not work with the version 3.15 as it is suppose > to (http://www.bioperl.org/wiki/PAML). > > I am also wondering if my syntax is correct. > > @files = @ARGV; > > foreach $files (@files) { > my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new > ( -params => { 'runmode' => -2, > 'seqtype' => 1, > } ); > $kaks_factory->alignment($files); > my ($rc,$parser) = $kaks_factory->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > print "$MLmatrix\n"; > } > > The files that I input are already formated for Codeml, they work perfectly > fine when I feed them myself to Codeml through the config file. > > I hope someone can help me.. > Thank you, > > Laurent Frantz > PhD Student - Comparative Genomics > Wageningen University > laurent.frantz at wur.nl > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Sun Feb 20 16:18:50 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 20 Feb 2011 22:18:50 +0100 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: References: Message-ID: Hi Mingwei, I'm not sure what you mean by "positioning" here. Do you want to get the coordinates of the post-translational sites out of a protein sequence database record? Or do you want to draw the post-translational sites on a picture of the protein sequence? Or something else entirely? Dave On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: > Hi, > > I am trying to positioning some post-tranlational modification sites, > which is marked in peptides, in a full length protein sequence. Anyone > would be kind to tell me the model I could use for this? > > Many thanks > > Mingwei > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mm809 at cam.ac.uk Sun Feb 20 16:41:36 2011 From: mm809 at cam.ac.uk (Mingwei Min) Date: Sun, 20 Feb 2011 21:41:36 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: References: Message-ID: Hi Dave, Sorry for not making it clear. Yes, I just want to get the coordinates of the post-translational sites out of a protein sequence. And what I have now is the peptide sequence with marker on the post-translated residue... what should i do to map them to the whole protein sequence and get the coordinates? The only way I could come up with is blast. But it seems to be too complex for this simple job.... Many thanks, Mingwei 2011/2/20 Dave Messina : > Hi Mingwei, > I'm not sure what you mean by "positioning" here. Do you want to get the > coordinates of the post-translational sites out of a protein sequence > database record? Or do you want to draw the post-translational sites on a > picture of the protein sequence? Or something else entirely? > > Dave > > > > On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: >> >> Hi, >> >> I am trying to positioning some post-tranlational modification sites, >> which is marked in peptides, in a full length protein sequence. Anyone >> would be kind to tell me the model I could use for this? >> >> Many thanks >> >> Mingwei >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Sun Feb 20 17:00:33 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 20 Feb 2011 23:00:33 +0100 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: References: Message-ID: Hi Mingwei, Please remember to "reply all" so others on the mailing list can follow the conversation. Unless you have some way of other way of mapping the coordinates of the sequence with the post-translational sites to the coordinates of the full sequence, I think you'll have to do a similarity search of some form. BLAST may not be best for this, given that it's sloppy with the ends of an alignment, but there are plenty of options for BLAST that may improve your results. Again, you'll need to be specific about your problem for us to help. I don't what "too complex for this simple job" means. Is it too slow? Are you getting too many hits? Dave On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: > Hi Dave, > > Sorry for not making it clear. Yes, I just want to get the coordinates > of the post-translational sites out of a protein sequence. And what I > have now is the peptide sequence with marker on the post-translated > residue... what should i do to map them to the whole protein sequence > and get the coordinates? The only way I could come up with is blast. > But it seems to be too complex for this simple job.... > > Many thanks, > > Mingwei > > 2011/2/20 Dave Messina : > > Hi Mingwei, > > I'm not sure what you mean by "positioning" here. Do you want to get the > > coordinates of the post-translational sites out of a protein sequence > > database record? Or do you want to draw the post-translational sites on a > > picture of the protein sequence? Or something else entirely? > > > > Dave > > > > > > > > On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: > >> > >> Hi, > >> > >> I am trying to positioning some post-tranlational modification sites, > >> which is marked in peptides, in a full length protein sequence. Anyone > >> would be kind to tell me the model I could use for this? > >> > >> Many thanks > >> > >> Mingwei > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From mm809 at cam.ac.uk Sun Feb 20 17:28:31 2011 From: mm809 at cam.ac.uk (Mingwei Min) Date: Sun, 20 Feb 2011 22:28:31 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: References: Message-ID: Hi Dave, Thank you for your suggestion. when I said "too comple for this simple job", I just thought that there might be some particular module that could do this straightforwardly. I'll have a try of BLAST anyway. Thank you. Mingwei 2011/2/20 Dave Messina : > Hi Mingwei, > Please remember to "reply all" so others on the mailing list can follow the > conversation. > Unless you have some way of other way of mapping the coordinates of the > sequence with the post-translational sites to the coordinates of the full > sequence, I think you'll have to do a similarity search of some form. > BLAST may not be best for this, given that it's sloppy with the ends of an > alignment, but there are plenty of options for BLAST that may improve your > results.?Again, you'll need to be specific about your problem for us to > help. I don't what "too complex for this simple job" means. Is it too slow? > Are you getting too many hits? > > > Dave > > > On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: >> >> Hi Dave, >> >> Sorry for not making it clear. Yes, I just want to get the coordinates >> of the post-translational sites out of a protein sequence. And what I >> have now is the peptide sequence with marker on the post-translated >> residue... what should i do to map them to the whole protein sequence >> and get the coordinates? The only way I could come up with is blast. >> But it seems to be too complex for this simple job.... >> >> Many thanks, >> >> Mingwei >> >> 2011/2/20 Dave Messina : >> > Hi Mingwei, >> > I'm not sure what you mean by "positioning" here. Do you want to get the >> > coordinates of the post-translational sites out of a protein sequence >> > database record? Or do you want to draw the post-translational sites on >> > a >> > picture of the protein sequence? Or something else entirely? >> > >> > Dave >> > >> > >> > >> > On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: >> >> >> >> Hi, >> >> >> >> I am trying to positioning some post-tranlational modification sites, >> >> which is marked in peptides, in a full length protein sequence. Anyone >> >> would be kind to tell me the model I could use for this? >> >> >> >> Many thanks >> >> >> >> Mingwei >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > > > -- Mingwei Min ?PhD student University of Cambridge Department of Genetics Downing St CB2 3EH UK From cjfields at illinois.edu Sun Feb 20 21:57:44 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 20 Feb 2011 20:57:44 -0600 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: References: Message-ID: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> If this is a direct string match (no ambiguity), just use perl's index function: index STR,SUBSTR,POSITION index STR,SUBSTR The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the base, ordinarily "-1". Also see here: http://perlmeme.org/howtos/perlfunc/index_function.html chris On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote: > Hi Dave, > > Thank you for your suggestion. when I said "too comple for this simple > job", I just thought that there might be some particular module that > could do this straightforwardly. I'll have a try of BLAST anyway. > Thank you. > > Mingwei > > 2011/2/20 Dave Messina : >> Hi Mingwei, >> Please remember to "reply all" so others on the mailing list can follow the >> conversation. >> Unless you have some way of other way of mapping the coordinates of the >> sequence with the post-translational sites to the coordinates of the full >> sequence, I think you'll have to do a similarity search of some form. >> BLAST may not be best for this, given that it's sloppy with the ends of an >> alignment, but there are plenty of options for BLAST that may improve your >> results. Again, you'll need to be specific about your problem for us to >> help. I don't what "too complex for this simple job" means. Is it too slow? >> Are you getting too many hits? >> >> >> Dave >> >> >> On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: >>> >>> Hi Dave, >>> >>> Sorry for not making it clear. Yes, I just want to get the coordinates >>> of the post-translational sites out of a protein sequence. And what I >>> have now is the peptide sequence with marker on the post-translated >>> residue... what should i do to map them to the whole protein sequence >>> and get the coordinates? The only way I could come up with is blast. >>> But it seems to be too complex for this simple job.... >>> >>> Many thanks, >>> >>> Mingwei >>> >>> 2011/2/20 Dave Messina : >>>> Hi Mingwei, >>>> I'm not sure what you mean by "positioning" here. Do you want to get the >>>> coordinates of the post-translational sites out of a protein sequence >>>> database record? Or do you want to draw the post-translational sites on >>>> a >>>> picture of the protein sequence? Or something else entirely? >>>> >>>> Dave >>>> >>>> >>>> >>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am trying to positioning some post-tranlational modification sites, >>>>> which is marked in peptides, in a full length protein sequence. Anyone >>>>> would be kind to tell me the model I could use for this? >>>>> >>>>> Many thanks >>>>> >>>>> Mingwei >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> >> > > > > -- > Mingwei Min PhD student > University of Cambridge > Department of Genetics > Downing St > CB2 3EH > UK > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Feb 21 04:26:21 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 21 Feb 2011 09:26:21 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> References: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> Message-ID: <1298280381.4541.380.camel@deskpro15336.internal.sanger.ac.uk> Hi Mingwei, I guess this is MS data for phosphorylation sites? We are doing the same here. I don't know what software you are using in yuor MS pipeline but it may already map the peptides to the full-length protein for you. If not, you probably get peptide sequences with the probabilities of a site carrying a phosphate (or whatever post-translational modification) encoded in the string, e.g the data I'm working with will show me something like "..LKS[0.99]S[0.01]..." to indicate probabilities of 99% and 1% of those two serines being modified. You then have to extract that data from the peptide string using a regex. Then you can identifiy the most probable site within the string and map the peptide string to the full-length protein sequence using index (or a regex) as Chris suggested. You can then calculate the position of the actual modified site from the match position of the peptide and the position of the site within the peptide. I don't think there is any ready-made solution of this as it is basically just simply string-matching but please do let me knof if you are getting stuck and I can help you further. Cheers, Frank On Sun, 2011-02-20 at 20:57 -0600, Chris Fields wrote: > If this is a direct string match (no ambiguity), just use perl's index function: > > index STR,SUBSTR,POSITION > index STR,SUBSTR > The index function searches for one string within another, but > without the wildcard-like behavior of a full regular-expression > pattern match. It returns the position of the first occurrence > of SUBSTR in STR at or after POSITION. If POSITION is omitted, > starts searching from the beginning of the string. POSITION > before the beginning of the string or after its end is treated > as if it were the beginning or the end, respectively. POSITION > and the return value are based at 0 (or whatever you've set the > $[ variable to--but don't do that). If the substring is not > found, "index" returns one less than the base, ordinarily "-1". > > Also see here: > > http://perlmeme.org/howtos/perlfunc/index_function.html > > chris > > On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote: > > > Hi Dave, > > > > Thank you for your suggestion. when I said "too comple for this simple > > job", I just thought that there might be some particular module that > > could do this straightforwardly. I'll have a try of BLAST anyway. > > Thank you. > > > > Mingwei > > > > 2011/2/20 Dave Messina : > >> Hi Mingwei, > >> Please remember to "reply all" so others on the mailing list can follow the > >> conversation. > >> Unless you have some way of other way of mapping the coordinates of the > >> sequence with the post-translational sites to the coordinates of the full > >> sequence, I think you'll have to do a similarity search of some form. > >> BLAST may not be best for this, given that it's sloppy with the ends of an > >> alignment, but there are plenty of options for BLAST that may improve your > >> results. Again, you'll need to be specific about your problem for us to > >> help. I don't what "too complex for this simple job" means. Is it too slow? > >> Are you getting too many hits? > >> > >> > >> Dave > >> > >> > >> On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: > >>> > >>> Hi Dave, > >>> > >>> Sorry for not making it clear. Yes, I just want to get the coordinates > >>> of the post-translational sites out of a protein sequence. And what I > >>> have now is the peptide sequence with marker on the post-translated > >>> residue... what should i do to map them to the whole protein sequence > >>> and get the coordinates? The only way I could come up with is blast. > >>> But it seems to be too complex for this simple job.... > >>> > >>> Many thanks, > >>> > >>> Mingwei > >>> > >>> 2011/2/20 Dave Messina : > >>>> Hi Mingwei, > >>>> I'm not sure what you mean by "positioning" here. Do you want to get the > >>>> coordinates of the post-translational sites out of a protein sequence > >>>> database record? Or do you want to draw the post-translational sites on > >>>> a > >>>> picture of the protein sequence? Or something else entirely? > >>>> > >>>> Dave > >>>> > >>>> > >>>> > >>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I am trying to positioning some post-tranlational modification sites, > >>>>> which is marked in peptides, in a full length protein sequence. Anyone > >>>>> would be kind to tell me the model I could use for this? > >>>>> > >>>>> Many thanks > >>>>> > >>>>> Mingwei > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >> > >> > > > > > > > > -- > > Mingwei Min PhD student > > University of Cambridge > > Department of Genetics > > Downing St > > CB2 3EH > > UK > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From shalabh.sharma7 at gmail.com Mon Feb 21 09:44:33 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 21 Feb 2011 09:44:33 -0500 Subject: [Bioperl-l] fasta to fastq Message-ID: Hi, Is there any module that can convert fasta and qual files to FASTQ format? I found lot of programs that can do it the other way but not fasta to fastq. I would really appreciate your help. Thanks Shalabh From p.j.a.cock at googlemail.com Mon Feb 21 10:13:01 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 21 Feb 2011 15:13:01 +0000 Subject: [Bioperl-l] fasta to fastq In-Reply-To: References: Message-ID: On Mon, Feb 21, 2011 at 2:44 PM, shalabh sharma wrote: > Hi, > ? ?Is there any module that can convert fasta and qual files to FASTQ > format? > I found lot of programs that can do it the other way but not fasta to fastq. > > I would really appreciate your help. > > Thanks > Shalabh Hi, Yes, you can do FASTA+QUAL to FASTQ with BioPerl, see: http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ For comparison the Biopython Tutorial has a Python example (section "Converting FASTA and QUAL files into FASTQ files"): http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Peter From awitney at sgul.ac.uk Mon Feb 21 10:06:50 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 21 Feb 2011 15:06:50 +0000 Subject: [Bioperl-l] fasta to fastq In-Reply-To: References: Message-ID: <69FFF792-E3D2-4BC1-B628-3B161434112B@sgul.ac.uk> I've never tried it, but this looks like it will do what you ask for... http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ adam On 21 Feb 2011, at 14:44, shalabh sharma wrote: > Hi, > Is there any module that can convert fasta and qual files to FASTQ > format? > I found lot of programs that can do it the other way but not fasta to fastq. > > I would really appreciate your help. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Feb 21 11:14:38 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 21 Feb 2011 11:14:38 -0500 Subject: [Bioperl-l] fasta to fastq In-Reply-To: References: Message-ID: Hi, @Adam, Peter. Thanks a lot , "Merging with Bioperl worked". Actually i was looking exactly something like this. Thanks Shalabh On Mon, Feb 21, 2011 at 10:13 AM, Peter Cock wrote: > On Mon, Feb 21, 2011 at 2:44 PM, shalabh sharma > wrote: > > Hi, > > Is there any module that can convert fasta and qual files to FASTQ > > format? > > I found lot of programs that can do it the other way but not fasta to > fastq. > > > > I would really appreciate your help. > > > > Thanks > > Shalabh > > Hi, > > Yes, you can do FASTA+QUAL to FASTQ with BioPerl, see: > > http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ > > For comparison the Biopython Tutorial has a Python example > (section "Converting FASTA and QUAL files into FASTQ files"): > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Peter > From mm809 at cam.ac.uk Mon Feb 21 15:23:20 2011 From: mm809 at cam.ac.uk (Mingwei Min) Date: Mon, 21 Feb 2011 20:23:20 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: <1298280381.4541.380.camel@deskpro15336.internal.sanger.ac.uk> References: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> <1298280381.4541.380.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Hi Frank, Yes, this is MS data for phosphorylation sites, as well as ubiquitinlytion sites. I am now dealing with some published data--- some of which only show the peptide(and yes, some with the modifying probabilities) without coordinates in the sequence. The way Chris suggested is already very straightforward. I'll have a try. Thank you very much for your help. Cheer, Mingwei 2011/2/21 Frank Schwach : > Hi Mingwei, > > I guess this is MS data for phosphorylation sites? We are doing the same > here. I don't know what software you are using in yuor MS pipeline but > it may already map the peptides to the full-length protein for you. If > not, you probably get peptide sequences with the probabilities of a site > carrying a phosphate (or whatever post-translational modification) > encoded in the string, e.g the data I'm working with will show me > something like "..LKS[0.99]S[0.01]..." to indicate probabilities of 99% > and 1% of those two serines being modified. You then have to extract > that data from the peptide string using a regex. Then you can identifiy > the most probable site within the string and map the peptide string to > the full-length protein sequence using index (or a regex) as Chris > suggested. You can then calculate the position of the actual modified > site from the match position of the peptide and the position of the site > within the peptide. I don't think there is any ready-made solution of > this as it is basically just simply string-matching but please do let me > knof if you are getting stuck and I can help you further. > > Cheers, > > Frank > > > > On Sun, 2011-02-20 at 20:57 -0600, Chris Fields wrote: >> If this is a direct string match (no ambiguity), just use perl's index function: >> >> ? ? ? ?index STR,SUBSTR,POSITION >> ? ? ? ?index STR,SUBSTR >> ? ? ? ? ? ? ? ?The index function searches for one string within another, but >> ? ? ? ? ? ? ? ?without the wildcard-like behavior of a full regular-expression >> ? ? ? ? ? ? ? ?pattern match. ?It returns the position of the first occurrence >> ? ? ? ? ? ? ? ?of SUBSTR in STR at or after POSITION. ?If POSITION is omitted, >> ? ? ? ? ? ? ? ?starts searching from the beginning of the string. ?POSITION >> ? ? ? ? ? ? ? ?before the beginning of the string or after its end is treated >> ? ? ? ? ? ? ? ?as if it were the beginning or the end, respectively. ?POSITION >> ? ? ? ? ? ? ? ?and the return value are based at 0 (or whatever you've set the >> ? ? ? ? ? ? ? ?$[ variable to--but don't do that). ?If the substring is not >> ? ? ? ? ? ? ? ?found, "index" returns one less than the base, ordinarily "-1". >> >> Also see here: >> >> http://perlmeme.org/howtos/perlfunc/index_function.html >> >> chris >> >> On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote: >> >> > Hi Dave, >> > >> > Thank you for your suggestion. when I said "too comple for this simple >> > job", I just thought that there might be some particular module that >> > could do this straightforwardly. I'll have a try of BLAST anyway. >> > Thank you. >> > >> > Mingwei >> > >> > 2011/2/20 Dave Messina : >> >> Hi Mingwei, >> >> Please remember to "reply all" so others on the mailing list can follow the >> >> conversation. >> >> Unless you have some way of other way of mapping the coordinates of the >> >> sequence with the post-translational sites to the coordinates of the full >> >> sequence, I think you'll have to do a similarity search of some form. >> >> BLAST may not be best for this, given that it's sloppy with the ends of an >> >> alignment, but there are plenty of options for BLAST that may improve your >> >> results. Again, you'll need to be specific about your problem for us to >> >> help. I don't what "too complex for this simple job" means. Is it too slow? >> >> Are you getting too many hits? >> >> >> >> >> >> Dave >> >> >> >> >> >> On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: >> >>> >> >>> Hi Dave, >> >>> >> >>> Sorry for not making it clear. Yes, I just want to get the coordinates >> >>> of the post-translational sites out of a protein sequence. And what I >> >>> have now is the peptide sequence with marker on the post-translated >> >>> residue... what should i do to map them to the whole protein sequence >> >>> and get the coordinates? The only way I could come up with is blast. >> >>> But it seems to be too complex for this simple job.... >> >>> >> >>> Many thanks, >> >>> >> >>> Mingwei >> >>> >> >>> 2011/2/20 Dave Messina : >> >>>> Hi Mingwei, >> >>>> I'm not sure what you mean by "positioning" here. Do you want to get the >> >>>> coordinates of the post-translational sites out of a protein sequence >> >>>> database record? Or do you want to draw the post-translational sites on >> >>>> a >> >>>> picture of the protein sequence? Or something else entirely? >> >>>> >> >>>> Dave >> >>>> >> >>>> >> >>>> >> >>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> I am trying to positioning some post-tranlational modification sites, >> >>>>> which is marked in peptides, in a full length protein sequence. Anyone >> >>>>> would be kind to tell me the model I could use for this? >> >>>>> >> >>>>> Many thanks >> >>>>> >> >>>>> Mingwei >> >>>>> _______________________________________________ >> >>>>> Bioperl-l mailing list >> >>>>> Bioperl-l at lists.open-bio.org >> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>>> >> >>>> >> >> >> >> >> > >> > >> > >> > -- >> > Mingwei Min ?PhD student >> > University of Cambridge >> > Department of Genetics >> > Downing St >> > CB2 3EH >> > UK >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ?The Wellcome Trust Sanger Institute is operated by Genome Research > ?Limited, a charity registered in England with number 1021457 and a > ?company registered in England with number 2742969, whose registered > ?office is 215 Euston Road, London, NW1 2BE. From snamln at unife.it Mon Feb 21 15:25:41 2011 From: snamln at unife.it (Maria elena Sana) Date: Mon, 21 Feb 2011 21:25:41 +0100 Subject: [Bioperl-l] bio::Db::sam bioperl version Message-ID: Hi All, I'm using a perl script with Bioperl /Bio:DB::Sam object but it runs differently on two server on which there are two different perl version (5.8.8 and 5.10.1 respectively). On the first machine on which there is the 5.8.8 perl version, the script runs correctly, on the second it doesn't return errors but doesn't work perfectly. Any elucidation? I have specifically problem with calculation of query position in pileup routine (Bio::DB::Bam::PILEUP) Thanx a lot -- Dott.ssa Maria Elena Sana Universit? di Ferrara Dipartimento di Morfologia ed Embriologia Via Fossato di Mortara, 70 (Viale Eliporto presso CUBO) 44121 - Ferrara Italy Phone 0532-455851 Mail mariaelena.sana at unife.it From mm809 at cam.ac.uk Mon Feb 21 15:11:59 2011 From: mm809 at cam.ac.uk (Mingwei Min) Date: Mon, 21 Feb 2011 20:11:59 +0000 Subject: [Bioperl-l] question about positioning peptide in a full protein sequence In-Reply-To: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> References: <43195C0D-159E-4658-B8ED-6F6E76F37F61@illinois.edu> Message-ID: Many thanks, Chris, that's exactly what I need. Mingwei 2011/2/21 Chris Fields : > If this is a direct string match (no ambiguity), just use perl's index function: > > ? ? ? index STR,SUBSTR,POSITION > ? ? ? index STR,SUBSTR > ? ? ? ? ? ? ? The index function searches for one string within another, but > ? ? ? ? ? ? ? without the wildcard-like behavior of a full regular-expression > ? ? ? ? ? ? ? pattern match. ?It returns the position of the first occurrence > ? ? ? ? ? ? ? of SUBSTR in STR at or after POSITION. ?If POSITION is omitted, > ? ? ? ? ? ? ? starts searching from the beginning of the string. ?POSITION > ? ? ? ? ? ? ? before the beginning of the string or after its end is treated > ? ? ? ? ? ? ? as if it were the beginning or the end, respectively. ?POSITION > ? ? ? ? ? ? ? and the return value are based at 0 (or whatever you've set the > ? ? ? ? ? ? ? $[ variable to--but don't do that). ?If the substring is not > ? ? ? ? ? ? ? found, "index" returns one less than the base, ordinarily "-1". > > Also see here: > > http://perlmeme.org/howtos/perlfunc/index_function.html > > chris > > On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote: > >> Hi Dave, >> >> Thank you for your suggestion. when I said "too comple for this simple >> job", I just thought that there might be some particular module that >> could do this straightforwardly. I'll have a try of BLAST anyway. >> Thank you. >> >> Mingwei >> >> 2011/2/20 Dave Messina : >>> Hi Mingwei, >>> Please remember to "reply all" so others on the mailing list can follow the >>> conversation. >>> Unless you have some way of other way of mapping the coordinates of the >>> sequence with the post-translational sites to the coordinates of the full >>> sequence, I think you'll have to do a similarity search of some form. >>> BLAST may not be best for this, given that it's sloppy with the ends of an >>> alignment, but there are plenty of options for BLAST that may improve your >>> results. Again, you'll need to be specific about your problem for us to >>> help. I don't what "too complex for this simple job" means. Is it too slow? >>> Are you getting too many hits? >>> >>> >>> Dave >>> >>> >>> On Sun, Feb 20, 2011 at 22:35, Mingwei Min wrote: >>>> >>>> Hi Dave, >>>> >>>> Sorry for not making it clear. Yes, I just want to get the coordinates >>>> of the post-translational sites out of a protein sequence. And what I >>>> have now is the peptide sequence with marker on the post-translated >>>> residue... what should i do to map them to the whole protein sequence >>>> and get the coordinates? The only way I could come up with is blast. >>>> But it seems to be too complex for this simple job.... >>>> >>>> Many thanks, >>>> >>>> Mingwei >>>> >>>> 2011/2/20 Dave Messina : >>>>> Hi Mingwei, >>>>> I'm not sure what you mean by "positioning" here. Do you want to get the >>>>> coordinates of the post-translational sites out of a protein sequence >>>>> database record? Or do you want to draw the post-translational sites on >>>>> a >>>>> picture of the protein sequence? Or something else entirely? >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to positioning some post-tranlational modification sites, >>>>>> which is marked in peptides, in a full length protein sequence. Anyone >>>>>> would be kind to tell me the model I could use for this? >>>>>> >>>>>> Many thanks >>>>>> >>>>>> Mingwei >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>> >>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Mon Feb 21 15:30:39 2011 From: scott at scottcain.net (Scott Cain) Date: Mon, 21 Feb 2011 15:30:39 -0500 Subject: [Bioperl-l] bio::Db::sam bioperl version In-Reply-To: References: Message-ID: Hi Maria, I have no idea of the version of perl is an issue, but I would more likely expect versions of either Bio::DB::Sam or BioPerl are different. If you are running BioPerl from a git checkout for example, it could easily vary between the two servers. In past experience with people having similar problems with GBrowse, that is usually what the problem is. Scott On Mon, Feb 21, 2011 at 3:25 PM, Maria elena Sana wrote: > Hi All, > I'm using a perl script with Bioperl /Bio:DB::Sam object but it runs > differently on two server on which there are two different perl > version (5.8.8 and 5.10.1 respectively). On the first machine on which > there is the 5.8.8 perl version, the script runs correctly, on the > second it doesn't return errors but doesn't work perfectly. Any > elucidation? > I have specifically problem with calculation of query position in pileup > routine (Bio::DB::Bam::PILEUP) > Thanx a lot > > -- > Dott.ssa Maria Elena Sana > Universit? di Ferrara > Dipartimento di Morfologia ed Embriologia > Via Fossato di Mortara, 70 > (Viale Eliporto presso CUBO) > 44121 - Ferrara > Italy Phone 0532-455851 > Mail mariaelena.sana at unife.it > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From comp_sea at yahoo.com Wed Feb 23 11:21:01 2011 From: comp_sea at yahoo.com (Syed Mustafa Hussain) Date: Wed, 23 Feb 2011 08:21:01 -0800 (PST) Subject: [Bioperl-l] Can't locate object method "attributes" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line 703, line 192. Message-ID: <700273.28443.qm@web33605.mail.mud.yahoo.com> > Hi, > > We had recently updated BioPerl and Bio::Graphics and found > some applications not working properly. As an example simple > graphics script like: > > ##################################################### > > use Bio::Graphics::Panel; > use Bio::SeqFeature::Generic; > > use CGI;? ? ? # or any other CGI:: form > handler/decoder > > print "Content-type: text/html\n\n"; > > my $panel = Bio::Graphics::Panel->new(-length => > 700, > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? -width? => 700 > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ); > > my $track = $panel->add_track(-glyph? ? > ???=> 'generic', > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? > -label? ? ???=> 1 > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ); > > my $feature = Bio::SeqFeature::Generic->new(-start? > ? ? ? => 1, > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? -end? ? > ? ? ? => 400 > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ); > $track->add_feature($feature); > > print "TEST IMAGE:
"; > > open GRAPH, "> /srv/www/htdocs/tmp/test.png" or die > "could not open image file"; > print GRAPH $panel->png; > close(GRAPH); > > print ""; > print ""; > > ##################################################### > > is giving following error when I debug: > > ? DB<1> n > Can't locate object method "attributes" via package > "Bio::SeqFeature::Generic" at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line > 703. > at /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm > line 703 > ? ? ? ? > Bio::Graphics::Glyph::bgcolor('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)') > called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line > 1299 > ? ? ? ? > Bio::Graphics::Glyph::filled_box('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 400, 7) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line > 1471 > ? ? ? ? > Bio::Graphics::Glyph::draw_component('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/generic.pm > line 347 > ? ? ? ? > Bio::Graphics::Glyph::generic::draw_component('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph.pm line > 1050 > ? ? ? ? > Bio::Graphics::Glyph::draw('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/generic.pm > line 338 > ? ? ? ? > Bio::Graphics::Glyph::generic::draw('Bio::Graphics::Glyph::generic=HASH(0x1d62f90)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Glyph/track.pm > line 35 > ? ? ? ? > Bio::Graphics::Glyph::track::draw('Bio::Graphics::Glyph::track=HASH(0x1d5bb10)', > 'GD::Image=SCALAR(0x2002460)', 0, 0, 0, 1) called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Panel.pm line > 588 > ? ? ? ? > Bio::Graphics::Panel::gd('Bio::Graphics::Panel=HASH(0x1b1f6b0)') > called at > /usr/lib/perl5/site_perl/5.8.8/Bio/Graphics/Panel.pm line > 1067 > ? ? ? ? > Bio::Graphics::Panel::png('Bio::Graphics::Panel=HASH(0x1b1f6b0)') > called at test.cgi line 37 > Debugged program terminated.? Use q to quit or R to > restart, > ? use o inhibit_exit to avoid stopping after program > termination, > ? h q, h R or h o to get additional info. > > ##################################################### > > Is it because of some incompatibility between version of > bioperl and bio::Graphics or some thing else? > > Thanks, > Mustafa. > > > > ____________________________________________________________________________________ > Finding fabulous fares is fun.? > Let Yahoo! FareChase search your favorite travel sites to > find flight and hotel bargains. > http://farechase.yahoo.com/promo-generic-14795097 > From chapmanb at 50mail.com Thu Feb 24 13:25:48 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 24 Feb 2011 13:25:48 -0500 Subject: [Bioperl-l] BOSC 2011 topic organizers and Codefest Message-ID: <20110224182548.GK20125@sobchak.mgh.harvard.edu> Hi all; This year the Bioinformatics Open Source Conference (BOSC) will be taking place in Vienna, Austria on July 15-16th. This is a yearly opportunity for open source bioinformatics developers to get together in person and discuss on-going projects. Nomi Harris, Peter Rice and the other organizing committee members are already hard at work planning for the conference: http://www.open-bio.org/wiki/BOSC_2011 The call for abstracts opens next Monday, and extends through April 18th, and we've been brainstorming potential session topics. This year we've tried to focus each of the sessions around a particular biological problem or computational approach. We hope this will draw some interesting parallels between work being done in different groups, and encourage even more collaboration. We are actively looking for community members who are interested in heading up the organization of a topic. The general idea is to build a cohesive set of talks within a session. How you'd like to do this is completely flexible but some of the ideas we've been discussing are: - Having a short introductory talk to provide an overview of an area, framing the different talks within this context. - Forgoing individual question/answer and instead combining this time into a longer panel-style discussion with all of the speakers. This would help stimulate back and forth between the different projects and the audience. If you are interested in a particular topic and would like to help with the organization, please send an e-mail to the BOSC mailing list: bosc at lists.open-bio.org. We're also open to new topic suggestions, and will look to add one or two more topics to our current list. Finally, there will be a two day coding session prior to BOSC as a follow up to last year's fun and productive Codefest: http://www.open-bio.org/wiki/Codefest_2011 The Metalab, a unique hacker space in Vienna, has kindly agreed to host us for the two days. If you are at all interested, please add your name to the attendees list on the wiki. Since the Metalab organizers don't know us personally, we'd like to demonstrate there is interest and that we'll really show up with a bunch of bioinformatics hackers. More details will be in the works as the summer draws closer. Looking forward to the sound of music, Brad From dan.kortschak at adelaide.edu.au Sun Feb 27 21:52:50 2011 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 28 Feb 2011 13:22:50 +1030 Subject: [Bioperl-l] Bio::DB::Sam Message-ID: <1298861570.30299.66.camel@sueno> Sorry for a post not strictly bioperl related, but I figured this forum will get the best appropriate coverage and Lincoln is often here. I have just tried to install Bio::DB::Sam on a new server I have been tasked with setting up, but it fails to build with: cpan[2]> install LDS/Bio-SamTools-1.27.tar.gz Running make for L/LD/LDS/Bio-SamTools-1.27.tar.gz CPAN: checksum security checks disabled because Digest::SHA not installed. Please consider installing the Digest::SHA module. Scanning cache /root/.cpan/build for sizes ............................................................................DONE CPAN: Archive::Tar loaded ok (v1.76) Bio-SamTools-1.27 Bio-SamTools-1.27/Changes Bio-SamTools-1.27/LICENSE Bio-SamTools-1.27/DISCLAIMER Bio-SamTools-1.27/Build.PL Bio-SamTools-1.27/typemap Bio-SamTools-1.27/META.yml Bio-SamTools-1.27/README Bio-SamTools-1.27/MANIFEST Bio-SamTools-1.27/bin Bio-SamTools-1.27/bin/bamToGBrowse.pl Bio-SamTools-1.27/lib Bio-SamTools-1.27/lib/Bio Bio-SamTools-1.27/lib/Bio/DB Bio-SamTools-1.27/lib/Bio/DB/Sam.pm Bio-SamTools-1.27/lib/Bio/DB/Sam.xs Bio-SamTools-1.27/lib/Bio/DB/Sam Bio-SamTools-1.27/lib/Bio/DB/Sam/SamToGBrowse.pm Bio-SamTools-1.27/lib/Bio/DB/Sam/Constants.pm Bio-SamTools-1.27/lib/Bio/DB/Sam/Segment.pm Bio-SamTools-1.27/lib/Bio/DB/Bam Bio-SamTools-1.27/lib/Bio/DB/Bam/Alignment.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/Target.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/FetchIterator.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/PileupWrapper.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/AlignWrapper.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/Query.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/Pileup.pm Bio-SamTools-1.27/lib/Bio/DB/Bam/ReadIterator.pm Bio-SamTools-1.27/t Bio-SamTools-1.27/t/01sam.t Bio-SamTools-1.27/t/data Bio-SamTools-1.27/t/data/ex1.sam.gz Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam.bam Bio-SamTools-1.27/t/data/ex1.bam Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam Bio-SamTools-1.27/t/data/00README.txt Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam.sorted.bam Bio-SamTools-1.27/t/data/ex1.fa CPAN: File::Temp loaded ok (v0.22) CPAN.pm: Going to build L/LD/LDS/Bio-SamTools-1.27.tar.gz This module requires samtools 0.1.9 or higher (samtools.sourceforge.net). Can't ioctl TIOCGETP: Invalid argument Consider installing Term::ReadKey from CPAN site nearby at http://www.perl.com/CPAN Or use perl -MCPAN -e shell to reach CPAN. Falling back to 'stty'. If you do not want to see this warning, set PERL_READLINE_NOWARN in your environment. Please enter the location of the bam.h and compiled libbam.a files: /usr/local/src/samtools_latest Found /usr/local/src/samtools_latest/bam.h and /usr/local/src/samtools_latest/libbam.a. Creating new 'MYMETA.yml' with configuration results Creating new 'Build' script for 'Bio-SamTools' version '1.27' CPAN: Module::Build loaded ok (v0.3624) Building Bio-SamTools gcc -I/usr/local/src/samtools_latest -I/usr/lib64/perl5/CORE -DXS_VERSION="1.27" -DVERSION="1.27" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wformat=0 -c -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -o lib/Bio/DB/Sam.o lib/Bio/DB/Sam.c lib/Bio/DB/Sam.xs: In function ?invoke_pileup_callback_fun?: lib/Bio/DB/Sam.xs:106: warning: unused variable ?pileup_obj? lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam_open?: lib/Bio/DB/Sam.c:530: warning: unused variable ?packname? lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam_index_build?: lib/Bio/DB/Sam.c:590: warning: unused variable ?packname? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam_sort_core?: lib/Bio/DB/Sam.xs:306: warning: implicit declaration of function ?bam_sort_core? lib/Bio/DB/Sam.c:614: warning: unused variable ?packname? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam_tell?: lib/Bio/DB/Sam.xs:350: warning: format ?%llu? expects type ?long long unsigned int?, but argument 3 has type ?int64_t? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_qseq?: lib/Bio/DB/Sam.xs:500: warning: operation on ?seq? may be undefined lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment__qscore?: lib/Bio/DB/Sam.xs:514: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_aux?: lib/Bio/DB/Sam.xs:599: warning: pointer targets in passing argument 2 of ?strncat? differ in signedness /usr/include/bits/string3.h:151: note: expected ?const char * __restrict__? but argument is of type ?uint8_t *? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_aux_keys?: lib/Bio/DB/Sam.xs:671: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_data?: lib/Bio/DB/Sam.xs:694: warning: pointer targets in assignment differ in signedness lib/Bio/DB/Sam.xs:697: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Header_view1?: lib/Bio/DB/Sam.xs:900: warning: implicit declaration of function ?bam_view1? lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam__Index_coverage?: lib/Bio/DB/Sam.xs:983: warning: unused variable ?cov? lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Pileup_is_refskip?: lib/Bio/DB/Sam.xs:1078: error: ?bam_pileup1_t? has no member named ?is_refskip? error building lib/Bio/DB/Sam.o from 'lib/Bio/DB/Sam.c' at /usr/local/share/perl5/ExtUtils/CBuilder/Base.pm line 175. LDS/Bio-SamTools-1.27.tar.gz ./Build -- NOT OK Running Build test Can't test without successful make Running Build install Make had returned bad status, install seems impossible Failed during this command: LDS/Bio-SamTools-1.27.tar.gz : make NO Has anyone else seen this problem or have any idea about how to resolve it? thanks Dan Kortschak From cjfields at illinois.edu Sun Feb 27 22:08:46 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 27 Feb 2011 21:08:46 -0600 Subject: [Bioperl-l] Bio::DB::Sam In-Reply-To: <1298861570.30299.66.camel@sueno> References: <1298861570.30299.66.camel@sueno> Message-ID: <05DFC1BC-A13E-4E89-8571-E9163BCB73F3@illinois.edu> Check the README; my guess is you need to compile samtools with the -fPIC flag: http://cpansearch.perl.org/src/LDS/Bio-SamTools-1.27/README Could also be an out-of-date version of samtools itself, the latest Bio-Samtools requires v.0.1.9 or higher. chris On Feb 27, 2011, at 8:52 PM, Dan Kortschak wrote: > Sorry for a post not strictly bioperl related, but I figured this forum > will get the best appropriate coverage and Lincoln is often here. > > I have just tried to install Bio::DB::Sam on a new server I have been > tasked with setting up, but it fails to build with: > > cpan[2]> install LDS/Bio-SamTools-1.27.tar.gz > Running make for L/LD/LDS/Bio-SamTools-1.27.tar.gz > > CPAN: checksum security checks disabled because Digest::SHA not installed. > Please consider installing the Digest::SHA module. > > Scanning cache /root/.cpan/build for sizes > ............................................................................DONE > CPAN: Archive::Tar loaded ok (v1.76) > Bio-SamTools-1.27 > Bio-SamTools-1.27/Changes > Bio-SamTools-1.27/LICENSE > Bio-SamTools-1.27/DISCLAIMER > Bio-SamTools-1.27/Build.PL > Bio-SamTools-1.27/typemap > Bio-SamTools-1.27/META.yml > Bio-SamTools-1.27/README > Bio-SamTools-1.27/MANIFEST > Bio-SamTools-1.27/bin > Bio-SamTools-1.27/bin/bamToGBrowse.pl > Bio-SamTools-1.27/lib > Bio-SamTools-1.27/lib/Bio > Bio-SamTools-1.27/lib/Bio/DB > Bio-SamTools-1.27/lib/Bio/DB/Sam.pm > Bio-SamTools-1.27/lib/Bio/DB/Sam.xs > Bio-SamTools-1.27/lib/Bio/DB/Sam > Bio-SamTools-1.27/lib/Bio/DB/Sam/SamToGBrowse.pm > Bio-SamTools-1.27/lib/Bio/DB/Sam/Constants.pm > Bio-SamTools-1.27/lib/Bio/DB/Sam/Segment.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam > Bio-SamTools-1.27/lib/Bio/DB/Bam/Alignment.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/Target.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/FetchIterator.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/PileupWrapper.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/AlignWrapper.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/Query.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/Pileup.pm > Bio-SamTools-1.27/lib/Bio/DB/Bam/ReadIterator.pm > Bio-SamTools-1.27/t > Bio-SamTools-1.27/t/01sam.t > Bio-SamTools-1.27/t/data > Bio-SamTools-1.27/t/data/ex1.sam.gz > Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam.bam > Bio-SamTools-1.27/t/data/ex1.bam > Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam > Bio-SamTools-1.27/t/data/00README.txt > Bio-SamTools-1.27/t/data/dm3_3R_4766911_4767130.sam.sorted.bam > Bio-SamTools-1.27/t/data/ex1.fa > CPAN: File::Temp loaded ok (v0.22) > > CPAN.pm: Going to build L/LD/LDS/Bio-SamTools-1.27.tar.gz > > This module requires samtools 0.1.9 or higher (samtools.sourceforge.net). > Can't ioctl TIOCGETP: Invalid argument > Consider installing Term::ReadKey from CPAN site nearby > at http://www.perl.com/CPAN > Or use > perl -MCPAN -e shell > to reach CPAN. Falling back to 'stty'. > If you do not want to see this warning, set PERL_READLINE_NOWARN > in your environment. > Please enter the location of the bam.h and compiled libbam.a files: /usr/local/src/samtools_latest > > Found /usr/local/src/samtools_latest/bam.h and /usr/local/src/samtools_latest/libbam.a. > Creating new 'MYMETA.yml' with configuration results > Creating new 'Build' script for 'Bio-SamTools' version '1.27' > CPAN: Module::Build loaded ok (v0.3624) > Building Bio-SamTools > gcc -I/usr/local/src/samtools_latest -I/usr/lib64/perl5/CORE -DXS_VERSION="1.27" -DVERSION="1.27" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wformat=0 -c -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -o lib/Bio/DB/Sam.o lib/Bio/DB/Sam.c > lib/Bio/DB/Sam.xs: In function ?invoke_pileup_callback_fun?: > lib/Bio/DB/Sam.xs:106: warning: unused variable ?pileup_obj? > lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam_open?: > lib/Bio/DB/Sam.c:530: warning: unused variable ?packname? > lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam_index_build?: > lib/Bio/DB/Sam.c:590: warning: unused variable ?packname? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam_sort_core?: > lib/Bio/DB/Sam.xs:306: warning: implicit declaration of function ?bam_sort_core? > lib/Bio/DB/Sam.c:614: warning: unused variable ?packname? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam_tell?: > lib/Bio/DB/Sam.xs:350: warning: format ?%llu? expects type ?long long unsigned int?, but argument 3 has type ?int64_t? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_qseq?: > lib/Bio/DB/Sam.xs:500: warning: operation on ?seq? may be undefined > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment__qscore?: > lib/Bio/DB/Sam.xs:514: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness > /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_aux?: > lib/Bio/DB/Sam.xs:599: warning: pointer targets in passing argument 2 of ?strncat? differ in signedness > /usr/include/bits/string3.h:151: note: expected ?const char * __restrict__? but argument is of type ?uint8_t *? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_aux_keys?: > lib/Bio/DB/Sam.xs:671: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness > /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Alignment_data?: > lib/Bio/DB/Sam.xs:694: warning: pointer targets in assignment differ in signedness > lib/Bio/DB/Sam.xs:697: warning: pointer targets in passing argument 2 of ?Perl_newSVpv? differ in signedness > /usr/lib64/perl5/CORE/proto.h:2210: note: expected ?const char *? but argument is of type ?uint8_t *? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Header_view1?: > lib/Bio/DB/Sam.xs:900: warning: implicit declaration of function ?bam_view1? > lib/Bio/DB/Sam.c: In function ?XS_Bio__DB__Bam__Index_coverage?: > lib/Bio/DB/Sam.xs:983: warning: unused variable ?cov? > lib/Bio/DB/Sam.xs: In function ?XS_Bio__DB__Bam__Pileup_is_refskip?: > lib/Bio/DB/Sam.xs:1078: error: ?bam_pileup1_t? has no member named ?is_refskip? > error building lib/Bio/DB/Sam.o from 'lib/Bio/DB/Sam.c' at /usr/local/share/perl5/ExtUtils/CBuilder/Base.pm line 175. > LDS/Bio-SamTools-1.27.tar.gz > ./Build -- NOT OK > Running Build test > Can't test without successful make > Running Build install > Make had returned bad status, install seems impossible > Failed during this command: > LDS/Bio-SamTools-1.27.tar.gz : make NO > > Has anyone else seen this problem or have any idea about how to resolve > it? > > thanks > Dan Kortschak > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Sun Feb 27 22:30:14 2011 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 28 Feb 2011 14:00:14 +1030 Subject: [Bioperl-l] Bio::DB::Sam [resolved] In-Reply-To: <05DFC1BC-A13E-4E89-8571-E9163BCB73F3@illinois.edu> References: <1298861570.30299.66.camel@sueno> <05DFC1BC-A13E-4E89-8571-E9163BCB73F3@illinois.edu> Message-ID: <1298863814.30299.82.camel@sueno> Version discordance - it's been a long time since I built this. Thanks Chris. Dan On Sun, 2011-02-27 at 21:08 -0600, Chris Fields wrote: > Check the README; my guess is you need to compile samtools with the -fPIC flag: > > http://cpansearch.perl.org/src/LDS/Bio-SamTools-1.27/README > > Could also be an out-of-date version of samtools itself, the latest Bio-Samtools requires v.0.1.9 or higher. > > chris From awitney at sgul.ac.uk Mon Feb 28 06:06:42 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 28 Feb 2011 11:06:42 +0000 Subject: [Bioperl-l] building NGS pipeline Message-ID: <86D567CE-1735-4D85-B40F-B67823749FB7@sgul.ac.uk> Hi, I'm trying to put together a set of steps to runs some analysis on NGS data we have. I have found modules that wrap alignment software such as bowtie/bwa/maq, but are there any packages to calculate RPKM's etc? what are people using for this? thanks for any help adam From sdavis2 at mail.nih.gov Mon Feb 28 07:04:32 2011 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 28 Feb 2011 07:04:32 -0500 Subject: [Bioperl-l] building NGS pipeline In-Reply-To: <86D567CE-1735-4D85-B40F-B67823749FB7@sgul.ac.uk> References: <86D567CE-1735-4D85-B40F-B67823749FB7@sgul.ac.uk> Message-ID: On Mon, Feb 28, 2011 at 6:06 AM, Adam Witney wrote: > Hi, > > I'm trying to put together a set of steps to runs some analysis on NGS data > we have. I have found modules that wrap alignment software such as > bowtie/bwa/maq, but are there any packages to calculate RPKM's etc? what are > people using for this? > > scripture, cufflinks/cuffdiff, bioconductor (GenomicFeatures package, for example), ERANGE (wold lab), and several others. I don't know of perl wrappers for these, but they are all command-line applications, generally speaking. This is an interesting site to follow for RNA-seq analysis and applications: http://rna-seqblog.com/ Sean From cjfields at illinois.edu Mon Feb 28 10:03:18 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 28 Feb 2011 09:03:18 -0600 Subject: [Bioperl-l] building NGS pipeline In-Reply-To: References: <86D567CE-1735-4D85-B40F-B67823749FB7@sgul.ac.uk> Message-ID: <0FF71A5D-F5F6-4AAD-9501-4F4911DB40D8@illinois.edu> On Feb 28, 2011, at 6:04 AM, Sean Davis wrote: > On Mon, Feb 28, 2011 at 6:06 AM, Adam Witney wrote: > >> Hi, >> >> I'm trying to put together a set of steps to runs some analysis on NGS data >> we have. I have found modules that wrap alignment software such as >> bowtie/bwa/maq, but are there any packages to calculate RPKM's etc? what are >> people using for this? >> >> > scripture, cufflinks/cuffdiff, bioconductor (GenomicFeatures package, for > example), ERANGE (wold lab), and several others. I don't know of perl > wrappers for these, but they are all command-line applications, generally > speaking. > > This is an interesting site to follow for RNA-seq analysis and applications: > > http://rna-seqblog.com/ > > Sean BioPerl does have wrappers and interfaces for some packages, particularly Lincoln's Bio::DB::Sam (samtools package) and wrappers for bowtie, tophat, and bwa. Locally, for comparative gene expression analyses we use bioconductor (many packages to choose from) as well as the command-line tools Sean mentions. chris From wangwl at mail.whu.edu.cn Mon Feb 28 09:57:19 2011 From: wangwl at mail.whu.edu.cn (Wenliang Wang) Date: Mon, 28 Feb 2011 22:57:19 +0800 Subject: [Bioperl-l] Bio::Tools::Run::TribeMCL Message-ID: <000d01cbd757$d03aeb80$70b0c280$@whu.edu.cn> Hello, I??m trying to use TribeMCL to cluster a group of protein sequence with the following code: use Bio::Tools::Run::TribeMCL; use Bio::SearchIO; my $usage = "run_tribe.pl blastfile"; my $blastfile = $ARGV[0]; my $sio = Bio::SearchIO->new(-format=>'blast', -file=>$blastfile); my @params=('inputtype'=>'searchio',I=>'2.0', 'mcl'=>'/home/sean/tribemcl/src/shmcl/mcl', 'matrix'=>'/home/sean/tribemcl/src/contrib/tribe/tribe-matrix'); my $fact = Bio::Tools::Run::TribeMCL->new(@params); my $fam = $fact->run($sio); for (my $i = 0; $i [$i]})." members\n"; foreach my $member (@{$fam->[$i]}){ print "\t$member\n"; } } But it came out with some wrong message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Need inputs for running tribe mcl, nothing provided STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Tools::Run::TribeMCL::_setup_input /usr/lib/perl5/site_perl/5.8. 8/Bio/Tools/Run/TribeMCL.pm:765 STACK: Bio::Tools::Run::TribeMCL::run /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/TribeMCL.pm:346 STACK: test.pl:16 I??m wondering if you can help me with this. Sorry for the long email. Best regards?? Wenliang Wang College of Life Science,Wuhan University,Wuhan,Hubei,China. email:wangwl at mail.whu.edu.cn MSN:lvxiaopohai at hotmail.com From cjfields at illinois.edu Mon Feb 28 10:43:33 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 28 Feb 2011 09:43:33 -0600 Subject: [Bioperl-l] Bio::Tools::Run::TribeMCL In-Reply-To: <000d01cbd757$d03aeb80$70b0c280$@whu.edu.cn> References: <000d01cbd757$d03aeb80$70b0c280$@whu.edu.cn> Message-ID: Not sure of it's status , but I don't think TribeMCL is maintained anymore (maybe we should deprecate it?). You should really look at other tools besides TribeMCL, in particular OrthoMCL, which is essentially a more up-to-date TribeMCL. Also, a recent set of posts mentioned other tools such as blastclust (comes with NCBI's BLAST tools). chris On Feb 28, 2011, at 8:57 AM, Wenliang Wang wrote: > Hello, > > I??m trying to use TribeMCL to cluster a group of protein sequence > with the following code: > > > > use Bio::Tools::Run::TribeMCL; > > use Bio::SearchIO; > > > > my $usage = "run_tribe.pl blastfile"; > > my $blastfile = $ARGV[0]; > > > > my $sio = Bio::SearchIO->new(-format=>'blast', > > -file=>$blastfile); > > > > my @params=('inputtype'=>'searchio',I=>'2.0', > > 'mcl'=>'/home/sean/tribemcl/src/shmcl/mcl', > > 'matrix'=>'/home/sean/tribemcl/src/contrib/tribe/tribe-matrix'); > > > > my $fact = Bio::Tools::Run::TribeMCL->new(@params); > > > > my $fam = $fact->run($sio); > > > > for (my $i = 0; $i > print "Cluster $i \t ".scalar(@{$fam->[$i]})." members\n"; > > foreach my $member (@{$fam->[$i]}){ > > print "\t$member\n"; > > } > > } > > > > But it came out with some wrong message: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Need inputs for running tribe mcl, nothing provided > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Run::TribeMCL::_setup_input /usr/lib/perl5/site_perl/5.8. > 8/Bio/Tools/Run/TribeMCL.pm:765 > > STACK: Bio::Tools::Run::TribeMCL::run > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/TribeMCL.pm:346 > > STACK: test.pl:16 > > > > > > I??m wondering if you can help me with this. > > > > Sorry for the long email. > > > > > > Best regards?? > > > > Wenliang Wang > > College of Life Science,Wuhan University,Wuhan,Hubei,China. > > email:wangwl at mail.whu.edu.cn > > MSN:lvxiaopohai at hotmail.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Mon Feb 28 13:05:44 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Feb 2011 13:05:44 -0500 Subject: [Bioperl-l] Next release? Message-ID: Hi, There are some bug fixes in Bio::DB::SeqFeature that I'd like to release to CPAN. What are the plans for the next CPAN release of BioPerl? Lincoln -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Mon Feb 28 13:16:10 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 28 Feb 2011 12:16:10 -0600 Subject: [Bioperl-l] Next release? In-Reply-To: References: Message-ID: <57651D25-6D5D-40EB-A336-D565076626E5@illinois.edu> I've been holding off for some fixes going on in Bio::Tree, but if it's pressing we can push one out sooner. Next few weeks? chris On Feb 28, 2011, at 12:05 PM, Lincoln Stein wrote: > Hi, > > There are some bug fixes in Bio::DB::SeqFeature that I'd like to release to > CPAN. What are the plans for the next CPAN release of BioPerl? > > Lincoln > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Mon Feb 28 13:19:46 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 28 Feb 2011 13:19:46 -0500 Subject: [Bioperl-l] Next release? In-Reply-To: <57651D25-6D5D-40EB-A336-D565076626E5@illinois.edu> References: <57651D25-6D5D-40EB-A336-D565076626E5@illinois.edu> Message-ID: It would be great to have a CPAN release by the end of March. Is that doable? Lincoln On Mon, Feb 28, 2011 at 1:16 PM, Chris Fields wrote: > I've been holding off for some fixes going on in Bio::Tree, but if it's > pressing we can push one out sooner. Next few weeks? > > chris > > On Feb 28, 2011, at 12:05 PM, Lincoln Stein wrote: > > > Hi, > > > > There are some bug fixes in Bio::DB::SeqFeature that I'd like to release > to > > CPAN. What are the plans for the next CPAN release of BioPerl? > > > > Lincoln > > > > > > -- > > Lincoln D. Stein > > Director, Informatics and Biocomputing Platform > > Ontario Institute for Cancer Research > > 101 College St., Suite 800 > > Toronto, ON, Canada M5G0A3 > > 416 673-8514 > > Assistant: Renata Musa > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Mon Feb 28 13:23:40 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 28 Feb 2011 12:23:40 -0600 Subject: [Bioperl-l] Next release? In-Reply-To: References: <57651D25-6D5D-40EB-A336-D565076626E5@illinois.edu> Message-ID: Sure, let's aim for that or sooner. Deadlines are always an incentive :) Let's see how master looks; I have a 1.6.2 branch I am merging things to, I can pull in any changes and push out an initial test release sometime this week. chris On Feb 28, 2011, at 12:19 PM, Lincoln Stein wrote: > It would be great to have a CPAN release by the end of March. Is that doable? > > Lincoln > > On Mon, Feb 28, 2011 at 1:16 PM, Chris Fields wrote: > I've been holding off for some fixes going on in Bio::Tree, but if it's pressing we can push one out sooner. Next few weeks? > > chris > > On Feb 28, 2011, at 12:05 PM, Lincoln Stein wrote: > > > Hi, > > > > There are some bug fixes in Bio::DB::SeqFeature that I'd like to release to > > CPAN. What are the plans for the next CPAN release of BioPerl? > > > > Lincoln > > > > > > -- > > Lincoln D. Stein > > Director, Informatics and Biocomputing Platform > > Ontario Institute for Cancer Research > > 101 College St., Suite 800 > > Toronto, ON, Canada M5G0A3 > > 416 673-8514 > > Assistant: Renata Musa > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa From cjfields at illinois.edu Mon Feb 28 13:59:06 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 28 Feb 2011 12:59:06 -0600 Subject: [Bioperl-l] Plans for the next release (and beyond) Message-ID: <11F5BA8C-D4F4-419B-870B-D4B5180F4D7D@illinois.edu> There is a pressing need to get a 1.6.2 core release out as soon as possible, based on Lincoln's need to get the latest Bio::DB::SeqFeature fixes out to CPAN. We had been waiting on a few changes are currently lingering on branches that could be merged in when necessary (the GMOD hackathon changes primary among them), but the need isn't immediately pressing to merge them back in prior to a release. I would also like to avoid the 'commit-frenzy' that sometimes occurs prior to a release if possible, particularly cutting-edge changes, etc. My suggestion is, after this release, we start actually working on a few items that will very likely effectively end the 1.6.x release series. 1) Make packaging releases a much less painful process. Tools already exist to do this (Dist::Zilla, ShipIt), we should definitely take advantage of them. 2) Work on de-monolithizing BioPerl, maybe towards a 1.7 or even a 2.0 release. We've long talked about doing so, I talked about it at the last BOSC meeting, 'bout time to actually work on it. This will also address some of the problems Lincoln has been facing with the rapid development cycle of GBrowse v2 vs BioPerl, as was previously experienced with Bio::Graphics, which is a good example of how development of BioPerl-related modules can occur successfully outside the core. 3) I would like to work on moving the HOWTO's and other relevant documentation (Tutorial) back into the distributions, maybe in a particular namespace (Bio::Manual or similar). The reason is simple: maintaining possibly discordant versions of documentation is unsustainable. We could possibly set up a way of converting POD->wiki for on-line documentation, but I would like the documentation be tied to the version of the code it comes with, and the only easy way to do so is to package them all together. Any others? Concerns, etc? chris