From w.bryant at ucl.ac.uk Mon Jun 1 04:06:58 2009 From: w.bryant at ucl.ac.uk (Will Bryant) Date: Mon, 01 Jun 2009 09:06:58 +0100 Subject: [Bioperl-l] Extract genomic data from GenBank Message-ID: <4A238C22.9090604@ucl.ac.uk> I'm trying to retrieve the complete GenBank format sequence file for a specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not exist' errors, even when trying the example gi '405830'. The script was running fine September last year, but when I came back to it this week it wasn't working. Am I missing something obvious? In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100 Code: #!/usr/bin/perl -w use strict; use Bio::Perl; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank'); my $straincomp = $gb->get_Seq_by_gi('405830'); my $seqout = 0; #my $set_output_file = '$seqout = Bio::SeqIO->new( -format => \'genbank\', -file => \'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');'; #print $set_output_file; eval ($set_output_file); $seqout -> write_seq($straincomp); Error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: gi does not exist STACK: Error::throw STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209 STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12 ----------------------------------------------------------- Many thanks, Will Bryant. From David.Messina at sbc.su.se Mon Jun 1 05:04:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 1 Jun 2009 11:04:40 +0200 Subject: [Bioperl-l] Extract genomic data from GenBank In-Reply-To: <4A238C22.9090604@ucl.ac.uk> References: <4A238C22.9090604@ucl.ac.uk> Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com> Hey Will, I think there have been API changes in GenBank's remote query interface that have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to BioPerl 1.6 and see if that works for you. (Note that I've only glanced at your code -- I'm assuming that's not the problem since it worked fine for you before.) Dave From fontanez at fas.harvard.edu Mon Jun 1 08:41:06 2009 From: fontanez at fas.harvard.edu (Kristina Fontanez) Date: Mon, 1 Jun 2009 08:41:06 -0400 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: References: <2023E087846042178215CF9EBDE12C75@NewLife> <4A205502.2030701@sendu.me.uk> <024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu> <4A205D46.4090105@sendu.me.uk> Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu> Hey everyone- Thanks for all the advice. I reinstalled Xcode tools, installed Fink and downloaded bioperl successfully. It's now working smoothly. Thanks again, Kristina --------------------------------------------------------------- Kristina Fontanez PhD candidate Department of Organismic and Evolutionary Biology Cavanaugh lab Harvard University 16 Divinity Ave. Cambridge, MA 02138 tel: 617-495-1138 fax: 617-496-6933 email: fontanez at fas.harvard.edu On May 29, 2009, at 10:40 PM, Chris Fields wrote: Kristina, You aren't running as superuser: > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez $ cpan You'll need to run cpan using 'sudo cpan' if installing modules anywhere requiring superuser permissions. chris On May 29, 2009, at 5:10 PM, Sendu Bala wrote: > Kristina Fontanez wrote: >> Hello everyone- >> Sendu - I took your advice but doing Install Bundle::CPAN did not >> take care of the dependencies. It still failed. See attached txt >> file with my terminal output. Does anyone have any idea how this >> might be? > > From reading the output it seems like perhaps you don't have 'make' > or there is something wrong when using it. If you're on a mac you > may need to install the dev tools. Someone else want to jump in here > with advice? > > Also, check your CPAN configuration to ensure it is trying to use > the correct make commands. ('o conf' etc.) > > >> If I wanted to wipe all perl from my computer and simply start >> over, how might this be accomplished? > > Don't do that. At least not until you know you have a working make > setup. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jun 1 10:55:50 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Jun 2009 10:55:50 -0400 Subject: [Bioperl-l] a HOWTO for Tiling Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife> Hi All Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. cheers and thanks for all input-- Mark From cjfields at illinois.edu Mon Jun 1 11:21:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 10:21:30 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: References: <2023E087846042178215CF9EBDE12C75@NewLife> Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> A autogenerated passthrough Makefile.PL is generated with the distribution: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL We may remove that in future releases, but it should work regardless (i.e. call Module::Build and Build.PL). I'm pretty convinced that the issue was permissions-based at heart. Note Kristina ran 'cpan' instead of 'sudo cpan' to invoke the shell, so the shell is using current user config instead of su for installation. You need to use 'sudo' to install anything /Library/Perl on Mac (unless you are already 'root', but on recent OS X version logging in as 'root' is turned off). I just noticed nothing is mentioned along these lines in the installation docs, so we'll need to update those. chris On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote: > Hi Kristina, > > [Don't forget to reply-all, so the list stays in the loop. Many many > more helpers > there.] > > Apparently cpan can't make the Makefile, but can download and expand > the > library directories, in your .cpan directory (see edited highlights > below). > > Let's appeal to the BioPerl brethren/sestren---answers? > > MAJ > > > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7602) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> install Test::Harness > CPAN: Storable loaded ok > Going to read /Users/kristinafontanez/.cpan/Metadata > Database was generated on Fri, 29 May 2009 11:27:00 GMT > Running install for module Test::Harness > Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz > CPAN: Digest::MD5 loaded ok > CPAN: Compress::Zlib loaded ok > Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ > ANDYA/Test-Harness-3.17.tar.gz ok > Scanning cache /Users/kristinafontanez/.cpan/build for sizes > Test-Harness-3.17/ > Test-Harness-3.17/Build.PL > ... > Test-Harness-3.17/xt/perls/sample-tests/ > Test-Harness-3.17/xt/perls/sample-tests/perl_version > Removing previously used /Users/kristinafontanez/.cpan/build/Test- > Harness-3.17 > > CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz > > Checking if your kit is complete... > Looks good > Writing Makefile for Test::Harness > -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible > > cpan> install File::HomeDir > ...[more of same]... > > > ----- Original Message ----- From: "Kristina Fontanez" > > To: "Mark A. Jensen" > Sent: Friday, May 29, 2009 3:56 PM > Subject: Re: [Bioperl-l] problem with bioperl install > > >> Mr. Jensen- >> >> Thank you for your help but unfortunately the installation of >> Test::Harness etc didn't work. I copied my terminal output and >> attached the file. Any advice on what's still going wrong? >> >> Thanks, >> Kristina >> > > > -------------------------------------------------------------------------------- > > >> >> >> >> >> --------------------------------------------------------------- >> Kristina Fontanez >> PhD candidate >> Department of Organismic and Evolutionary Biology >> Cavanaugh lab >> Harvard University >> 16 Divinity Ave. >> Cambridge, MA 02138 >> >> tel: 617-495-1138 >> fax: 617-496-6933 >> email: fontanez at fas.harvard.edu >> >> >> >> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote: >> >> The message says you are first updating your CPAN.pm. >> That module needs modules you don't have, so >> >> use cpan to install the dependencies you don't have, viz. >>> Test::Harness >>> File::HomeDir >> >> $ cpan >>> install Test::Harness >> etc. >> Then install CPAN.pm again (or run the Bioperl install again). >> >> Lather, rinse, repeat the install of Bioperl until it completes >> without errors. >> >> ----- Original Message ----- From: "Kristina Fontanez" > > >> To: >> Sent: Friday, May 29, 2009 3:07 PM >> Subject: [Bioperl-l] problem with bioperl install >> >> >>> Hello- >>> >>> I am trying to install bioperl and I ran into some problems. See >>> list below. >>> >>> >>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz >>> >>> Checking if your kit is complete... >>> Looks good >>> Warning: prerequisite File::HomeDir 0.69 not found. >>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56. >>> Writing Makefile for CPAN >>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/ >>> CPAN-1.94.tar.gz] ----- >>> Test::Harness >>> File::HomeDir >>> >>> >>> How can I fix this? >>> >>> >>> Thanks, >>> Kristina >>> --------------------------------------------------------------- >>> Kristina Fontanez >>> PhD candidate >>> Department of Organismic and Evolutionary Biology >>> Cavanaugh lab >>> Harvard University >>> 16 Divinity Ave. >>> Cambridge, MA 02138 >>> >>> tel: 617-495-1138 >>> fax: 617-496-6933 >>> email: fontanez at fas.harvard.edu >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jun 1 12:14:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 11:14:07 -0500 Subject: [Bioperl-l] a HOWTO for Tiling In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife> References: <13190185F84E43BDA99993CEB44394C4@NewLife> Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> I think, as long is it doesn't significantly impact SearchIO performance wise (from reading the HOWTO I can't see how it will), I say commit away. In fact, I consider this a bug fix that should be in the next 1.6 point release. We should add deprecation warnings where needed for 1.7... chris On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote: > Hi All > Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an > exhibition of B::S::Tiling, use cases, code snippets, design, > implementation and algorithm discussions. We're just about ready to > port over to core from bioperl-dev; please shout out if this is not > a good idea. > cheers and thanks for all input-- > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.bolser at gmail.com Mon Jun 1 12:27:30 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 1 Jun 2009 17:27:30 +0100 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> 2009/6/1 Chris Fields : ... > for installation. ?You need to use 'sudo' to install anything /Library/Perl > on Mac (unless you are already 'root', but on recent OS X version logging in ... local::lib is supposed to take care of this. Is this broken on Mac? Building stuff as root is generally considered to be bad. > I just noticed nothing is mentioned along these lines in the installation > docs, so we'll need to update those. I tried to write down a clear 'recipe' for getting things installed (this was actually on the GMod wiki). I really think the install docs could be improved. Sometimes less verbose is better. Dan > chris > > On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote: > >> Hi Kristina, >> >> [Don't forget to reply-all, so the list stays in the loop. Many many more >> helpers >> there.] >> >> Apparently cpan can't make the Makefile, but can download and expand the >> library directories, in your .cpan directory (see edited highlights >> below). >> >> Let's appeal to the BioPerl brethren/sestren---answers? >> >> MAJ >> >> >> term dump: >> >> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan >> Terminal does not support AddHistory. >> >> cpan shell -- CPAN exploration and modules installation (v1.7602) >> ReadLine support available (try 'install Bundle::CPAN') >> >> cpan> install Test::Harness >> CPAN: Storable loaded ok >> Going to read /Users/kristinafontanez/.cpan/Metadata >> Database was generated on Fri, 29 May 2009 11:27:00 GMT >> Running install for module Test::Harness >> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz >> CPAN: Digest::MD5 loaded ok >> CPAN: Compress::Zlib loaded ok >> Checksum for >> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz >> ok >> Scanning cache /Users/kristinafontanez/.cpan/build for sizes >> Test-Harness-3.17/ >> Test-Harness-3.17/Build.PL >> ... >> Test-Harness-3.17/xt/perls/sample-tests/ >> Test-Harness-3.17/xt/perls/sample-tests/perl_version >> Removing previously used >> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17 >> >> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz >> >> Checking if your kit is complete... >> Looks good >> Writing Makefile for Test::Harness >> ?-- NOT OK >> Running make test >> Can't test without successful make >> Running make install >> make had returned bad status, install seems impossible >> >> cpan> install File::HomeDir >> ...[more of same]... >> >> >> ----- Original Message ----- From: "Kristina Fontanez" >> >> To: "Mark A. Jensen" >> Sent: Friday, May 29, 2009 3:56 PM >> Subject: Re: [Bioperl-l] problem with bioperl install >> >> >>> Mr. Jensen- >>> >>> Thank you for your help but unfortunately the installation of >>> Test::Harness etc didn't work. I copied my terminal output and >>> attached the file. Any advice on what's still going wrong? >>> >>> Thanks, >>> Kristina >>> >> >> >> >> -------------------------------------------------------------------------------- >> >> >>> >>> >>> >>> >>> --------------------------------------------------------------- >>> Kristina Fontanez >>> PhD candidate >>> Department of Organismic and Evolutionary Biology >>> Cavanaugh lab >>> Harvard University >>> 16 Divinity Ave. >>> Cambridge, MA 02138 >>> >>> tel: 617-495-1138 >>> fax: 617-496-6933 >>> email: fontanez at fas.harvard.edu >>> >>> >>> >>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote: >>> >>> The message says you are first updating your CPAN.pm. >>> That module needs modules you don't have, so >>> >>> use cpan to install the dependencies you don't have, viz. >>>> >>>> ?Test::Harness >>>> ?File::HomeDir >>> >>> $ cpan >>>> >>>> install Test::Harness >>> >>> etc. >>> Then install CPAN.pm again (or run the Bioperl install again). >>> >>> Lather, rinse, repeat the install of Bioperl until it completes >>> without errors. >>> >>> ----- Original Message ----- From: "Kristina Fontanez" >>> >> > >>> To: >>> Sent: Friday, May 29, 2009 3:07 PM >>> Subject: [Bioperl-l] problem with bioperl install >>> >>> >>>> Hello- >>>> >>>> I am trying to install bioperl and I ran into some problems. See >>>> list ?below. >>>> >>>> >>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz >>>> >>>> Checking if your kit is complete... >>>> Looks good >>>> Warning: prerequisite File::HomeDir 0.69 not found. >>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56. >>>> Writing Makefile for CPAN >>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/ >>>> CPAN-1.94.tar.gz] ----- >>>> ?Test::Harness >>>> ?File::HomeDir >>>> >>>> >>>> How can I fix this? >>>> >>>> >>>> Thanks, >>>> Kristina >>>> --------------------------------------------------------------- >>>> Kristina Fontanez >>>> PhD candidate >>>> Department of Organismic and Evolutionary Biology >>>> Cavanaugh lab >>>> Harvard University >>>> 16 Divinity Ave. >>>> Cambridge, MA 02138 >>>> >>>> tel: 617-495-1138 >>>> fax: 617-496-6933 >>>> email: fontanez at fas.harvard.edu >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jun 1 13:15:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 12:15:42 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote: > 2009/6/1 Chris Fields : > > ... >> for installation. You need to use 'sudo' to install anything / >> Library/Perl >> on Mac (unless you are already 'root', but on recent OS X version >> logging in > ... > > local::lib is supposed to take care of this. Is this broken on Mac? > Building stuff as root is generally considered to be bad. You can install to a local lib, yes, but cpan needs to be manually configured to do this; I don't think it is automatically configured to do so in OS X, eg. it defaults to /Library/Perl. Frankly, I sidestep the whole issue with my own custom perl installation, but that's me. >> I just noticed nothing is mentioned along these lines in the >> installation >> docs, so we'll need to update those. > > I tried to write down a clear 'recipe' for getting things installed > (this was actually on the GMod wiki). I really think the install docs > could be improved. Sometimes less verbose is better. > > Dan True, but I would much rather have reasonable instructions that outline most installation issues than ones that aren't detailed enough. My thought is to strip down the INSTALL doc that comes with BioPerl down to the essentials and point to the wiki for the more detailed ones (including problems encountered). It's too hard to maintain both and backport the wiki into plain text. chris From maj at fortinbras.us Mon Jun 1 15:03:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Jun 2009 15:03:05 -0400 Subject: [Bioperl-l] a HOWTO for Tiling In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> References: <13190185F84E43BDA99993CEB44394C4@NewLife> <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> Message-ID: Thanks, Chris-- Bio::Search::Tiling is now ported to core; the snapshot of the ported version is in bioperl-dev/tags/tiling-port-to-core-060109. Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets BIOPERL_TILING_EXHAUSTIVE_TESTS . Cry 'Havoc!' and let slip the dogs of war... MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Sendu Bala" ; "Dave Messina" ; "BioPerl List" Sent: Monday, June 01, 2009 12:14 PM Subject: Re: [Bioperl-l] a HOWTO for Tiling >I think, as long is it doesn't significantly impact SearchIO performance wise >(from reading the HOWTO I can't see how it will), I say commit away. In fact, >I consider this a bug fix that should be in the next 1.6 point release. We >should add deprecation warnings where needed for 1.7... > > chris > > On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote: > >> Hi All >> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of >> B::S::Tiling, use cases, code snippets, design, implementation and algorithm >> discussions. We're just about ready to port over to core from bioperl-dev; >> please shout out if this is not a good idea. >> cheers and thanks for all input-- >> Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From koenvanderdrift at gmail.com Mon Jun 1 18:22:23 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Mon, 1 Jun 2009 18:22:23 -0400 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote: > My thought is to strip down the INSTALL doc that comes with BioPerl > down to the essentials and point to the wiki for the more detailed > ones (including problems encountered). It's too hard to maintain > both and backport the wiki into plain text. Good idea, please then also update the file PLATFORMS. It has a link to a very outdated website for the installation of bioperl on OS X. And maybe a line + link to the bioperl wiki can be added that recommends the use of fink as an alternative to cpan? cheers, - Koen. From cjfields at illinois.edu Mon Jun 1 19:27:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 18:27:32 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu> On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote: > > On Jun 1, 2009, at 1:15 PM, Chris Fields wrote: > >> My thought is to strip down the INSTALL doc that comes with BioPerl >> down to the essentials and point to the wiki for the more detailed >> ones (including problems encountered). It's too hard to maintain >> both and backport the wiki into plain text. > > > Good idea, please then also update the file PLATFORMS. It has a link > to a very outdated website for the installation of bioperl on OS X. > And maybe a line + link to the bioperl wiki can be added that > recommends the use of fink as an alternative to cpan? > > cheers, > > - Koen. Done. I've added a ticket on bugzilla for tracking this so it doesn't get lost: http://bugzilla.open-bio.org/show_bug.cgi?id=2846 chris From shalabh.sharma7 at gmail.com Tue Jun 2 10:44:25 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 2 Jun 2009 10:44:25 -0400 Subject: [Bioperl-l] Refseq Hits Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> Hi All, This is not really a bioperl query, but i am really confused and need some help. I blasted some sequences against refseq database (locally). After parsing the blast result what i noticed that some description fields contain two hit names like: hit_name -> gi|71082715|ref|YP_265434.1| Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein [Candidatus Pelagibacter ubique HTCC1002] So besides giving me description for hit_name (HTCC 1062) its also giving me HTCC 1002. I will really appreciate if someone can help me out. Thanks Shalabh _________________________________________________ Shalabh Sharma Scientific Computing Professional Associate Department of Marine Sciences University of Georgia Athens, GA 30602-3636 phone: 706-542-0341 email: ssharmai at uga.edu From jonathancrabtree at gmail.com Tue Jun 2 11:04:33 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Tue, 2 Jun 2009 11:04:33 -0400 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> Hi Shalabh- I believe RefSeq is a non-redundant database, in which sequence entries with identical sequences are merged and their descriptions are concatenated in the FASTA defline. If you look up the two accession numbers/gi numbers from your search results I think you'll see that both are valid matches because their polypeptide sequences are identical: http://www.ncbi.nlm.nih.gov/protein/71082715 http://www.ncbi.nlm.nih.gov/protein/91762865 You're just getting a single match with two descriptions instead of two matches with one description, but the sequence is the same and so, therefore are the blast alignments. Jonathan On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma wrote: > Hi All, > This is not really a bioperl query, but i am really confused and > need some help. > I blasted some sequences against refseq database (locally). After parsing > the blast result what i noticed that some description fields contain two > hit > names like: > hit_name -> gi|71082715|ref|YP_265434.1| > Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique > HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein > [Candidatus Pelagibacter ubique HTCC1002] > > So besides giving me description for hit_name (HTCC 1062) its also giving > me > HTCC 1002. > I will really appreciate if someone can help me out. > > Thanks > Shalabh > _________________________________________________ > Shalabh Sharma > Scientific Computing Professional Associate > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > phone: 706-542-0341 > email: ssharmai at uga.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Jun 2 11:15:45 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 2 Jun 2009 11:15:45 -0400 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> Hi Jonathan, Your information is really helpful. Thanks a lot. -Shalabh On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree < jonathancrabtree at gmail.com> wrote: > > Hi Shalabh- > > I believe RefSeq is a non-redundant database, in which sequence entries > with identical sequences are merged and their descriptions are concatenated > in the FASTA defline. If you look up the two accession numbers/gi numbers > from your search results I think you'll see that both are valid matches > because their polypeptide sequences are identical: > > http://www.ncbi.nlm.nih.gov/protein/71082715 > http://www.ncbi.nlm.nih.gov/protein/91762865 > > You're just getting a single match with two descriptions instead of two > matches with one description, but the sequence is the same and so, therefore > are the blast alignments. > > Jonathan > > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma > wrote: > >> Hi All, >> This is not really a bioperl query, but i am really confused and >> need some help. >> I blasted some sequences against refseq database (locally). After parsing >> the blast result what i noticed that some description fields contain two >> hit >> names like: >> hit_name -> gi|71082715|ref|YP_265434.1| >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding >> protein >> [Candidatus Pelagibacter ubique HTCC1002] >> >> So besides giving me description for hit_name (HTCC 1062) its also giving >> me >> HTCC 1002. >> I will really appreciate if someone can help me out. >> >> Thanks >> Shalabh >> _________________________________________________ >> Shalabh Sharma >> Scientific Computing Professional Associate >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> phone: 706-542-0341 >> email: ssharmai at uga.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From tristan.lefebure at gmail.com Tue Jun 2 12:24:21 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 2 Jun 2009 12:24:21 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <200906021224.21439.tristan.lefebure@gmail.com> On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote > the range_convert() method - with plenty of tests. I > mention this now so that no-one else need to start > thinking through all the edge values. > > :) > > I'll contribute it to the code base once there is a > consensus of best way forward. > Heikki, This thread has been quiet for a while, but I don't see anything new in Bio::Seq::Quality. Did we reach a consensus or are you waiting for some more discussion on the subject? (I'm pretty impatient to see bioperl handling both sanger and illumina ranges on the fly!) --Tristan > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : > >> I have tried to summarise this in a central place: > >> http://en.wikipedia.org/wiki/FASTQ_format > > > > Torsten, > > > > Thanks for putting this together. Very helpful. > > > > Do you have a plan of action? Let me propose one for > > BioPerl. It based on following assumptions: > > > > 1. There is multitude of different ways of coding > > quality values out there. 2. Bio::Seq::Quality is > > agnostic of any quality value range rules 3. The > > emerging open standard is the Sanger fastq > > specification 4. Open source programs use the Sanger > > fastq specs > > > > > > From these it follows that: > > > > > > 1. BioPerl should support Sanger fastq standard > > > > 1.1. it already does and there are other SeqIO modules > > for dealing with other non-fastq formats. > > > > 2. BioPerl should offer simple ways of converting > > between quality range rules > > > > 2.1. Have a generic method accessible from > > Bio::Seq::Quality with preset versions of the method > > for converting between known variants (Sanger fastq and > > the two Illumina versions) > > > > For example: > > > > range_convert ($from_lower, $from_upper, $to_lower, > > $to_upper, $value) throw if $value < $from_lower or > > $value > $from_upper return $newvalue > > > > range_convert_illumina2fastq(), > > range_convert_fastq2illumina(), > > range_convert_fastq2phred(), > > range_convert_phred2fastq().... > > > > (assuming that illumina 1.3 eq phred) > > > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert > > Illumina qualities into Sanger fastq on the fly > > > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the > > incoming stream of quality value range either > > automatically or be given a keyword parameter > > indicating the range. > > > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an > > error if it detects a quality value out of range. > > > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an > > error if it detects a quality value out of range. > > > > 2.2.4. It would be useful but not absolutely necessary > > for Bio::SeqIO::Fastq::write_seq to be able to write > > out in Illumina ranges > > > > > > What do you think? > > > > -Heikki > > > > 2009/4/26 Torsten Seemann : > >>> > This might be a good place to ask the question: > >>> > having looked at the fastq.pm page, is the fastq > >>> > format defined (only) by a "@'" followed by > >>> > >>> a > >>> > >>> > sequence line and a "+" header followed by a > >>> > quality line and the two headers have to agree? Now > >>> > that Illumina is using phred scaling, are 'Sanger' > >>> > and 'Illumina' versions the same? > >>> > >>> No they aren't the same, Illumina still encodes the > >>> ascii as value + 64 and Sanger as value + 33. > >> > >> Illumina have now CHANGED how they calculate the > >> quality value however in the last month or so... Their > >> Q range used to be -5..40 mapped to ASCII 64+, but now > >> they produce Q >= 0 and it is unclear if they start at > >> 69 or 64 now... > >> > >> I have tried to summarise this in a central place: > >> > >> http://en.wikipedia.org/wiki/FASTQ_format > >> > >> Corrections welcome! > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. > >> Microbiology, Monash University, AUSTRALIA > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +27 (0)714328090 > > Sent from Claremont, WC, South Africa From Russell.Smithies at agresearch.co.nz Tue Jun 2 16:56:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Jun 2009 08:56:26 +1200 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz> The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Wednesday, 3 June 2009 3:16 a.m. > To: Jonathan Crabtree > Cc: bioperl-l > Subject: Re: [Bioperl-l] Refseq Hits > > Hi Jonathan, Your information is really helpful. Thanks a > lot. > > -Shalabh > > > On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree < > jonathancrabtree at gmail.com> wrote: > > > > > Hi Shalabh- > > > > I believe RefSeq is a non-redundant database, in which sequence entries > > with identical sequences are merged and their descriptions are concatenated > > in the FASTA defline. If you look up the two accession numbers/gi numbers > > from your search results I think you'll see that both are valid matches > > because their polypeptide sequences are identical: > > > > http://www.ncbi.nlm.nih.gov/protein/71082715 > > http://www.ncbi.nlm.nih.gov/protein/91762865 > > > > You're just getting a single match with two descriptions instead of two > > matches with one description, but the sequence is the same and so, therefore > > are the blast alignments. > > > > Jonathan > > > > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma > > wrote: > > > >> Hi All, > >> This is not really a bioperl query, but i am really confused and > >> need some help. > >> I blasted some sequences against refseq database (locally). After parsing > >> the blast result what i noticed that some description fields contain two > >> hit > >> names like: > >> hit_name -> gi|71082715|ref|YP_265434.1| > >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique > >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding > >> protein > >> [Candidatus Pelagibacter ubique HTCC1002] > >> > >> So besides giving me description for hit_name (HTCC 1062) its also giving > >> me > >> HTCC 1002. > >> I will really appreciate if someone can help me out. > >> > >> Thanks > >> Shalabh > >> _________________________________________________ > >> Shalabh Sharma > >> Scientific Computing Professional Associate > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> phone: 706-542-0341 > >> email: ssharmai at uga.edu > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Tue Jun 2 17:05:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Jun 2009 17:05:03 -0400 Subject: [Bioperl-l] Bio::Search::Tiling Message-ID: All- Bio::Search::Tiling is now in bioperl-live, passes all tests. Thanks, Mark From shalabh.sharma7 at gmail.com Wed Jun 3 13:27:59 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 3 Jun 2009 13:27:59 -0400 Subject: [Bioperl-l] gbf to gff Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> Hi all, I am working on Roseobacters. Many times I've converted gbk file from GenBank to gff format but now one genome "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two gbf files: https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain So now how i can convert this genome to one gff file so i can use it in gbrowse? I would really appreciate if anyone can help me out. Thanks From scott at scottcain.net Wed Jun 3 14:11:54 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 3 Jun 2009 14:11:54 -0400 Subject: [Bioperl-l] gbf to gff In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com> Hi Shalabh, Do you want them combined onto a single reference sequence? I'm guessing this is a circular microbial genome in two segments. Do you know how to the coordinates in one genbank file relates to the other (or are you willing to make something up)? I imagine the way I would do it would be to convert both files to gff and then write a quicky script to convert the coordinates and reference sequence name (column 1) of one file to be consistent with the other. Scott On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma wrote: > Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've > converted gbk file from GenBank to gff format but now one genome > "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two > gbf files: > > https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain > > So now how i can convert this genome to one gff file so i can use it in > gbrowse? > I would really appreciate if anyone can help me out. > > Thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From alperyilmaz at gmail.com Fri Jun 5 14:50:46 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Fri, 5 Jun 2009 14:50:46 -0400 Subject: [Bioperl-l] GBroswe2 - feature details Message-ID: Dear all, I have a question about utilizing the tag/value pairs that were used in 9th of GFF. If my 9th column is like this: ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22 How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to print name and sequence of a BindingSite, what do I need to replace question marks below? balloon hover = Motif name: $name, Sequence: ??????? The manual is mentioning that it's possible to use user defined tag/value pairs, but I couldn't figure out how. The manual is mentioning: [feature_type:details] tag1 = formatting rule tag2 = formatting rule tag3 = formatting rule can be used to adjust formatting of a tag, but I don't how this can be used to assign value to a tag? I tried ; [cis-elements:details] bs_seq = $value (I didn't use BS_Seq, since it was mentioned, tags are case-insensitive) OR $bs_seq = $value but, I cannot use $bs_seq in hover link option after doing this. What am I doing wrong? thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 www.grassius.org From cjfields at illinois.edu Fri Jun 5 16:43:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Jun 2009 15:43:04 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm? In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu> References: <002b01c9e567$e09b0de0$a1d129a0$@edu> <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu> Message-ID: (Just so this is going to the correct list) Marcos, I'll look into it. This may have been fixed in between the releases, though. There isn't a PPM available for 1.6 yet (several prereqs were missing at the time of the 1.6 release, such as Graphviz and so on). A bug report is in the queue for this, though, as a reminder. I think those are now available, though, so we should *theoretically* be capable of getting a PPM ready. I say 'theoretically' b/c I don't have easy access to a PC running Windows (I have moved to OS X). I'll see what I can do about that in the next few weeks. In the meantime, if you need it you can download 1.6 or the 'nightly build' version (nightly snapshots of svn code) and add it to PERL5LIB or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work. Nightly builds: http://bioperl.org/DIST/nightly_builds/ chris On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote: > OK, I attached the first record for both files. These are GenBank > flat files that were emailed to us and transferred from Macs to PCs, > so I am not sure if the encoding/line terminations got messed up at > some point. I converted the line terminations to Unix and the > encoding to Western European Windows, still, it didn't work. May be > worth it mention that BioEdit did understand the format after I > fixed the encoding. > > The data was erased because my boss is kind of finicky about sharing > information. However, I tested the files attached to this email and > got the same results. > > I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the > availability of an upgrade from CPAN, are you releasing the PPD as > well? > > Thanks! > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thu 6/4/2009 8:05 PM > To: Barbeitos, Marcos > Cc: bioperl-guts-l at lists.open-bio.org > Subject: Re: [Bioperl-guts-l] Bug in genbank.pm? > > Marcos, > > We need the GenBank file (or the accession) you are attempting to > parse. Also, what version are you using? We have released v. 1.6 on > CPAN, and I intend on releasing 1.6.1 soon. > > chris > > On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote: > >> Hello. I am trying to parse the Info from GeneBank flat files using >> Bio::SeqIO. I got two file which are virtually identical and one of >> them >> gets parsed just fine. However, in the case of the other, the >> program >> croaks when trying to parse the features and gives me: >> >> >> >> -------------------- WARNING --------------------- >> >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to >> recover >> >> --------------------------------------------------- >> >> >> >> I noticed that it does that after it reads the entry '/organism' in >> Features. The only difference I can see between the two files is the >> presence of the feature ' /organelle' and of the line BASE COUNT in >> one of >> them, but the error persists even after I remove these lines. Apart >> from >> that, there are the number of white spaces that precede the >> beginning of >> each line. Any ideas? >> >> >> >> Thanks! >> >> >> >> Marcos S. Barbeitos >> >> Post-Doc Fellow >> >> The University of Kansas >> Department of Ecology and Evolutionary Biology >> 2041 Haworth Hall >> 1200 Sunnyside Avenue >> Lawrence, Kansas 66045 >> p: 785.864.5887 >> f: 785.864.5860 >> >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > From Russell.Smithies at agresearch.co.nz Sun Jun 7 16:32:27 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 8 Jun 2009 08:32:27 +1200 Subject: [Bioperl-l] GBroswe2 - feature details In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz> For the first part of your question, you can use a sub to access values in your annotations: balloon hover = sub{my $f = shift; my %a = $f->attributes; my $name = $f->name; my $seq = $a{'BS_Seq'}; return "Motif name: $name, Sequence: $seq" if defined $seq; return "Motif name: $name, No sequence defined"; } For the second bit, here's the formatting rules I'm using to create hyperlinks: [Dbxref:DETAILS] URL = sub { my ($tag,$value)=@_; if ($value =~ /NCBI_gi:(.+)/){ return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1"; } if ($value =~ /NCBI_Gene:(.+)/){ return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1"; } return; } And this is what the gff looks like: BTA10 refseq mRNA 10011147 10176454 0 - . ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311; BTA10 refseq mRNA 10241506 10301142 0 + . ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862; Hopefully, this will get you going :-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz > Sent: Saturday, 6 June 2009 6:51 a.m. > To: BioPerl List > Subject: [Bioperl-l] GBroswe2 - feature details > > Dear all, > > I have a question about utilizing the tag/value pairs that were used > in 9th of GFF. If my 9th column is like this: > > ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22 > > How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to > print name and sequence of a BindingSite, what do I need to replace > question marks below? > > balloon hover = Motif name: $name, > Sequence: ??????? > > > The manual is mentioning that it's possible to use user defined > tag/value pairs, but I couldn't figure out how. The manual is > mentioning: > [feature_type:details] > tag1 = formatting rule > tag2 = formatting rule > tag3 = formatting rule > > can be used to adjust formatting of a tag, but I don't how this can be > used to assign value to a tag? I tried ; > [cis-elements:details] > bs_seq = $value (I didn't use BS_Seq, since it was > mentioned, tags are case-insensitive) > OR > $bs_seq = $value > > but, I cannot use $bs_seq in hover link option after doing this. What > am I doing wrong? > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > www.grassius.org > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From bernd.jagla at pasteur.fr Mon Jun 8 12:24:12 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 8 Jun 2009 18:24:12 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> Hi, I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e 'install Bio::Das' This is perl, v5.8.9 built for darwin-2level (please let me know if you need anything else) I am trying to install Bio::Das 1.11 I get the following error: not ok 3 not ok 4 Can't call method "description" on an undefined value at t/01das.t line 62. When going into the sources for 01das.t and printing out $db I get: $VAR1 = \bless( { 'autotypes' => undef, 'default_dsn' => undef, 'autocategories' => undef, 'sockets' => {}, 'aggregators' => [ bless( { 'sub_parts' => [ 'coding_exon' ], 'require_whole_object' => undef, 'main_method' => 'CDS', 'method' => 'alignment' }, 'Bio::DB::GFF::Aggregator' ), bless( { 'sub_parts' => [ 'EST_match' ], 'require_whole_object' => undef, 'main_method' => 'alignment', 'method' => 'alignment' }, 'Bio::DB::GFF::Aggregator' ) ], 'timeout' => undef, 'oldstyle_api' => 1, 'default_server' => 'http://www.wormbase.org/db/seq/das' }, 'Bio::Das' ); @sources is empty And test(3, at sources) fails. Please advise. Thanks, Bernd From lincoln.stein at gmail.com Mon Jun 8 13:00:48 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 8 Jun 2009 13:00:48 -0400 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> Hi, The regression tests require an active Internet connection, as well as the DAS test server being up and running. It may be there was a temporary failure of one of those two. I just tested on my end and the regression tests ran ok, so could you try it again? Lincoln On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla wrote: > Hi, > > > > I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e > 'install Bio::Das' > This is perl, v5.8.9 built for darwin-2level > (please let me know if you need anything else) > > > > I am trying to install Bio::Das 1.11 > > > > I get the following error: > > > > not ok 3 > > not ok 4 > > Can't call method "description" on an undefined value at t/01das.t line 62. > > > > When going into the sources for 01das.t and printing out $db I get: > > > > $VAR1 = \bless( { > > 'autotypes' => undef, > > 'default_dsn' => undef, > > 'autocategories' => undef, > > 'sockets' => {}, > > 'aggregators' => [ > > bless( { > > 'sub_parts' => [ > > > 'coding_exon' > > ], > > 'require_whole_object' => > undef, > > 'main_method' => 'CDS', > > 'method' => 'alignment' > > }, 'Bio::DB::GFF::Aggregator' > ), > > bless( { > > 'sub_parts' => [ > > 'EST_match' > > ], > > 'require_whole_object' => > undef, > > 'main_method' => 'alignment', > > 'method' => 'alignment' > > }, 'Bio::DB::GFF::Aggregator' ) > > ], > > 'timeout' => undef, > > 'oldstyle_api' => 1, > > 'default_server' => 'http://www.wormbase.org/db/seq/das' > > }, 'Bio::Das' ); > > > > > > @sources is empty > > And test(3, at sources) fails. > > > > Please advise. > > > > Thanks, > > > > Bernd > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lsbrath at gmail.com Mon Jun 8 16:28:46 2009 From: lsbrath at gmail.com (lsbrath at gmail.com) Date: Mon, 08 Jun 2009 20:28:46 +0000 Subject: [Bioperl-l] fasta conversion Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com> Hello! I am running into trouble while trying to convert a text file to fasta. It should be simple enough but I am getting a wierd error message. This is my script: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use File::Copy; use Bio::SeqIO; my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa"; my $maid = '13063'; opendir my $dh, "$maid_dir"; # directory to search my @files = readdir $dh; #find the _fasta file for my $f (@files){ my $fa = $maid_dir."/".$maid."_hu_1kb.fa"; my $r = $maid_dir."/".$maid."_hu_1kb.txt"; open (my $in,$r); if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta print Dumper($f); my $hu_1kb = $maid.'_hu_1kb'; #file to convert my $in = Bio::SeqIO->new(-file => $r, -format => 'raw'); my $out = Bio::SeqIO->new(-file => ">$fa", -format => 'Fasta'); while ( my $seq = $in->next_seq()) { $out->write_seq($seq); } } } I keep getting the following error message: -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 13063 --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [13063HU] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258 STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210 STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484 STACK: Bio::Seq::SeqFactory::create C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116 STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119 ----------------------------------------------------------- Anyone out there that can help me solve this? From kjaja27 at yahoo.com Fri Jun 5 19:42:13 2009 From: kjaja27 at yahoo.com (kayj) Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT) Subject: [Bioperl-l] finding SNPs in a given region Message-ID: <23897107.post@talk.nabble.com> Hi All, Is there a way to find the SNPs in a given region, I have the start and the end base pair position, I am looking to download the SNPs in different regions, Is that possible ? This is my first time using bioperl and any help will be greatly appreciated Thanks -- View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From kjaja27 at yahoo.com Mon Jun 8 09:49:24 2009 From: kjaja27 at yahoo.com (kayj) Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT) Subject: [Bioperl-l] How to extract SNPs Message-ID: <23924432.post@talk.nabble.com> Hi All, I have several regions on the genome each is defined with the start and the end base pair position. I am looking into using HapMap http://hapmart.hapmap.org/BioMart/martview to extract the SNPs in these region given a population. I am new to bioperl and any help will be greatly appreciated. -- View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bernd at pasteur.fr Mon Jun 8 16:31:57 2009 From: bernd at pasteur.fr (bernd at pasteur.fr) Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST) Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> I tested the connection with wget and everything works fine. I suspect that our proxy might be the problem but all variables are set correctly (ftp_proxy, http_proxy and many more) I am not sure which environment variable are being used... I am not too familiar with all this and don't know where to look for the right configurations. Thanks, Bernd > Hi, > > The regression tests require an active Internet connection, as well as the > DAS test server being up and running. It may be there was a temporary > failure of one of those two. I just tested on my end and the regression > tests ran ok, so could you try it again? > > Lincoln > > On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > wrote: > >> Hi, >> >> >> >> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >> -e >> 'install Bio::Das' >> This is perl, v5.8.9 built for darwin-2level >> (please let me know if you need anything else) >> >> >> >> I am trying to install Bio::Das 1.11 >> >> >> >> I get the following error: >> >> >> >> not ok 3 >> >> not ok 4 >> >> Can't call method "description" on an undefined value at t/01das.t line >> 62. >> >> >> >> When going into the sources for 01das.t and printing out $db I get: >> >> >> >> $VAR1 = \bless( { >> >> 'autotypes' => undef, >> >> 'default_dsn' => undef, >> >> 'autocategories' => undef, >> >> 'sockets' => {}, >> >> 'aggregators' => [ >> >> bless( { >> >> 'sub_parts' => [ >> >> >> 'coding_exon' >> >> ], >> >> 'require_whole_object' => >> undef, >> >> 'main_method' => 'CDS', >> >> 'method' => 'alignment' >> >> }, >> 'Bio::DB::GFF::Aggregator' >> ), >> >> bless( { >> >> 'sub_parts' => [ >> >> 'EST_match' >> >> ], >> >> 'require_whole_object' => >> undef, >> >> 'main_method' => >> 'alignment', >> >> 'method' => 'alignment' >> >> }, >> 'Bio::DB::GFF::Aggregator' ) >> >> ], >> >> 'timeout' => undef, >> >> 'oldstyle_api' => 1, >> >> 'default_server' => >> 'http://www.wormbase.org/db/seq/das' >> >> }, 'Bio::Das' ); >> >> >> >> >> >> @sources is empty >> >> And test(3, at sources) fails. >> >> >> >> Please advise. >> >> >> >> Thanks, >> >> >> >> Bernd >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Mon Jun 8 17:12:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 8 Jun 2009 17:12:03 -0400 Subject: [Bioperl-l] fasta conversion In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com> References: <000e0cd6aa4cd53993046bdc1675@google.com> Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife> you're getting the sequence descriptor rather than the sequence in the return from $in->next_seq. Read up on what the 'raw' format actually entails in the Bio::SeqIO pod.. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, June 08, 2009 4:28 PM Subject: [Bioperl-l] fasta conversion > Hello! > > I am running into trouble while trying to convert a text file to fasta. It > should be simple enough but I am getting a wierd error message. > > This is my script: > > #!/usr/bin/perl > use strict; > use warnings; > use Data::Dumper; > use File::Copy; > use Bio::SeqIO; > > > my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa"; > my $maid = '13063'; > > opendir my $dh, "$maid_dir"; # directory to search > my @files = readdir $dh; > #find the _fasta file > for my $f (@files){ > my $fa = $maid_dir."/".$maid."_hu_1kb.fa"; > my $r = $maid_dir."/".$maid."_hu_1kb.txt"; > open (my $in,$r); > if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta > > print Dumper($f); > my $hu_1kb = $maid.'_hu_1kb'; #file to convert > my $in = Bio::SeqIO->new(-file => $r, > -format => 'raw'); > my $out = Bio::SeqIO->new(-file => ">$fa", > -format => 'Fasta'); > while ( my $seq = $in->next_seq()) { > $out->write_seq($seq); > } > } > } > > I keep getting the following error message: > > -------------------- WARNING --------------------- > MSG: seq doesn't validate, mismatch is 13063 > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Attempting to set the sequence to [13063HU] which does not look healthy > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258 > STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210 > STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484 > STACK: Bio::Seq::SeqFactory::create > C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116 > STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119 > ----------------------------------------------------------- > > Anyone out there that can help me solve this? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Mon Jun 8 17:26:17 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Mon, 08 Jun 2009 17:26:17 -0400 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> Message-ID: <4A2D81F9.8060509@bms.com> bernd at pasteur.fr wrote: Try to add this line -proxy => 'http:', in t/01das.t where the Bio::Das object is created (I think line 41). Hope this works for you, it did for me. Stefan > I tested the connection with wget and everything works fine. > I suspect that our proxy might be the problem but all variables are set > correctly (ftp_proxy, http_proxy and many more) I am not sure which > environment variable are being used... > I am not too familiar with all this and don't know where to look for the > right configurations. > > Thanks, > > Bernd > > >> Hi, >> >> The regression tests require an active Internet connection, as well as the >> DAS test server being up and running. It may be there was a temporary >> failure of one of those two. I just tested on my end and the regression >> tests ran ok, so could you try it again? >> >> Lincoln >> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla >> wrote: >> >> >>> Hi, >>> >>> >>> >>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >>> -e >>> 'install Bio::Das' >>> This is perl, v5.8.9 built for darwin-2level >>> (please let me know if you need anything else) >>> >>> >>> >>> I am trying to install Bio::Das 1.11 >>> >>> >>> >>> I get the following error: >>> >>> >>> >>> not ok 3 >>> >>> not ok 4 >>> >>> Can't call method "description" on an undefined value at t/01das.t line >>> 62. >>> >>> >>> >>> When going into the sources for 01das.t and printing out $db I get: >>> >>> >>> >>> $VAR1 = \bless( { >>> >>> 'autotypes' => undef, >>> >>> 'default_dsn' => undef, >>> >>> 'autocategories' => undef, >>> >>> 'sockets' => {}, >>> >>> 'aggregators' => [ >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> >>> 'coding_exon' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => 'CDS', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' >>> ), >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> 'EST_match' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => >>> 'alignment', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' ) >>> >>> ], >>> >>> 'timeout' => undef, >>> >>> 'oldstyle_api' => 1, >>> >>> 'default_server' => >>> 'http://www.wormbase.org/db/seq/das' >>> >>> }, 'Bio::Das' ); >>> >>> >>> >>> >>> >>> @sources is empty >>> >>> And test(3, at sources) fails. >>> >>> >>> >>> Please advise. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Tue Jun 9 03:05:47 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 9 Jun 2009 09:05:47 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <4A2D81F9.8060509@bms.com> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> <4A2D81F9.8060509@bms.com> Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> Great, that works!!! But since I am using Bio::Das within GBrowse I can't/don't want to change those sources. I tried setting some environment variable but that doesn't seem to work either... So far I have the set the following: FTP_PROXY=http://... HTTP_PROXY=http://... PROXYFTP=http://... PROXYHTTP=http://... ftp_proxy=http://... http_proxy=http://... PROXY=http://... Any suggestions are welcome. Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov Sent: Monday, June 08, 2009 11:26 PM To: bernd at pasteur.fr Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem bernd at pasteur.fr wrote: Try to add this line -proxy => 'http:', in t/01das.t where the Bio::Das object is created (I think line 41). Hope this works for you, it did for me. Stefan > I tested the connection with wget and everything works fine. > I suspect that our proxy might be the problem but all variables are set > correctly (ftp_proxy, http_proxy and many more) I am not sure which > environment variable are being used... > I am not too familiar with all this and don't know where to look for the > right configurations. > > Thanks, > > Bernd > > >> Hi, >> >> The regression tests require an active Internet connection, as well as the >> DAS test server being up and running. It may be there was a temporary >> failure of one of those two. I just tested on my end and the regression >> tests ran ok, so could you try it again? >> >> Lincoln >> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla >> wrote: >> >> >>> Hi, >>> >>> >>> >>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >>> -e >>> 'install Bio::Das' >>> This is perl, v5.8.9 built for darwin-2level >>> (please let me know if you need anything else) >>> >>> >>> >>> I am trying to install Bio::Das 1.11 >>> >>> >>> >>> I get the following error: >>> >>> >>> >>> not ok 3 >>> >>> not ok 4 >>> >>> Can't call method "description" on an undefined value at t/01das.t line >>> 62. >>> >>> >>> >>> When going into the sources for 01das.t and printing out $db I get: >>> >>> >>> >>> $VAR1 = \bless( { >>> >>> 'autotypes' => undef, >>> >>> 'default_dsn' => undef, >>> >>> 'autocategories' => undef, >>> >>> 'sockets' => {}, >>> >>> 'aggregators' => [ >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> >>> 'coding_exon' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => 'CDS', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' >>> ), >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> 'EST_match' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => >>> 'alignment', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' ) >>> >>> ], >>> >>> 'timeout' => undef, >>> >>> 'oldstyle_api' => 1, >>> >>> 'default_server' => >>> 'http://www.wormbase.org/db/seq/das' >>> >>> }, 'Bio::Das' ); >>> >>> >>> >>> >>> >>> @sources is empty >>> >>> And test(3, at sources) fails. >>> >>> >>> >>> Please advise. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Tue Jun 9 07:20:35 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 9 Jun 2009 12:20:35 +0100 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> Hi, I have been experimenting with the Bio::DB::EUtilities module, with help from the Cookbook. But I can't seem to figure out how to get the DNA sequence of a gene; all the examples seem to be fetching protein sequence. How would i go about fetching a sequence using an Entrez GeneID? thanks for any help adam From Kevin.M.Brown at asu.edu Tue Jun 9 11:25:45 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 9 Jun 2009 08:25:45 -0700 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com> <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jun 9 12:08:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 11:08:46 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> All, I've noticed a few methods in bioperl with names like 'no_Foo' that mean 'number of Foo' (such as SimpleAlign's no_sequences). The problem I foresee are possible ambiguities, particularly with negative boolean checks (eg 'no_Foo' could also mean 'this instance contains no Foo'), something that BioPerl also has with various settings. I suggest we alias these as num_* to disambiguate that. There's no easy way to change already in-place flag setting w/o going through a deprecation cycle, but we can promote using positive booleans where possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave the older 'no_*' methods as is for the time being and maybe deprecate them later. If no one has objections I'll add these in as needed. chris From SMarkel at accelrys.com Tue Jun 9 12:26:08 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 9 Jun 2009 12:26:08 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> Chris, I just checked our code for the Sequence Analysis Collection in Pipeline Pilot. We've got a few places we'd need to make code changes, but we like your suggestion. So, no objections from us. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, 09 June 2009 9:09 AM > To: BioPerl List > Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans > > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with negative > boolean checks (eg 'no_Foo' could also mean 'this instance contains no > Foo'), something that BioPerl also has with various settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave > the older 'no_*' methods as is for the time being and maybe deprecate > them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jun 9 13:03:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 12:03:16 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> Message-ID: I don't think it would require code changes right away; for the time being no_* will just alias num_*. We can probably have deprecation warnings activate when we reach a particular version. chris On Jun 9, 2009, at 11:26 AM, Scott Markel wrote: > Chris, > > I just checked our code for the Sequence Analysis Collection in > Pipeline Pilot. We've got a few places we'd need to make code > changes, but we like your suggestion. So, no objections from us. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Tuesday, 09 June 2009 9:09 AM >> To: BioPerl List >> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative >> booleans >> >> All, >> >> I've noticed a few methods in bioperl with names like 'no_Foo' that >> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >> problem I foresee are possible ambiguities, particularly with >> negative >> boolean checks (eg 'no_Foo' could also mean 'this instance contains >> no >> Foo'), something that BioPerl also has with various settings. >> >> I suggest we alias these as num_* to disambiguate that. There's no >> easy way to change already in-place flag setting w/o going through a >> deprecation cycle, but we can promote using positive booleans where >> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >> leave >> the older 'no_*' methods as is for the time being and maybe deprecate >> them later. >> >> If no one has objections I'll add these in as needed. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jun 9 12:32:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 9 Jun 2009 12:32:51 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife> +1, absolutely- MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Tuesday, June 09, 2009 12:08 PM Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with negative > boolean checks (eg 'no_Foo' could also mean 'this instance contains no > Foo'), something that BioPerl also has with various settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave > the older 'no_*' methods as is for the time being and maybe deprecate > them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Tue Jun 9 13:18:05 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 9 Jun 2009 13:18:05 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: Great suggestions, I'm all for it. -hilmar On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with > negative boolean checks (eg 'no_Foo' could also mean 'this instance > contains no Foo'), something that BioPerl also has with various > settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can > leave the older 'no_*' methods as is for the time being and maybe > deprecate them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florent.angly at gmail.com Tue Jun 9 14:41:51 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Jun 2009 11:41:51 -0700 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <4A2EACEF.3090809@gmail.com> Agree! no_* is prone to misunderstandings. Also, some BioPerl code uses nof_*, which I quite like. Florent Hilmar Lapp wrote: > Great suggestions, I'm all for it. > > -hilmar > > On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: > >> All, >> >> I've noticed a few methods in bioperl with names like 'no_Foo' that >> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >> problem I foresee are possible ambiguities, particularly with >> negative boolean checks (eg 'no_Foo' could also mean 'this instance >> contains no Foo'), something that BioPerl also has with various >> settings. >> >> I suggest we alias these as num_* to disambiguate that. There's no >> easy way to change already in-place flag setting w/o going through a >> deprecation cycle, but we can promote using positive booleans where >> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >> leave the older 'no_*' methods as is for the time being and maybe >> deprecate them later. >> >> If no one has objections I'll add these in as needed. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jun 9 14:55:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 13:55:48 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2EACEF.3090809@gmail.com> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> Message-ID: We could probably alias nof_* with num_* just for consistency, but leave nof_* as is and not deprecate it (I don't think anyone would confuse nof* with no*). chris On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: > Agree! no_* is prone to misunderstandings. > Also, some BioPerl code uses nof_*, which I quite like. > Florent > > Hilmar Lapp wrote: >> Great suggestions, I'm all for it. >> >> -hilmar >> >> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >> >>> All, >>> >>> I've noticed a few methods in bioperl with names like 'no_Foo' >>> that mean 'number of Foo' (such as SimpleAlign's no_sequences). >>> The problem I foresee are possible ambiguities, particularly with >>> negative boolean checks (eg 'no_Foo' could also mean 'this >>> instance contains no Foo'), something that BioPerl also has with >>> various settings. >>> >>> I suggest we alias these as num_* to disambiguate that. There's >>> no easy way to change already in-place flag setting w/o going >>> through a deprecation cycle, but we can promote using positive >>> booleans where possible (eg 'is_foo' or 'has_foo' instead of >>> 'no_foo'). We can leave the older 'no_*' methods as is for the >>> time being and maybe deprecate them later. >>> >>> If no one has objections I'll add these in as needed. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mauricio at open-bio.org Tue Jun 9 15:33:18 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 09 Jun 2009 14:33:18 -0500 Subject: [Bioperl-l] Project Help In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com> References: <146497.36250.qm@web8407.mail.in.yahoo.com> Message-ID: <4A2EB8FE.4080402@open-bio.org> Hi Chirag, The OBF applied for the GSoC 2009 but unfortunately we were not accepted. However, other organizations/projects made their way into it and have been kind enough to adopt some of the ideas originally proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list so the people involved with those projects can give you more details. Regards, Mauricio. chirag matkar wrote: > Hello, > THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation. > Also as i am myself a perl developer can i can some stipend or internship for building perl modules?. > > Thanking You, > Regards Chirag. > > > Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/ > From rmb32 at cornell.edu Tue Jun 9 15:12:54 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 09 Jun 2009 12:12:54 -0700 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> Message-ID: <4A2EB436.8020506@cornell.edu> Why not just add deprecation warnings now? Or you could add deprecation warnings now that only print if $Bio::Root::Version::VERSION >= something. Best to do it while one is thinking about it, I always say. Cause I always forget to do it later. ;-) Rob Chris Fields wrote: > We could probably alias nof_* with num_* just for consistency, but leave > nof_* as is and not deprecate it (I don't think anyone would confuse > nof* with no*). > > chris > > On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: > >> Agree! no_* is prone to misunderstandings. >> Also, some BioPerl code uses nof_*, which I quite like. >> Florent >> >> Hilmar Lapp wrote: >>> Great suggestions, I'm all for it. >>> >>> -hilmar >>> >>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >>> >>>> All, >>>> >>>> I've noticed a few methods in bioperl with names like 'no_Foo' that >>>> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >>>> problem I foresee are possible ambiguities, particularly with >>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance >>>> contains no Foo'), something that BioPerl also has with various >>>> settings. >>>> >>>> I suggest we alias these as num_* to disambiguate that. There's no >>>> easy way to change already in-place flag setting w/o going through a >>>> deprecation cycle, but we can promote using positive booleans where >>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >>>> leave the older 'no_*' methods as is for the time being and maybe >>>> deprecate them later. >>>> >>>> If no one has objections I'll add these in as needed. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Tue Jun 9 16:19:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 15:19:03 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2EB436.8020506@cornell.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: On Jun 9, 2009, at 2:12 PM, Robert Buels wrote: > Why not just add deprecation warnings now? Or you could add > deprecation warnings now that only print if > $Bio::Root::Version::VERSION >= something. Best to do it while one > is thinking about it, I always say. Cause I always forget to do it > later. ;-) > > Rob Actually, that's one thing I want to implement within Root, namely the ability to do this: $self->deprecated(-message => 'method Foo is deprecated', -start_ver => $version1, -throw_ver => $version2 ); So it's essentially a noop and invisible up to start_ver (upon where it warns), then throws after, well, throw_ver. I could probably finagle that in w/o destroying things... chris > Chris Fields wrote: >> We could probably alias nof_* with num_* just for consistency, but >> leave nof_* as is and not deprecate it (I don't think anyone would >> confuse nof* with no*). >> chris >> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: >>> Agree! no_* is prone to misunderstandings. >>> Also, some BioPerl code uses nof_*, which I quite like. >>> Florent >>> >>> Hilmar Lapp wrote: >>>> Great suggestions, I'm all for it. >>>> >>>> -hilmar >>>> >>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >>>> >>>>> All, >>>>> >>>>> I've noticed a few methods in bioperl with names like 'no_Foo' >>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences). >>>>> The problem I foresee are possible ambiguities, particularly >>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this >>>>> instance contains no Foo'), something that BioPerl also has with >>>>> various settings. >>>>> >>>>> I suggest we alias these as num_* to disambiguate that. There's >>>>> no easy way to change already in-place flag setting w/o going >>>>> through a deprecation cycle, but we can promote using positive >>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of >>>>> 'no_foo'). We can leave the older 'no_*' methods as is for the >>>>> time being and maybe deprecate them later. >>>>> >>>>> If no one has objections I'll add these in as needed. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From cjfields at illinois.edu Tue Jun 9 16:45:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 15:45:37 -0500 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: On Jun 9, 2009, at 3:19 PM, Chris Fields wrote: > On Jun 9, 2009, at 2:12 PM, Robert Buels wrote: > >> Why not just add deprecation warnings now? Or you could add >> deprecation warnings now that only print if >> $Bio::Root::Version::VERSION >= something. Best to do it while one >> is thinking about it, I always say. Cause I always forget to do it >> later. ;-) >> >> Rob > > Actually, that's one thing I want to implement within Root, namely > the ability to do this: > > $self->deprecated(-message => 'method Foo is deprecated', > -start_ver => $version1, > -throw_ver => $version2 > ); > > So it's essentially a noop and invisible up to start_ver (upon where > it warns), then throws after, well, throw_ver. I could probably > finagle that in w/o destroying things... > > chris Just to note, this is mainly to allow us devs the opportunity to add these to main trunk w/o having to worry about merges over to the 1.6 branch (where the version is different). We don't want the dep warnings showing up there right away, but maybe in a point release or minor version. chris From hlapp at gmx.net Tue Jun 9 19:09:26 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 9 Jun 2009 19:09:26 -0400 Subject: [Bioperl-l] Project Help In-Reply-To: <4A2EB8FE.4080402@open-bio.org> References: <146497.36250.qm@web8407.mail.in.yahoo.com> <4A2EB8FE.4080402@open-bio.org> Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net> Hi Chirag, check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted Projects') at http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 -hilmar On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote: > Hi Chirag, > > The OBF applied for the GSoC 2009 but unfortunately we were not > accepted. However, other organizations/projects made their way into > it and have been kind enough to adopt some of the ideas originally > proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl > mailing list so the people involved with those projects can give you > more details. > > Regards, > Mauricio. > > > chirag matkar wrote: >> Hello, >> THis is Chirag Matkar wanting to know whether there were any GSOC >> 2009 projects underway in open Bioinformatics Foundation. >> Also as i am myself a perl developer can i can some stipend or >> internship for building perl modules?. >> Thanking You, >> Regards Chirag. >> Explore and discover exciting holidays and getaways with >> Yahoo! India Travel http://in.travel.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Jun 9 21:13:36 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 09 Jun 2009 18:13:36 -0700 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: <4A2F08C0.3010609@cornell.edu> Chris Fields wrote: >> Actually, that's one thing I want to implement within Root, namely the >> ability to do this: >> >> $self->deprecated(-message => 'method Foo is deprecated', >> -start_ver => $version1, >> -throw_ver => $version2 >> ); Here's a patch with tests against the svn trunk head. Is this what you had in mind? -- Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: deprecated.patch Type: text/x-diff Size: 5601 bytes Desc: not available URL: From cjfields at illinois.edu Tue Jun 9 22:54:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 21:54:47 -0500 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2F08C0.3010609@cornell.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> <4A2F08C0.3010609@cornell.edu> Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu> On Jun 9, 2009, at 8:13 PM, Robert Buels wrote: > Chris Fields wrote: >>> Actually, that's one thing I want to implement within Root, namely >>> the ability to do this: >>> >>> $self->deprecated(-message => 'method Foo is deprecated', >>> -start_ver => $version1, >>> -throw_ver => $version2 >>> ); > > Here's a patch with tests against the svn trunk head. Is this what > you had in mind? > > -- > Rob Funny, I had written up almost exactly the same code, just a little rearranged. I've modified mine to follow your use of -warn_version (I also had -throw_version as a synonym of -version, JIC). Also, for the tests I created a temp class in the tests and ran tests off that. Thanks for the patch! chris From maj at fortinbras.us Wed Jun 10 00:10:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 10 Jun 2009 00:10:12 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Hi All, I've built a public Amazon machine image, loaded with many many goodies, including the most recent (r15747) trunks of - bioperl-live - bioperl-run - bioperl-db/biosql The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, emboss, and more are all there (and most even pass bioperl-run tests), and perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo (r1071) and others. This is *not* a lean mean fighting machine. Please give it a try if you're so inclined. Fuller details (including image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max. Ping me if it doesn't work. Cheers, Mark From cjfields at illinois.edu Wed Jun 10 00:36:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 23:36:40 -0500 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> I'll be trying that out, particularly re: bioperl-run. For bioperl-db do you have mysql or pg? Heh, I see Moose is installed. Just need svn'd parrot and git updated rakudo and we could do some damage... chris On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote: > Hi All, > > I've built a public Amazon machine image, loaded with many many > goodies, including the most recent (r15747) trunks of > - bioperl-live > - bioperl-run > - bioperl-db/biosql > The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit > by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, > emboss, and more are all there (and most even pass bioperl-run > tests), and > perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo > (r1071) and others. This is *not* a lean mean fighting machine. > > Please give it a try if you're so inclined. Fuller details (including > image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max > . > > Ping me if it doesn't work. > > Cheers, > Mark > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jun 10 00:39:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 10 Jun 2009 00:39:36 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife> ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Wednesday, June 10, 2009 12:36 AM Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI > I'll be trying that out, particularly re: bioperl-run. For bioperl-db > do you have mysql or pg? -both (I'm all about options...) > > Heh, I see Moose is installed. Just need svn'd parrot and git updated > rakudo and we could do some damage... > bioperl-max-0.1.1, here we come... > chris > cheers MAJ > On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote: > >> Hi All, >> >> I've built a public Amazon machine image, loaded with many many >> goodies, including the most recent (r15747) trunks of >> - bioperl-live >> - bioperl-run >> - bioperl-db/biosql >> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit >> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, >> emboss, and more are all there (and most even pass bioperl-run >> tests), and >> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo >> (r1071) and others. This is *not* a lean mean fighting machine. >> >> Please give it a try if you're so inclined. Fuller details (including >> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max >> . >> >> Ping me if it doesn't work. >> >> Cheers, >> Mark >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bernd.jagla at pasteur.fr Wed Jun 10 03:43:47 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Wed, 10 Jun 2009 09:43:47 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina> <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina> I wrote a small test program to test the environment variables and I have them: 'SSH_CLIENT' => '157. 'FTP_PROXY' => 'http:// 'HTTP_PROXY' => 'http://cache.past 'SSH_TTY' => '/dev/ttys002', 'ftp_proxy' => 'http:// 'http_proxy' => 'http:// Using the "-proxy" works, without it doesn't. (and yes, I export the variables..) Thanks for any suggestions. Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown Sent: Tuesday, June 09, 2009 5:26 PM Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.jagla at pasteur.fr Wed Jun 10 04:16:08 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Wed, 10 Jun 2009 10:16:08 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina> <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Message-ID: To whom it may concern: I added $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'}; Around line 72 before: $self->proxy($proxy) if $proxy; In Das.pm. This did the trick. For completeness I also edited Fetch.pm: Around line 134: $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'}; Before: my $dest = $proxy || $request->url; Best, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown Sent: Tuesday, June 09, 2009 5:26 PM Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ron at ron.dk Wed Jun 10 03:35:09 2009 From: ron at ron.dk (Rasmus Ory Nielsen) Date: Wed, 10 Jun 2009 09:35:09 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase file. Message-ID: <4A2F622D.5060500@ron.dk> Hi, This is my first time using bioperl for restriction analysis, so please bear with me, if this is a FAQ. I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the script shown at the bottom of the mail. My bioperl version is bioperl-live nightly from 09-Jun-2009. The scripts throws an exception - see below. But, if I comment out the '-enzymes' argument, so it uses the built-in collection of enzymes, it works. My problem is, that I need to use some of the enzymes that are only available in rebase. So how do I get this working? Thanks for your attention. Best regards, Rasmus Ory Nielsen ############################################################ Output from the script: ############################################################ [roni at ksdhcp ~]$ ./restriction_test.pl --------------------- WARNING --------------------- MSG: The enzyme name CviKI-1 was changed to CviKI-I --------------------------------------------------- ------------- EXCEPTION ------------- MSG: Bad end parameter (11). End must be less than the total length of sequence (total=7) STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 STACK toplevel ./restriction_test.pl:30 ------------------------------------- [roni at ksdhcp ~]$ ############################################################ Output from the script with the '-enzymes' argument commented out ############################################################ [roni at ksdhcp ~]$ ./restriction_test.pl --------------------- WARNING --------------------- MSG: The enzyme name CviKI-1 was changed to CviKI-I --------------------------------------------------- $VAR1 = [ { 'seq' => 'CTCGACCGTTAGCAA', 'end' => 15, 'start' => '1' }, { 'seq' => 'AGCTTTCTACCGTTATCGT', 'end' => 34, 'start' => '16' } ]; [roni at ksdhcp ~]$ ############################################################ #!/usr/bin/perl use strict; use warnings; use Bio::PrimarySeq; use Bio::Restriction::IO; use Bio::Restriction::Analysis; use Data::Dumper; # create seq obj my $seqobj = new Bio::PrimarySeq( -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', -primary_id => 'test', -molecule => 'dna' ); # read rebase file my $rebase_io = Bio::Restriction::IO->new( -file => 'withrefm.906', -format => 'withrefm', ); my $rebase_collection = $rebase_io->read; # start restriction analysis my $restriction_analysis = Bio::Restriction::Analysis->new( -seq => $seqobj, -enzymes => $rebase_collection, # it works with this line commented out ); # retrieve fragment maps my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); print Dumper \@fragment_maps; From awitney at sgul.ac.uk Wed Jun 10 07:19:55 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 12:19:55 +0100 Subject: [Bioperl-l] EUtilities Cookbook example fails Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> Hi, I am going through the EUtilities Cookbook, but the last example (in section 2.3.1) fails with: Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. This is with BioPerl 1.6.0, perl v5.8.8 thanks for any help adam From hlapp at gmx.net Wed Jun 10 08:08:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 10 Jun 2009 08:08:54 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net> Very cool! -hilmar On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote: > Hi All, > > I've built a public Amazon machine image, loaded with many many > goodies, including the most recent (r15747) trunks of > - bioperl-live > - bioperl-run > - bioperl-db/biosql > The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit > by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, > emboss, and more are all there (and most even pass bioperl-run > tests), and > perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo > (r1071) and others. This is *not* a lean mean fighting machine. > > Please give it a try if you're so inclined. Fuller details (including > image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max > . > > Ping me if it doesn't work. > > Cheers, > Mark > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Jun 10 08:28:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 07:28:44 -0500 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> I can reproduce that; I'll look into it. chris On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: > Hi, > > I am going through the EUtilities Cookbook, but the last example (in > section 2.3.1) fails with: > > Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ > site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. > > This is with BioPerl 1.6.0, perl v5.8.8 > > thanks for any help > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 09:20:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 08:20:43 -0500 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> EntrezGene doesn't contain the sequence information; I believe it just links to the sequence in a specified nuc record with given coordinates. You can get to it, but it takes a little trickery; in essence you need to use the UID to get the gene summary information, extract that, then grab the sequence record using seqstart, seqend, and seqstrand. A dump of esummary info for UID 18131, for instance, (using $eutil- >print_all) gives this info (abbreviated somewhat): UID :18131 Name :Notch3 Description :Notch gene homolog 3 (Drosophila) Orgname :Mus musculus ... GenomicInfo GenomicInfoType ChrLoc :17 ChrAccVer :NC_000083.5 ChrStart :32303796 ChrStop :32257837 GeneWeight :23049 The genomic info section gives the accession.version, start, end, and (implicitly) the strand (ChrStop is less that ChrStart). I have added an example to the cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F chris On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: > Hi, > > I have been experimenting with the Bio::DB::EUtilities module, with > help from the Cookbook. But I can't seem to figure out how to get > the DNA sequence of a gene; all the examples seem to be fetching > protein sequence. > > How would i go about fetching a sequence using an Entrez GeneID? > > thanks for any help > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 09:33:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 08:33:51 -0500 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu> Adam, Okay, fixed that and the previous issue with 'use an undefined value as an ARRAY reference'. The previous issue appears to be due to a change in the XML output from NCBI (it used to give the IDs at one point). Also made the wiki changes for this; didn't take long to find everything. Thanks for pointing that out! If you find any more issues feel free to make the necessary changes on the wiki or point them out if they're in code. chris On Jun 10, 2009, at 8:12 AM, Adam Witney wrote: > Hi Chris, > > not sure if I should start a new thread for this, but it is related > to the EUtilities Cookbook and LinkSet.pm. > > There are several references in the Cookbook to the method > "get_linkname", however this seems to have changed in the recent > version of LinkSet.pm to "get_link_name". But one reference to the > old method name still exists in LinkSet.pm, as shown by this patch: > > --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ > LinkSet.pm 2009-02-20 12:36:37.000000000 +0000 > +++ /Users/adam/Desktop/LinkSet.pm 2009-06-10 13:58:49.000000000 +0100 > @@ -220,7 +220,7 @@ > =cut > > sub get_link_name { > - return ($_[0]->get_linknames)[0]; > + return ($_[0]->get_link_names)[0]; > } > > =head2 get_submitted_ids > > If i haven't got this all wrong entirely, I could go through and fix > the Cookbook entries if that was useful? > > adam > > > On 10 Jun 2009, at 13:28, Chris Fields wrote: > >> I can reproduce that; I'll look into it. >> >> chris >> >> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I am going through the EUtilities Cookbook, but the last example >>> (in section 2.3.1) fails with: >>> >>> Can't use an undefined value as an ARRAY reference at /usr/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. >>> >>> This is with BioPerl 1.6.0, perl v5.8.8 >>> >>> thanks for any help >>> >>> adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From awitney at sgul.ac.uk Wed Jun 10 09:12:05 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 14:12:05 +0100 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> Hi Chris, not sure if I should start a new thread for this, but it is related to the EUtilities Cookbook and LinkSet.pm. There are several references in the Cookbook to the method "get_linkname", however this seems to have changed in the recent version of LinkSet.pm to "get_link_name". But one reference to the old method name still exists in LinkSet.pm, as shown by this patch: --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ LinkSet.pm 2009-02-20 12:36:37.000000000 +0000 +++ /Users/adam/Desktop/LinkSet.pm 2009-06-10 13:58:49.000000000 +0100 @@ -220,7 +220,7 @@ =cut sub get_link_name { - return ($_[0]->get_linknames)[0]; + return ($_[0]->get_link_names)[0]; } =head2 get_submitted_ids If i haven't got this all wrong entirely, I could go through and fix the Cookbook entries if that was useful? adam On 10 Jun 2009, at 13:28, Chris Fields wrote: > I can reproduce that; I'll look into it. > > chris > > On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: > >> Hi, >> >> I am going through the EUtilities Cookbook, but the last example >> (in section 2.3.1) fails with: >> >> Can't use an undefined value as an ARRAY reference at /usr/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. >> >> This is with BioPerl 1.6.0, perl v5.8.8 >> >> thanks for any help >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Wed Jun 10 10:10:21 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 15:10:21 +0100 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> Message-ID: Thanks for the pointers Chris. The new example on the Cookbook doesn't quite work for me as ChrStart seems to appear in the DocSum twice, thus get_contents_by_name('ChrStart') returns a list of two values (which writes the second ChrStart into $end). Also the $start and $end seem to be out by 1, so I needed to change it to this: my ($acc) = ($docsum->get_contents_by_name('ChrAccVer')); my ($start) = ($docsum->get_contents_by_name('ChrStart')); my ($end) = ($docsum->get_contents_by_name('ChrStop')); $start += 1; $end += 1; Ah, looking at this further there appears to be something going on in the response from Entrez. Compare these two gene records: http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? db=gene&id=18131 (your example below) http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 (my gene) In both cases you can see that ChrStart appears twice, once as part of the GenomicInfo list and once on its own at the bottom. In my example above the two ChrStart values match, but in the Notch3 example you posted the 2nd ChrStart seems to be the same as the ChrStop in the GenomicInfo list. Do you know if the second ChrStart has a separate meaning? I guess in the Cookbook example we would need to make sure that the get_contents_by_name('ChrStart') picks up the value from the GenomicInfo list, is this possible? thanks again adam On 10 Jun 2009, at 14:20, Chris Fields wrote: > EntrezGene doesn't contain the sequence information; I believe it > just links to the sequence in a specified nuc record with given > coordinates. You can get to it, but it takes a little trickery; in > essence you need to use the UID to get the gene summary information, > extract that, then grab the sequence record using seqstart, seqend, > and seqstrand. > > A dump of esummary info for UID 18131, for instance, (using $eutil- > >print_all) gives this info (abbreviated somewhat): > > UID :18131 > Name :Notch3 > Description :Notch gene homolog 3 (Drosophila) > Orgname :Mus musculus > ... > GenomicInfo > GenomicInfoType > ChrLoc :17 > ChrAccVer :NC_000083.5 > ChrStart :32303796 > ChrStop :32257837 > GeneWeight :23049 > > The genomic info section gives the accession.version, start, end, > and (implicitly) the strand (ChrStop is less that ChrStart). I have > added an example to the cookbook: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F > > chris > > On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: > >> Hi, >> >> I have been experimenting with the Bio::DB::EUtilities module, with >> help from the Cookbook. But I can't seem to figure out how to get >> the DNA sequence of a gene; all the examples seem to be fetching >> protein sequence. >> >> How would i go about fetching a sequence using an Entrez GeneID? >> >> thanks for any help >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 13:56:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 12:56:46 -0500 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> Message-ID: Adam, That's really odd that they do that (both the duplication of ChrStart and the coordinates being off-by-one, which means they appear to be 0- based). It's possible that the second ChrStart is meant to represent the actual first base for the gene irrespective of start/end. My example is on the opposite strand, so the second ChrStart == end. The fact that they use the same element name is slightly annoying (and seemingly redundant), but there is a workaround. We grab only the layered information specifically; in this case we want everything below 'GenomicInfoType': GenomicInfo GenomicInfoType ChrLoc :17 ChrAccVer :NC_000083.5 ChrStart :32303796 ChrStop :32257837 So, we can do this in the DocSum loop (that appears to work for your example): ############################ for my $docsum ($eutil->next_DocSum) { # to ensure we grab the right ChrStart information, we grab the Item above # it in the Item hierarchy (visible via print_all from the eutil instance) my ($item) = $docsum->get_Items_by_name('GenomicInfoType'); my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop); while (my $sub_item = $item->next_subItem) { if (exists $item_data{$sub_item->get_name}) { $item_data{$sub_item->get_name} = $sub_item->get_content; } } # check to make sure everything is set for my $check (qw(ChrAccVer ChrStart ChrStop)) { die "$check not set" unless $item_data{$check}; } my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1; $fetcher->set_parameters(-id => $item_data{ChrAccVer}, -seq_start => $item_data{ChrStart} + 1, -seq_stop => $item_data{ChrStop} + 1, -strand => $strand); print $fetcher->get_Response->content; } ############################ That's to retain compatibility with 1.6; I'll update the wiki. I can add some common Item container methods to grab information for any Items contained in the current instance (be it a DocSum or another Item). I'll add that in bioperl-live. chris On Jun 10, 2009, at 9:10 AM, Adam Witney wrote: > Thanks for the pointers Chris. > > The new example on the Cookbook doesn't quite work for me as > ChrStart seems to appear in the DocSum twice, thus > get_contents_by_name('ChrStart') returns a list of two values (which > writes the second ChrStart into $end). Also the $start and $end seem > to be out by 1, so I needed to change it to this: > > my ($acc) = ($docsum->get_contents_by_name('ChrAccVer')); > my ($start) = ($docsum->get_contents_by_name('ChrStart')); > my ($end) = ($docsum->get_contents_by_name('ChrStop')); > > $start += 1; > $end += 1; > > Ah, looking at this further there appears to be something going on > in the response from Entrez. Compare these two gene records: > > http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 > (your example below) > http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 > (my gene) > > In both cases you can see that ChrStart appears twice, once as part > of the GenomicInfo list and once on its own at the bottom. In my > example above the two ChrStart values match, but in the Notch3 > example you posted the 2nd ChrStart seems to be the same as the > ChrStop in the GenomicInfo list. Do you know if the second ChrStart > has a separate meaning? > > I guess in the Cookbook example we would need to make sure that the > get_contents_by_name('ChrStart') picks up the value from the > GenomicInfo list, is this possible? > > thanks again > > adam > > > On 10 Jun 2009, at 14:20, Chris Fields wrote: > >> EntrezGene doesn't contain the sequence information; I believe it >> just links to the sequence in a specified nuc record with given >> coordinates. You can get to it, but it takes a little trickery; in >> essence you need to use the UID to get the gene summary >> information, extract that, then grab the sequence record using >> seqstart, seqend, and seqstrand. >> >> A dump of esummary info for UID 18131, for instance, (using $eutil- >> >print_all) gives this info (abbreviated somewhat): >> >> UID :18131 >> Name :Notch3 >> Description :Notch gene homolog 3 (Drosophila) >> Orgname :Mus musculus >> ... >> GenomicInfo >> GenomicInfoType >> ChrLoc :17 >> ChrAccVer :NC_000083.5 >> ChrStart :32303796 >> ChrStop :32257837 >> GeneWeight :23049 >> >> The genomic info section gives the accession.version, start, end, >> and (implicitly) the strand (ChrStop is less that ChrStart). I have >> added an example to the cookbook: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F >> >> chris >> >> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I have been experimenting with the Bio::DB::EUtilities module, >>> with help from the Cookbook. But I can't seem to figure out how to >>> get the DNA sequence of a gene; all the examples seem to be >>> fetching protein sequence. >>> >>> How would i go about fetching a sequence using an Entrez GeneID? >>> >>> thanks for any help >>> >>> adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jun 11 07:36:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 07:36:40 -0400 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife> Hi Chris and list- Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? I notice also that autogenerated documentation for bioperl-live doesn't contain new modules (or HIVQuery & Tiling, anyway ;) )-- cheers, Mark From maj at fortinbras.us Thu Jun 11 09:17:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 09:17:25 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <4A2F622D.5060500@ron.dk> References: <4A2F622D.5060500@ron.dk> Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife> Rasmus et al- This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles through all enzymes apparently creating a global cut map). AarI has a recognition sequence of CACCTGC (in $enz->seq->seq) but a cut site of CACCTGCNNNN^ (in $enz->seq->site) The bad parm '11' refers to the end of the cut site sequence, but the routine B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence, and so throws. This surprises me. Core, let me know if you want me to take this on, or if the module author can fix it quicker. cheers, Mark ----- Original Message ----- From: "Rasmus Ory Nielsen" To: Sent: Wednesday, June 10, 2009 3:35 AM Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Hi, > > This is my first time using bioperl for restriction analysis, so please bear > with me, if this is a FAQ. > > I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the > script shown at the bottom of the mail. > My bioperl version is bioperl-live nightly from 09-Jun-2009. > > The scripts throws an exception - see below. But, if I comment out the > '-enzymes' argument, so it uses the built-in collection of enzymes, it works. > > My problem is, that I need to use some of the enzymes that are only available > in rebase. So how do I get this working? > > Thanks for your attention. > > Best regards, > Rasmus Ory Nielsen > > > ############################################################ > Output from the script: > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: Bad end parameter (11). End must be less than the total length of > sequence (total=7) > STACK Bio::PrimarySeq::subseq > /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > STACK Bio::Restriction::Analysis::_enzyme_sites > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > STACK Bio::Restriction::Analysis::_cuts > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > STACK Bio::Restriction::Analysis::cut > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > STACK Bio::Restriction::Analysis::fragment_maps > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > STACK toplevel ./restriction_test.pl:30 > ------------------------------------- > > [roni at ksdhcp ~]$ > > > ############################################################ > Output from the script with the '-enzymes' argument commented out > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > $VAR1 = [ > { > 'seq' => 'CTCGACCGTTAGCAA', > 'end' => 15, > 'start' => '1' > }, > { > 'seq' => 'AGCTTTCTACCGTTATCGT', > 'end' => 34, > 'start' => '16' > } > ]; > [roni at ksdhcp ~]$ > > ############################################################ > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::PrimarySeq; > use Bio::Restriction::IO; > use Bio::Restriction::Analysis; > use Data::Dumper; > > # create seq obj > my $seqobj = new Bio::PrimarySeq( > -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > -primary_id => 'test', > -molecule => 'dna' > ); > > # read rebase file > my $rebase_io = Bio::Restriction::IO->new( > -file => 'withrefm.906', > -format => 'withrefm', > ); > my $rebase_collection = $rebase_io->read; > > # start restriction analysis > my $restriction_analysis = Bio::Restriction::Analysis->new( > -seq => $seqobj, > -enzymes => $rebase_collection, # it works with this line commented out > ); > > # retrieve fragment maps > my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > print Dumper \@fragment_maps; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jun 11 10:19:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Jun 2009 09:19:51 -0500 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife> References: <4A2F622D.5060500@ron.dk> <2F52B1CED1374763822BF3AD1D283B3B@NewLife> Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Mark, Feel free to take it up. It's probably a good idea to start a bug report for tracking if it proves to be thornier to fix than expected. chris On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > Rasmus et al- > > This looks like a bug. A quick debug shows it's barfing on > 'AarI' (as it cycles through > all enzymes apparently creating a global cut map). AarI has a > recognition sequence of > > CACCTGC (in $enz->seq->seq) > > but a cut site of > > CACCTGCNNNN^ (in $enz->seq->site) > > The bad parm '11' refers to the end of the cut site sequence, but > the routine > B:R:Analysis::_cuts is attempting to split the 7-symbol recognition > sequence, > and so throws. > > This surprises me. Core, let me know if you want me to take this on, > or > if the module author can fix it quicker. > > cheers, > Mark > > ----- Original Message ----- From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when > using rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so >> please bear with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >> created the script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out >> the '-enzymes' argument, so it uses the built-in collection of >> enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only >> available in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length >> of sequence (total=7) >> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >> Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line >> commented out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jun 11 10:26:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 10:26:19 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> References: <4A2F622D.5060500@ron.dk> <2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: All-righty-- thanks MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Rasmus Ory Nielsen" ; Sent: Thursday, June 11, 2009 10:19 AM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Mark, > > Feel free to take it up. It's probably a good idea to start a bug report for > tracking if it proves to be thornier to fix than expected. > > chris > > On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> Rasmus et al- >> >> This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it >> cycles through >> all enzymes apparently creating a global cut map). AarI has a recognition >> sequence of >> >> CACCTGC (in $enz->seq->seq) >> >> but a cut site of >> >> CACCTGCNNNN^ (in $enz->seq->site) >> >> The bad parm '11' refers to the end of the cut site sequence, but the >> routine >> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition >> sequence, >> and so throws. >> >> This surprises me. Core, let me know if you want me to take this on, or >> if the module author can fix it quicker. >> >> cheers, >> Mark >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >> rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so please >>> bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>> the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out the >>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>> works. >>> >>> My problem is, that I need to use some of the enzymes that are only >>> available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total length of >>> sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >>> Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line commented >>> out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From mauricio at open-bio.org Thu Jun 11 12:46:35 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 11 Jun 2009 11:46:35 -0500 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> Message-ID: <4A3134EB.4080702@open-bio.org> Hi Mark, I'll take a look into this sometime between today and tomorrow. Will keep you posted. Thanks for the heads up :) Mauricio. Mark A. Jensen wrote: > Hi Chris and list- > Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? > I notice also that autogenerated documentation for bioperl-live doesn't contain > new modules (or HIVQuery & Tiling, anyway ;) )-- > cheers, Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Jun 11 14:41:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 14:41:26 -0400 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <4A3134EB.4080702@open-bio.org> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> <4A3134EB.4080702@open-bio.org> Message-ID: cheers Mauricio! MAJ ----- Original Message ----- From: "Mauricio Herrera Cuadra" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Thursday, June 11, 2009 12:46 PM Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? > Hi Mark, > > I'll take a look into this sometime between today and tomorrow. Will keep you > posted. Thanks for the heads up :) > > Mauricio. > > > Mark A. Jensen wrote: >> Hi Chris and list- >> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? >> I notice also that autogenerated documentation for bioperl-live doesn't >> contain >> new modules (or HIVQuery & Tiling, anyway ;) )-- >> cheers, Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From Xianjun.Dong at bccs.uib.no Fri Jun 12 16:38:50 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Fri, 12 Jun 2009 22:38:50 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph Message-ID: <4A32BCDA.4080605@ii.uib.no> HI, I am not sure this is the right place I can get help. I've suffered by a problem for several days: I want to highlight parts of regions in my track, using a different background color. To do that, I defined a glyph named "background", based on the 'Bio::Graphics::Glyph::generic' module. I override the draw_component() method, by adding code like below: $gd->filledRectangle($left,0,$right,$gd->height, $self->factory->translate_color($color)); # the script is pasted at the end This will draw a rectangle with top=0, bottom=$gd->height. I made the highlight regions into a list of features, and add_track with -glyph=>'background'. (see the following script, test.pl) This really works as I expect, which will add a colored block at background of all tracks in a panel (including the ruler arrow). You can see the output image in attached file "test.bioperl1.2.3.png" Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not work. Well, it works, but the highlight part only shrink to a low height, instead of covering all tracks in the panel. I also attached the output here, see the file "test.bioperl1.6.png". I tried to think about the reason, the 'background' module is based on the generic module. What can cause the difference? Is it because $gd->height is different, or the tracks followed with 'background' track can not draw from the first position? Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person solve problem, wise person avoid problem"...) But another problem is coming: Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() function, which means I have to use some higher version if I want to create web map for my graphics, but then I have to give up using highlight background. OK. It's long enough for my first-time submission here. Hope someone can throw me some clue. Thanks ahead!! Xianjun ==================== test.pl ======================= #!/usr/bin/perl use strict; use lib "$ENV{HOME}/lib"; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans5 = $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); my $trans = $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); # hightlight my $trans31 = $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', -source=>'a'); my $trans41 = $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', -source=>'b'); my $panel= Bio::Graphics::Panel->new(-width=>1200, -length=>1050, -start =>0, -pad_left=>12, -pad_right=>12); # the following track works as I expected in bioperl 1.2.3, but not in 1.5 and 1.6 $panel->add_track([$trans41,$trans31], -glyph => 'background', -block_bgcolor => sub{return (shift->source eq 'a')?'#cccccc':'#fffc22'}, ); $panel->add_track($ftr->new(-start=>100,-end=>1000), -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($trans, -glyph => 'transcript2', # 'transcript2', #process_5utr', -fgcolor => 'darkred', -bgcolor => 'darkred', -title => '$source', -link => 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL ); print $panel->png; # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl 1.2.3 my $map = $panel->create_web_map("image"); $panel->finished(); 1; ==================== background.pm ======================= package Bio::Graphics::Glyph::background; use strict; use base 'Bio::Graphics::Glyph::generic'; sub pad_top{ return 0; } sub draw_component { my $self = shift; #$self->SUPER::draw_component(@_); my ($gd,$dx,$dy) = @_; my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); # draw an arrow to indicate the direction of transcript my $color = $self->option('block_bgcolor') || '#cccccc'; $gd->filledRectangle($left,0,$right,$gd->height, $self->factory->translate_color($color)); } 1; -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.2.3.png Type: image/png Size: 2789 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.6.png Type: image/png Size: 2365 bytes Desc: not available URL: From scott at scottcain.net Fri Jun 12 21:29:09 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 12 Jun 2009 21:29:09 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A32BCDA.4080605@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> Hello Xianjun, I don't think that approach will work. What you almost certainly need to do is a postgrid callback that does the drawing of the highlighted region. For example code of how to do this, take a look at the make_postgrid_callback subroutine in GBrowse 1.69. The option -postgrid is a method of Bio::Graphics::Panel. Scott On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > HI, > > I am not sure this is the right place I can get help. > > I've suffered by a problem for several days: I want to highlight parts of > regions in my track, using a different background color. To do that, I > defined a glyph named "background", based on the > 'Bio::Graphics::Glyph::generic' module. I override the draw_component() > method, by adding code like below: > > $gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > > # the script is pasted at the end > > This will draw a rectangle with top=0, bottom=$gd->height. I made the > highlight regions into a list of features, and add_track with > -glyph=>'background'. (see the following script, test.pl) This really works > as I expect, which will add a colored block at background of all tracks in a > panel (including the ruler arrow). You can see the output image in attached > file "test.bioperl1.2.3.png" > > Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not > work. Well, it works, but the highlight part only shrink to a low height, > instead of covering all tracks in the panel. I also attached the output > here, see the file "test.bioperl1.6.png". > > I tried to think about the reason, the 'background' module is based on the > generic module. What can cause the difference? Is it because $gd->height is > different, or the tracks followed with 'background' track can not draw from > the first position? > > Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person > solve problem, wise person avoid problem"...) But another problem is coming: > Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() > function, which means I have to use some higher version if I want to create > web map for my graphics, but then I have to give up using highlight > background. > > OK. It's long enough for my first-time submission here. Hope someone can > throw me some clue. > > Thanks ahead!! > > Xianjun > > > ==================== test.pl ======================= > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans ?= > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12); > > # the following track works as I expected in bioperl 1.2.3, but not in 1.5 > and 1.6 > $panel->add_track([$trans41,$trans31], > ? ? ? ? -glyph ? => 'background', > ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > ? ? ? ? ? ? ? ? ); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > ? ? ? ? ? ? ? ? -glyph=>'arrow', > ? ? ? ? ? ? ? ? -double=>1, > ? ? ? ? ? ? ? ? -tick=>2); > > $panel->add_track($trans, > ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', > ? ? ? ? ? ? ? ? -fgcolor => 'darkred', > ? ? ? ? ? ? ? ? -bgcolor => 'darkred', > ? ? ? ? ? ? ? ? -title => '$source', > ? ? ? ? ? ? ? ? -link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL > ? ? ? ? ? ? ? ? ); > ?print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl > 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > 1; > > ==================== background.pm ======================= > package Bio::Graphics::Glyph::background; > > use strict; > use base 'Bio::Graphics::Glyph::generic'; > sub pad_top{ > ?return 0; > } > > sub draw_component { > ?my $self = shift; > ?#$self->SUPER::draw_component(@_); > ?my ($gd,$dx,$dy) = @_; > ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); > > ?# draw an arrow to indicate the direction of transcript > ?my $color = $self->option('block_bgcolor') || '#cccccc'; > ?$gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > } > > 1; > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Jun 13 09:27:39 2009 From: scott at scottcain.net (Scott Cain) Date: Sat, 13 Jun 2009 09:27:39 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A339621.2060702@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> Hi Xianjun, I understand what you want to do, as the current version of gbrowse does this, which uses bioperl 1.6. Without digging through the code, I can't tell you exactly how this works and you didn't send your code that uses this callback, so I can't try it either. One thing that is different between your code and gbrowse is that each of the tracks is actually a seperate panel (to allow track dragging), so it possible that this sort of callback doesn't work for Bio::Graphics any more. Scott On Saturday, June 13, 2009, Xianjun Dong wrote: > Hi, Scott > > Thanks for your reply first. > > I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: > > $gd->filledRectangle($left+$start,0,$left+$end,$bottom, > ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); > > where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 > > OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") > > OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. > > [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: > test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png > test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ] > > You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? > > Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) > > Thanks > > Xianjun > ============================================= > > # this generates the callback for highlighting a region > sub make_postgrid_callback { > ?my $settings = shift; > ?return unless ref $settings->{h_region}; > > ?my @h_regions = map { > ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; > ? defined($h_ref) && $h_ref eq $settings->{ref} > ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] > ? ? ? ? ? ? ? ?: () > ?} > ? @{$settings->{h_region}}; > > ?return unless @h_regions; > ?return hilite_regions_closure(@h_regions); > } > > # this subroutine generates a Bio::Graphics::Panel callback closure > # suitable for hilighting a region of a panel. > # The args are a list of [start,end,color] > sub hilite_regions_closure { > ?my @h_regions = @_; > > ?return sub { > ? my $gd ? ? = shift; > ? my $panel ?= shift; > ? my $left ? = $panel->pad_left; > ? my $top ? ?= $panel->top; > ? my $bottom = $panel->bottom; > ? for my $r (@h_regions) { > ? ? my ($h_start,$h_end,$h_color) = @$r; > ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); > ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something > ? ? # assuming top is 0 so as to ignore top padding > ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, > ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); > ? } > ?}; > } > > > Scott Cain wrote: > > Hello Xianjun, > > I don't think that approach will work. ?What you almost certainly need > to do is a postgrid callback that does the drawing of the highlighted > region. ?For example code of how to do this, take a look at the > make_postgrid_callback subroutine in GBrowse 1.69. ?The option > -postgrid is a method of Bio::Graphics::Panel. > > Scott > > > > > On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > > > HI, > > I am not sure this is the right place I can get help. > > I've suffered by a problem for several days: I want to highlight parts of > regions in my track, using a different background color. To do that, I > defined a glyph named "background", based on the > 'Bio::Graphics::Glyph::generic' module. I override the draw_component() > method, by adding code like below: > > $gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > > # the script is pasted at the end > > This will draw a rectangle with top=0, bottom=$gd->height. I made the > highlight regions into a list of features, and add_track with > -glyph=>'background'. (see the following script, test.pl) This really works > as I expect, which will add a colored block at background of all tracks in a > panel (including the ruler arrow). You can see the output image in attached > file "test.bioperl1.2.3.png" > > Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not > work. Well, it works, but the highlight part only shrink to a low height, > instead of covering all tracks in the panel. I also attached the output > here, see the file "test.bioperl1.6.png". > > I tried to think about the reason, the 'background' module is based on the > generic module. What can cause the difference? Is it because $gd->height is > different, or the tracks followed with 'background' track can not draw from > the first position? > > Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person > solve problem, wise person avoid problem"...) But another problem is coming: > Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() > function, which means I have to use some higher version if I want to create > web map for my graphics, but then I have to give up using highlight > background. > > OK. It's long enough for my first-time submission here. Hope someone can > throw me some clue. > > Thanks ahead!! > > Xianjun > > > ==================== test.pl ======================= > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans ?= > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); > > # the following track works as I expected in bioperl 1.2.3, but not in 1.5 > and 1.6 > $panel->add_track([$trans41,$trans31], > ? ? ? ?-glyph ? => 'background', > ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > ? ? ? ? ? ? ? ?); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > ? ? ? ? ? ? ? ?-glyph=>'arrow', > ? ? ? ? ? ? ? ?-double=>1, > ? ? ? ? ? ? ? ?-tick=>2); > > $panel->add_track($trans, > ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', > ? ? ? ? ? ? ? ?-fgcolor => 'darkred', > ? ? ? ? ? ? ? ?-bgcolor => 'darkred', > ? ? ? ? ? ? ? ?-title => '$source', > ? ? ? ? ? ? ? ?-link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL > ? ? ? ? ? ? ? ?); > ?print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl > 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > 1; > > ==================== background.pm ======================= > package Bio::Graphics::Glyph::background; > > use strict; > use base 'Bio::Graphics::Glyph::generic'; > sub pad_top{ > ?return 0; > } > > sub draw_component { > ?my $self = shift; > ?#$self->SUPER::draw_component(@_); > ?my ($gd,$dx,$dy) = @_; > ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); > > ?# draw an arrow to indicate the direction of transcript > ?my $color = $self->option('block_bgcolor') || '#cccccc'; > ?$gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > } > > 1; > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Xianjun.Dong at bccs.uib.no Sat Jun 13 12:48:16 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Sat, 13 Jun 2009 18:48:16 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> Message-ID: <4A33D850.1020203@ii.uib.no> Hi, Scott Before I gave up my own whole solution to use GBrowse, I still want to bother you once: As you suggested, I put -postgrid option when the panel, which will call a function to draw the background. The code below is almost copied from the online POD of Bio::Graphics::Panel (see http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html ) But it still does not work. Could you help to have a look? I paste it below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment, -grid=>1, -width=>600, -postgrid=> \&draw_gap); sub gap_it { my $gd = shift; my $panel = shift; my ($gap_start,$gap_end) = $panel->location2pixel(500,600); my $top = $panel->top; my $bottom = $panel->bottom; my $gray = $panel->translate_color('gray'); $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); } THanks Xianjun ----------------------------------------------- #!/usr/bin/perl use strict; use lib "$ENV{HOME}/lib"; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans5 = $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); my $trans = $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); # hightlight my $trans31 = $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', -source=>'a'); my $trans41 = $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', -source=>'b'); my $panel= Bio::Graphics::Panel->new(-width=>1200, -length=>1050, -start =>0, -pad_left=>12, -pad_right=>12 -postgrid=>\&gap_it); sub gap_it { my $gd = shift; my $panel = shift; my ($gap_start,$gap_end) = $panel->location2pixel(500,600); my $top = $panel->top; my $bottom = $gd->height, #panel->bottom; my $gray = $panel->translate_color('red'); $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); } # the following track works as I expected in bioperl 1.2.3, but not in 1.5 and 1.6 #$panel->add_track([$trans41,$trans31], # -glyph => 'background', # -block_bgcolor => sub{return (shift->source eq 'a')?'#cccccc':'#fffc22'}, # ); $panel->add_track($ftr->new(-start=>100,-end=>1000), -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($trans, -glyph => 'transcript2', # 'transcript2', #process_5utr', -fgcolor => 'darkred', -bgcolor => 'darkred', -title => '$source', -link => 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL ); print $panel->png; # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl 1.2.3 my $map = $panel->create_web_map("image"); $panel->finished(); Scott Cain wrote: > Hi Xianjun, > > I understand what you want to do, as the current version of gbrowse > does this, which uses bioperl 1.6. Without digging through the code, > I can't tell you exactly how this works and you didn't send your code > that uses this callback, so I can't try it either. > > One thing that is different between your code and gbrowse is that each > of the tracks is actually a seperate panel (to allow track dragging), > so it possible that this sort of callback doesn't work for > Bio::Graphics any more. > > Scott > > On Saturday, June 13, 2009, Xianjun Dong wrote: > >> Hi, Scott >> >> Thanks for your reply first. >> >> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: >> >> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >> $panel->translate_color($h_color)); >> >> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >> >> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >> >> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. >> >> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: >> test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png >> test.bioperl1.2.3.png: http://translog.genereg.net/test.bioperl1.2.3.png ] >> >> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? >> >> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) >> >> Thanks >> >> Xianjun >> ============================================= >> >> # this generates the callback for highlighting a region >> sub make_postgrid_callback { >> my $settings = shift; >> return unless ref $settings->{h_region}; >> >> my @h_regions = map { >> my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >> defined($h_ref) && $h_ref eq $settings->{ref} >> ? [$h_start,$h_end,$h_color||'lightgrey'] >> : () >> } >> @{$settings->{h_region}}; >> >> return unless @h_regions; >> return hilite_regions_closure(@h_regions); >> } >> >> # this subroutine generates a Bio::Graphics::Panel callback closure >> # suitable for hilighting a region of a panel. >> # The args are a list of [start,end,color] >> sub hilite_regions_closure { >> my @h_regions = @_; >> >> return sub { >> my $gd = shift; >> my $panel = shift; >> my $left = $panel->pad_left; >> my $top = $panel->top; >> my $bottom = $panel->bottom; >> for my $r (@h_regions) { >> my ($h_start,$h_end,$h_color) = @$r; >> my ($start,$end) = $panel->location2pixel($h_start,$h_end); >> if ($end-$start <= 1) { $end++; $start-- } # so that we always see something >> # assuming top is 0 so as to ignore top padding >> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >> $panel->translate_color($h_color)); >> } >> }; >> } >> >> >> Scott Cain wrote: >> >> Hello Xianjun, >> >> I don't think that approach will work. What you almost certainly need >> to do is a postgrid callback that does the drawing of the highlighted >> region. For example code of how to do this, take a look at the >> make_postgrid_callback subroutine in GBrowse 1.69. The option >> -postgrid is a method of Bio::Graphics::Panel. >> >> Scott >> >> >> >> >> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: >> >> >> HI, >> >> I am not sure this is the right place I can get help. >> >> I've suffered by a problem for several days: I want to highlight parts of >> regions in my track, using a different background color. To do that, I >> defined a glyph named "background", based on the >> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >> method, by adding code like below: >> >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> >> # the script is pasted at the end >> >> This will draw a rectangle with top=0, bottom=$gd->height. I made the >> highlight regions into a list of features, and add_track with >> -glyph=>'background'. (see the following script, test.pl) This really works >> as I expect, which will add a colored block at background of all tracks in a >> panel (including the ruler arrow). You can see the output image in attached >> file "test.bioperl1.2.3.png" >> >> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not >> work. Well, it works, but the highlight part only shrink to a low height, >> instead of covering all tracks in the panel. I also attached the output >> here, see the file "test.bioperl1.6.png". >> >> I tried to think about the reason, the 'background' module is based on the >> generic module. What can cause the difference? Is it because $gd->height is >> different, or the tracks followed with 'background' track can not draw from >> the first position? >> >> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person >> solve problem, wise person avoid problem"...) But another problem is coming: >> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >> function, which means I have to use some higher version if I want to create >> web map for my graphics, but then I have to give up using highlight >> background. >> >> OK. It's long enough for my first-time submission here. Hope someone can >> throw me some clue. >> >> Thanks ahead!! >> >> Xianjun >> >> >> ==================== test.pl ======================= >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans = >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> -length=>1050, >> -start =>0, >> -pad_left=>12, >> -pad_right=>12); >> >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> $panel->add_track([$trans41,$trans31], >> -glyph => 'background', >> -block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> ); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> -glyph=>'arrow', >> -double=>1, >> -tick=>2); >> >> $panel->add_track($trans, >> -glyph => 'transcript2', # 'transcript2', #process_5utr', >> -fgcolor => 'darkred', >> -bgcolor => 'darkred', >> -title => '$source', >> -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL >> ); >> print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> 1; >> >> ==================== background.pm ======================= >> package Bio::Graphics::Glyph::background; >> >> use strict; >> use base 'Bio::Graphics::Glyph::generic'; >> sub pad_top{ >> return 0; >> } >> >> sub draw_component { >> my $self = shift; >> #$self->SUPER::draw_component(@_); >> my ($gd,$dx,$dy) = @_; >> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >> >> # draw an arrow to indicate the direction of transcript >> my $color = $self->option('block_bgcolor') || '#cccccc'; >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> } >> >> 1; >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> >> >> >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> > > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From maj at fortinbras.us Sun Jun 14 00:35:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 14 Jun 2009 00:35:18 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: All- I'm finding this is requiring a pretty substantial refactor and rationalization. I have opened a branch at REPOS/bioperl-live/branches/restriction-refactor and am making commits at will there (won't Rob be pleased!). When it appears to be passing tests, I'll let Chris know (on list), and he can decide on its mergability, and brave users could try it out by downloading Bio/Restriction (deeply) via subversion. My running commentary is at Bug #2855. MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Rasmus Ory Nielsen" Sent: Thursday, June 11, 2009 10:19 AM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. > Mark, > > Feel free to take it up. It's probably a good idea to start a bug report for > tracking if it proves to be thornier to fix than expected. > > chris > > On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> Rasmus et al- >> >> This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it >> cycles through >> all enzymes apparently creating a global cut map). AarI has a recognition >> sequence of >> >> CACCTGC (in $enz->seq->seq) >> >> but a cut site of >> >> CACCTGCNNNN^ (in $enz->seq->site) >> >> The bad parm '11' refers to the end of the cut site sequence, but the >> routine >> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition >> sequence, >> and so throws. >> >> This surprises me. Core, let me know if you want me to take this on, or >> if the module author can fix it quicker. >> >> cheers, >> Mark >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >> rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so please >>> bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>> the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out the >>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>> works. >>> >>> My problem is, that I need to use some of the enzymes that are only >>> available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total length of >>> sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >>> Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line commented >>> out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Sun Jun 14 21:57:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 14 Jun 2009 18:57:45 -0700 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. In-Reply-To: References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: <4A35AA99.2080305@cornell.edu> Mark A. Jensen wrote: > I'm finding this is requiring a pretty substantial refactor and > rationalization. I have opened a branch at > REPOS/bioperl-live/branches/restriction-refactor > and am making commits at will there (won't Rob be pleased!). Oh Mark, you are so agile! > When it appears to be passing tests, I'll let Chris know (on list), > and he can decide on its mergability, and brave users could try > it out by downloading Bio/Restriction (deeply) via subversion. If it's passing tests but still has bugs, make sure you add tests for the additional bugs you find! Rob From maj at fortinbras.us Sun Jun 14 22:02:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 14 Jun 2009 22:02:37 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen usingrebasefile. In-Reply-To: <4A35AA99.2080305@cornell.edu> References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> <4A35AA99.2080305@cornell.edu> Message-ID: ----- Original Message ----- From: "Robert Buels" To: "BioPerl List" Sent: Sunday, June 14, 2009 9:57 PM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen usingrebasefile. > Mark A. Jensen wrote: >> I'm finding this is requiring a pretty substantial refactor and >> rationalization. I have opened a branch at >> REPOS/bioperl-live/branches/restriction-refactor >> and am making commits at will there (won't Rob be pleased!). > Oh Mark, you are so agile! ha! > >> When it appears to be passing tests, I'll let Chris know (on list), >> and he can decide on its mergability, and brave users could try >> it out by downloading Bio/Restriction (deeply) via subversion. > If it's passing tests but still has bugs, make sure you add tests for the > additional bugs you find! mais, bien sur; plenty new tests coming-- thanks Rob- MAJ > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Mon Jun 15 16:06:31 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 15 Jun 2009 16:06:31 -0400 Subject: [Bioperl-l] sub sampling Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> Hi All, I was just wondering that is there any module is bioperl that do subsampling? I have a file like this: 369859 0477 93 163417 1348 92 228122 0176 88 232792 0050 93 239636 1850 95 300069 0048 96 244108 0046 91 199087 0055 93 206209 0048 96 - - - - - - which contain around 100,000 lines and i want to take out a sample of 25% from this file. Is there any way i can do this in Bioperl? Thanks Shalabh From maj at fortinbras.us Mon Jun 15 19:49:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 15 Jun 2009 19:49:58 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was: Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: <4A2F622D.5060500@ron.dk> References: <4A2F622D.5060500@ron.dk> Message-ID: Dear All, The revamped Bio::Restriction::* in branch REPOS/bioperl-live/branches/restriction-refactor passes all existing tests, including those in t/Restriction. New tests will be added within the next day or so. The original bug occurred because only a subset of the possible rebase withrefm-formatted enzymes were handled; it choked on freshly-downloaded rebase files because of this. The refactored version now handles *all* rebase types, including those of rebase forms XXX^X [ intrasite cutters, the main types built in to base.pm] XXXX(m/n) [ right-end extrasite cutters ] (s/t)XXXX [ left-end ditto ] (s/t)XXXX(m/n) [ double-end ditto], palindromic and non-palindromic, as well as multisite enzymes that string together combinations of these forms. Much rationalization (well, seems rational to me anyway) and cruft removal in the affected code has also occurred. itype2.pm has been updated as well, to conform to the refactoring. If you're dying to try this now, get a working copy of the branch like so $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor bioperl-rr $ cd bioperl-rr $ perl Build.PL $ ./Build test $ ./Build install This will only hammer your current installation in the $SITE_LIB/Bio/Restriction path; I worked only on a sparse checkout of the necessary files. To revert to your old install, do $ cd $MY_OLD_BIOPERL_WORKINGDIR $ ./Build install [In the possible event that these instructions are in error, there will be a response on this list in a matter of milliseconds, so stand by.] Happy coding- Mark ----- Original Message ----- From: "Rasmus Ory Nielsen" To: Sent: Wednesday, June 10, 2009 3:35 AM Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Hi, > > This is my first time using bioperl for restriction analysis, so please bear > with me, if this is a FAQ. > > I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the > script shown at the bottom of the mail. > My bioperl version is bioperl-live nightly from 09-Jun-2009. > > The scripts throws an exception - see below. But, if I comment out the > '-enzymes' argument, so it uses the built-in collection of enzymes, it works. > > My problem is, that I need to use some of the enzymes that are only available > in rebase. So how do I get this working? > > Thanks for your attention. > > Best regards, > Rasmus Ory Nielsen > > > ############################################################ > Output from the script: > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: Bad end parameter (11). End must be less than the total length of > sequence (total=7) > STACK Bio::PrimarySeq::subseq > /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > STACK Bio::Restriction::Analysis::_enzyme_sites > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > STACK Bio::Restriction::Analysis::_cuts > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > STACK Bio::Restriction::Analysis::cut > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > STACK Bio::Restriction::Analysis::fragment_maps > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > STACK toplevel ./restriction_test.pl:30 > ------------------------------------- > > [roni at ksdhcp ~]$ > > > ############################################################ > Output from the script with the '-enzymes' argument commented out > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > $VAR1 = [ > { > 'seq' => 'CTCGACCGTTAGCAA', > 'end' => 15, > 'start' => '1' > }, > { > 'seq' => 'AGCTTTCTACCGTTATCGT', > 'end' => 34, > 'start' => '16' > } > ]; > [roni at ksdhcp ~]$ > > ############################################################ > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::PrimarySeq; > use Bio::Restriction::IO; > use Bio::Restriction::Analysis; > use Data::Dumper; > > # create seq obj > my $seqobj = new Bio::PrimarySeq( > -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > -primary_id => 'test', > -molecule => 'dna' > ); > > # read rebase file > my $rebase_io = Bio::Restriction::IO->new( > -file => 'withrefm.906', > -format => 'withrefm', > ); > my $rebase_collection = $rebase_io->read; > > # start restriction analysis > my $restriction_analysis = Bio::Restriction::Analysis->new( > -seq => $seqobj, > -enzymes => $rebase_collection, # it works with this line commented out > ); > > # retrieve fragment maps > my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > print Dumper \@fragment_maps; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Jun 15 20:07:21 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 15 Jun 2009 20:07:21 -0400 Subject: [Bioperl-l] sub sampling In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> Message-ID: Shalabh If you want to do sampling with replacement this is not bad (if you trust rand() ): # open your file into $my_infile, then @lines = <$my_infile>; my $num_samps = 10; my $sample_size_pc = 0.25; my @samples; for (1..$num_samps) { push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * @lines) ) ]; } # now, do something, fr'instance my @sample_pc; foreach (@samples) { my $pct=0; foreach my $line (@lines[ @$_ ]) { @a = split(/\s+/,$line); $pct += $a[2]; } $pct /= @$_; push @sample_pc, $pct; } R's just better for some things, ain't it? MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Monday, June 15, 2009 4:06 PM Subject: [Bioperl-l] sub sampling > Hi All, I was just wondering that is there any module is bioperl > that do subsampling? > I have a file like this: > > 369859 0477 93 > 163417 1348 92 > 228122 0176 88 > 232792 0050 93 > 239636 1850 95 > 300069 0048 96 > 244108 0046 91 > 199087 0055 93 > 206209 0048 96 > - - - > - - - > > which contain around 100,000 lines and i want to take out a sample of 25% > from this file. Is there any way i can do this in Bioperl? > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Xianjun.Dong at bccs.uib.no Sat Jun 13 08:05:53 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Sat, 13 Jun 2009 14:05:53 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> Message-ID: <4A339621.2060702@ii.uib.no> Hi, Scott Thanks for your reply first. I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: $gd->filledRectangle($left+$start,0,$left+$end,$bottom, $panel->translate_color($h_color)); where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png test.bioperl1.2.3.png: http://translog.genereg.net/test.bioperl1.2.3.png ] You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) Thanks Xianjun ============================================= # this generates the callback for highlighting a region sub make_postgrid_callback { my $settings = shift; return unless ref $settings->{h_region}; my @h_regions = map { my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; defined($h_ref) && $h_ref eq $settings->{ref} ? [$h_start,$h_end,$h_color||'lightgrey'] : () } @{$settings->{h_region}}; return unless @h_regions; return hilite_regions_closure(@h_regions); } # this subroutine generates a Bio::Graphics::Panel callback closure # suitable for hilighting a region of a panel. # The args are a list of [start,end,color] sub hilite_regions_closure { my @h_regions = @_; return sub { my $gd = shift; my $panel = shift; my $left = $panel->pad_left; my $top = $panel->top; my $bottom = $panel->bottom; for my $r (@h_regions) { my ($h_start,$h_end,$h_color) = @$r; my ($start,$end) = $panel->location2pixel($h_start,$h_end); if ($end-$start <= 1) { $end++; $start-- } # so that we always see something # assuming top is 0 so as to ignore top padding $gd->filledRectangle($left+$start,0,$left+$end,$bottom, $panel->translate_color($h_color)); } }; } Scott Cain wrote: > Hello Xianjun, > > I don't think that approach will work. What you almost certainly need > to do is a postgrid callback that does the drawing of the highlighted > region. For example code of how to do this, take a look at the > make_postgrid_callback subroutine in GBrowse 1.69. The option > -postgrid is a method of Bio::Graphics::Panel. > > Scott > > > > > On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > >> HI, >> >> I am not sure this is the right place I can get help. >> >> I've suffered by a problem for several days: I want to highlight parts of >> regions in my track, using a different background color. To do that, I >> defined a glyph named "background", based on the >> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >> method, by adding code like below: >> >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> >> # the script is pasted at the end >> >> This will draw a rectangle with top=0, bottom=$gd->height. I made the >> highlight regions into a list of features, and add_track with >> -glyph=>'background'. (see the following script, test.pl) This really works >> as I expect, which will add a colored block at background of all tracks in a >> panel (including the ruler arrow). You can see the output image in attached >> file "test.bioperl1.2.3.png" >> >> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not >> work. Well, it works, but the highlight part only shrink to a low height, >> instead of covering all tracks in the panel. I also attached the output >> here, see the file "test.bioperl1.6.png". >> >> I tried to think about the reason, the 'background' module is based on the >> generic module. What can cause the difference? Is it because $gd->height is >> different, or the tracks followed with 'background' track can not draw from >> the first position? >> >> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person >> solve problem, wise person avoid problem"...) But another problem is coming: >> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >> function, which means I have to use some higher version if I want to create >> web map for my graphics, but then I have to give up using highlight >> background. >> >> OK. It's long enough for my first-time submission here. Hope someone can >> throw me some clue. >> >> Thanks ahead!! >> >> Xianjun >> >> >> ==================== test.pl ======================= >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans = >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> -length=>1050, >> -start =>0, >> -pad_left=>12, >> -pad_right=>12); >> >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> $panel->add_track([$trans41,$trans31], >> -glyph => 'background', >> -block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> ); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> -glyph=>'arrow', >> -double=>1, >> -tick=>2); >> >> $panel->add_track($trans, >> -glyph => 'transcript2', # 'transcript2', #process_5utr', >> -fgcolor => 'darkred', >> -bgcolor => 'darkred', >> -title => '$source', >> -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL >> ); >> print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> 1; >> >> ==================== background.pm ======================= >> package Bio::Graphics::Glyph::background; >> >> use strict; >> use base 'Bio::Graphics::Glyph::generic'; >> sub pad_top{ >> return 0; >> } >> >> sub draw_component { >> my $self = shift; >> #$self->SUPER::draw_component(@_); >> my ($gd,$dx,$dy) = @_; >> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >> >> # draw an arrow to indicate the direction of transcript >> my $color = $self->option('block_bgcolor') || '#cccccc'; >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> } >> >> 1; >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.2.3.png Type: image/png Size: 2789 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.6.png Type: image/png Size: 2365 bytes Desc: not available URL: From malcolm.cook at gmail.com Tue Jun 16 04:06:36 2009 From: malcolm.cook at gmail.com (Malcolm Cook) Date: Tue, 16 Jun 2009 03:06:36 -0500 Subject: [Bioperl-l] Alignment->slice() issue? Message-ID: Kevin, I'm getting struck by this old issue you once coded around. http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html Any chance you could share your implementation with fellow traveller... ?? Thanks, Malcolm Cook Stowers insitute for Medical research From remi.planel at free.fr Tue Jun 16 10:57:27 2009 From: remi.planel at free.fr (Remi Planel) Date: Tue, 16 Jun 2009 16:57:27 +0200 Subject: [Bioperl-l] Hits Object Message-ID: <4A37B2D7.70807@free.fr> Hi all, I couldn't find out from a Bio::Search::Result::ResultI object (obtain after parsing a blast report) a way to filter some of the hsps associated ? By filter I mean eliminate for each hit some hsps I'm not interested in ? Can I modify directly the Result object ? Thanks, From lsbrath at gmail.com Tue Jun 16 11:42:37 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Tue, 16 Jun 2009 11:42:37 -0400 Subject: [Bioperl-l] error message: can't call method "next_hit" on and undefined value Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Hello, My method produces an error message stating that it can't call a "next_hit" method on an undefined value. Hello, My method produces an error message stating that it can't call a "next_hit" method on an undefined value. sub hu_bl2seq_parser{ my ($maid, $maid_dir) = @_; # Get the report my $in = new Bio::SearchIO(-format => 'blast', -file => ">".$maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); my $result=$in->next_result; my($hu_aln,$hu_mismatches); # Get info about the first hit my $hit = $result->next_hit; my $name = $hit->name; # get info about the first hsp of the first hit my $hsp = $hit->next_hsp; # get the alignment object my $aln = $hsp->get_aln; #my $percent_id = $hsp->percent_identity; #my $aln_length = $hsp->length('total'); my @mismatches = $hsp->seq_inds('query','nomatch'); my $aln_str=""; # access the alignment string my $strIO=IO::String->new($aln_str); # write the string alignio in clustalw format my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); # now the actual alignment string is accessable for printing or in this case moving to a db table $alnio->write_aln($aln); $hu_aln=$aln_str; $hu_mismatches = scalar @mismatches; return($hu_aln, $hu_mismatches); } The problem is at "my $hit = $result->next_hit;" Any help will be appreciated. LomSpace From cjfields at illinois.edu Tue Jun 16 14:14:18 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 13:14:18 -0500 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu> I'll check out the branch sometime today and run tests on it. Thanks for the hard work Mark! chris On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > Dear All, > > There are tests for the new functionality of Bio::Restriction > now in t/Restriction on the branch, along with the withrefm.906 > in t/data that revealed the bug in RON's post. All tests pass without > warnings on my machine (which is bioperl live, perl 5.10.10, > under Vista/cygwin - yes, I still don't have a real computer). > We're ready for a merge on my end. > > Thanks all for your silent assent to these machinations. > cheers > Mark > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > >> Dear All, >> >> The revamped Bio::Restriction::* in branch >> >> REPOS/bioperl-live/branches/restriction-refactor >> >> passes all existing tests, including those in t/Restriction. >> New tests will be added within the next day or so. >> The original bug occurred because only a subset of >> the possible rebase withrefm-formatted enzymes were >> handled; it choked on freshly-downloaded rebase >> files because of this. >> >> The refactored version now handles *all* rebase types, >> including those of rebase forms >> >> XXX^X [ intrasite cutters, the main types >> built in to base.pm] >> XXXX(m/n) [ right-end extrasite cutters ] >> (s/t)XXXX [ left-end ditto ] >> (s/t)XXXX(m/n) [ double-end ditto], >> >> palindromic and non-palindromic, as well as multisite >> enzymes that string together combinations of these >> forms. Much rationalization (well, seems rational to me >> anyway) and cruft removal in the affected code has also >> occurred. itype2.pm has been updated as well, to >> conform to the refactoring. >> >> If you're dying to try this now, get a working copy >> of the branch like so >> >> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >> restriction-refactor bioperl-rr >> $ cd bioperl-rr >> $ perl Build.PL >> $ ./Build test >> $ ./Build install >> >> This will only hammer your current installation in the >> $SITE_LIB/Bio/Restriction path; I worked only on >> a sparse checkout of the necessary files. To revert to your >> old install, do >> >> $ cd $MY_OLD_BIOPERL_WORKINGDIR >> $ ./Build install >> >> [In the possible event that these instructions are in error, >> there will be a response on this list in a matter of >> milliseconds, so stand by.] >> >> Happy coding- >> Mark >> >> >> >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when >> using rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so >>> please bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >>> created the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out >>> the '-enzymes' argument, so it uses the built-in collection of >>> enzymes, it works. >>> >>> My problem is, that I need to use some of the enzymes that are >>> only available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total >>> length of sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>> 5.10.0/Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line >>> commented out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Tue Jun 16 13:58:56 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 13:58:56 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: Dear All, There are tests for the new functionality of Bio::Restriction now in t/Restriction on the branch, along with the withrefm.906 in t/data that revealed the bug in RON's post. All tests pass without warnings on my machine (which is bioperl live, perl 5.10.10, under Vista/cygwin - yes, I still don't have a real computer). We're ready for a merge on my end. Thanks all for your silent assent to these machinations. cheers Mark ----- Original Message ----- From: "Mark A. Jensen" To: "Rasmus Ory Nielsen" ; Sent: Monday, June 15, 2009 7:49 PM Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Dear All, > > The revamped Bio::Restriction::* in branch > > REPOS/bioperl-live/branches/restriction-refactor > > passes all existing tests, including those in t/Restriction. > New tests will be added within the next day or so. > The original bug occurred because only a subset of > the possible rebase withrefm-formatted enzymes were > handled; it choked on freshly-downloaded rebase > files because of this. > > The refactored version now handles *all* rebase types, > including those of rebase forms > > XXX^X [ intrasite cutters, the main types > built in to base.pm] > XXXX(m/n) [ right-end extrasite cutters ] > (s/t)XXXX [ left-end ditto ] > (s/t)XXXX(m/n) [ double-end ditto], > > palindromic and non-palindromic, as well as multisite > enzymes that string together combinations of these > forms. Much rationalization (well, seems rational to me > anyway) and cruft removal in the affected code has also > occurred. itype2.pm has been updated as well, to > conform to the refactoring. > > If you're dying to try this now, get a working copy > of the branch like so > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > bioperl-rr > $ cd bioperl-rr > $ perl Build.PL > $ ./Build test > $ ./Build install > > This will only hammer your current installation in the > $SITE_LIB/Bio/Restriction path; I worked only on > a sparse checkout of the necessary files. To revert to your > old install, do > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > $ ./Build install > > [In the possible event that these instructions are in error, > there will be a response on this list in a matter of > milliseconds, so stand by.] > > Happy coding- > Mark > > > > > ----- Original Message ----- > From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so please bear >> with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the >> script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out the >> '-enzymes' argument, so it uses the built-in collection of enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only available >> in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length of >> sequence (total=7) >> STACK Bio::PrimarySeq::subseq >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line commented >> out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jun 16 13:51:14 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 13:51:14 -0400 Subject: [Bioperl-l] Hits Object In-Reply-To: <4A37B2D7.70807@free.fr> Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife> Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will be answered there- cheers, Mark From cjfields at illinois.edu Tue Jun 16 14:31:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 13:31:10 -0500 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: Everything passes on my end (Mac OS X 10.5, perl 5.10.0). +1 on the merge. Also (as mentioned some time back w/ Hilmar among others), we can probably delete this branch seeing as the code will be merged to trunk (it being a feature branch and all). Worth doing the same for a few other feature branches as well. chris On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > Dear All, > > There are tests for the new functionality of Bio::Restriction > now in t/Restriction on the branch, along with the withrefm.906 > in t/data that revealed the bug in RON's post. All tests pass without > warnings on my machine (which is bioperl live, perl 5.10.10, > under Vista/cygwin - yes, I still don't have a real computer). > We're ready for a merge on my end. > > Thanks all for your silent assent to these machinations. > cheers > Mark > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > >> Dear All, >> >> The revamped Bio::Restriction::* in branch >> >> REPOS/bioperl-live/branches/restriction-refactor >> >> passes all existing tests, including those in t/Restriction. >> New tests will be added within the next day or so. >> The original bug occurred because only a subset of >> the possible rebase withrefm-formatted enzymes were >> handled; it choked on freshly-downloaded rebase >> files because of this. >> >> The refactored version now handles *all* rebase types, >> including those of rebase forms >> >> XXX^X [ intrasite cutters, the main types >> built in to base.pm] >> XXXX(m/n) [ right-end extrasite cutters ] >> (s/t)XXXX [ left-end ditto ] >> (s/t)XXXX(m/n) [ double-end ditto], >> >> palindromic and non-palindromic, as well as multisite >> enzymes that string together combinations of these >> forms. Much rationalization (well, seems rational to me >> anyway) and cruft removal in the affected code has also >> occurred. itype2.pm has been updated as well, to >> conform to the refactoring. >> >> If you're dying to try this now, get a working copy >> of the branch like so >> >> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >> restriction-refactor bioperl-rr >> $ cd bioperl-rr >> $ perl Build.PL >> $ ./Build test >> $ ./Build install >> >> This will only hammer your current installation in the >> $SITE_LIB/Bio/Restriction path; I worked only on >> a sparse checkout of the necessary files. To revert to your >> old install, do >> >> $ cd $MY_OLD_BIOPERL_WORKINGDIR >> $ ./Build install >> >> [In the possible event that these instructions are in error, >> there will be a response on this list in a matter of >> milliseconds, so stand by.] >> >> Happy coding- >> Mark >> >> >> >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when >> using rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so >>> please bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >>> created the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out >>> the '-enzymes' argument, so it uses the built-in collection of >>> enzymes, it works. >>> >>> My problem is, that I need to use some of the enzymes that are >>> only available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total >>> length of sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>> 5.10.0/Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line >>> commented out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Tue Jun 16 15:07:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 14:07:44 -0500 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: Sounds to me like a BioPerl bug. Do you have some example data demonstrating the problem? chris On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote: > Kevin, > > I'm getting struck by this old issue you once coded around. > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > Any chance you could share your implementation with fellow > traveller... > > ?? > > Thanks, > > Malcolm Cook > Stowers insitute for Medical research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jun 16 15:32:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 15:32:02 -0400 Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined value In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife> lomspace- this my $in = new Bio::SearchIO(-format => 'blast', -file => ">".$maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); should be my $in = new Bio::SearchIO(-format => 'blast', -file => $maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); if you're reading the file. Then $result will have something in it when you do $in->next_result cheers, MAJ ----- Original Message ----- From: "Mgavi Brathwaite" To: Sent: Tuesday, June 16, 2009 11:42 AM Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined value > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > sub hu_bl2seq_parser{ > my ($maid, $maid_dir) = @_; > # Get the report > my $in = new Bio::SearchIO(-format => 'blast', > -file => ">".$maid_dir."\\".$maid."aln_hu.aln", > -report_type => 'blastn'); > #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); > #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); > my $result=$in->next_result; > my($hu_aln,$hu_mismatches); > # Get info about the first hit > my $hit = $result->next_hit; > my $name = $hit->name; > # get info about the first hsp of the first hit > my $hsp = $hit->next_hsp; > # get the alignment object > my $aln = $hsp->get_aln; > #my $percent_id = $hsp->percent_identity; > #my $aln_length = $hsp->length('total'); > my @mismatches = $hsp->seq_inds('query','nomatch'); > my $aln_str=""; > # access the alignment string > my $strIO=IO::String->new($aln_str); > # write the string alignio in clustalw format > my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); > # now the actual alignment string is accessable for printing or in > this case moving to a db table > $alnio->write_aln($aln); > $hu_aln=$aln_str; > $hu_mismatches = scalar @mismatches; > return($hu_aln, $hu_mismatches); > } > > The problem is at "my $hit = $result->next_hit;" > Any help will be appreciated. > LomSpace > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Jun 16 15:46:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 16 Jun 2009 12:46:40 -0700 Subject: [Bioperl-l] error message: can't call method "next_hit" on and undefined value In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Message-ID: <4A37F6A0.1080907@cornell.edu> Mgavi Brathwaite wrote: > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. Your proximate problem seems to be that you are prepending a '>' to the filename in your invocation of Bio::SearchIO::new, which I think might cause it to write to the file instead of reading from it. But also, you probably want to use next_result and next_hit in while loops, since they return undef when there are no more hits or hsps to parse. This is what is causing your "can't call next_hit on undefined value" error. next_result() returns undef when there are no results to parse. by while loops, I mean something like: while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { # insert the rest of your operations here } } Hope this helps. Rob > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > sub hu_bl2seq_parser{ > my ($maid, $maid_dir) = @_; > # Get the report > my $in = new Bio::SearchIO(-format => 'blast', > -file => ">".$maid_dir."\\".$maid."aln_hu.aln", > -report_type => 'blastn'); > #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); > #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); > my $result=$in->next_result; > my($hu_aln,$hu_mismatches); > # Get info about the first hit > my $hit = $result->next_hit; > my $name = $hit->name; > # get info about the first hsp of the first hit > my $hsp = $hit->next_hsp; > # get the alignment object > my $aln = $hsp->get_aln; > #my $percent_id = $hsp->percent_identity; > #my $aln_length = $hsp->length('total'); > my @mismatches = $hsp->seq_inds('query','nomatch'); > my $aln_str=""; > # access the alignment string > my $strIO=IO::String->new($aln_str); > # write the string alignio in clustalw format > my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); > # now the actual alignment string is accessable for printing or in > this case moving to a db table > $alnio->write_aln($aln); > $hu_aln=$aln_str; > $hu_mismatches = scalar @mismatches; > return($hu_aln, $hu_mismatches); > } > > The problem is at "my $hit = $result->next_hit;" > Any help will be appreciated. > LomSpace > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Jun 16 16:10:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 16:10:34 -0400 Subject: [Bioperl-l] Bio::Restriction refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife> Right; will remove branch. Will go ahead with merge at 21:20 UTC. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Rasmus Ory Nielsen" Sent: Tuesday, June 16, 2009 2:31 PM Subject: Re: [Bioperl-l] Bio::Restriction refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Everything passes on my end (Mac OS X 10.5, perl 5.10.0). +1 on the merge. > > Also (as mentioned some time back w/ Hilmar among others), we can probably > delete this branch seeing as the code will be merged to trunk (it being a > feature branch and all). Worth doing the same for a few other feature > branches as well. > > chris > > On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > >> Dear All, >> >> There are tests for the new functionality of Bio::Restriction >> now in t/Restriction on the branch, along with the withrefm.906 >> in t/data that revealed the bug in RON's post. All tests pass without >> warnings on my machine (which is bioperl live, perl 5.10.10, >> under Vista/cygwin - yes, I still don't have a real computer). >> We're ready for a merge on my end. >> >> Thanks all for your silent assent to these machinations. >> cheers >> Mark >> >> ----- Original Message ----- From: "Mark A. Jensen" >> To: "Rasmus Ory Nielsen" ; >> Sent: Monday, June 15, 2009 7:49 PM >> Subject: [Bioperl-l] Bio::Restriction refactor >> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] >> >> >>> Dear All, >>> >>> The revamped Bio::Restriction::* in branch >>> >>> REPOS/bioperl-live/branches/restriction-refactor >>> >>> passes all existing tests, including those in t/Restriction. >>> New tests will be added within the next day or so. >>> The original bug occurred because only a subset of >>> the possible rebase withrefm-formatted enzymes were >>> handled; it choked on freshly-downloaded rebase >>> files because of this. >>> >>> The refactored version now handles *all* rebase types, >>> including those of rebase forms >>> >>> XXX^X [ intrasite cutters, the main types >>> built in to base.pm] >>> XXXX(m/n) [ right-end extrasite cutters ] >>> (s/t)XXXX [ left-end ditto ] >>> (s/t)XXXX(m/n) [ double-end ditto], >>> >>> palindromic and non-palindromic, as well as multisite >>> enzymes that string together combinations of these >>> forms. Much rationalization (well, seems rational to me >>> anyway) and cruft removal in the affected code has also >>> occurred. itype2.pm has been updated as well, to >>> conform to the refactoring. >>> >>> If you're dying to try this now, get a working copy >>> of the branch like so >>> >>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >>> restriction-refactor bioperl-rr >>> $ cd bioperl-rr >>> $ perl Build.PL >>> $ ./Build test >>> $ ./Build install >>> >>> This will only hammer your current installation in the >>> $SITE_LIB/Bio/Restriction path; I worked only on >>> a sparse checkout of the necessary files. To revert to your >>> old install, do >>> >>> $ cd $MY_OLD_BIOPERL_WORKINGDIR >>> $ ./Build install >>> >>> [In the possible event that these instructions are in error, >>> there will be a response on this list in a matter of >>> milliseconds, so stand by.] >>> >>> Happy coding- >>> Mark >>> >>> >>> >>> >>> ----- Original Message ----- From: "Rasmus Ory Nielsen" >>> To: >>> Sent: Wednesday, June 10, 2009 3:35 AM >>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >>> rebasefile. >>> >>> >>>> Hi, >>>> >>>> This is my first time using bioperl for restriction analysis, so please >>>> bear with me, if this is a FAQ. >>>> >>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>>> the script shown at the bottom of the mail. >>>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>>> >>>> The scripts throws an exception - see below. But, if I comment out the >>>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>>> works. >>>> >>>> My problem is, that I need to use some of the enzymes that are only >>>> available in rebase. So how do I get this working? >>>> >>>> Thanks for your attention. >>>> >>>> Best regards, >>>> Rasmus Ory Nielsen >>>> >>>> >>>> ############################################################ >>>> Output from the script: >>>> ############################################################ >>>> >>>> [roni at ksdhcp ~]$ ./restriction_test.pl >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>>> --------------------------------------------------- >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: Bad end parameter (11). End must be less than the total length of >>>> sequence (total=7) >>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>>> 5.10.0/Bio/PrimarySeq.pm:401 >>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>>> STACK toplevel ./restriction_test.pl:30 >>>> ------------------------------------- >>>> >>>> [roni at ksdhcp ~]$ >>>> >>>> >>>> ############################################################ >>>> Output from the script with the '-enzymes' argument commented out >>>> ############################################################ >>>> >>>> [roni at ksdhcp ~]$ ./restriction_test.pl >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>>> --------------------------------------------------- >>>> $VAR1 = [ >>>> { >>>> 'seq' => 'CTCGACCGTTAGCAA', >>>> 'end' => 15, >>>> 'start' => '1' >>>> }, >>>> { >>>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>>> 'end' => 34, >>>> 'start' => '16' >>>> } >>>> ]; >>>> [roni at ksdhcp ~]$ >>>> >>>> ############################################################ >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::PrimarySeq; >>>> use Bio::Restriction::IO; >>>> use Bio::Restriction::Analysis; >>>> use Data::Dumper; >>>> >>>> # create seq obj >>>> my $seqobj = new Bio::PrimarySeq( >>>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>>> -primary_id => 'test', >>>> -molecule => 'dna' >>>> ); >>>> >>>> # read rebase file >>>> my $rebase_io = Bio::Restriction::IO->new( >>>> -file => 'withrefm.906', >>>> -format => 'withrefm', >>>> ); >>>> my $rebase_collection = $rebase_io->read; >>>> >>>> # start restriction analysis >>>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>>> -seq => $seqobj, >>>> -enzymes => $rebase_collection, # it works with this line commented >>>> out >>>> ); >>>> >>>> # retrieve fragment maps >>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>>> print Dumper \@fragment_maps; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From MEC at stowers.org Tue Jun 16 16:13:33 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 16 Jun 2009 15:13:33 -0500 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: Chris! erm, yeah, I do.... ... and I will schedule some time to code up a test and add it to AlignI's suite.... Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Tuesday, June 16, 2009 2:08 PM > To: Malcolm Cook > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Alignment->slice() issue? > > Sounds to me like a BioPerl bug. Do you have some example > data demonstrating the problem? > > chris > > On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote: > > > Kevin, > > > > I'm getting struck by this old issue you once coded around. > > > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > > > Any chance you could share your implementation with fellow > > traveller... > > > > ?? > > > > Thanks, > > > > Malcolm Cook > > Stowers insitute for Medical research > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Jun 16 22:47:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 22:47:39 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> Dear All, The refactored Bio::Restriction::* has been merged to trunk, with all tests passing. [Anyone got a cigarette?] cheers, Mark ----- Original Message ----- From: "Mark A. Jensen" To: "Rasmus Ory Nielsen" ; Sent: Monday, June 15, 2009 7:49 PM Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Dear All, > > The revamped Bio::Restriction::* in branch > > REPOS/bioperl-live/branches/restriction-refactor > > passes all existing tests, including those in t/Restriction. > New tests will be added within the next day or so. > The original bug occurred because only a subset of > the possible rebase withrefm-formatted enzymes were > handled; it choked on freshly-downloaded rebase > files because of this. > > The refactored version now handles *all* rebase types, > including those of rebase forms > > XXX^X [ intrasite cutters, the main types > built in to base.pm] > XXXX(m/n) [ right-end extrasite cutters ] > (s/t)XXXX [ left-end ditto ] > (s/t)XXXX(m/n) [ double-end ditto], > > palindromic and non-palindromic, as well as multisite > enzymes that string together combinations of these > forms. Much rationalization (well, seems rational to me > anyway) and cruft removal in the affected code has also > occurred. itype2.pm has been updated as well, to > conform to the refactoring. > > If you're dying to try this now, get a working copy > of the branch like so > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > bioperl-rr > $ cd bioperl-rr > $ perl Build.PL > $ ./Build test > $ ./Build install > > This will only hammer your current installation in the > $SITE_LIB/Bio/Restriction path; I worked only on > a sparse checkout of the necessary files. To revert to your > old install, do > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > $ ./Build install > > [In the possible event that these instructions are in error, > there will be a response on this list in a matter of > milliseconds, so stand by.] > > Happy coding- > Mark > > > > > ----- Original Message ----- > From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so please bear >> with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the >> script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out the >> '-enzymes' argument, so it uses the built-in collection of enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only available >> in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length of >> sequence (total=7) >> STACK Bio::PrimarySeq::subseq >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line commented >> out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Jun 16 23:21:22 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 17 Jun 2009 15:21:22 +1200 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> References: <4A2F622D.5060500@ron.dk> <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz> Cigarettes are post-coitus and pre-firing squad. What you'd be needing is a cigar (proud father) ;-) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 17 June 2009 2:48 p.m. > To: bioperl-l at lists.open-bio.org > Cc: Rasmus Ory Nielsen > Subject: Re: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > Dear All, > > The refactored Bio::Restriction::* has been merged to trunk, with all > tests passing. [Anyone got a cigarette?] > > cheers, > Mark > > ----- Original Message ----- > From: "Mark A. Jensen" > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. > Exception when using rebasefile.] > > > > Dear All, > > > > The revamped Bio::Restriction::* in branch > > > > REPOS/bioperl-live/branches/restriction-refactor > > > > passes all existing tests, including those in t/Restriction. > > New tests will be added within the next day or so. > > The original bug occurred because only a subset of > > the possible rebase withrefm-formatted enzymes were > > handled; it choked on freshly-downloaded rebase > > files because of this. > > > > The refactored version now handles *all* rebase types, > > including those of rebase forms > > > > XXX^X [ intrasite cutters, the main types > > built in to base.pm] > > XXXX(m/n) [ right-end extrasite cutters ] > > (s/t)XXXX [ left-end ditto ] > > (s/t)XXXX(m/n) [ double-end ditto], > > > > palindromic and non-palindromic, as well as multisite > > enzymes that string together combinations of these > > forms. Much rationalization (well, seems rational to me > > anyway) and cruft removal in the affected code has also > > occurred. itype2.pm has been updated as well, to > > conform to the refactoring. > > > > If you're dying to try this now, get a working copy > > of the branch like so > > > > $ svn co > > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > > bioperl-rr > > $ cd bioperl-rr > > $ perl Build.PL > > $ ./Build test > > $ ./Build install > > > > This will only hammer your current installation in the > > $SITE_LIB/Bio/Restriction path; I worked only on > > a sparse checkout of the necessary files. To revert to your > > old install, do > > > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > > $ ./Build install > > > > [In the possible event that these instructions are in error, > > there will be a response on this list in a matter of > > milliseconds, so stand by.] > > > > Happy coding- > > Mark > > > > > > > > > > ----- Original Message ----- > > From: "Rasmus Ory Nielsen" > > To: > > Sent: Wednesday, June 10, 2009 3:35 AM > > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > > rebasefile. > > > > > >> Hi, > >> > >> This is my first time using bioperl for restriction analysis, so please > bear > >> with me, if this is a FAQ. > >> > >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created > the > >> script shown at the bottom of the mail. > >> My bioperl version is bioperl-live nightly from 09-Jun-2009. > >> > >> The scripts throws an exception - see below. But, if I comment out the > >> '-enzymes' argument, so it uses the built-in collection of enzymes, it > works. > >> > >> My problem is, that I need to use some of the enzymes that are only > available > >> in rebase. So how do I get this working? > >> > >> Thanks for your attention. > >> > >> Best regards, > >> Rasmus Ory Nielsen > >> > >> > >> ############################################################ > >> Output from the script: > >> ############################################################ > >> > >> [roni at ksdhcp ~]$ ./restriction_test.pl > >> > >> --------------------- WARNING --------------------- > >> MSG: The enzyme name CviKI-1 was changed to CviKI-I > >> --------------------------------------------------- > >> > >> ------------- EXCEPTION ------------- > >> MSG: Bad end parameter (11). End must be less than the total length of > >> sequence (total=7) > >> STACK Bio::PrimarySeq::subseq > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > >> STACK Bio::Restriction::Analysis::_enzyme_sites > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > >> STACK Bio::Restriction::Analysis::_cuts > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > >> STACK Bio::Restriction::Analysis::cut > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > >> STACK Bio::Restriction::Analysis::fragment_maps > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > >> STACK toplevel ./restriction_test.pl:30 > >> ------------------------------------- > >> > >> [roni at ksdhcp ~]$ > >> > >> > >> ############################################################ > >> Output from the script with the '-enzymes' argument commented out > >> ############################################################ > >> > >> [roni at ksdhcp ~]$ ./restriction_test.pl > >> > >> --------------------- WARNING --------------------- > >> MSG: The enzyme name CviKI-1 was changed to CviKI-I > >> --------------------------------------------------- > >> $VAR1 = [ > >> { > >> 'seq' => 'CTCGACCGTTAGCAA', > >> 'end' => 15, > >> 'start' => '1' > >> }, > >> { > >> 'seq' => 'AGCTTTCTACCGTTATCGT', > >> 'end' => 34, > >> 'start' => '16' > >> } > >> ]; > >> [roni at ksdhcp ~]$ > >> > >> ############################################################ > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> use Bio::PrimarySeq; > >> use Bio::Restriction::IO; > >> use Bio::Restriction::Analysis; > >> use Data::Dumper; > >> > >> # create seq obj > >> my $seqobj = new Bio::PrimarySeq( > >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > >> -primary_id => 'test', > >> -molecule => 'dna' > >> ); > >> > >> # read rebase file > >> my $rebase_io = Bio::Restriction::IO->new( > >> -file => 'withrefm.906', > >> -format => 'withrefm', > >> ); > >> my $rebase_collection = $rebase_io->read; > >> > >> # start restriction analysis > >> my $restriction_analysis = Bio::Restriction::Analysis->new( > >> -seq => $seqobj, > >> -enzymes => $rebase_collection, # it works with this line commented > >> out > >> ); > >> > >> # retrieve fragment maps > >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > >> print Dumper \@fragment_maps; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From e.stupka at ucl.ac.uk Wed Jun 17 07:29:08 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 12:29:08 +0100 Subject: [Bioperl-l] Next-gen modules Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Dear all, after several years of absence I am slowly coming back to Bioperl, and hope to contribute again to its development. One area that I was thinking of starting from, since we are actively involved with it, is to improve BIoperl's support fo next-gen sequencing data, tools, etc. Since I am sure I have missed out on a lot of recent developments, do let me know if/what is useful. One example that comes to mind is that the conversion of various formats to/from FASTQ does not seem to be supported. Some code can be found within Li Heng's script: http://maq.sourceforge.net/ fq_all2std.pl but it would be good if it could make its way into SeqIO? And similarly, potentially, for other next-gen sequence formats? Similarly, there seems to be little in bioperl-run to support tools that have been developed in this area, such as Maq, BowTie, TopHat, etc? Do let me know if there is a past thread on this, or other people actively developing, etc. so that I can find out what priorities are. thanks and best regards to all (old friends and new), Elia --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 08:19:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 08:19:04 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ] ----- Original Message ----- From: "Elia Stupka" To: Sent: Wednesday, June 17, 2009 7:29 AM Subject: [Bioperl-l] Next-gen modules > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and > hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can be > found within Li Heng's script: http://maq.sourceforge.net/ > fq_all2std.pl but it would be good if it could make its way into > SeqIO? And similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Wed Jun 17 08:21:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 13:21:17 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com> On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka wrote: > > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and hope > to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen sequencing > data, tools, etc. Since I am sure I have missed out on a lot of recent > developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various formats > to/from FASTQ does not seem to be supported. Some code can be found within > Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be > good if it could make its way into SeqIO? And similarly, potentially, for > other next-gen sequence formats? If you do add FASTQ support to BioPerl's SeqIO (and I think that is a good idea), please could you follow the format names used by Biopython - as this time we got there first ;) I'm asking this as Biopython's SeqIO tries to use the same format names as BioPerl's SeqIO and EMBOSS, see http://biopython.org/wiki/SeqIO Specifically, * "fastq" in Biopython means the original Sanger standard FASTQ files encoding PHRED qualities using an ASCII offset of 33. * "fastq-solexa" in Biopython means the early Solexa/Illumina style FASTQ files which encode Solexa qualities using an ASCII offset of 64. * "fastq-illumina" in Biopython will mean recent Solexa/Illumina style FASTQ files (from pipeline version 1.3+) which encode PHRED qualities using an ASCII offset of 64. This is in the Biopython repository, but hasn't been released yet - so the name "fastq-illumina" isn't set in stone yet. For good quality reads, PHRED and Solexa scores are approximately equal, so the "fastq-solexa" and "fastq-illumina" variants are almost equivalent. > Similarly, there seems to be little in bioperl-run to support tools that > have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people actively > developing, etc. so that I can find out what priorities are. Have you seen these recent threads?: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html Regards, Peter (at Biopython) From maj at fortinbras.us Wed Jun 17 08:02:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 08:02:11 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <92C15E3391F64BAF801754E924122540@NewLife> Elia-- I say a definite +1; in fact, this sounds like it should be a Hot Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some others you might have missed in your hiatus...). I will create a page that can be a central point for wish lists, discussion, etc. There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html cheers from a newbie, Mark ----- Original Message ----- From: "Elia Stupka" To: Sent: Wednesday, June 17, 2009 7:29 AM Subject: [Bioperl-l] Next-gen modules > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and > hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can be > found within Li Heng's script: http://maq.sourceforge.net/ > fq_all2std.pl but it would be good if it could make its way into > SeqIO? And similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 08:57:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 07:57:52 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: Elia, As Mark indicated, we recently discussed the lack of support for next- gen on list, at least re: fastq. I may be hit with the same thing in a few months time myself, and I recall Jason and a few others also mentioning the same. Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but I don't believe it has been committed to trunk yet, so maybe he can answer. From prior discussions IIRC the issues were: 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina 1.3) from one another (so maybe some optional validation), and 2) having a way for the Seq object to either 'know' what format is contained, or we use phred score and convert back and forth from that (I think the latter makes more sense). Peter's suggestions also are reasonable, though does biopython have a separate module for each of these variations? Our version (I believe) mainly varied the conversion within Bio::SeqIO::fastq itself based on the fastq variant passed in as a separate named argument. As for the wrappers, we would most certainly welcome them! chris On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote: > Dear all, > > after several years of absence I am slowly coming back to Bioperl, > and hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can > be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl > but it would be good if it could make its way into SeqIO? And > similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, > etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 08:54:22 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 13:54:22 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: Dear Mark, thanks a lot for the pointers. With regards to FASTQ parsing: -my understanding by reading past threads is to work on a single format, i.e. FASTQ and to interpet the quality "flavours" as just quality conversions, right? -However, I assume we would still want to support a simple way for the user to say format => 'fastq-solexa' using the nomenclature adopted in BioPython suggested by Peter, right? -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they would produce identical outputs, will be a good check. Finally, I saw Tristan's reply to Heikki's thread, so what is the status quo? Is it moving forward? cheers Elia On 17 Jun 2009, at 13:02, Mark A. Jensen wrote: > Elia-- > I say a definite +1; in fact, this sounds like it should be a Hot > Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some > others > you might have missed in your hiatus...). I will create a page that > can be a central point for wish lists, discussion, etc. > > There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html > > cheers from a newbie, Mark > > ----- Original Message ----- From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > >> Dear all, >> after several years of absence I am slowly coming back to Bioperl, >> and hope to contribute again to its development. >> One area that I was thinking of starting from, since we are >> actively involved with it, is to improve BIoperl's support fo next- >> gen sequencing data, tools, etc. Since I am sure I have missed out >> on a lot of recent developments, do let me know if/what is useful. >> One example that comes to mind is that the conversion of various >> formats to/from FASTQ does not seem to be supported. Some code can >> be found within Li Heng's script: http://maq.sourceforge.net/ >> fq_all2std.pl but it would be good if it could make its way into >> SeqIO? And similarly, potentially, for other next-gen sequence >> formats? >> Similarly, there seems to be little in bioperl-run to support >> tools that have been developed in this area, such as Maq, BowTie, >> TopHat, etc? >> Do let me know if there is a past thread on this, or other people >> actively developing, etc. so that I can find out what priorities are. >> thanks and best regards to all (old friends and new), >> Elia >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From biopython at maubp.freeserve.co.uk Wed Jun 17 09:25:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 14:25:59 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields wrote: > > Elia, > > As Mark indicated, we recently discussed the lack of support for next-gen on > list, at least re: fastq. ?I may be hit with the same thing in a few months > time myself, and I recall Jason and a few others also mentioning the same. > ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but > I don't believe it has been committed to trunk yet, so maybe he can answer. > > From prior discussions IIRC the issues were: > > 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina > 1.3) from one another (so maybe some optional validation), and Following the python rule of thumb for being explicit, Biopython makes the user specify which FASTQ variant is being used. I don't think you can do anything else. Any attempted validation would have to be heuristic based on the ASCII characters found, and would risk false positive warnings. > 2) having a way for the Seq object to either 'know' what format is > contained, or we use phred score and convert back and forth from that (I > think the latter makes more sense). I think it could make sense for BioPerl to convert Solexa scores to/from PHRED scores on the fly (especially now that Illumina is abandoning the Solexa score system). Python style tries to avoid implicit conversions, so Biopython doesn't automatically do a conversion from Solexa to PHRED scores on parsing (but will on writing if the requested output format requires this). > Peter's suggestions also are reasonable, though does biopython have a > separate module for each of these variations? ?Our version (I believe) > mainly varied the conversion within Bio::SeqIO::fastq itself based on the > fastq variant passed in as a separate named argument. Biopython's SeqIO gives the three FASTQ variants their own unique names. This format name is a required argument for parsing/writing (we don't try and guess the file format from the data contents). Internally we have three separate FASTQ parsers/writers although they do share code. Other issues to keep in mind: (3) There should be no warning parsing files where the optional repeated title is missing on the "+" lines (as discussed earlier on the BioPerl list). (4) When writing FASTQ files should BioPerl omit the optional repeated title on the "+" line? Biopython omits this as I understand this to be common practice, and can make a big different to file sizes - especially on short read data from Solexa/Illumina. (5) Also test reading and writing files with an optional description (as well as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples, e.g. @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC (6) Test reading and writing files where the encoded quality string starts with a "@" or a "+" character, e.g. http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html Peter From tristan.lefebure at gmail.com Wed Jun 17 09:27:12 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 17 Jun 2009 09:27:12 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: <200906170927.13273.tristan.lefebure@gmail.com> Hello, Regarding next-gen sequences and bioperl, following my experience, another issue is bioperl speed. For example, if you want to trim bad quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, you've got to be patient (but may be I missed some shortcuts...). A pure perl solution will be between 100 to 1000x faster... Would it be possible to have an ultra-light quality object with few simple methods for next-gen reads? I can contribute some tests if that sounds like an important point. -Tristan On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote: > Elia-- > I say a definite +1; in fact, this sounds like it should > be a Hot Topic (see > http://www.bioperl.org/wiki/Category:Hot_Topics for some > others you might have missed in your hiatus...). I will > create a page that can be a central point for wish lists, > discussion, etc. > > There has been much discussion of late about FASTQ > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0 >30187.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02 >9970.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02 >9911.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-April/ >029765.html > > cheers from a newbie, > Mark > > ----- Original Message ----- > From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > > Dear all, > > > > after several years of absence I am slowly coming back > > to Bioperl, and hope to contribute again to its > > development. > > > > One area that I was thinking of starting from, since we > > are actively involved with it, is to improve BIoperl's > > support fo next-gen sequencing data, tools, etc. Since > > I am sure I have missed out on a lot of recent > > developments, do let me know if/what is useful. > > > > One example that comes to mind is that the conversion > > of various formats to/from FASTQ does not seem to be > > supported. Some code can be found within Li Heng's > > script: http://maq.sourceforge.net/ fq_all2std.pl but > > it would be good if it could make its way into SeqIO? > > And similarly, potentially, for other next-gen sequence > > formats? > > > > Similarly, there seems to be little in bioperl-run to > > support tools that have been developed in this area, > > such as Maq, BowTie, TopHat, etc? > > > > Do let me know if there is a past thread on this, or > > other people actively developing, etc. so that I can > > find out what priorities are. > > > > thanks and best regards to all (old friends and new), > > > > Elia > > > > --- > > Senior Lecturer, Bioinformatics > > UCL Cancer Institute > > Paul O' Gorman Building > > University College London > > Gower Street > > WC1E 6BT > > London > > UK > > > > Office (UCL): +44 207 679 6493 > > Office (ICMS): +44 0207 8822374 > > > > Mobile: +44 7597 566 194 > > Mobile (Italy): +39 338 8448801 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Jun 17 09:54:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 14:54:45 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com> On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka wrote: > > Dear Mark, > > thanks a lot for the pointers. > > With regards to FASTQ parsing: > > -my understanding by reading past threads is to work on a single format, > i.e. FASTQ and to interpet the quality "flavours" as just quality > conversions, right? > -However, I assume we would still want to support a simple way for the user > to say format => 'fastq-solexa' using the nomenclature adopted in BioPython > suggested by Peter, right? I think you will need a way for the user to say they have a Solexa, or an Illumina 1.3+, or an original Sanger standard FASTQ file. >From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format and the variant "flybase_chadoxml" format). Does BioPerl's SeqIO format system have any concept of flavour that I am not aware of? > -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code > at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they > would produce identical outputs, will be a good check. Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful guide (although it doesn't yet cope with the new Illumina 1.3+ variant), but I don't trust it 100%. See e.g. http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html Peter From john.marshall at sanger.ac.uk Wed Jun 17 09:28:12 2009 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 17 Jun 2009 14:28:12 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> On 17 Jun 2009, at 12:29, Elia Stupka wrote: > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, > etc? FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit in the not too distant future. (First it needs some "blah blah" replaced with actual documentation and a test suite.) Cheers, John [1] http://www.ebi.ac.uk/~zerbino/velvet/ -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From Kevin.M.Brown at asu.edu Wed Jun 17 11:41:18 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 17 Jun 2009 08:41:18 -0700 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> Warning: This is very ugly code and makes a few assumptions, such as the alignment objects are stored in order of their start position. I made this assumption as that is how I put them into the object to begin with. =head1 C Function to slice up an alignment sequence based on start and end parameters and returns a new alignment object. slice($alignment, $start, $end) =cut sub slice { my ($alignment, $start, $end, $new_align) = @_; $$new_align = new Bio::SimpleAlign; print $$alignment->no_sequences() . "\n"; $$new_align->add_seq( new Bio::LocatableSeq( -seq => substr( $$alignment->get_seq_by_pos(1)->seq(), $start - 1, $end - $start + 1 ), -id => $$alignment->get_seq_by_pos(1)->display_id(), -start => max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1), -end => min( $$alignment->get_seq_by_pos(1)->end - $start + 1, $end - $start + 1 ), -alphabet => 'dna', -strand => $$alignment->get_seq_by_pos(1)->strand() ) ); # implement a binary search to determine a decent offset into the alignment my $probe; if ($$alignment->no_sequences() <= 2) { $probe = $$alignment->no_sequences(); } else { my ($L, $R) = (1, $$alignment->no_sequences()); while (($R - $L) > 1) { $probe = floor(($R + $L) / 2); # gotta watch this. Had the check backwards and so was never going # in the right direction for the search. If I reverse these two # variables, then I have to either reverse the conditions or change # the > to a <. if ($$alignment->get_seq_by_pos($probe)->start() > $start) { $R = $probe; } else { $L = $probe; } } } # now go through the results that are after that point for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++) { my $seq = $$alignment->get_seq_by_pos($i); last if ($seq->start() > $end); # Only concern ourselves with primers that land inside the desired region # other primers will show up in the image maps for each gene. if ($seq->start() >= $start && $seq->end() <= $end) { # values for the substr pullout of a given sequence my $offset = max($start - $seq->start(), 0); my $length = min($end, $seq->end()) - max($start, $seq->start()) + 1; $$new_align->add_seq( new Bio::LocatableSeq( -seq => $seq->seq(), -id => $seq->display_id(), -start => max($seq->start - $start + 1, 1), -end => min($seq->end - $start + 1, $end - $start + 1), -alphabet => 'dna', -strand => $seq->strand() ) ); } } return 1; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Malcolm Cook > Sent: Tuesday, June 16, 2009 1:07 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Alignment->slice() issue? > > Kevin, > > I'm getting struck by this old issue you once coded around. > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > Any chance you could share your implementation with fellow > traveller... > > ?? > > Thanks, > > Malcolm Cook > Stowers insitute for Medical research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jun 17 12:47:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 12:47:38 -0400 Subject: [Bioperl-l] bioperl-dev or branch? : redux In-Reply-To: References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com> Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife> Hi All, I thought I'd revisit this thread, since in the last couple weeks, have used both techniques (bioperl-dev and branch from trunk) to produce completed projects. My thoughts: Using bioperl-dev was very nice for creating Bio::Search::Tiling, a new addition to the core api. There was no pressure to conform to the existing api there. In particular, there was no implicit insistence to make things work through Bio::Search::Utils, and I was free to factor it out. The Tiling api was definitely unstable until the end, when it was ported to the core. As I made regular reports to bioperl-l, everything was transparent and up front, and I received excellent suggestions there (as usual). For Bio::Restriction, using the branch was just as natural. Here, the existing structure was well established, and all the work needed to happen beneath the api. All old t/Restriction tests needed to pass, and additional ones created for the new functionality. So here, using bioperl-dev wasn't natural, even though some "experiments" needed to be tried (some succeeded and some failed, as you can see in the commentary at Bug #2855). Even though the new code turned out to require substantial effort, the effort was required to fix a true bug in the working core, and any fixes needed to work transparently with respect to the users for whom this bug had not been an issue. Using the branch made it relatively easy to merge quickly back into the core when done, and there is a certain psychological pressure too provided by an open branch which is helpful. Hilmar raised the very good point in the previous discussion that (essentially) bioperl-dev shouldn't become a sandbox with lots of unfinished code scraps and derelict stuff that doesn't work. My view is bioperl-dev will become a sandbox only if we treat it like one. I've filled out the Bioperl-dev page on the wiki (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing some recognition to devs there whose modules become part of the core may be a better way to insure that projects that are started on bioperl-dev actually get finished, than to prescribe beforehand what kinds of projects may get started. I believe this follows the adage of liberality on what is accepted, and strictness on what is emitted. cheers, MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chase Miller" Cc: "BioPerl List" Sent: Thursday, May 21, 2009 4:00 PM Subject: Re: [Bioperl-l] bioperl-dev or branch? > Moving this question to the BioPerl list, which is where we need to > discuss this I think. Can someone refresh my memory on what the > Bioperl-dev repository is or was meant for? It doesn't seem documented > on the wiki. > > My (admittedly vague) recollection is that bioperl-dev is basically > for highly experimental changes or functionality. > > I'm not clear why everything else shouldn't go either into the main > trunk or into a branch. If there is a realistic expectation for > something to be folded into the main trunk sooner or later, what would > be the reasons for not putting it into a branch of the main > repository? If we are putting it into a separate repository, we're > waiving a lot of svn's support for merging and resolving concurrent > edits. > > I would also go actually go a step further and suggest that even if > this GSoC project starts out on a branch (which I can see good reasons > for, such as eliminating fear to disrupt something), there should be a > plan to move to main trunk before the end of the project. We've had a > good tradition in BioPerl of developing directly on the main trunk. It > sometimes leads to occasional disruptions when lots of tests seem > failing, but it also encourages development discipline and make new > code to melt into the BioPerl code base without requiring any extra > steps by someone. > > Any and all thoughts or comments welcome and appreciated! > > -hilmar > > On May 21, 2009, at 11:26 AM, Chase Miller wrote: > >> This brings me to a question about where I should have my code >> repository. Originally, I was going to use Bioperl-dev, but it was >> brought to my attention that that repository does not normally >> receive daily updates and it might not be the right place for my day >> to day development. An alternative would be to use something like >> google code on a daily basis and commit to Bioperl-dev on a weekly >> basis. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 13:06:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:06:44 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> On Jun 17, 2009, at 8:25 AM, Peter wrote: > On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields > wrote: >> >> Elia, >> >> As Mark indicated, we recently discussed the lack of support for >> next-gen on >> list, at least re: fastq. I may be hit with the same thing in a >> few months >> time myself, and I recall Jason and a few others also mentioning >> the same. >> Heikki wrote some code for Illumina FASTQ for SeqIO and related >> modules but >> I don't believe it has been committed to trunk yet, so maybe he can >> answer. >> >> From prior discussions IIRC the issues were: >> >> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, >> Illumina >> 1.3) from one another (so maybe some optional validation), and > > Following the python rule of thumb for being explicit, Biopython makes > the user specify which FASTQ variant is being used. I don't think you > can do anything else. Any attempted validation would have to be > heuristic based on the ASCII characters found, and would risk false > positive warnings. Right; I'm thinking along the same lines. If anything the most we would allow is some level of validation, so if there were a degree of uncertainty about the format one could set a validation flag to check bounds during the parse and warn if they are exceeded. >> 2) having a way for the Seq object to either 'know' what format is >> contained, or we use phred score and convert back and forth from >> that (I >> think the latter makes more sense). > > I think it could make sense for BioPerl to convert Solexa scores to/ > from > PHRED scores on the fly (especially now that Illumina is abandoning > the Solexa score system). Python style tries to avoid implicit > conversions, > so Biopython doesn't automatically do a conversion from Solexa to > PHRED scores on parsing (but will on writing if the requested output > format requires this). > >> Peter's suggestions also are reasonable, though does biopython have a >> separate module for each of these variations? Our version (I >> believe) >> mainly varied the conversion within Bio::SeqIO::fastq itself based >> on the >> fastq variant passed in as a separate named argument. > > Biopython's SeqIO gives the three FASTQ variants their own unique > names. This format name is a required argument for parsing/writing > (we don't try and guess the file format from the data contents). > Internally > we have three separate FASTQ parsers/writers although they do share > code. We could easily do the same if others agree. Actually, if we specified that shorthand for a variant on a format would be designated as -format => 'format-variant', I think we could easily hack SeqIO to deal with that by splitting on '-' and passing everything to the constructor as (-format => 'format', -variant => 'variant'). Very little repeated code in this case, just an additional named parameter indicating the format variant (and the SeqIO class can do the type checking on that within the constructor). > Other issues to keep in mind: > > (3) There should be no warning parsing files where the optional > repeated > title is missing on the "+" lines (as discussed earlier on the > BioPerl list). Agreed, though we'll have to check the current fastq parser to see if that's currently the case. I thought that was fixed but maybe not? > (4) When writing FASTQ files should BioPerl omit the optional repeated > title on the "+" line? Biopython omits this as I understand this to be > common practice, and can make a big different to file sizes - > especially > on short read data from Solexa/Illumina. Agreed, particularly if it's commonly encountered. > (5) Also test reading and writing files with an optional description > (as well > as an identifier) on the "@" (and "+") lines. See the NCBI SRA for > examples, > e.g. > > @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 > GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC > +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 > IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC Should be easy enough to implement with a simple regex. > (6) Test reading and writing files where the encoded quality string > starts > with a "@" or a "+" character, e.g. > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html > > Peter Mark, getting all that? ;> chris From cjfields at illinois.edu Wed Jun 17 13:09:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:09:54 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). The key issues affecting speed in bioperl are contained object instantiation and inheritance (and between those two, the latter much more so as it plays a role with contained objects as well as the container). http://www.bioperl.org/wiki/Why_BioPerl_is_slow Moose/Perl6 roles/traits are one way around that issue, but we are a ways off from getting that running. I think to get that working decently would be a from-ground-up endeavor (see my past posts on biomoose/bioperl6). > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? > > I can contribute some tests if that sounds like an important > point. > > -Tristan The quality objects themselves I don't think are that heavy; I think the main impediment is inheritance. One could get around that a bit by using a direct_new method to create a blessed hash directly, then reimplement methods to lazily create any objects contained on the fly. chris From bill at genenformics.com Wed Jun 17 13:03:16 2009 From: bill at genenformics.com (bill at genenformics.com) Date: Wed, 17 Jun 2009 10:03:16 -0700 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com> Hopefully this is helpful. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648 Bill at genenformics > Warning: This is very ugly code and makes a few assumptions, such as the > alignment objects are stored in order of their start position. I made > this assumption as that is how I put them into the object to begin with. > > =head1 C > > Function to slice up an alignment sequence based on start and end > parameters > and returns a new alignment object. > > slice($alignment, $start, $end) > > =cut > > sub slice > { > my ($alignment, $start, $end, $new_align) = @_; > > $$new_align = new Bio::SimpleAlign; > print $$alignment->no_sequences() . "\n"; > > $$new_align->add_seq( > new Bio::LocatableSeq( > -seq => > substr( > > $$alignment->get_seq_by_pos(1)->seq(), > $start - 1, $end > - $start + 1 > ), > -id => > $$alignment->get_seq_by_pos(1)->display_id(), > -start => > > max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1), > -end => min( > > $$alignment->get_seq_by_pos(1)->end - $start + 1, > $end - $start > + 1 > ), > -alphabet => 'dna', > -strand => > $$alignment->get_seq_by_pos(1)->strand() > ) > ); > > # implement a binary search to determine a decent offset into > the alignment > my $probe; > > if ($$alignment->no_sequences() <= 2) { > $probe = $$alignment->no_sequences(); > } > else { > my ($L, $R) = (1, $$alignment->no_sequences()); > while (($R - $L) > 1) > { > $probe = floor(($R + $L) / 2); > > # gotta watch this. Had the check backwards and so was > never going > # in the right direction for the search. If I reverse > these two > # variables, then I have to either reverse the > conditions or change > # the > to a <. > if ($$alignment->get_seq_by_pos($probe)->start() > > $start) > { > $R = $probe; > } > else > { > $L = $probe; > } > } > } > # now go through the results that are after that point > for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++) > { > my $seq = $$alignment->get_seq_by_pos($i); > last if ($seq->start() > $end); > > # Only concern ourselves with primers that land inside > the desired region > # other primers will show up in the image maps for each > gene. > if ($seq->start() >= $start && $seq->end() <= $end) > { > > # values for the substr pullout of a given > sequence > my $offset = max($start - $seq->start(), 0); > my $length = > min($end, $seq->end()) - max($start, > $seq->start()) + 1; > $$new_align->add_seq( > new Bio::LocatableSeq( > -seq => $seq->seq(), > -id => > $seq->display_id(), > -start => > max($seq->start - $start + 1, 1), > -end => min($seq->end - > $start + 1, $end - $start + 1), > -alphabet => 'dna', > -strand => > $seq->strand() > ) > ); > } > } > return 1; > } > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Malcolm Cook >> Sent: Tuesday, June 16, 2009 1:07 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Alignment->slice() issue? >> >> Kevin, >> >> I'm getting struck by this old issue you once coded around. >> >> http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html >> >> Any chance you could share your implementation with fellow >> traveller... >> >> ?? >> >> Thanks, >> >> Malcolm Cook >> Stowers insitute for Medical research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jun 17 13:13:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 13:13:23 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> I'm on the case! (but maybe not in realtime, today!) ----- Original Message ----- From: "Chris Fields" To: "Peter" Cc: "BioPerl List" ; "Elia Stupka" ; "Heikki Lehvaslaiho" Sent: Wednesday, June 17, 2009 1:06 PM Subject: Re: [Bioperl-l] Next-gen modules > > On Jun 17, 2009, at 8:25 AM, Peter wrote: > >> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields wrote: >>> >>> Elia, >>> >>> As Mark indicated, we recently discussed the lack of support for next-gen >>> on >>> list, at least re: fastq. I may be hit with the same thing in a few months >>> time myself, and I recall Jason and a few others also mentioning the same. >>> Heikki wrote some code for Illumina FASTQ for SeqIO and related modules >>> but >>> I don't believe it has been committed to trunk yet, so maybe he can answer. >>> >>> From prior discussions IIRC the issues were: >>> >>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, >>> Illumina >>> 1.3) from one another (so maybe some optional validation), and >> >> Following the python rule of thumb for being explicit, Biopython makes >> the user specify which FASTQ variant is being used. I don't think you >> can do anything else. Any attempted validation would have to be >> heuristic based on the ASCII characters found, and would risk false >> positive warnings. > > Right; I'm thinking along the same lines. If anything the most we would > allow is some level of validation, so if there were a degree of uncertainty > about the format one could set a validation flag to check bounds during the > parse and warn if they are exceeded. > >>> 2) having a way for the Seq object to either 'know' what format is >>> contained, or we use phred score and convert back and forth from that (I >>> think the latter makes more sense). >> >> I think it could make sense for BioPerl to convert Solexa scores to/ from >> PHRED scores on the fly (especially now that Illumina is abandoning >> the Solexa score system). Python style tries to avoid implicit conversions, >> so Biopython doesn't automatically do a conversion from Solexa to >> PHRED scores on parsing (but will on writing if the requested output >> format requires this). >> >>> Peter's suggestions also are reasonable, though does biopython have a >>> separate module for each of these variations? Our version (I believe) >>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the >>> fastq variant passed in as a separate named argument. >> >> Biopython's SeqIO gives the three FASTQ variants their own unique >> names. This format name is a required argument for parsing/writing >> (we don't try and guess the file format from the data contents). Internally >> we have three separate FASTQ parsers/writers although they do share >> code. > > We could easily do the same if others agree. Actually, if we specified that > shorthand for a variant on a format would be designated as -format => > 'format-variant', I think we could easily hack SeqIO to deal with that by > splitting on '-' and passing everything to the constructor as (-format => > 'format', -variant => 'variant'). Very little repeated code in this case, > just an additional named parameter indicating the format variant (and the > SeqIO class can do the type checking on that within the constructor). > >> Other issues to keep in mind: >> >> (3) There should be no warning parsing files where the optional repeated >> title is missing on the "+" lines (as discussed earlier on the BioPerl >> list). > > Agreed, though we'll have to check the current fastq parser to see if that's > currently the case. I thought that was fixed but maybe not? > >> (4) When writing FASTQ files should BioPerl omit the optional repeated >> title on the "+" line? Biopython omits this as I understand this to be >> common practice, and can make a big different to file sizes - especially >> on short read data from Solexa/Illumina. > > Agreed, particularly if it's commonly encountered. > >> (5) Also test reading and writing files with an optional description (as >> well >> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples, >> e.g. >> >> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC > > Should be easy enough to implement with a simple regex. > >> (6) Test reading and writing files where the encoded quality string starts >> with a "@" or a "+" character, e.g. >> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >> >> Peter > > Mark, getting all that? ;> > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From e.stupka at ucl.ac.uk Wed Jun 17 13:49:38 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 18:49:38 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: I would suggest developing the "standard" version first, then moving onto potential optimizations. When we went through a similar argument in Ensembl about 8 years ago we ended up dropping Bio::Root completely... If one is truly after performance for these large next-gen projects, it'd be down to pure piping, shell, and worrying about location and copying of files, sticking to systems-level as much as possible, and quite far from Bioperl altogether, so I think it's a whole different level of optimization issues, probably outside the scope of Bioperl. Elia On 17 Jun 2009, at 18:09, Chris Fields wrote: > > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > >> Hello, >> Regarding next-gen sequences and bioperl, following my >> experience, another issue is bioperl speed. For example, if >> you want to trim bad quality bases at ends of 1E6 Solexa >> reads using Bio::SeqIO::fastq and some methods in >> Bio::Seq::Quality, well, you've got to be patient (but may >> be I missed some shortcuts...). > > The key issues affecting speed in bioperl are contained object > instantiation and inheritance (and between those two, the latter > much more so as it plays a role with contained objects as well as > the container). > > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > > Moose/Perl6 roles/traits are one way around that issue, but we are a > ways off from getting that running. I think to get that working > decently would be a from-ground-up endeavor (see my past posts on > biomoose/bioperl6). > >> A pure perl solution will be between 100 to 1000x faster... >> Would it be possible to have an ultra-light quality object >> with few simple methods for next-gen reads? >> >> I can contribute some tests if that sounds like an important >> point. >> >> -Tristan > > The quality objects themselves I don't think are that heavy; I think > the main impediment is inheritance. One could get around that a bit > by using a direct_new method to create a blessed hash directly, then > reimplement methods to lazily create any objects contained on the fly. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From cjfields at illinois.edu Wed Jun 17 13:52:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:52:49 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> I think this is a top priority for a fall BioPerl release, maybe 1.6.2 (I am planning on a summer 1.6.1 release still). Made it into a bug report for tracking: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 If no one works on this I may take it up after the 1.6.1 release. chris On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote: > I'm on the case! (but maybe not in realtime, today!) > > ----- Original Message ----- From: "Chris Fields" > > To: "Peter" > Cc: "BioPerl List" ; "Elia Stupka" >; "Heikki Lehvaslaiho" > Sent: Wednesday, June 17, 2009 1:06 PM > Subject: Re: [Bioperl-l] Next-gen modules > > >> >> On Jun 17, 2009, at 8:25 AM, Peter wrote: >> >>> On Wed, Jun 17, 2009 at 1:57 PM, Chris >>> Fields wrote: >>>> >>>> Elia, >>>> >>>> As Mark indicated, we recently discussed the lack of support for >>>> next-gen on >>>> list, at least re: fastq. I may be hit with the same thing in a >>>> few months >>>> time myself, and I recall Jason and a few others also mentioning >>>> the same. >>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related >>>> modules but >>>> I don't believe it has been committed to trunk yet, so maybe he >>>> can answer. >>>> >>>> From prior discussions IIRC the issues were: >>>> >>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina >>>> 1.0, Illumina >>>> 1.3) from one another (so maybe some optional validation), and >>> >>> Following the python rule of thumb for being explicit, Biopython >>> makes >>> the user specify which FASTQ variant is being used. I don't think >>> you >>> can do anything else. Any attempted validation would have to be >>> heuristic based on the ASCII characters found, and would risk false >>> positive warnings. >> >> Right; I'm thinking along the same lines. If anything the most we >> would allow is some level of validation, so if there were a degree >> of uncertainty about the format one could set a validation flag to >> check bounds during the parse and warn if they are exceeded. >> >>>> 2) having a way for the Seq object to either 'know' what format is >>>> contained, or we use phred score and convert back and forth from >>>> that (I >>>> think the latter makes more sense). >>> >>> I think it could make sense for BioPerl to convert Solexa scores >>> to/ from >>> PHRED scores on the fly (especially now that Illumina is abandoning >>> the Solexa score system). Python style tries to avoid implicit >>> conversions, >>> so Biopython doesn't automatically do a conversion from Solexa to >>> PHRED scores on parsing (but will on writing if the requested output >>> format requires this). >>> >>>> Peter's suggestions also are reasonable, though does biopython >>>> have a >>>> separate module for each of these variations? Our version (I >>>> believe) >>>> mainly varied the conversion within Bio::SeqIO::fastq itself >>>> based on the >>>> fastq variant passed in as a separate named argument. >>> >>> Biopython's SeqIO gives the three FASTQ variants their own unique >>> names. This format name is a required argument for parsing/writing >>> (we don't try and guess the file format from the data contents). >>> Internally >>> we have three separate FASTQ parsers/writers although they do share >>> code. >> >> We could easily do the same if others agree. Actually, if we >> specified that shorthand for a variant on a format would be >> designated as -format => 'format-variant', I think we could easily >> hack SeqIO to deal with that by splitting on '-' and passing >> everything to the constructor as (-format => 'format', -variant => >> 'variant'). Very little repeated code in this case, just an >> additional named parameter indicating the format variant (and the >> SeqIO class can do the type checking on that within the >> constructor). >> >>> Other issues to keep in mind: >>> >>> (3) There should be no warning parsing files where the optional >>> repeated >>> title is missing on the "+" lines (as discussed earlier on the >>> BioPerl list). >> >> Agreed, though we'll have to check the current fastq parser to see >> if that's currently the case. I thought that was fixed but maybe >> not? >> >>> (4) When writing FASTQ files should BioPerl omit the optional >>> repeated >>> title on the "+" line? Biopython omits this as I understand this >>> to be >>> common practice, and can make a big different to file sizes - >>> especially >>> on short read data from Solexa/Illumina. >> >> Agreed, particularly if it's commonly encountered. >> >>> (5) Also test reading and writing files with an optional >>> description (as well >>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA >>> for examples, >>> e.g. >>> >>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC >> >> Should be easy enough to implement with a simple regex. >> >>> (6) Test reading and writing files where the encoded quality >>> string starts >>> with a "@" or a "+" character, e.g. >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >>> >>> Peter >> >> Mark, getting all that? ;> >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 14:01:28 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 19:01:28 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> Message-ID: If we reach a consensus on how/who/what, I will be happy to contribute some coding time in the coming days. Would it be a good starting point to start adding the different formats as named in BioPython, and test support for reading/wrting them? I could start playing with that. regards, Elia On 17 Jun 2009, at 18:52, Chris Fields wrote: > I think this is a top priority for a fall BioPerl release, maybe > 1.6.2 (I am planning on a summer 1.6.1 release still). Made it into > a bug report for tracking: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2857 > > If no one works on this I may take it up after the 1.6.1 release. > > chris > > On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote: > >> I'm on the case! (but maybe not in realtime, today!) >> >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Peter" >> Cc: "BioPerl List" ; "Elia Stupka" > >; "Heikki Lehvaslaiho" >> Sent: Wednesday, June 17, 2009 1:06 PM >> Subject: Re: [Bioperl-l] Next-gen modules >> >> >>> >>> On Jun 17, 2009, at 8:25 AM, Peter wrote: >>> >>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris >>>> Fields wrote: >>>>> >>>>> Elia, >>>>> >>>>> As Mark indicated, we recently discussed the lack of support >>>>> for next-gen on >>>>> list, at least re: fastq. I may be hit with the same thing in >>>>> a few months >>>>> time myself, and I recall Jason and a few others also >>>>> mentioning the same. >>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related >>>>> modules but >>>>> I don't believe it has been committed to trunk yet, so maybe he >>>>> can answer. >>>>> >>>>> From prior discussions IIRC the issues were: >>>>> >>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina >>>>> 1.0, Illumina >>>>> 1.3) from one another (so maybe some optional validation), and >>>> >>>> Following the python rule of thumb for being explicit, Biopython >>>> makes >>>> the user specify which FASTQ variant is being used. I don't think >>>> you >>>> can do anything else. Any attempted validation would have to be >>>> heuristic based on the ASCII characters found, and would risk false >>>> positive warnings. >>> >>> Right; I'm thinking along the same lines. If anything the most >>> we would allow is some level of validation, so if there were a >>> degree of uncertainty about the format one could set a validation >>> flag to check bounds during the parse and warn if they are >>> exceeded. >>> >>>>> 2) having a way for the Seq object to either 'know' what format is >>>>> contained, or we use phred score and convert back and forth >>>>> from that (I >>>>> think the latter makes more sense). >>>> >>>> I think it could make sense for BioPerl to convert Solexa scores >>>> to/ from >>>> PHRED scores on the fly (especially now that Illumina is abandoning >>>> the Solexa score system). Python style tries to avoid implicit >>>> conversions, >>>> so Biopython doesn't automatically do a conversion from Solexa to >>>> PHRED scores on parsing (but will on writing if the requested >>>> output >>>> format requires this). >>>> >>>>> Peter's suggestions also are reasonable, though does biopython >>>>> have a >>>>> separate module for each of these variations? Our version (I >>>>> believe) >>>>> mainly varied the conversion within Bio::SeqIO::fastq itself >>>>> based on the >>>>> fastq variant passed in as a separate named argument. >>>> >>>> Biopython's SeqIO gives the three FASTQ variants their own unique >>>> names. This format name is a required argument for parsing/writing >>>> (we don't try and guess the file format from the data contents). >>>> Internally >>>> we have three separate FASTQ parsers/writers although they do share >>>> code. >>> >>> We could easily do the same if others agree. Actually, if we >>> specified that shorthand for a variant on a format would be >>> designated as -format => 'format-variant', I think we could >>> easily hack SeqIO to deal with that by splitting on '-' and >>> passing everything to the constructor as (-format => 'format', - >>> variant => 'variant'). Very little repeated code in this case, >>> just an additional named parameter indicating the format variant >>> (and the SeqIO class can do the type checking on that within the >>> constructor). >>> >>>> Other issues to keep in mind: >>>> >>>> (3) There should be no warning parsing files where the optional >>>> repeated >>>> title is missing on the "+" lines (as discussed earlier on the >>>> BioPerl list). >>> >>> Agreed, though we'll have to check the current fastq parser to see >>> if that's currently the case. I thought that was fixed but maybe >>> not? >>> >>>> (4) When writing FASTQ files should BioPerl omit the optional >>>> repeated >>>> title on the "+" line? Biopython omits this as I understand this >>>> to be >>>> common practice, and can make a big different to file sizes - >>>> especially >>>> on short read data from Solexa/Illumina. >>> >>> Agreed, particularly if it's commonly encountered. >>> >>>> (5) Also test reading and writing files with an optional >>>> description (as well >>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA >>>> for examples, >>>> e.g. >>>> >>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC >>> >>> Should be easy enough to implement with a simple regex. >>> >>>> (6) Test reading and writing files where the encoded quality >>>> string starts >>>> with a "@" or a "+" character, e.g. >>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >>>> >>>> Peter >>> >>> Mark, getting all that? ;> >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From tristan.lefebure at gmail.com Wed Jun 17 14:09:42 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 17 Jun 2009 14:09:42 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <200906171409.42558.tristan.lefebure@gmail.com> Thanks both for the light. That probably means that the place bioperl will take in the handling of the next-gen sequencing raw data (i.e. reads) is very limited, nope? (at least until bioperl6). A single GA2 solexa lane generates about 9 million reads, and I would really not called that a big project... BTW, is there a simple way to see object instantiation and inheritance, as well as time consumption for each, when once calls next_seq() (or any other method)? -Tristan On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote: > I would suggest developing the "standard" version first, > then moving onto potential optimizations. > > When we went through a similar argument in Ensembl about > 8 years ago we ended up dropping Bio::Root completely... > > If one is truly after performance for these large > next-gen projects, it'd be down to pure piping, shell, > and worrying about location and copying of files, > sticking to systems-level as much as possible, and quite > far from Bioperl altogether, so I think it's a whole > different level of optimization issues, probably outside > the scope of Bioperl. > > Elia > > On 17 Jun 2009, at 18:09, Chris Fields wrote: > > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > >> Hello, > >> Regarding next-gen sequences and bioperl, following my > >> experience, another issue is bioperl speed. For > >> example, if you want to trim bad quality bases at ends > >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some > >> methods in Bio::Seq::Quality, well, you've got to be > >> patient (but may be I missed some shortcuts...). > > > > The key issues affecting speed in bioperl are contained > > object instantiation and inheritance (and between those > > two, the latter much more so as it plays a role with > > contained objects as well as the container). > > > > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > > > > Moose/Perl6 roles/traits are one way around that issue, > > but we are a ways off from getting that running. I > > think to get that working decently would be a > > from-ground-up endeavor (see my past posts on > > biomoose/bioperl6). > > > >> A pure perl solution will be between 100 to 1000x > >> faster... Would it be possible to have an ultra-light > >> quality object with few simple methods for next-gen > >> reads? > >> > >> I can contribute some tests if that sounds like an > >> important point. > >> > >> -Tristan > > > > The quality objects themselves I don't think are that > > heavy; I think the main impediment is inheritance. One > > could get around that a bit by using a direct_new > > method to create a blessed hash directly, then > > reimplement methods to lazily create any objects > > contained on the fly. > > > > chris > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 From bix at sendu.me.uk Wed Jun 17 14:20:00 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 19:20:00 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: <4A3933D0.4040808@sendu.me.uk> Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). This is my concern as well. Or, rather, is there actually a significant set of users out there who are dealing with next-gen sequencing and would consider using BioPerl for their work? I'm working with all the 1000-genomes data at the Sanger, and we at least are probably never going to use BioPerl for the work. > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? The fastq parser itself already seems pretty fast. The way to get the speedup is to not create any Bio::Seq* objects but just return the data directly. At that point it's not taking much advantage of BioPerl. But certainly it could be done... From e.stupka at ucl.ac.uk Wed Jun 17 14:39:08 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 19:39:08 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <200906171409.42558.tristan.lefebure@gmail.com> Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk> We are using bioperl for simple pre and post-processing of data for full Solexa runs, and although it might not be ideal, the scripting with Bioperl is not a major killer. When I was referring to large, heavy pipelines I was thinking of pipelines that deal with many Solexa runs as one project (e.g. 1000 genomes) who really cannot afford any bottleneck in their pipelines, because that affects directly their storage. cheers Elia On 17 Jun 2009, at 19:09, Tristan Lefebure wrote: > Thanks both for the light. > > That probably means that the place bioperl will take in the > handling of the next-gen sequencing raw data (i.e. reads) is > very limited, nope? (at least until bioperl6). A single GA2 > solexa lane generates about 9 million reads, and I would > really not called that a big project... > > BTW, is there a simple way to see object instantiation and > inheritance, as well as time consumption for each, when once > calls next_seq() (or any other method)? > > -Tristan > > On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote: >> I would suggest developing the "standard" version first, >> then moving onto potential optimizations. >> >> When we went through a similar argument in Ensembl about >> 8 years ago we ended up dropping Bio::Root completely... >> >> If one is truly after performance for these large >> next-gen projects, it'd be down to pure piping, shell, >> and worrying about location and copying of files, >> sticking to systems-level as much as possible, and quite >> far from Bioperl altogether, so I think it's a whole >> different level of optimization issues, probably outside >> the scope of Bioperl. >> >> Elia >> >> On 17 Jun 2009, at 18:09, Chris Fields wrote: >>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For >>>> example, if you want to trim bad quality bases at ends >>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some >>>> methods in Bio::Seq::Quality, well, you've got to be >>>> patient (but may be I missed some shortcuts...). >>> >>> The key issues affecting speed in bioperl are contained >>> object instantiation and inheritance (and between those >>> two, the latter much more so as it plays a role with >>> contained objects as well as the container). >>> >>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow >>> >>> Moose/Perl6 roles/traits are one way around that issue, >>> but we are a ways off from getting that running. I >>> think to get that working decently would be a >>> from-ground-up endeavor (see my past posts on >>> biomoose/bioperl6). >>> >>>> A pure perl solution will be between 100 to 1000x >>>> faster... Would it be possible to have an ultra-light >>>> quality object with few simple methods for next-gen >>>> reads? >>>> >>>> I can contribute some tests if that sounds like an >>>> important point. >>>> >>>> -Tristan >>> >>> The quality objects themselves I don't think are that >>> heavy; I think the main impediment is inheritance. One >>> could get around that a bit by using a direct_new >>> method to create a blessed hash directly, then >>> reimplement methods to lazily create any objects >>> contained on the fly. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 > > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From cjfields at illinois.edu Wed Jun 17 14:40:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 13:40:05 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <200906171409.42558.tristan.lefebure@gmail.com> Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu> On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote: > Thanks both for the light. > > That probably means that the place bioperl will take in the > handling of the next-gen sequencing raw data (i.e. reads) is > very limited, nope? (at least until bioperl6). A single GA2 > solexa lane generates about 9 million reads, and I would > really not called that a big project... I don't think it's impossible. If you parse any very long list of sequences in order it will be very slow, yes, but if they were indexed or loaded into a DB lookups would of course be magnitudes faster. We already have perl-based indexing for fastq (Bio::Index::Fastq), so maybe something could be built on top of that. I haven't looked but we can also wrap other C/C++-based parsers as well. BioLib, for instance, has bindings to io_lib, so maybe that could be (ab)used in some way. > BTW, is there a simple way to see object instantiation and > inheritance, as well as time consumption for each, when once > calls next_seq() (or any other method)? > > -Tristan As a simple benchmark, at one point all feature tag information was converted into Bio::Annotations. I reverted that behavior to be simple tag/value again and had a pretty decent bump: http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark Also, I tried reimplementing some parsers as generic 'event'-based driver/handler and they were slightly faster, the key roadblock being instantation again. If I didn't create Features/Annotations I saw a significant speedup. That's not entirely unexpected, as SeqFeatures also contain Locations (in turn that can contain subLocations) and (until recently) tag-based Bio::Annotation by default. Annotations are collected in an Annotation::Collection and can contain other objects I believe (Ontology terms, etc). The overall lesson is, if you don't have very heavy objects being created the overhead is actually quite small; it's only when you greedily instantiate everything that you run into problems. chris From cjfields at illinois.edu Wed Jun 17 15:05:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 14:05:03 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote: > I would suggest developing the "standard" version first, then moving > onto potential optimizations. Yes, agreed. > When we went through a similar argument in Ensembl about 8 years ago > we ended up dropping Bio::Root completely... They (strangely enough) still use it in a few modules and require bioperl 1.2.3, but (in my experience) the latest bioperl works just fine. I asked about that and never got a response. > If one is truly after performance for these large next-gen projects, > it'd be down to pure piping, shell, and worrying about location and > copying of files, sticking to systems-level as much as possible, and > quite far from Bioperl altogether, so I think it's a whole different > level of optimization issues, probably outside the scope of Bioperl. > > Elia In the end I don't think we can run it using perl alone, no, and I believe using BioPerl by itself will not be the optimal solution, but it can probably interface with something that is. chris From e.stupka at ucl.ac.uk Wed Jun 17 15:14:04 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 20:14:04 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk> Excellent, I was thinking of working on Maq and BowTie as priorities. Elia On 17 Jun 2009, at 14:28, John Marshall wrote: > On 17 Jun 2009, at 12:29, Elia Stupka wrote: >> Similarly, there seems to be little in bioperl-run to support tools >> that have been developed in this area, such as Maq, BowTie, TopHat, >> etc? > > FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to > submit in the not too distant future. (First it needs some "blah > blah" replaced with actual documentation and a test suite.) > > Cheers, > > John > > [1] http://www.ebi.ac.uk/~zerbino/velvet/ > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 > and acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 > 2BE._______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From michael.watson at bbsrc.ac.uk Wed Jun 17 15:15:20 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Jun 2009 20:15:20 +0100 Subject: [Bioperl-l] Next-gen modules References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk> In answer to your question, yes! We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results. This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java. Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem. ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala Sent: Wed 17/06/2009 7:20 PM To: tristan.lefebure at gmail.com Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next-gen modules Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). This is my concern as well. Or, rather, is there actually a significant set of users out there who are dealing with next-gen sequencing and would consider using BioPerl for their work? I'm working with all the 1000-genomes data at the Sanger, and we at least are probably never going to use BioPerl for the work. > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? The fastq parser itself already seems pretty fast. The way to get the speedup is to not create any Bio::Seq* objects but just return the data directly. At that point it's not taking much advantage of BioPerl. But certainly it could be done... _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 17 15:30:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 14:30:15 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3933D0.4040808@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > Tristan Lefebure wrote: >> Hello, >> Regarding next-gen sequences and bioperl, following my experience, >> another issue is bioperl speed. For example, if you want to trim >> bad quality bases at ends of 1E6 Solexa reads using >> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >> you've got to be patient (but may be I missed some shortcuts...). > > This is my concern as well. Or, rather, is there actually a > significant set of users out there who are dealing with next-gen > sequencing and would consider using BioPerl for their work? > > I'm working with all the 1000-genomes data at the Sanger, and we at > least are probably never going to use BioPerl for the work. Are you using pure perl or (gasp) something else? ;> Judging by the feedback there are definitely a set of users who would like to integrate nextgen into bioperl somehow, probably to take advantage of other aspects of bioperl. >> A pure perl solution will be between 100 to 1000x faster... Would >> it be possible to have an ultra-light quality object with few >> simple methods for next-gen reads? > > The fastq parser itself already seems pretty fast. The way to get > the speedup is to not create any Bio::Seq* objects but just return > the data directly. At that point it's not taking much advantage of > BioPerl. But certainly it could be done... I suppose the best way to assess what needs to be done is come up with a set of 'use cases' specifying what users want so we can design around them, otherwise we're shooting in the dark. I'm personally wondering if this could be done as a sequence database, something similar in theme to Lincoln's SeqFeature::Store, but sequence only, and returns quality objects in a similar manner (ala Storable)? Not sure whether that's feasible, but it's appears at least scalable. chris From e.stupka at ucl.ac.uk Wed Jun 17 15:37:26 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 20:37:26 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <4C3D793879C64A5E84C67FE313C86FA4@NewLife> Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> Dear all, I tried to summarize today's discussion with what seems to be the "shaping consensus" on the Wiki page: http://www.bioperl.org/wiki/Nextgen_in_Bioperl good night, Elia On 17 Jun 2009, at 13:19, Mark A. Jensen wrote: > [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl > ] > ----- Original Message ----- From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > >> Dear all, >> after several years of absence I am slowly coming back to Bioperl, >> and hope to contribute again to its development. >> One area that I was thinking of starting from, since we are >> actively involved with it, is to improve BIoperl's support fo next- >> gen sequencing data, tools, etc. Since I am sure I have missed out >> on a lot of recent developments, do let me know if/what is useful. >> One example that comes to mind is that the conversion of various >> formats to/from FASTQ does not seem to be supported. Some code can >> be found within Li Heng's script: http://maq.sourceforge.net/ >> fq_all2std.pl but it would be good if it could make its way into >> SeqIO? And similarly, potentially, for other next-gen sequence >> formats? >> Similarly, there seems to be little in bioperl-run to support >> tools that have been developed in this area, such as Maq, BowTie, >> TopHat, etc? >> Do let me know if there is a past thread on this, or other people >> actively developing, etc. so that I can find out what priorities are. >> thanks and best regards to all (old friends and new), >> Elia >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From e.stupka at ucl.ac.uk Wed Jun 17 16:06:35 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 21:06:35 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Interesting that you mention the database issue. We found that for specific memory/CPU intenstive things we also switch to using dbs. For example, after many years of loyal use of disconnected_ranges we switched to a simple SQL implementation of it, because of the large performance gains it would give us. Similarly in Ensembl as well as in the old days of bioperl-db we opted for doing subseq within SQL where possible. Some lean way of SQL'izing specific components could be less "disruptive" than avoiding object creation and provide significant gains in performance. Could be set as an optional flag, and could use temporary ad hoc SQL databases? Still, priority now is to make SeqIO compliant with all those formats, than we can worry about performance :) Elia On 17 Jun 2009, at 20:30, Chris Fields wrote: > On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> Hello, >>> Regarding next-gen sequences and bioperl, following my experience, >>> another issue is bioperl speed. For example, if you want to trim >>> bad quality bases at ends of 1E6 Solexa reads using >>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>> you've got to be patient (but may be I missed some shortcuts...). >> >> This is my concern as well. Or, rather, is there actually a >> significant set of users out there who are dealing with next-gen >> sequencing and would consider using BioPerl for their work? >> >> I'm working with all the 1000-genomes data at the Sanger, and we at >> least are probably never going to use BioPerl for the work. > > Are you using pure perl or (gasp) something else? ;> > > Judging by the feedback there are definitely a set of users who > would like to integrate nextgen into bioperl somehow, probably to > take advantage of other aspects of bioperl. > >>> A pure perl solution will be between 100 to 1000x faster... Would >>> it be possible to have an ultra-light quality object with few >>> simple methods for next-gen reads? >> >> The fastq parser itself already seems pretty fast. The way to get >> the speedup is to not create any Bio::Seq* objects but just return >> the data directly. At that point it's not taking much advantage of >> BioPerl. But certainly it could be done... > > > I suppose the best way to assess what needs to be done is come up > with a set of 'use cases' specifying what users want so we can > design around them, otherwise we're shooting in the dark. > > I'm personally wondering if this could be done as a sequence > database, something similar in theme to Lincoln's SeqFeature::Store, > but sequence only, and returns quality objects in a similar manner > (ala Storable)? Not sure whether that's feasible, but it's appears > at least scalable. > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 16:29:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 16:29:31 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife> <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife> Thanks Elia for those wiki notes-- [I would say you received an enthusiatic 'welcome back'!] cheers, Mark ----- Original Message ----- From: "Elia Stupka" To: "Mark A. Jensen" Cc: Sent: Wednesday, June 17, 2009 3:37 PM Subject: Re: [Bioperl-l] Next-gen modules > Dear all, > > I tried to summarize today's discussion with what seems to be the > "shaping consensus" on the Wiki page: > > http://www.bioperl.org/wiki/Nextgen_in_Bioperl > > good night, > > Elia > > > On 17 Jun 2009, at 13:19, Mark A. Jensen wrote: > >> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl >> ] >> ----- Original Message ----- From: "Elia Stupka" >> To: >> Sent: Wednesday, June 17, 2009 7:29 AM >> Subject: [Bioperl-l] Next-gen modules >> >> >>> Dear all, >>> after several years of absence I am slowly coming back to Bioperl, >>> and hope to contribute again to its development. >>> One area that I was thinking of starting from, since we are >>> actively involved with it, is to improve BIoperl's support fo next- >>> gen sequencing data, tools, etc. Since I am sure I have missed out >>> on a lot of recent developments, do let me know if/what is useful. >>> One example that comes to mind is that the conversion of various >>> formats to/from FASTQ does not seem to be supported. Some code can >>> be found within Li Heng's script: http://maq.sourceforge.net/ >>> fq_all2std.pl but it would be good if it could make its way into >>> SeqIO? And similarly, potentially, for other next-gen sequence >>> formats? >>> Similarly, there seems to be little in bioperl-run to support >>> tools that have been developed in this area, such as Maq, BowTie, >>> TopHat, etc? >>> Do let me know if there is a past thread on this, or other people >>> actively developing, etc. so that I can find out what priorities are. >>> thanks and best regards to all (old friends and new), >>> Elia >>> --- >>> Senior Lecturer, Bioinformatics >>> UCL Cancer Institute >>> Paul O' Gorman Building >>> University College London >>> Gower Street >>> WC1E 6BT >>> London >>> UK >>> Office (UCL): +44 207 679 6493 >>> Office (ICMS): +44 0207 8822374 >>> Mobile: +44 7597 566 194 >>> Mobile (Italy): +39 338 8448801 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 16:35:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 15:35:38 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> So, #1 priority is to get fastq up-to-speed, then maybe assess other options. Illuminating discussion, thanks Elia! urgh, excuse unintended bad pun above... chris On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > Interesting that you mention the database issue. We found that for > specific memory/CPU intenstive things we also switch to using dbs. > For example, after many years of loyal use of disconnected_ranges we > switched to a simple SQL implementation of it, because of the large > performance gains it would give us. Similarly in Ensembl as well as > in the old days of bioperl-db we opted for doing subseq within SQL > where possible. > > Some lean way of SQL'izing specific components could be less > "disruptive" than avoiding object creation and provide significant > gains in performance. Could be set as an optional flag, and could > use temporary ad hoc SQL databases? > > Still, priority now is to make SeqIO compliant with all those > formats, than we can worry about performance :) > > Elia > > On 17 Jun 2009, at 20:30, Chris Fields wrote: > >> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >> >>> Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For example, if you >>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>> you've got to be patient (but may be I missed some shortcuts...). >>> >>> This is my concern as well. Or, rather, is there actually a >>> significant set of users out there who are dealing with next-gen >>> sequencing and would consider using BioPerl for their work? >>> >>> I'm working with all the 1000-genomes data at the Sanger, and we >>> at least are probably never going to use BioPerl for the work. >> >> Are you using pure perl or (gasp) something else? ;> >> >> Judging by the feedback there are definitely a set of users who >> would like to integrate nextgen into bioperl somehow, probably to >> take advantage of other aspects of bioperl. >> >>>> A pure perl solution will be between 100 to 1000x faster... Would >>>> it be possible to have an ultra-light quality object with few >>>> simple methods for next-gen reads? >>> >>> The fastq parser itself already seems pretty fast. The way to get >>> the speedup is to not create any Bio::Seq* objects but just return >>> the data directly. At that point it's not taking much advantage of >>> BioPerl. But certainly it could be done... >> >> >> I suppose the best way to assess what needs to be done is come up >> with a set of 'use cases' specifying what users want so we can >> design around them, otherwise we're shooting in the dark. >> >> I'm personally wondering if this could be done as a sequence >> database, something similar in theme to Lincoln's >> SeqFeature::Store, but sequence only, and returns quality objects >> in a similar manner (ala Storable)? Not sure whether that's >> feasible, but it's appears at least scalable. >> >> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 16:36:31 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 21:36:31 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Better than colorspaced discussions for sure ;) Elia On 17 Jun 2009, at 21:35, Chris Fields wrote: > So, #1 priority is to get fastq up-to-speed, then maybe assess other > options. > > Illuminating discussion, thanks Elia! > > urgh, excuse unintended bad pun above... > > chris > > On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > >> Interesting that you mention the database issue. We found that for >> specific memory/CPU intenstive things we also switch to using dbs. >> For example, after many years of loyal use of disconnected_ranges >> we switched to a simple SQL implementation of it, because of the >> large performance gains it would give us. Similarly in Ensembl as >> well as in the old days of bioperl-db we opted for doing subseq >> within SQL where possible. >> >> Some lean way of SQL'izing specific components could be less >> "disruptive" than avoiding object creation and provide significant >> gains in performance. Could be set as an optional flag, and could >> use temporary ad hoc SQL databases? >> >> Still, priority now is to make SeqIO compliant with all those >> formats, than we can worry about performance :) >> >> Elia >> >> On 17 Jun 2009, at 20:30, Chris Fields wrote: >> >>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> >>>> Tristan Lefebure wrote: >>>>> Hello, >>>>> Regarding next-gen sequences and bioperl, following my >>>>> experience, another issue is bioperl speed. For example, if you >>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>>> you've got to be patient (but may be I missed some shortcuts...). >>>> >>>> This is my concern as well. Or, rather, is there actually a >>>> significant set of users out there who are dealing with next-gen >>>> sequencing and would consider using BioPerl for their work? >>>> >>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>> at least are probably never going to use BioPerl for the work. >>> >>> Are you using pure perl or (gasp) something else? ;> >>> >>> Judging by the feedback there are definitely a set of users who >>> would like to integrate nextgen into bioperl somehow, probably to >>> take advantage of other aspects of bioperl. >>> >>>>> A pure perl solution will be between 100 to 1000x faster... >>>>> Would it be possible to have an ultra-light quality object with >>>>> few simple methods for next-gen reads? >>>> >>>> The fastq parser itself already seems pretty fast. The way to get >>>> the speedup is to not create any Bio::Seq* objects but just >>>> return the data directly. At that point it's not taking much >>>> advantage of BioPerl. But certainly it could be done... >>> >>> >>> I suppose the best way to assess what needs to be done is come up >>> with a set of 'use cases' specifying what users want so we can >>> design around them, otherwise we're shooting in the dark. >>> >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's >>> SeqFeature::Store, but sequence only, and returns quality objects >>> in a similar manner (ala Storable)? Not sure whether that's >>> feasible, but it's appears at least scalable. >>> >>> chris >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 16:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 16:54:00 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife> unintended! Does that mean your delete key's broke...? ----- Original Message ----- From: "Chris Fields" To: "Elia Stupka" Cc: ; Sent: Wednesday, June 17, 2009 4:35 PM Subject: Re: [Bioperl-l] Next-gen modules > So, #1 priority is to get fastq up-to-speed, then maybe assess other > options. > > Illuminating discussion, thanks Elia! > > urgh, excuse unintended bad pun above... > > chris > > On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > >> Interesting that you mention the database issue. We found that for >> specific memory/CPU intenstive things we also switch to using dbs. >> For example, after many years of loyal use of disconnected_ranges we >> switched to a simple SQL implementation of it, because of the large >> performance gains it would give us. Similarly in Ensembl as well as >> in the old days of bioperl-db we opted for doing subseq within SQL >> where possible. >> >> Some lean way of SQL'izing specific components could be less >> "disruptive" than avoiding object creation and provide significant >> gains in performance. Could be set as an optional flag, and could >> use temporary ad hoc SQL databases? >> >> Still, priority now is to make SeqIO compliant with all those >> formats, than we can worry about performance :) >> >> Elia >> >> On 17 Jun 2009, at 20:30, Chris Fields wrote: >> >>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> >>>> Tristan Lefebure wrote: >>>>> Hello, >>>>> Regarding next-gen sequences and bioperl, following my >>>>> experience, another issue is bioperl speed. For example, if you >>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>>> you've got to be patient (but may be I missed some shortcuts...). >>>> >>>> This is my concern as well. Or, rather, is there actually a >>>> significant set of users out there who are dealing with next-gen >>>> sequencing and would consider using BioPerl for their work? >>>> >>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>> at least are probably never going to use BioPerl for the work. >>> >>> Are you using pure perl or (gasp) something else? ;> >>> >>> Judging by the feedback there are definitely a set of users who >>> would like to integrate nextgen into bioperl somehow, probably to >>> take advantage of other aspects of bioperl. >>> >>>>> A pure perl solution will be between 100 to 1000x faster... Would >>>>> it be possible to have an ultra-light quality object with few >>>>> simple methods for next-gen reads? >>>> >>>> The fastq parser itself already seems pretty fast. The way to get >>>> the speedup is to not create any Bio::Seq* objects but just return >>>> the data directly. At that point it's not taking much advantage of >>>> BioPerl. But certainly it could be done... >>> >>> >>> I suppose the best way to assess what needs to be done is come up >>> with a set of 'use cases' specifying what users want so we can >>> design around them, otherwise we're shooting in the dark. >>> >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's >>> SeqFeature::Store, but sequence only, and returns quality objects >>> in a similar manner (ala Storable)? Not sure whether that's >>> feasible, but it's appears at least scalable. >>> >>> chris >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hartzell at alerce.com Wed Jun 17 16:40:03 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 17 Jun 2009 13:40:03 -0700 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3933D0.4040808@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com> Sendu Bala writes: > Tristan Lefebure wrote: > > Hello, > > Regarding next-gen sequences and bioperl, following my > > experience, another issue is bioperl speed. For example, if > > you want to trim bad quality bases at ends of 1E6 Solexa > > reads using Bio::SeqIO::fastq and some methods in > > Bio::Seq::Quality, well, you've got to be patient (but may > > be I missed some shortcuts...). > > This is my concern as well. Or, rather, is there actually a significant > set of users out there who are dealing with next-gen sequencing and > would consider using BioPerl for their work? > > I'm working with all the 1000-genomes data at the Sanger, and we at > least are probably never going to use BioPerl for the work. > [...] Is it purely a speed issue, or are there other issues (e.g. stability, correctness, compatibility) that are contributing to your decision? What *are* you using? g. From bix at sendu.me.uk Wed Jun 17 18:10:57 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 23:10:57 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> Message-ID: <4A3969F1.8080002@sendu.me.uk> Chris Fields wrote: > On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> Hello, >>> Regarding next-gen sequences and bioperl, following my experience, >>> another issue is bioperl speed. For example, if you want to trim bad >>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and >>> some methods in Bio::Seq::Quality, well, you've got to be patient >>> (but may be I missed some shortcuts...). >> >> This is my concern as well. Or, rather, is there actually a >> significant set of users out there who are dealing with next-gen >> sequencing and would consider using BioPerl for their work? >> >> I'm working with all the 1000-genomes data at the Sanger, and we at >> least are probably never going to use BioPerl for the work. > > Are you using pure perl or (gasp) something else? ;> We use some perl stuff, some C stuff. My own stuff is OO perl, but much lighter weight than BioPerl. Absolute minimal object creation. >>> A pure perl solution will be between 100 to 1000x faster... Would it >>> be possible to have an ultra-light quality object with few simple >>> methods for next-gen reads? >> >> The fastq parser itself already seems pretty fast. The way to get the >> speedup is to not create any Bio::Seq* objects but just return the >> data directly. At that point it's not taking much advantage of >> BioPerl. But certainly it could be done... > > I suppose the best way to assess what needs to be done is come up with a > set of 'use cases' specifying what users want so we can design around > them, otherwise we're shooting in the dark. Indeed. Though at least I think we can all agree it would be nice to have the functionality there even if it's slow. There will always be at least some use-cases where the run speed doesn't matter. > I'm personally wondering if this could be done as a sequence database, > something similar in theme to Lincoln's SeqFeature::Store, but sequence > only, and returns quality objects in a similar manner (ala Storable)? > Not sure whether that's feasible, but it's appears at least scalable. I think not. Well, at least SeqFeature::Store doesn't scale. Try storing millions of features in a database and watch it crawl to complete unusability. I can't imagine a db scaling to holding hundreds of TB of data either. I'm also not sure what the benefit is. There are already high-speed ways of indexing your fastq or bam files. From bix at sendu.me.uk Wed Jun 17 18:24:50 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 23:24:50 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <19001.21667.127519.462899@already.dhcp.gene.com> Message-ID: <4A396D32.5070909@sendu.me.uk> George Hartzell wrote: > Sendu Bala writes: > > Tristan Lefebure wrote: > > > Hello, > > > Regarding next-gen sequences and bioperl, following my > > > experience, another issue is bioperl speed. For example, if > > > you want to trim bad quality bases at ends of 1E6 Solexa > > > reads using Bio::SeqIO::fastq and some methods in > > > Bio::Seq::Quality, well, you've got to be patient (but may > > > be I missed some shortcuts...). > > > > This is my concern as well. Or, rather, is there actually a significant > > set of users out there who are dealing with next-gen sequencing and > > would consider using BioPerl for their work? > > > > I'm working with all the 1000-genomes data at the Sanger, and we at > > least are probably never going to use BioPerl for the work. > > [...] > > Is it purely a speed issue, or are there other issues (e.g. stability, > correctness, compatibility) that are contributing to your decision? Too heavy-weight, too slow, too memory intensive, missing too much functionality in any case. If I have to write new parsers and wrappers, I may as well make them fast (which means they don't "fit" into BioPerl). > What *are* you using? There are already great tools written in C that do all the heavy lifting and the rest is done in perl written for speed and low memory. From cjfields at illinois.edu Wed Jun 17 18:38:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 17:38:26 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For example, if you >>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>> you've got to be patient (but may be I missed some shortcuts...). >>> >>> This is my concern as well. Or, rather, is there actually a >>> significant set of users out there who are dealing with next-gen >>> sequencing and would consider using BioPerl for their work? >>> >>> I'm working with all the 1000-genomes data at the Sanger, and we >>> at least are probably never going to use BioPerl for the work. >> Are you using pure perl or (gasp) something else? ;> > > We use some perl stuff, some C stuff. My own stuff is OO perl, but > much lighter weight than BioPerl. Absolute minimal object creation. Makes sense. >>>> A pure perl solution will be between 100 to 1000x faster... Would >>>> it be possible to have an ultra-light quality object with few >>>> simple methods for next-gen reads? >>> >>> The fastq parser itself already seems pretty fast. The way to get >>> the speedup is to not create any Bio::Seq* objects but just return >>> the data directly. At that point it's not taking much advantage of >>> BioPerl. But certainly it could be done... >> I suppose the best way to assess what needs to be done is come up >> with a set of 'use cases' specifying what users want so we can >> design around them, otherwise we're shooting in the dark. > > Indeed. Though at least I think we can all agree it would be nice to > have the functionality there even if it's slow. There will always be > at least some use-cases where the run speed doesn't matter. Agreed. >> I'm personally wondering if this could be done as a sequence >> database, something similar in theme to Lincoln's >> SeqFeature::Store, but sequence only, and returns quality objects >> in a similar manner (ala Storable)? Not sure whether that's >> feasible, but it's appears at least scalable. > > I think not. Well, at least SeqFeature::Store doesn't scale. Try > storing millions of features in a database and watch it crawl to > complete unusability. I can't imagine a db scaling to holding > hundreds of TB of data either. I'm also not sure what the benefit > is. There are already high-speed ways of indexing your fastq or bam > files. Interesting that you ran into issues with SF::Store; wonder if object storage is the limiting factor there, or if it is something else. Anyone else having this issue? chris From cjfields at illinois.edu Wed Jun 17 21:08:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 20:08:55 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A396D32.5070909@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <19001.21667.127519.462899@already.dhcp.gene.com> <4A396D32.5070909@sendu.me.uk> Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu> On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote: > George Hartzell wrote: >> Sendu Bala writes: >> > Tristan Lefebure wrote: >> > > Hello, >> > > Regarding next-gen sequences and bioperl, following my > > >> experience, another issue is bioperl speed. For example, if > > >> you want to trim bad quality bases at ends of 1E6 Solexa > > reads >> using Bio::SeqIO::fastq and some methods in > > Bio::Seq::Quality, >> well, you've got to be patient (but may > > be I missed some >> shortcuts...). >> > > This is my concern as well. Or, rather, is there actually a >> significant > set of users out there who are dealing with next-gen >> sequencing and > would consider using BioPerl for their work? >> > > I'm working with all the 1000-genomes data at the Sanger, and >> we at > least are probably never going to use BioPerl for the work. >> > [...] >> Is it purely a speed issue, or are there other issues (e.g. >> stability, >> correctness, compatibility) that are contributing to your decision? > > Too heavy-weight, too slow, too memory intensive, missing too much > functionality in any case. If I have to write new parsers and > wrappers, I may as well make them fast (which means they don't "fit" > into BioPerl). That's (unfortunately) true. It may be easy to whip up something that works, but it probably won't be fast. >> What *are* you using? > > There are already great tools written in C that do all the heavy > lifting and the rest is done in perl written for speed and low memory. Like this one? http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml I suppose if one were inclined, this could be wrapped with SWIG in BioLib, but would it be worth it (maybe beyond grabbing the file indices)? chris From jbarrick at msu.edu Wed Jun 17 23:10:43 2009 From: jbarrick at msu.edu (Jeffrey Barrick) Date: Wed, 17 Jun 2009 23:10:43 -0400 Subject: [Bioperl-l] svn error Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu> Hi all, I've been trying to download the latest version of "bioperl-live" through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion ] and I keep getting an "svn: Found malformed header in revision file" error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to stop prematurely. I also get the error when trying to browse that directory, for example: http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB Any ideas? Thanks, --Jeff From hlapp at gmx.net Wed Jun 17 21:51:16 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 17 Jun 2009 20:51:16 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Message-ID: On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > Similarly in Ensembl as well as in the old days of bioperl-db we > opted for doing subseq within SQL where possible. BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL, unless you manipulate the sequence, or make it a non-persistent object. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Thu Jun 18 02:45:17 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 18 Jun 2009 07:45:17 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> Message-ID: <4A39E27D.9040807@sendu.me.uk> Chris Fields wrote: > On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote: > >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's SeqFeature::Store, >>> but sequence only, and returns quality objects in a similar manner >>> (ala Storable)? Not sure whether that's feasible, but it's appears >>> at least scalable. >> >> I think not. Well, at least SeqFeature::Store doesn't scale. Try >> storing millions of features in a database and watch it crawl to >> complete unusability. I can't imagine a db scaling to holding hundreds >> of TB of data either. I'm also not sure what the benefit is. There are >> already high-speed ways of indexing your fastq or bam files. > > Interesting that you ran into issues with SF::Store; wonder if object > storage is the limiting factor there, or if it is something else. Object storage certainly was an issue, which is why I patched it to (optionally) not store objects. That helped a great deal, but ultimately only increased the number of features you could store before it slowed down; it didn't solve the problem completely. From Xianjun.Dong at bccs.uib.no Thu Jun 18 06:15:47 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Thu, 18 Jun 2009 12:15:47 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A33D850.1020203@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> Message-ID: <4A3A13D3.7050208@ii.uib.no> Hi, Scott, Do you mind to have a look of the code (below my signature) if I use the -postgrid callback correctly? I still cannnot get the background for the whole panel. Thanks Xianjun Xianjun Dong wrote: > Hi, Scott > > Before I gave up my own whole solution to use GBrowse, I still want to > bother you once: > > As you suggested, I put -postgrid option when the panel, which will > call a function to draw the background. The code below is almost > copied from the online POD of Bio::Graphics::Panel (see > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html > ) > > But it still does not work. Could you help to have a look? I paste it > below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while > the gap drawing function is gap_it, not draw_gap. I guess it's a typo. > or not?) > > THanks > > Xianjun > > ----------------------------------------------- mytestcode.pl > -------------------------- > > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = > $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = > $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = > $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans = > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > -length=>1050, > -start =>0, > -pad_left=>12, > -pad_right=>12 > -postgrid=>\&gap_it); > > sub gap_it { > my $gd = shift; > my $panel = shift; > my ($gap_start,$gap_end) = $panel->location2pixel(500,600); > my $top = $panel->top; > my $bottom = $gd->height, #panel->bottom; > my $gray = $panel->translate_color('red'); > $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); > } > # the following track works as I expected in bioperl 1.2.3, but not in > 1.5 and 1.6 > #$panel->add_track([$trans41,$trans31], > # -glyph => 'background', > # -block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > # ); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > -glyph=>'arrow', > -double=>1, > -tick=>2); > > $panel->add_track($trans, > -glyph => 'transcript2', # 'transcript2', #process_5utr', > -fgcolor => 'darkred', > -bgcolor => 'darkred', > -title => '$source', > -link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', > #EnsEMBL > ); > print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in > Bioperl 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > > > > > > > > > > Scott Cain wrote: >> Hi Xianjun, >> >> I understand what you want to do, as the current version of gbrowse >> does this, which uses bioperl 1.6. Without digging through the code, >> I can't tell you exactly how this works and you didn't send your code >> that uses this callback, so I can't try it either. >> >> One thing that is different between your code and gbrowse is that each >> of the tracks is actually a seperate panel (to allow track dragging), >> so it possible that this sort of callback doesn't work for >> Bio::Graphics any more. >> >> Scott >> >> On Saturday, June 13, 2009, Xianjun Dong >> wrote: >> >>> Hi, Scott >>> >>> Thanks for your reply first. >>> >>> I still have question: I dig out the code from GBrowse (which I >>> paste below). Method make_postgrid_callback gets all highlight >>> region and then use hilite_regions_closure function to draw them >>> out, using the following GD function: >>> >>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>> $panel->translate_color($h_color)); >>> >>> where the $bottom=$panel->bottom. This is the only difference from >>> my code, where I use $gd->height. I guess they are almost same >>> (except the pad_bottom), we can see this in the code of >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>> >>> >>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, >>> for my highlight regions. The output is same, when using the library >>> of Bioperl 1.6 (or 1.5). You can see the attached image >>> ("test.bioperl1.6.png") >>> >>> OK. I might have not explained my question explicitly. My question >>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl >>> 1.2.3), I can get the right image I want (see the attached file >>> "test.bioperl1.2.3.png"), where the highlight range will go from the >>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the >>> highlight region in its own track, not the whole panel. OK, did I >>> explain clearly now? you can see the difference of the two images. >>> >>> [I am not sure the mailist allow to attach image, otherwise, I put >>> them in the following links: >>> test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png >>> test.bioperl1.2.3.png: >>> http://translog.genereg.net/test.bioperl1.2.3.png ] >>> >>> You can test it and see the difference if you have both 1.2.3 and >>> 1.6 on your computer? >>> >>> Really want to know how this works in bioperl 1.2.3 (Even though >>> this might be a bug at that version, or whatever) >>> >>> Thanks >>> >>> Xianjun >>> ============================================= >>> >>> # this generates the callback for highlighting a region >>> sub make_postgrid_callback { >>> my $settings = shift; >>> return unless ref $settings->{h_region}; >>> >>> my @h_regions = map { >>> my ($h_ref,$h_start,$h_end,$h_color) = >>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>> defined($h_ref) && $h_ref eq $settings->{ref} >>> ? [$h_start,$h_end,$h_color||'lightgrey'] >>> : () >>> } >>> @{$settings->{h_region}}; >>> >>> return unless @h_regions; >>> return hilite_regions_closure(@h_regions); >>> } >>> >>> # this subroutine generates a Bio::Graphics::Panel callback closure >>> # suitable for hilighting a region of a panel. >>> # The args are a list of [start,end,color] >>> sub hilite_regions_closure { >>> my @h_regions = @_; >>> >>> return sub { >>> my $gd = shift; >>> my $panel = shift; >>> my $left = $panel->pad_left; >>> my $top = $panel->top; >>> my $bottom = $panel->bottom; >>> for my $r (@h_regions) { >>> my ($h_start,$h_end,$h_color) = @$r; >>> my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>> if ($end-$start <= 1) { $end++; $start-- } # so that we always >>> see something >>> # assuming top is 0 so as to ignore top padding >>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>> $panel->translate_color($h_color)); >>> } >>> }; >>> } >>> >>> >>> Scott Cain wrote: >>> >>> Hello Xianjun, >>> >>> I don't think that approach will work. What you almost certainly need >>> to do is a postgrid callback that does the drawing of the highlighted >>> region. For example code of how to do this, take a look at the >>> make_postgrid_callback subroutine in GBrowse 1.69. The option >>> -postgrid is a method of Bio::Graphics::Panel. >>> >>> Scott >>> >>> >>> >>> >>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun >>> Dong wrote: >>> >>> >>> HI, >>> >>> I am not sure this is the right place I can get help. >>> >>> I've suffered by a problem for several days: I want to highlight >>> parts of >>> regions in my track, using a different background color. To do that, I >>> defined a glyph named "background", based on the >>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>> method, by adding code like below: >>> >>> $gd->filledRectangle($left,0,$right,$gd->height, >>> $self->factory->translate_color($color)); >>> >>> # the script is pasted at the end >>> >>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>> highlight regions into a list of features, and add_track with >>> -glyph=>'background'. (see the following script, test.pl) This >>> really works >>> as I expect, which will add a colored block at background of all >>> tracks in a >>> panel (including the ruler arrow). You can see the output image in >>> attached >>> file "test.bioperl1.2.3.png" >>> >>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it >>> does not >>> work. Well, it works, but the highlight part only shrink to a low >>> height, >>> instead of covering all tracks in the panel. I also attached the output >>> here, see the file "test.bioperl1.6.png". >>> >>> I tried to think about the reason, the 'background' module is based >>> on the >>> generic module. What can cause the difference? Is it because >>> $gd->height is >>> different, or the tracks followed with 'background' track can not >>> draw from >>> the first position? >>> >>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>> person >>> solve problem, wise person avoid problem"...) But another problem is >>> coming: >>> Bio::Graphics in Bioperl 1.2.3 does not support >>> $panel->create_web_map() >>> function, which means I have to use some higher version if I want to >>> create >>> web map for my graphics, but then I have to give up using highlight >>> background. >>> >>> OK. It's long enough for my first-time submission here. Hope someone >>> can >>> throw me some clue. >>> >>> Thanks ahead!! >>> >>> Xianjun >>> >>> >>> ==================== test.pl ======================= >>> #!/usr/bin/perl >>> >>> use strict; >>> use lib "$ENV{HOME}/lib"; >>> >>> use Bio::Graphics; >>> use Bio::Graphics::Feature; >>> my $ftr= 'Bio::Graphics::Feature'; >>> >>> # processed_transcript >>> my $trans1 = >>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>> my $trans2 = >>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>> my $trans3 = >>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans4 = >>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans5 = >>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>> my $trans = >>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>> >>> # hightlight >>> my $trans31 = >>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>> >>> -source=>'a'); >>> my $trans41 = >>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>> >>> -source=>'b'); >>> >>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>> -length=>1050, >>> -start =>0, >>> -pad_left=>12, >>> -pad_right=>12); >>> >>> # the following track works as I expected in bioperl 1.2.3, but not >>> in 1.5 >>> and 1.6 >>> $panel->add_track([$trans41,$trans31], >>> -glyph => 'background', >>> -block_bgcolor => sub{return (shift->source eq >>> 'a')?'#cccccc':'#fffc22'}, >>> ); >>> >>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>> -glyph=>'arrow', >>> -double=>1, >>> -tick=>2); >>> >>> $panel->add_track($trans, >>> -glyph => 'transcript2', # 'transcript2', #process_5utr', >>> -fgcolor => 'darkred', >>> -bgcolor => 'darkred', >>> -title => '$source', >>> -link => >>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>> #EnsEMBL >>> ); >>> print $panel->png; >>> >>> # the following part works in bioperl 1.5 and 1.6, but not work in >>> Bioperl >>> 1.2.3 >>> my $map = $panel->create_web_map("image"); >>> $panel->finished(); >>> >>> 1; >>> >>> ==================== background.pm ======================= >>> package Bio::Graphics::Glyph::background; >>> >>> use strict; >>> use base 'Bio::Graphics::Glyph::generic'; >>> sub pad_top{ >>> return 0; >>> } >>> >>> sub draw_component { >>> my $self = shift; >>> #$self->SUPER::draw_component(@_); >>> my ($gd,$dx,$dy) = @_; >>> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>> >>> # draw an arrow to indicate the direction of transcript >>> my $color = $self->option('block_bgcolor') || '#cccccc'; >>> $gd->filledRectangle($left,0,$right,$gd->height, >>> $self->factory->translate_color($color)); >>> } >>> >>> 1; >>> >>> -- >>> ========================================== >>> Xianjun Dong >>> PhD student, Lenhard group >>> Computational Biology Unit >>> Bergen Center for Computational Science >>> University of Bergen >>> Hoyteknologisenteret, Thormohlensgate 55 >>> N-5008 Bergen, Norway >>> E-mail: xianjun.dong at bccs.uib.no >>> Tel.: +47 555 84022 >>> Fax : +47 555 84295 >>> ========================================== >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> ========================================== >>> Xianjun Dong >>> PhD student, Lenhard group >>> Computational Biology Unit >>> Bergen Center for Computational Science >>> University of Bergen >>> Hoyteknologisenteret, Thormohlensgate 55 >>> N-5008 Bergen, Norway >>> E-mail: xianjun.dong at bccs.uib.no >>> Tel.: +47 555 84022 >>> Fax : +47 555 84295 >>> ========================================== >>> >>> >>> >> >> > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From charles.tilford at bms.com Thu Jun 18 09:38:34 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 09:38:34 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? Message-ID: <4A3A435A.8000505@bms.com> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. Can anyone confirm? Hi all, I'm using the SCF Bio::SeqIO module to parse trace data out of chromatograms. The SCF files are being produced by phred using the "-cd" parameter. The traces come out great, and the corresponding base calls from the .phd files align with the peaks wonderfully when I visualize them on a rendered trace. However, only the A bases align to the appropriate trace channel, the rest are mixed up. I find that if I do the following re-mapping, the phred base calls match the SeqIO : Remapped A : A C : G G : T T : C The relevant part of Bio::SeqIO::scf is here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 ... which indicates that it expects the pack()ed trace data to be in order ATGC. The base call parsing code is here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 ... which is unpacking in order ACGT. As far as I can tell, the relevant official SCF documentation is here: http://staden.sourceforge.net/manual/formats_unix_4.html ... which indicates that both trace and base order should be ACGT (matching the SeqIO unpack() for bases, but not traces). My empirical channel unscrambling mapping implies order ACTG, which is different from either of the two orders above. The sequence from the SCF file (should be that from original AB1 file, I think) is not perfectly identical to that called by phred, but is very similar (to be expected); that is, I don't need to remap C, G and T to get it to align with the phred data. So it looks like the SeqIO module is not mapping the sections of the packed trace data to the appropriate bases. The unpack order is different than the staden documentation ... but so is the order I impose to correct the problem. I am still unclear as to the differences between V2 and V3 of the format. The major difference appears to be coding the trace absolutely (V2) or relatively to prior values (V3); I'd expect if I was using one format and SeqIO was trying to parse the other that I would get garbage out. Running in verbose reports "scf.pm is working with a version 2 scf." Thoughts on this would be appreciated - can anyone confirm a problem with trace extraction from SCF? I'm hoping that once I convince our admin to (properly) install staden::read that I can work directly with the ab1 files, but I need to stopgap on SCF for the time being.... -CAT From cjfields at illinois.edu Thu Jun 18 11:31:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Jun 2009 10:31:08 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A435A.8000505@bms.com> References: <4A3A435A.8000505@bms.com> Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> Charles, The best way to make sure this is addressed is to file a ticket (bug report) on it so we can properly track it. I have a local installation of io_lib and I believe we also have Geneious installed locally (both of which read SCF), so I can work on confirming that. If it stays on the list it may not get answered and a possible bug report will be lost (to possibly bite someone else later). AFAIK this module doesn't use staden::read but is pure perl. You are more than welcome to try out Bio::SeqIO::staden::read, but I have to warn you that most of us are looking at replacing it's functionality at some point with BioLib bindings to io_lib (more stable) and so we don't intend on following up with bug fixes. Note: there is also Bio::SCF (non-bp): http://search.cpan.org/~lds/Bio-SCF-1.01/ chris On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote: > Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace > channels. Can anyone confirm? > > Hi all, > > I'm using the SCF Bio::SeqIO module to parse trace data out of > chromatograms. The SCF files are being produced by phred using the "- > cd" parameter. The traces come out great, and the corresponding base > calls from the .phd files align with the peaks wonderfully when I > visualize them on a rendered trace. However, only the A bases align > to the appropriate trace channel, the rest are mixed up. I find that > if I do the following re-mapping, the phred base calls match the > > SeqIO : Remapped > A : A > C : G > G : T > T : C > > The relevant part of Bio::SeqIO::scf is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 > > ... which indicates that it expects the pack()ed trace data to be in > order ATGC. The base call parsing code is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 > > ... which is unpacking in order ACGT. As far as I can tell, the > relevant official SCF documentation is here: > > http://staden.sourceforge.net/manual/formats_unix_4.html > > ... which indicates that both trace and base order should be ACGT > (matching the SeqIO unpack() for bases, but not traces). My > empirical channel unscrambling mapping implies order ACTG, which is > different from either of the two orders above. The sequence from the > SCF file (should be that from original AB1 file, I think) is not > perfectly identical to that called by phred, but is very similar (to > be expected); that is, I don't need to remap C, G and T to get it to > align with the phred data. > > So it looks like the SeqIO module is not mapping the sections of the > packed trace data to the appropriate bases. The unpack order is > different than the staden documentation ... but so is the order I > impose to correct the problem. I am still unclear as to the > differences between V2 and V3 of the format. The major difference > appears to be coding the trace absolutely (V2) or relatively to > prior values (V3); I'd expect if I was using one format and SeqIO > was trying to parse the other that I would get garbage out. Running > in verbose reports "scf.pm is working with a version 2 scf." > > Thoughts on this would be appreciated - can anyone confirm a problem > with trace extraction from SCF? > > I'm hoping that once I convince our admin to (properly) install > staden::read that I can work directly with the ab1 files, but I need > to stopgap on SCF for the time being.... > > -CAT From MEC at stowers.org Thu Jun 18 11:42:48 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 18 Jun 2009 10:42:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A435A.8000505@bms.com> References: <4A3A435A.8000505@bms.com> Message-ID: Charles, Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters. Its not in the bioperl project but it is an easy install from CPAN. I am familiar with staden::read installation woes. Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box".... Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri #!/usr/bin/env perl # PURPOSE: extract from AB1 files into fasta format the sequence in # the 'clear range' defined by 3 parameters. If there is no clear # range, emit warning and skip the sequence. The fasta 'defline' # identifier is taken as the sample name. Other useful attributes are # also embedded into the defline using attribute=value syntax. # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1 # NOTE: 20 4 20 is ABI default settings # EXAMPLE: # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta # AUTHOR: malcolm_cook at stowers-institute.org use strict; use warnings; use Bio::Trace::ABIF; use Text::Wrap qw(wrap); $Text::Wrap::columns = 72; # wrap the sequence use File::Basename; my ($window_width, $bad_bases_threshold, $quality_threshold, @ARGV) = @ARGV; my $abif = Bio::Trace::ABIF->new(); sub main {} { foreach (@ARGV) { $abif->open_abif($_) or die "error opening $_ as ABIF"; my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width, $bad_bases_threshold, $quality_threshold ); my $sample_score = $abif->sample_score( $window_width, $bad_bases_threshold, $quality_threshold ); # my $contiguous_read_length = $abif->contiguous_read_length($window_width, # $quality_threshold, # 0, # ==> trim_ends # ); # my $length_of_read = $abif->length_of_read( # $window_width, # $quality_threshold, # # $method # ); my $defline = join "\t", $abif->sample_name, #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline #$abif->container_identifier . ':' . $abif->well_id, # or this, for container:well_id formatted defline identifiers (map {my $method = $_; "$method=". ($abif->$method() || '')} qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment # sample_tracking_id - don't use this - it is internal to ABI software "clear_range_start=$clear_range_start", "clear_range_stop=$clear_range_stop", "sample_score=$sample_score", #"contiguous_read_length=$contiguous_read_length", #"length_of_read=$length_of_read", ; if ($clear_range_start == -1) { warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline"; next; } my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1)); print ">$defline\n$seq\n"; $abif->close_abif(); } } main (); > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Charles Tilford > Sent: Thursday, June 18, 2009 8:39 AM > To: BioPerl List > Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? > > Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace > channels. > Can anyone confirm? > > Hi all, > > I'm using the SCF Bio::SeqIO module to parse trace data out > of chromatograms. The SCF files are being produced by phred > using the "-cd" > parameter. The traces come out great, and the corresponding > base calls from the .phd files align with the peaks > wonderfully when I visualize them on a rendered trace. > However, only the A bases align to the appropriate trace > channel, the rest are mixed up. I find that if I do the > following re-mapping, the phred base calls match the > > SeqIO : Remapped > A : A > C : G > G : T > T : C > > The relevant part of Bio::SeqIO::scf is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/SeqIO/scf.html#CODE9 > > ... which indicates that it expects the pack()ed trace data > to be in order ATGC. The base call parsing code is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/SeqIO/scf.html#CODE8 > > ... which is unpacking in order ACGT. As far as I can tell, > the relevant official SCF documentation is here: > > http://staden.sourceforge.net/manual/formats_unix_4.html > > ... which indicates that both trace and base order should be > ACGT (matching the SeqIO unpack() for bases, but not traces). > My empirical channel unscrambling mapping implies order ACTG, > which is different from either of the two orders above. The > sequence from the SCF file (should be that from original AB1 > file, I think) is not perfectly identical to that called by > phred, but is very similar (to be expected); that is, I don't > need to remap C, G and T to get it to align with the phred data. > > So it looks like the SeqIO module is not mapping the sections > of the packed trace data to the appropriate bases. The unpack > order is different than the staden documentation ... but so > is the order I impose to correct the problem. I am still > unclear as to the differences between > V2 and V3 of the format. The major difference appears to be > coding the trace absolutely (V2) or relatively to prior > values (V3); I'd expect if I was using one format and SeqIO > was trying to parse the other that I would get garbage out. > Running in verbose reports "scf.pm is working with a version 2 scf." > > Thoughts on this would be appreciated - can anyone confirm a > problem with trace extraction from SCF? > > I'm hoping that once I convince our admin to (properly) > install staden::read that I can work directly with the ab1 > files, but I need to stopgap on SCF for the time being.... > > -CAT > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carze at som.umaryland.edu Thu Jun 18 13:51:43 2009 From: carze at som.umaryland.edu (Cesar Arze) Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT) Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file Message-ID: <24095355.post@talk.nabble.com> Hi all, I've searched through the mailing list and bug-tracker looking for any indication of this (what I presume to be) bug I have been encountering when parsing certain Genbank files using SeqIO::GenBank but have yet to find anything. I apologize in advance if this is something that has already been addressed. When parsing these files and extracting the scientific name it seems that line breaks are causing the lineage info found in the ORGANISM section to be captured as part of the scientific name. An example of this is accession NC_005945: ORGANISM Bacillus anthracis str. Sterne Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus cereus group. Bacillus cereus has a line break which then causes scientific name to capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name. Not sure if anyone has ever ran into this problem but I would very much appreciate any help or direction. -- View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From charles.tilford at bms.com Thu Jun 18 15:59:01 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 15:59:01 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> References: <4A3A435A.8000505@bms.com> <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> Message-ID: <4A3A9C85.4000603@bms.com> Chris Fields wrote: > Charles, > > The best way to make sure this is addressed is to file a ticket (bug > report) on it so we can properly track it. Ok, I'll put that in. > > AFAIK this module doesn't use staden::read but is pure perl. Yes, that's my understanding too. I'm using the SeqIO module because of ongoing hiccups with the staden installation. > Note: there is also Bio::SCF (non-bp): > > http://search.cpan.org/~lds/Bio-SCF-1.01/ > I have that installed, but have not tried it out yet. Thanks! -CAT > chris > > On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote: > > >> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace >> channels. Can anyone confirm? >> >> Hi all, >> >> I'm using the SCF Bio::SeqIO module to parse trace data out of >> chromatograms. The SCF files are being produced by phred using the "- >> cd" parameter. The traces come out great, and the corresponding base >> calls from the .phd files align with the peaks wonderfully when I >> visualize them on a rendered trace. However, only the A bases align >> to the appropriate trace channel, the rest are mixed up. I find that >> if I do the following re-mapping, the phred base calls match the >> >> SeqIO : Remapped >> A : A >> C : G >> G : T >> T : C >> >> The relevant part of Bio::SeqIO::scf is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 >> >> ... which indicates that it expects the pack()ed trace data to be in >> order ATGC. The base call parsing code is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 >> >> ... which is unpacking in order ACGT. As far as I can tell, the >> relevant official SCF documentation is here: >> >> http://staden.sourceforge.net/manual/formats_unix_4.html >> >> ... which indicates that both trace and base order should be ACGT >> (matching the SeqIO unpack() for bases, but not traces). My >> empirical channel unscrambling mapping implies order ACTG, which is >> different from either of the two orders above. The sequence from the >> SCF file (should be that from original AB1 file, I think) is not >> perfectly identical to that called by phred, but is very similar (to >> be expected); that is, I don't need to remap C, G and T to get it to >> align with the phred data. >> >> So it looks like the SeqIO module is not mapping the sections of the >> packed trace data to the appropriate bases. The unpack order is >> different than the staden documentation ... but so is the order I >> impose to correct the problem. I am still unclear as to the >> differences between V2 and V3 of the format. The major difference >> appears to be coding the trace absolutely (V2) or relatively to >> prior values (V3); I'd expect if I was using one format and SeqIO >> was trying to parse the other that I would get garbage out. Running >> in verbose reports "scf.pm is working with a version 2 scf." >> >> Thoughts on this would be appreciated - can anyone confirm a problem >> with trace extraction from SCF? >> >> I'm hoping that once I convince our admin to (properly) install >> staden::read that I can work directly with the ab1 files, but I need >> to stopgap on SCF for the time being.... >> >> -CAT >> > > > > From charles.tilford at bms.com Thu Jun 18 16:02:53 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 16:02:53 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: References: <4A3A435A.8000505@bms.com> Message-ID: <4A3A9D6D.2010106@bms.com> Cook, Malcolm wrote: > Charles, > > Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF > > http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm > > It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters. > > Its not in the bioperl project but it is an easy install from CPAN. > Thanks - we installed that a few weeks ago, and it was on my list of things to try, but I had not gotten to it yet since I was getting data out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, the fact that I need to unscramble it makes me nervous... Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF module and see if it works with our files. Thanks, CAT > I am familiar with staden::read installation woes. > > Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box".... > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > > #!/usr/bin/env perl > > # PURPOSE: extract from AB1 files into fasta format the sequence in > # the 'clear range' defined by 3 parameters. If there is no clear > # range, emit warning and skip the sequence. The fasta 'defline' > # identifier is taken as the sample name. Other useful attributes are > # also embedded into the defline using attribute=value syntax. > > # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1 > > # NOTE: 20 4 20 is ABI default settings > > # EXAMPLE: > # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta > > # AUTHOR: malcolm_cook at stowers-institute.org > > use strict; > use warnings; > use Bio::Trace::ABIF; > use Text::Wrap qw(wrap); > $Text::Wrap::columns = 72; # wrap the sequence > > use File::Basename; > my ($window_width, > $bad_bases_threshold, > $quality_threshold, > @ARGV) = @ARGV; > > my $abif = Bio::Trace::ABIF->new(); > > sub main {} { > foreach (@ARGV) { > $abif->open_abif($_) or die "error opening $_ as ABIF"; > my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width, > $bad_bases_threshold, > $quality_threshold > ); > my $sample_score = $abif->sample_score( > $window_width, > $bad_bases_threshold, > $quality_threshold > ); > # my $contiguous_read_length = $abif->contiguous_read_length($window_width, > # $quality_threshold, > # 0, # ==> trim_ends > # ); > # my $length_of_read = $abif->length_of_read( > # $window_width, > # $quality_threshold, > # # $method > # ); > my $defline = > join "\t", > $abif->sample_name, > #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline > #$abif->container_identifier . ':' . $abif->well_id, # or this, for container:well_id formatted defline identifiers > (map {my $method = $_; > "$method=". ($abif->$method() || '')} > qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment > # sample_tracking_id - don't use this - it is internal to ABI software > "clear_range_start=$clear_range_start", > "clear_range_stop=$clear_range_stop", > "sample_score=$sample_score", > #"contiguous_read_length=$contiguous_read_length", > #"length_of_read=$length_of_read", > ; > if ($clear_range_start == -1) { > warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline"; > next; > } > my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1)); > print ">$defline\n$seq\n"; > $abif->close_abif(); > > } > } > > main (); > > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Charles Tilford >> Sent: Thursday, June 18, 2009 8:39 AM >> To: BioPerl List >> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? >> >> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace >> channels. >> Can anyone confirm? >> >> Hi all, >> >> I'm using the SCF Bio::SeqIO module to parse trace data out >> of chromatograms. The SCF files are being produced by phred >> using the "-cd" >> parameter. The traces come out great, and the corresponding >> base calls from the .phd files align with the peaks >> wonderfully when I visualize them on a rendered trace. >> However, only the A bases align to the appropriate trace >> channel, the rest are mixed up. I find that if I do the >> following re-mapping, the phred base calls match the >> >> SeqIO : Remapped >> A : A >> C : G >> G : T >> T : C >> >> The relevant part of Bio::SeqIO::scf is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B >> io/SeqIO/scf.html#CODE9 >> >> ... which indicates that it expects the pack()ed trace data >> to be in order ATGC. The base call parsing code is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B >> io/SeqIO/scf.html#CODE8 >> >> ... which is unpacking in order ACGT. As far as I can tell, >> the relevant official SCF documentation is here: >> >> http://staden.sourceforge.net/manual/formats_unix_4.html >> >> ... which indicates that both trace and base order should be >> ACGT (matching the SeqIO unpack() for bases, but not traces). >> My empirical channel unscrambling mapping implies order ACTG, >> which is different from either of the two orders above. The >> sequence from the SCF file (should be that from original AB1 >> file, I think) is not perfectly identical to that called by >> phred, but is very similar (to be expected); that is, I don't >> need to remap C, G and T to get it to align with the phred data. >> >> So it looks like the SeqIO module is not mapping the sections >> of the packed trace data to the appropriate bases. The unpack >> order is different than the staden documentation ... but so >> is the order I impose to correct the problem. I am still >> unclear as to the differences between >> V2 and V3 of the format. The major difference appears to be >> coding the trace absolutely (V2) or relatively to prior >> values (V3); I'd expect if I was using one format and SeqIO >> was trying to parse the other that I would get garbage out. >> Running in verbose reports "scf.pm is working with a version 2 scf." >> >> Thoughts on this would be appreciated - can anyone confirm a >> problem with trace extraction from SCF? >> >> I'm hoping that once I convince our admin to (properly) >> install staden::read that I can work directly with the ab1 >> files, but I need to stopgap on SCF for the time being.... >> >> -CAT >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Thu Jun 18 16:27:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Jun 2009 15:27:02 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A9D6D.2010106@bms.com> References: <4A3A435A.8000505@bms.com> <4A3A9D6D.2010106@bms.com> Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu> On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote: > Cook, Malcolm wrote: >> Charles, >> >> Another possible stopgap that might work for you, if you're working >> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to >> use Bio::Trace::ABIF >> >> http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm >> >> It works great and includes implementation of ABIs algorithm >> allowing to (re)compute trace clear ranges using kc-basecallers >> quality scores and any windowing/quality parameters. >> >> Its not in the bioperl project but it is an easy install from CPAN. >> > Thanks - we installed that a few weeks ago, and it was on my list of > things to try, but I had not gotten to it yet since I was getting > data out of the SCF SeqIO module. Even though the SeqIO::scf data > looks ok, the fact that I need to unscramble it makes me nervous... > Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF > module and see if it works with our files. > > Thanks, > CAT You definitely shouldn't need to unscramble it; my guess is this is a legit bug that just has gone unnoticed. I see that you have filed a ticket on it so we can at least track it. Thanks! chris From scott at scottcain.net Thu Jun 18 23:25:35 2009 From: scott at scottcain.net (Scott Cain) Date: Thu, 18 Jun 2009 23:25:35 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A3A13D3.7050208@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no> Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> Hi Xianjun, The attached script (which is not too different from yours--I only did a little clean up and made the padding consistent) makes the attached image, which is what I think you want. I'm using bioperl-live. Scott On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong wrote: > Hi, Scott, > > Do you mind to have a look of the code (below my signature) if I use the > -postgrid callback correctly? > I still cannnot get the background for the whole panel. > > Thanks > > Xianjun > > > Xianjun Dong wrote: >> >> Hi, Scott >> >> Before I gave up my own whole solution to use GBrowse, I still want to >> bother you once: >> >> As you suggested, I put -postgrid option when the panel, which will call a >> function to draw the background. The code below is almost copied from the >> online POD of Bio::Graphics::Panel (see >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html >> ) >> >> But it still does not work. Could you help to have a look? I paste it >> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap >> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) >> >> THanks >> >> Xianjun >> >> ----------------------------------------------- mytestcode.pl >> -------------------------- >> >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = >> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = >> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = >> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans ?= >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12 >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it); >> >> sub gap_it { >> ? ?my $gd ? ?= shift; >> ? ?my $panel = shift; >> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600); >> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top; >> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom; >> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red'); >> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); >> } >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> #$panel->add_track([$trans41,$trans31], >> # ? ? ? ? ?-glyph ? => 'background', >> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> # ? ? ? ? ? ? ? ? ?); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> ? ? ? ? ? ? ? ? -glyph=>'arrow', >> ? ? ? ? ? ? ? ? -double=>1, >> ? ? ? ? ? ? ? ? -tick=>2); >> >> $panel->add_track($trans, >> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', >> ? ? ? ? ? ? ? ? -fgcolor => 'darkred', >> ? ? ? ? ? ? ? ? -bgcolor => 'darkred', >> ? ? ? ? ? ? ? ? -title => '$source', >> ? ? ? ? ? ? ? ? -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL >> ? ? ? ? ? ? ? ? ); >> ?print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> >> >> >> >> >> >> >> >> >> Scott Cain wrote: >>> >>> Hi Xianjun, >>> >>> I understand what you want to do, as the current version of gbrowse >>> does this, which uses bioperl 1.6. ?Without digging through the code, >>> I can't tell you exactly how this works and you didn't send your code >>> that uses this callback, so I can't try it either. >>> >>> One thing that is different between your code and gbrowse is that each >>> of the tracks is actually a seperate panel (to allow track dragging), >>> so it possible that this sort of callback doesn't work for >>> Bio::Graphics any more. >>> >>> Scott >>> >>> On Saturday, June 13, 2009, Xianjun Dong >>> wrote: >>> >>>> >>>> Hi, Scott >>>> >>>> Thanks for your reply first. >>>> >>>> I still have question: I dig out the code from GBrowse (which I paste >>>> below). Method make_postgrid_callback gets all highlight region and then use >>>> hilite_regions_closure function to draw them out, using the following GD >>>> function: >>>> >>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>> >>>> where the $bottom=$panel->bottom. This is the only difference from my >>>> code, where I use $gd->height. I guess they are almost same (except the >>>> pad_bottom), we can see this in the code of >>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>>> >>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for >>>> my highlight regions. The output is same, when using the library of Bioperl >>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >>>> >>>> OK. I might have not explained my question explicitly. My question is: >>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can >>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"), >>>> where the highlight range will go from the roof to the floor. While in >>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, >>>> not the whole panel. OK, did I explain clearly now? you can see the >>>> difference of the two images. >>>> >>>> [I am not sure the mailist allow to attach image, otherwise, I put them >>>> in the following links: >>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png >>>> test.bioperl1.2.3.png: >>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ] >>>> >>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on >>>> your computer? >>>> >>>> Really want to know how this works in bioperl 1.2.3 (Even though this >>>> might be a bug at that version, or whatever) >>>> >>>> Thanks >>>> >>>> Xianjun >>>> ============================================= >>>> >>>> # this generates the callback for highlighting a region >>>> sub make_postgrid_callback { >>>> ?my $settings = shift; >>>> ?return unless ref $settings->{h_region}; >>>> >>>> ?my @h_regions = map { >>>> ? my ($h_ref,$h_start,$h_end,$h_color) = >>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>>> ? defined($h_ref) && $h_ref eq $settings->{ref} >>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] >>>> ? ? ? ? ? ? ? ?: () >>>> ?} >>>> ? @{$settings->{h_region}}; >>>> >>>> ?return unless @h_regions; >>>> ?return hilite_regions_closure(@h_regions); >>>> } >>>> >>>> # this subroutine generates a Bio::Graphics::Panel callback closure >>>> # suitable for hilighting a region of a panel. >>>> # The args are a list of [start,end,color] >>>> sub hilite_regions_closure { >>>> ?my @h_regions = @_; >>>> >>>> ?return sub { >>>> ? my $gd ? ? = shift; >>>> ? my $panel ?= shift; >>>> ? my $left ? = $panel->pad_left; >>>> ? my $top ? ?= $panel->top; >>>> ? my $bottom = $panel->bottom; >>>> ? for my $r (@h_regions) { >>>> ? ? my ($h_start,$h_end,$h_color) = @$r; >>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see >>>> something >>>> ? ? # assuming top is 0 so as to ignore top padding >>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>> ? } >>>> ?}; >>>> } >>>> >>>> >>>> Scott Cain wrote: >>>> >>>> Hello Xianjun, >>>> >>>> I don't think that approach will work. ?What you almost certainly need >>>> to do is a postgrid callback that does the drawing of the highlighted >>>> region. ?For example code of how to do this, take a look at the >>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option >>>> -postgrid is a method of Bio::Graphics::Panel. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong >>>> wrote: >>>> >>>> >>>> HI, >>>> >>>> I am not sure this is the right place I can get help. >>>> >>>> I've suffered by a problem for several days: I want to highlight parts >>>> of >>>> regions in my track, using a different background color. To do that, I >>>> defined a glyph named "background", based on the >>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>>> method, by adding code like below: >>>> >>>> $gd->filledRectangle($left,0,$right,$gd->height, >>>> $self->factory->translate_color($color)); >>>> >>>> # the script is pasted at the end >>>> >>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>>> highlight regions into a list of features, and add_track with >>>> -glyph=>'background'. (see the following script, test.pl) This really >>>> works >>>> as I expect, which will add a colored block at background of all tracks >>>> in a >>>> panel (including the ruler arrow). You can see the output image in >>>> attached >>>> file "test.bioperl1.2.3.png" >>>> >>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does >>>> not >>>> work. Well, it works, but the highlight part only shrink to a low >>>> height, >>>> instead of covering all tracks in the panel. I also attached the output >>>> here, see the file "test.bioperl1.6.png". >>>> >>>> I tried to think about the reason, the 'background' module is based on >>>> the >>>> generic module. What can cause the difference? Is it because $gd->height >>>> is >>>> different, or the tracks followed with 'background' track can not draw >>>> from >>>> the first position? >>>> >>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>>> person >>>> solve problem, wise person avoid problem"...) But another problem is >>>> coming: >>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >>>> function, which means I have to use some higher version if I want to >>>> create >>>> web map for my graphics, but then I have to give up using highlight >>>> background. >>>> >>>> OK. It's long enough for my first-time submission here. Hope someone can >>>> throw me some clue. >>>> >>>> Thanks ahead!! >>>> >>>> Xianjun >>>> >>>> >>>> ==================== test.pl ======================= >>>> #!/usr/bin/perl >>>> >>>> use strict; >>>> use lib "$ENV{HOME}/lib"; >>>> >>>> use Bio::Graphics; >>>> use Bio::Graphics::Feature; >>>> my $ftr= 'Bio::Graphics::Feature'; >>>> >>>> # processed_transcript >>>> my $trans1 = >>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>>> my $trans2 = >>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>>> my $trans3 = >>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>>> -source=>'a'); >>>> my $trans4 = >>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>>> -source=>'a'); >>>> my $trans5 = >>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>>> my $trans ?= >>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>>> >>>> # hightlight >>>> my $trans31 = >>>> >>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>>> -source=>'a'); >>>> my $trans41 = >>>> >>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>>> -source=>'b'); >>>> >>>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); >>>> >>>> # the following track works as I expected in bioperl 1.2.3, but not in >>>> 1.5 >>>> and 1.6 >>>> $panel->add_track([$trans41,$trans31], >>>> ? ? ? ?-glyph ? => 'background', >>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>>> 'a')?'#cccccc':'#fffc22'}, >>>> ? ? ? ? ? ? ? ?); >>>> >>>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>>> ? ? ? ? ? ? ? ?-glyph=>'arrow', >>>> ? ? ? ? ? ? ? ?-double=>1, >>>> ? ? ? ? ? ? ? ?-tick=>2); >>>> >>>> $panel->add_track($trans, >>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred', >>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred', >>>> ? ? ? ? ? ? ? ?-title => '$source', >>>> ? ? ? ? ? ? ? ?-link => >>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>>> ?#EnsEMBL >>>> ? ? ? ? ? ? ? ?); >>>> ?print $panel->png; >>>> >>>> # the following part works in bioperl 1.5 and 1.6, but not work in >>>> Bioperl >>>> 1.2.3 >>>> my $map = $panel->create_web_map("image"); >>>> $panel->finished(); >>>> >>>> 1; >>>> >>>> ==================== background.pm ======================= >>>> package Bio::Graphics::Glyph::background; >>>> >>>> use strict; >>>> use base 'Bio::Graphics::Glyph::generic'; >>>> sub pad_top{ >>>> ?return 0; >>>> } >>>> >>>> sub draw_component { >>>> ?my $self = shift; >>>> ?#$self->SUPER::draw_component(@_); >>>> ?my ($gd,$dx,$dy) = @_; >>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>>> >>>> ?# draw an arrow to indicate the direction of transcript >>>> ?my $color = $self->option('block_bgcolor') || '#cccccc'; >>>> ?$gd->filledRectangle($left,0,$right,$gd->height, >>>> $self->factory->translate_color($color)); >>>> } >>>> >>>> 1; >>>> >>>> -- >>>> ========================================== >>>> Xianjun Dong >>>> PhD student, Lenhard group >>>> Computational Biology Unit >>>> Bergen Center for Computational Science >>>> University of Bergen >>>> Hoyteknologisenteret, Thormohlensgate 55 >>>> N-5008 Bergen, Norway >>>> E-mail: xianjun.dong at bccs.uib.no >>>> Tel.: +47 555 84022 >>>> Fax : +47 555 84295 >>>> ========================================== >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> ========================================== >>>> Xianjun Dong >>>> PhD student, Lenhard group >>>> Computational Biology Unit >>>> Bergen Center for Computational Science >>>> University of Bergen >>>> Hoyteknologisenteret, Thormohlensgate 55 >>>> N-5008 Bergen, Norway >>>> E-mail: xianjun.dong at bccs.uib.no >>>> Tel.: +47 555 84022 >>>> Fax : +47 555 84295 >>>> ========================================== >>>> >>>> >>>> >>> >>> >> > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- A non-text attachment was scrubbed... Name: postgrid.pl Type: application/x-perl Size: 2140 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postgrid_highlight.png Type: image/png Size: 7195 bytes Desc: not available URL: From scott at scottcain.net Thu Jun 18 23:30:37 2009 From: scott at scottcain.net (Scott Cain) Date: Thu, 18 Jun 2009 23:30:37 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no> <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com> Actually, to be clear, that's bioperl-live and Bio::Graphics version 1.96 from CPAN. On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain wrote: > Hi Xianjun, > > The attached script (which is not too different from yours--I only did > a little clean up and made the padding consistent) makes the attached > image, which is what I think you want. ?I'm using bioperl-live. > > Scott > > > On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong wrote: >> Hi, Scott, >> >> Do you mind to have a look of the code (below my signature) if I use the >> -postgrid callback correctly? >> I still cannnot get the background for the whole panel. >> >> Thanks >> >> Xianjun >> >> >> Xianjun Dong wrote: >>> >>> Hi, Scott >>> >>> Before I gave up my own whole solution to use GBrowse, I still want to >>> bother you once: >>> >>> As you suggested, I put -postgrid option when the panel, which will call a >>> function to draw the background. The code below is almost copied from the >>> online POD of Bio::Graphics::Panel (see >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html >>> ) >>> >>> But it still does not work. Could you help to have a look? I paste it >>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap >>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) >>> >>> THanks >>> >>> Xianjun >>> >>> ----------------------------------------------- mytestcode.pl >>> -------------------------- >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use lib "$ENV{HOME}/lib"; >>> >>> use Bio::Graphics; >>> use Bio::Graphics::Feature; >>> my $ftr= 'Bio::Graphics::Feature'; >>> >>> # processed_transcript >>> my $trans1 = >>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>> my $trans2 = >>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>> my $trans3 = >>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans4 = >>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans5 = >>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>> my $trans ?= >>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>> >>> # hightlight >>> my $trans31 = >>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>> -source=>'a'); >>> my $trans41 = >>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>> -source=>'b'); >>> >>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it); >>> >>> sub gap_it { >>> ? ?my $gd ? ?= shift; >>> ? ?my $panel = shift; >>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600); >>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top; >>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom; >>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red'); >>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); >>> } >>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >>> and 1.6 >>> #$panel->add_track([$trans41,$trans31], >>> # ? ? ? ? ?-glyph ? => 'background', >>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>> 'a')?'#cccccc':'#fffc22'}, >>> # ? ? ? ? ? ? ? ? ?); >>> >>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>> ? ? ? ? ? ? ? ? -glyph=>'arrow', >>> ? ? ? ? ? ? ? ? -double=>1, >>> ? ? ? ? ? ? ? ? -tick=>2); >>> >>> $panel->add_track($trans, >>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred', >>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred', >>> ? ? ? ? ? ? ? ? -title => '$source', >>> ? ? ? ? ? ? ? ? -link => >>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL >>> ? ? ? ? ? ? ? ? ); >>> ?print $panel->png; >>> >>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >>> 1.2.3 >>> my $map = $panel->create_web_map("image"); >>> $panel->finished(); >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Scott Cain wrote: >>>> >>>> Hi Xianjun, >>>> >>>> I understand what you want to do, as the current version of gbrowse >>>> does this, which uses bioperl 1.6. ?Without digging through the code, >>>> I can't tell you exactly how this works and you didn't send your code >>>> that uses this callback, so I can't try it either. >>>> >>>> One thing that is different between your code and gbrowse is that each >>>> of the tracks is actually a seperate panel (to allow track dragging), >>>> so it possible that this sort of callback doesn't work for >>>> Bio::Graphics any more. >>>> >>>> Scott >>>> >>>> On Saturday, June 13, 2009, Xianjun Dong >>>> wrote: >>>> >>>>> >>>>> Hi, Scott >>>>> >>>>> Thanks for your reply first. >>>>> >>>>> I still have question: I dig out the code from GBrowse (which I paste >>>>> below). Method make_postgrid_callback gets all highlight region and then use >>>>> hilite_regions_closure function to draw them out, using the following GD >>>>> function: >>>>> >>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>>> >>>>> where the $bottom=$panel->bottom. This is the only difference from my >>>>> code, where I use $gd->height. I guess they are almost same (except the >>>>> pad_bottom), we can see this in the code of >>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>>>> >>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for >>>>> my highlight regions. The output is same, when using the library of Bioperl >>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >>>>> >>>>> OK. I might have not explained my question explicitly. My question is: >>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can >>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"), >>>>> where the highlight range will go from the roof to the floor. While in >>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, >>>>> not the whole panel. OK, did I explain clearly now? you can see the >>>>> difference of the two images. >>>>> >>>>> [I am not sure the mailist allow to attach image, otherwise, I put them >>>>> in the following links: >>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png >>>>> test.bioperl1.2.3.png: >>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ] >>>>> >>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on >>>>> your computer? >>>>> >>>>> Really want to know how this works in bioperl 1.2.3 (Even though this >>>>> might be a bug at that version, or whatever) >>>>> >>>>> Thanks >>>>> >>>>> Xianjun >>>>> ============================================= >>>>> >>>>> # this generates the callback for highlighting a region >>>>> sub make_postgrid_callback { >>>>> ?my $settings = shift; >>>>> ?return unless ref $settings->{h_region}; >>>>> >>>>> ?my @h_regions = map { >>>>> ? my ($h_ref,$h_start,$h_end,$h_color) = >>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>>>> ? defined($h_ref) && $h_ref eq $settings->{ref} >>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] >>>>> ? ? ? ? ? ? ? ?: () >>>>> ?} >>>>> ? @{$settings->{h_region}}; >>>>> >>>>> ?return unless @h_regions; >>>>> ?return hilite_regions_closure(@h_regions); >>>>> } >>>>> >>>>> # this subroutine generates a Bio::Graphics::Panel callback closure >>>>> # suitable for hilighting a region of a panel. >>>>> # The args are a list of [start,end,color] >>>>> sub hilite_regions_closure { >>>>> ?my @h_regions = @_; >>>>> >>>>> ?return sub { >>>>> ? my $gd ? ? = shift; >>>>> ? my $panel ?= shift; >>>>> ? my $left ? = $panel->pad_left; >>>>> ? my $top ? ?= $panel->top; >>>>> ? my $bottom = $panel->bottom; >>>>> ? for my $r (@h_regions) { >>>>> ? ? my ($h_start,$h_end,$h_color) = @$r; >>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see >>>>> something >>>>> ? ? # assuming top is 0 so as to ignore top padding >>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>>> ? } >>>>> ?}; >>>>> } >>>>> >>>>> >>>>> Scott Cain wrote: >>>>> >>>>> Hello Xianjun, >>>>> >>>>> I don't think that approach will work. ?What you almost certainly need >>>>> to do is a postgrid callback that does the drawing of the highlighted >>>>> region. ?For example code of how to do this, take a look at the >>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option >>>>> -postgrid is a method of Bio::Graphics::Panel. >>>>> >>>>> Scott >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong >>>>> wrote: >>>>> >>>>> >>>>> HI, >>>>> >>>>> I am not sure this is the right place I can get help. >>>>> >>>>> I've suffered by a problem for several days: I want to highlight parts >>>>> of >>>>> regions in my track, using a different background color. To do that, I >>>>> defined a glyph named "background", based on the >>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>>>> method, by adding code like below: >>>>> >>>>> $gd->filledRectangle($left,0,$right,$gd->height, >>>>> $self->factory->translate_color($color)); >>>>> >>>>> # the script is pasted at the end >>>>> >>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>>>> highlight regions into a list of features, and add_track with >>>>> -glyph=>'background'. (see the following script, test.pl) This really >>>>> works >>>>> as I expect, which will add a colored block at background of all tracks >>>>> in a >>>>> panel (including the ruler arrow). You can see the output image in >>>>> attached >>>>> file "test.bioperl1.2.3.png" >>>>> >>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does >>>>> not >>>>> work. Well, it works, but the highlight part only shrink to a low >>>>> height, >>>>> instead of covering all tracks in the panel. I also attached the output >>>>> here, see the file "test.bioperl1.6.png". >>>>> >>>>> I tried to think about the reason, the 'background' module is based on >>>>> the >>>>> generic module. What can cause the difference? Is it because $gd->height >>>>> is >>>>> different, or the tracks followed with 'background' track can not draw >>>>> from >>>>> the first position? >>>>> >>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>>>> person >>>>> solve problem, wise person avoid problem"...) But another problem is >>>>> coming: >>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >>>>> function, which means I have to use some higher version if I want to >>>>> create >>>>> web map for my graphics, but then I have to give up using highlight >>>>> background. >>>>> >>>>> OK. It's long enough for my first-time submission here. Hope someone can >>>>> throw me some clue. >>>>> >>>>> Thanks ahead!! >>>>> >>>>> Xianjun >>>>> >>>>> >>>>> ==================== test.pl ======================= >>>>> #!/usr/bin/perl >>>>> >>>>> use strict; >>>>> use lib "$ENV{HOME}/lib"; >>>>> >>>>> use Bio::Graphics; >>>>> use Bio::Graphics::Feature; >>>>> my $ftr= 'Bio::Graphics::Feature'; >>>>> >>>>> # processed_transcript >>>>> my $trans1 = >>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>>>> my $trans2 = >>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>>>> my $trans3 = >>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>>>> -source=>'a'); >>>>> my $trans4 = >>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>>>> -source=>'a'); >>>>> my $trans5 = >>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>>>> my $trans ?= >>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>>>> >>>>> # hightlight >>>>> my $trans31 = >>>>> >>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>>>> -source=>'a'); >>>>> my $trans41 = >>>>> >>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>>>> -source=>'b'); >>>>> >>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); >>>>> >>>>> # the following track works as I expected in bioperl 1.2.3, but not in >>>>> 1.5 >>>>> and 1.6 >>>>> $panel->add_track([$trans41,$trans31], >>>>> ? ? ? ?-glyph ? => 'background', >>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>>>> 'a')?'#cccccc':'#fffc22'}, >>>>> ? ? ? ? ? ? ? ?); >>>>> >>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow', >>>>> ? ? ? ? ? ? ? ?-double=>1, >>>>> ? ? ? ? ? ? ? ?-tick=>2); >>>>> >>>>> $panel->add_track($trans, >>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred', >>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred', >>>>> ? ? ? ? ? ? ? ?-title => '$source', >>>>> ? ? ? ? ? ? ? ?-link => >>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>>>> ?#EnsEMBL >>>>> ? ? ? ? ? ? ? ?); >>>>> ?print $panel->png; >>>>> >>>>> # the following part works in bioperl 1.5 and 1.6, but not work in >>>>> Bioperl >>>>> 1.2.3 >>>>> my $map = $panel->create_web_map("image"); >>>>> $panel->finished(); >>>>> >>>>> 1; >>>>> >>>>> ==================== background.pm ======================= >>>>> package Bio::Graphics::Glyph::background; >>>>> >>>>> use strict; >>>>> use base 'Bio::Graphics::Glyph::generic'; >>>>> sub pad_top{ >>>>> ?return 0; >>>>> } >>>>> >>>>> sub draw_component { >>>>> ?my $self = shift; >>>>> ?#$self->SUPER::draw_component(@_); >>>>> ?my ($gd,$dx,$dy) = @_; >>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>>>> >>>>> ?# draw an arrow to indicate the direction of transcript >>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc'; >>>>> ?$gd->filledRectangle($left,0,$right,$gd->height, >>>>> $self->factory->translate_color($color)); >>>>> } >>>>> >>>>> 1; >>>>> >>>>> -- >>>>> ========================================== >>>>> Xianjun Dong >>>>> PhD student, Lenhard group >>>>> Computational Biology Unit >>>>> Bergen Center for Computational Science >>>>> University of Bergen >>>>> Hoyteknologisenteret, Thormohlensgate 55 >>>>> N-5008 Bergen, Norway >>>>> E-mail: xianjun.dong at bccs.uib.no >>>>> Tel.: +47 555 84022 >>>>> Fax : +47 555 84295 >>>>> ========================================== >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ========================================== >>>>> Xianjun Dong >>>>> PhD student, Lenhard group >>>>> Computational Biology Unit >>>>> Bergen Center for Computational Science >>>>> University of Bergen >>>>> Hoyteknologisenteret, Thormohlensgate 55 >>>>> N-5008 Bergen, Norway >>>>> E-mail: xianjun.dong at bccs.uib.no >>>>> Tel.: +47 555 84022 >>>>> Fax : +47 555 84295 >>>>> ========================================== >>>>> >>>>> >>>>> >>>> >>>> >>> >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From roy.chaudhuri at gmail.com Fri Jun 19 06:34:24 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 19 Jun 2009 11:34:24 +0100 Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file In-Reply-To: <24095355.post@talk.nabble.com> References: <24095355.post@talk.nabble.com> Message-ID: <4A3B69B0.8080305@gmail.com> Hi Cesar, I can replicate this using an old Bioperl (version 1.5.2), but it appears to be fixed in version 1.6 and bioperl-live - the scientific_name method returns "Bacillus anthracis str. Sterne". Hope this helps. Roy. Cesar Arze wrote: > Hi all, > I've searched through the mailing list and bug-tracker looking for any > indication of this (what I presume to be) bug I have been encountering when > parsing certain Genbank files using SeqIO::GenBank but have yet to find > anything. I apologize in advance if this is something that has already been > addressed. > > When parsing these files and extracting the scientific name it seems that > line breaks are causing the lineage info found in the ORGANISM section to be > captured as part of the scientific name. An example of this is accession > NC_005945: > > ORGANISM Bacillus anthracis str. Sterne > Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; > Bacillus > cereus group. > > Bacillus cereus has a line break which then causes scientific name to > capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" > ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes; > Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name. > > Not sure if anyone has ever ran into this problem but I would very much > appreciate any help or direction. From cjfields at illinois.edu Fri Jun 19 16:57:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 19 Jun 2009 15:57:36 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Message-ID: So, to follow up (and make sure we don't have any overlapping tuits) we should probably determine who wants to work on what (i.e. fastq updating, etc). I think it's possible to quickly add in Solexa/ Illumina/Sanger fastq similar to BioPython, just don't want to step on anyone's toes if they are halfway through doing this. chris On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote: > Better than colorspaced discussions for sure ;) > > Elia > > On 17 Jun 2009, at 21:35, Chris Fields wrote: > >> So, #1 priority is to get fastq up-to-speed, then maybe assess >> other options. >> >> Illuminating discussion, thanks Elia! >> >> urgh, excuse unintended bad pun above... >> >> chris >> >> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: >> >>> Interesting that you mention the database issue. We found that for >>> specific memory/CPU intenstive things we also switch to using dbs. >>> For example, after many years of loyal use of disconnected_ranges >>> we switched to a simple SQL implementation of it, because of the >>> large performance gains it would give us. Similarly in Ensembl as >>> well as in the old days of bioperl-db we opted for doing subseq >>> within SQL where possible. >>> >>> Some lean way of SQL'izing specific components could be less >>> "disruptive" than avoiding object creation and provide significant >>> gains in performance. Could be set as an optional flag, and could >>> use temporary ad hoc SQL databases? >>> >>> Still, priority now is to make SeqIO compliant with all those >>> formats, than we can worry about performance :) >>> >>> Elia >>> >>> On 17 Jun 2009, at 20:30, Chris Fields wrote: >>> >>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>>> >>>>> Tristan Lefebure wrote: >>>>>> Hello, >>>>>> Regarding next-gen sequences and bioperl, following my >>>>>> experience, another issue is bioperl speed. For example, if you >>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads >>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, >>>>>> well, you've got to be patient (but may be I missed some >>>>>> shortcuts...). >>>>> >>>>> This is my concern as well. Or, rather, is there actually a >>>>> significant set of users out there who are dealing with next-gen >>>>> sequencing and would consider using BioPerl for their work? >>>>> >>>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>>> at least are probably never going to use BioPerl for the work. >>>> >>>> Are you using pure perl or (gasp) something else? ;> >>>> >>>> Judging by the feedback there are definitely a set of users who >>>> would like to integrate nextgen into bioperl somehow, probably to >>>> take advantage of other aspects of bioperl. >>>> >>>>>> A pure perl solution will be between 100 to 1000x faster... >>>>>> Would it be possible to have an ultra-light quality object with >>>>>> few simple methods for next-gen reads? >>>>> >>>>> The fastq parser itself already seems pretty fast. The way to >>>>> get the speedup is to not create any Bio::Seq* objects but just >>>>> return the data directly. At that point it's not taking much >>>>> advantage of BioPerl. But certainly it could be done... >>>> >>>> >>>> I suppose the best way to assess what needs to be done is come up >>>> with a set of 'use cases' specifying what users want so we can >>>> design around them, otherwise we're shooting in the dark. >>>> >>>> I'm personally wondering if this could be done as a sequence >>>> database, something similar in theme to Lincoln's >>>> SeqFeature::Store, but sequence only, and returns quality objects >>>> in a similar manner (ala Storable)? Not sure whether that's >>>> feasible, but it's appears at least scalable. >>>> >>>> chris >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> --- >>> Senior Lecturer, Bioinformatics >>> UCL Cancer Institute >>> Paul O' Gorman Building >>> University College London >>> Gower Street >>> WC1E 6BT >>> London >>> UK >>> >>> Office (UCL): +44 207 679 6493 >>> Office (ICMS): +44 0207 8822374 >>> >>> Mobile: +44 7597 566 194 >>> Mobile (Italy): +39 338 8448801 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Sat Jun 20 04:46:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 20 Jun 2009 09:46:31 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote: > > > On Jun 17, 2009, at 8:25 AM, Peter wrote: > >>> Peter's suggestions also are reasonable, though does biopython have a >>> separate module for each of these variations? ?Our version (I believe) >>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the >>> fastq variant passed in as a separate named argument. >> >> Biopython's SeqIO gives the three FASTQ variants their own unique >> names. This format name is a required argument for parsing/writing >> (we don't try and guess the file format from the data contents). >> Internally we have three separate FASTQ parsers/writers although >> they do share code. > > We could easily do the same if others agree. ?Actually, if we specified that > shorthand for a variant on a format would be designated as -format => > 'format-variant', I think we could easily hack SeqIO to deal with that by > splitting on '-' and passing everything to the constructor as (-format => > 'format', -variant => 'variant'). ?Very little repeated code in this case, > just an additional named parameter indicating the format variant (and the > SeqIO class can do the type checking on that within the constructor). Yes, when I started using names like "fastq-solexa" I did have in mind "main-variant" naming convention, and potentially Biopython may one day actually use this structure when allocating a Bio.SeqIO job to the appropriate parser or writer. For now, the Biopython list of formats is fairly short (and there are relatively few of these sub-formats) so to keep things simple we just have a flat mapping from the format name (e.g. "fasta", "fastq", "fastq-solexa") to the parser/write code. Peter From e.stupka at ucl.ac.uk Sat Jun 20 16:12:18 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Sat, 20 Jun 2009 21:12:18 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Message-ID: Hi Chris, I agree. I have not written a single line of code so far, while Heikki has some (but has been silent for a while) and you have perhaps some code ready to roll. I am happy to help where needed, just let me know what you'd like me to focus on. If you want to go ahead and implement the fastq staff discussed I can focus on bioperl-run. cheers Elia On 19 Jun 2009, at 21:57, Chris Fields wrote: > So, to follow up (and make sure we don't have any overlapping tuits) > we should probably determine who wants to work on what (i.e. fastq > updating, etc). I think it's possible to quickly add in Solexa/ > Illumina/Sanger fastq similar to BioPython, just don't want to step > on anyone's toes if they are halfway through doing this. > > chris > > On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote: > >> Better than colorspaced discussions for sure ;) >> >> Elia >> >> On 17 Jun 2009, at 21:35, Chris Fields wrote: >> >>> So, #1 priority is to get fastq up-to-speed, then maybe assess >>> other options. >>> >>> Illuminating discussion, thanks Elia! >>> >>> urgh, excuse unintended bad pun above... >>> >>> chris >>> >>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: >>> >>>> Interesting that you mention the database issue. We found that >>>> for specific memory/CPU intenstive things we also switch to using >>>> dbs. For example, after many years of loyal use of >>>> disconnected_ranges we switched to a simple SQL implementation of >>>> it, because of the large performance gains it would give us. >>>> Similarly in Ensembl as well as in the old days of bioperl-db we >>>> opted for doing subseq within SQL where possible. >>>> >>>> Some lean way of SQL'izing specific components could be less >>>> "disruptive" than avoiding object creation and provide >>>> significant gains in performance. Could be set as an optional >>>> flag, and could use temporary ad hoc SQL databases? >>>> >>>> Still, priority now is to make SeqIO compliant with all those >>>> formats, than we can worry about performance :) >>>> >>>> Elia >>>> >>>> On 17 Jun 2009, at 20:30, Chris Fields wrote: >>>> >>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>>>> >>>>>> Tristan Lefebure wrote: >>>>>>> Hello, >>>>>>> Regarding next-gen sequences and bioperl, following my >>>>>>> experience, another issue is bioperl speed. For example, if >>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads >>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, >>>>>>> well, you've got to be patient (but may be I missed some >>>>>>> shortcuts...). >>>>>> >>>>>> This is my concern as well. Or, rather, is there actually a >>>>>> significant set of users out there who are dealing with next- >>>>>> gen sequencing and would consider using BioPerl for their work? >>>>>> >>>>>> I'm working with all the 1000-genomes data at the Sanger, and >>>>>> we at least are probably never going to use BioPerl for the work. >>>>> >>>>> Are you using pure perl or (gasp) something else? ;> >>>>> >>>>> Judging by the feedback there are definitely a set of users who >>>>> would like to integrate nextgen into bioperl somehow, probably >>>>> to take advantage of other aspects of bioperl. >>>>> >>>>>>> A pure perl solution will be between 100 to 1000x faster... >>>>>>> Would it be possible to have an ultra-light quality object >>>>>>> with few simple methods for next-gen reads? >>>>>> >>>>>> The fastq parser itself already seems pretty fast. The way to >>>>>> get the speedup is to not create any Bio::Seq* objects but just >>>>>> return the data directly. At that point it's not taking much >>>>>> advantage of BioPerl. But certainly it could be done... >>>>> >>>>> >>>>> I suppose the best way to assess what needs to be done is come >>>>> up with a set of 'use cases' specifying what users want so we >>>>> can design around them, otherwise we're shooting in the dark. >>>>> >>>>> I'm personally wondering if this could be done as a sequence >>>>> database, something similar in theme to Lincoln's >>>>> SeqFeature::Store, but sequence only, and returns quality >>>>> objects in a similar manner (ala Storable)? Not sure whether >>>>> that's feasible, but it's appears at least scalable. >>>>> >>>>> chris >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> --- >>>> Senior Lecturer, Bioinformatics >>>> UCL Cancer Institute >>>> Paul O' Gorman Building >>>> University College London >>>> Gower Street >>>> WC1E 6BT >>>> London >>>> UK >>>> >>>> Office (UCL): +44 207 679 6493 >>>> Office (ICMS): +44 0207 8822374 >>>> >>>> Mobile: +44 7597 566 194 >>>> Mobile (Italy): +39 338 8448801 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From lincoln.stein at gmail.com Sat Jun 20 17:01:43 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Sat, 20 Jun 2009 17:01:43 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <6dce9a0b0906201401j40175dbdscd71360396fe9f7a@mail.gmail.com> Hi All, Apropos of this, I am about to release to CPAN a BioPerl interface to SAM and BAM files. The documentation is still in progress, but you can get CVS access here: % cvs -d :pserver:anonymous at gmod.cvs.sourceforge.net:/cvsroot/gmod co gbrowse-adaptors/Bio-SamTools Lincoln On Wed, Jun 17, 2009 at 7:29 AM, Elia Stupka wrote: > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and hope > to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen sequencing > data, tools, etc. Since I am sure I have missed out on a lot of recent > developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various formats > to/from FASTQ does not seem to be supported. Some code can be found within > Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be > good if it could make its way into SeqIO? And similarly, potentially, for > other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools that > have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people actively > developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From hartzell at alerce.com Mon Jun 22 09:18:20 2009 From: hartzell at alerce.com (George Hartzell) Date: Mon, 22 Jun 2009 06:18:20 -0700 Subject: [Bioperl-l] Anyone at YAPC? Message-ID: <19007.33948.411442.197063@already.dhcp.gene.com> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks. g. From cjfields1 at gmail.com Mon Jun 22 10:05:56 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Mon, 22 Jun 2009 09:05:56 -0500 Subject: [Bioperl-l] changing parameters in Bio::Tools::Run::RemoteBlast In-Reply-To: References: Message-ID: <67ABC7E3-216E-4F5A-B18E-A775A6B4D8F7@gmail.com> Jonas, The best place to send questions is to the mail list (which I've cc'd). If you reply make sure to keep the mail list in the reply-to. There are two ways to set the parameters you want. I'll show you what I consider the best, but I have no way to test it ATM. $factory->submit_parameter($foo => 'bar') is the syntax for setting PUT parameters. Sad to see they didn't provide you with the exact PUT parameter names (as follows): Max target sequences = 100 # MAX_NUM_SEQ Expect threshold = 10 # EXPECT Gap Costs = Existence 11 Extension 1 # GAPCOSTS Compositional adjustments = Conditional compositional score matrix adjustment # COMPOSITION_BASED_STATISTICS 'Compositional adjustments' is as follows (from command-line blastall): -C Use composition-based score adjustments for blastp or tblastn: As first character: D or d: default (equivalent to T) 0 or F or f: no composition-based statistics 2 or T or t: Composition-based score adjustments as in Bioinformatics 21:902-911, 1: Composition-based statistics as in NAR 29:2994-3005, 2001 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally For programs other than tblastn, must either be absent or be D, F or 0. As second character, if first character is equivalent to 1, 2, or 3: After the factory line and prior to the BLAST call you can add in the following (completely untested, excuse any possible mistakes) code: my %put = ( MAX_NUM_SEQ => 100, EXPECT => 10, GAPCOSTS => '11 1', COMPOSITION_BASED_STATISTICS => 2 # could be 1 as well ); for my $putName (keys %put) { $self->submit_parameter($putName,$put{$putName}); } chris On Jun 22, 2009, at 8:14 AM, Jonas Schaer wrote: > Hi there, > I hope it's OK to ask you a question about the bio perl module > Bio::Tools::Run::RemoteBlast. > My problem is, that I get different results using this perl-skript: > > ####################################################################################################################################################################################### > use Bio::Seq::SeqFactory; > use Bio::Tools::Run::RemoteBlast; > use strict; > my @blast_report; > my $prog = 'blastp'; > my $db = 'nr'; > my $e_val= '1e-10'; > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > #my $input = @_; > my > $ > blast_seq > = > 'MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE > '; > #$v is just to turn on and off the messages > my $v = 1; > my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > 'Bio::PrimarySeq'); > my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > "$blast_seq"); > my $filename='temp2.out'; > my $r = $factory->submit_blast($seq); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) > { > foreach my $rid ( @rids ) > { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > } > else > { > my $result = $rc->next_result(); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), > "\n"; > while ( my $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > > > } > @blast_report = get_file_data ($filename); > return @blast_report; > > > sub get_file_data > { > use strict; > my($filename) = @_; > use strict; > use warnings; > # Initialize variables > my @filedata = ( ); > unless( open(GET_FILE_DATA, $filename) ) > { > print STDERR "Cannot open file \"$filename\"\n\n"; > exit; > } > @filedata = ; > close GET_FILE_DATA; > print @filedata; > return @filedata; > } > > ####################################################################################################################################################################################### > > ... and the blastp on the ncbi-homepage. The people from NCBI wrote > me that I have to change some parameters: > "" > You need to have the following: > > > Max target sequences = 100 > Expect threshold = 10 > Gap Costs = Existence 11 Extension 1 > Compositional adjustments = Conditional compositional score matrix > adjustment"" > > Could you please tell me exactly how to change this parameters > within my perl-skript? I think I have to use the "put" command, but > I just cannot find out, how... > > Regards and thank you so much in advance :), > > Jonas Schaer From biopython at maubp.freeserve.co.uk Mon Jun 22 10:24:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Jun 2009 15:24:55 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> Message-ID: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote: > Peter wrote: >> Other issues to keep in mind: >> >> (3) There should be no warning parsing files where the optional repeated >> title is missing on the "+" lines (as discussed earlier on the BioPerl >> list). > > Agreed, though we'll have to check the current fastq parser to see if that's > currently the case. ?I thought that was fixed but maybe not? > >> (4) When writing FASTQ files should BioPerl omit the optional repeated >> title on the "+" line? Biopython omits this as I understand this to be >> common practice, and can make a big different to file sizes - especially >> on short read data from Solexa/Illumina. > > Agreed, particularly if it's commonly encountered. > >> (5) Also test reading and writing files with an optional description (as >> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA >> for examples, e.g. >> >> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC > > Should be easy enough to implement with a simple regex. > >> (6) Test reading and writing files where the encoded quality string starts >> with a "@" or a "+" character, e.g. >> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >> >> Peter > > Mark, getting all that? ;> > > chris Another couple of points that I should have remembered earlier, related to converting between PHRED scores and Solexa scores. On the bright side, with Illumina abandoning the Solexa scores in pipeline 1.3+, these issues will go away with time: (7) If BioPerl will be converting Solexa scores to/from PHRED scores as integers automatically (as discussed earlier), make sure you round to the nearest whole number (don't just truncate with a call to int!). MAQ does this by adding 0.5 before calling int (while in Biopython I just use Python's round function). (8) When asked to write out an old Solexa style FASTQ file, what will you do if given a standard Sanger FASTQ file (or a new Illumina 1.3+ FASTQ file) containing a base with PHRED quality zero? This maps to a Solexa quality of minus infinity... Right now the development version of Biopython will throw an error in this situation, but mapping to the lowest observed Solexa score might be reasonable. Peter From cjfields at illinois.edu Mon Jun 22 09:54:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Jun 2009 08:54:22 -0500 Subject: [Bioperl-l] Anyone at YAPC? In-Reply-To: <19007.33948.411442.197063@already.dhcp.gene.com> References: <19007.33948.411442.197063@already.dhcp.gene.com> Message-ID: I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, etc). May be worth going on IRC to find everyone. I'm giving serious thought to going next year if I can get enough work done towards a perl6 or Moose-based bioperl. chris On Jun 22, 2009, at 8:18 AM, George Hartzell wrote: > > I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks. > > g. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vofford at rvc.ac.uk Mon Jun 22 12:10:43 2009 From: vofford at rvc.ac.uk (Offord, Victoria) Date: Mon, 22 Jun 2009 17:10:43 +0100 Subject: [Bioperl-l] Clustalw Message-ID: Hi, Can anyone help and tell me where I am going wrong please J I am getting this error from the following script: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ClustalW call (clustalw align -infile=/tmp/8PVli9JWEa/L_pxrEtzD1 -output=gcg -matrix=BLOSUM -ktuple=2 -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515 STACK: tester.pl:25 ----------------------------------------------------------- #--------------------------------------------SCRIPT--------------------- --------------------------# #!/usr/bin/perl -w use Bio::Tools::Run::Alignment::Clustalw; $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9'; use Bio::Seq; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK"; my $b = "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP"; my $seq1 = Bio::Seq->new ( -seq => $a, -id => 'real', -desc => 'this is a real Seq'); my $seq2 = Bio::Seq->new ( -seq => $b, -id => 'test', -desc => 'this is a test Seq'); my @seq_array = ($seq1,$seq2); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); From Kevin.M.Brown at asu.edu Mon Jun 22 12:48:27 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 22 Jun 2009 09:48:27 -0700 Subject: [Bioperl-l] Clustalw In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4060B9BAF@EX02.asurite.ad.asu.edu> Do you have ClustalW installed and in your path? > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Offord, Victoria > Sent: Monday, June 22, 2009 9:11 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Clustalw > > Hi, > > > > Can anyone help and tell me where I am going wrong please J > > I am getting this error from the following script: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: ClustalW call (clustalw align -infile=/tmp/8PVli9JWEa/L_pxrEtzD1 > -output=gcg -matrix=BLOSUM -ktuple=2 > -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such > file or directory > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::Run::Alignment::Clustalw::_run > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756 > > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515 > > STACK: tester.pl:25 > > ----------------------------------------------------------- > > > > > > > > > > #--------------------------------------------SCRIPT----------- > ---------- > --------------------------# > > #!/usr/bin/perl -w > > use Bio::Tools::Run::Alignment::Clustalw; > > $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9'; > > use Bio::Seq; > > > > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > > > > my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK"; > > my $b = > "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP"; > > my $seq1 = Bio::Seq->new ( -seq => $a, > > -id => 'real', > > -desc => 'this is a real Seq'); > > my $seq2 = Bio::Seq->new ( -seq => $b, > > -id => 'test', > > -desc => 'this is a test Seq'); > > > > > my @seq_array = ($seq1,$seq2); > > > > my $seq_array_ref = \@seq_array; > > my $aln = $factory->align($seq_array_ref); > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jun 22 15:20:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Jun 2009 14:20:14 -0500 Subject: [Bioperl-l] bioperl-dev or branch? : redux In-Reply-To: <6DF025D32D664F61BC64B49184A2E6DD@NewLife> References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com> <6DF025D32D664F61BC64B49184A2E6DD@NewLife> Message-ID: <4766E259-B184-4552-817E-FBBB3A71A17F@illinois.edu> On Jun 17, 2009, at 11:47 AM, Mark A. Jensen wrote: > Hi All, > I thought I'd revisit this thread, since in the last couple weeks, > have used both techniques (bioperl-dev and branch from trunk) to > produce completed projects. My thoughts: > > Using bioperl-dev was very nice for creating Bio::Search::Tiling, a > new addition to the core api. There was no pressure to conform to the > existing api there. In particular, there was no implicit insistence to > make things work through Bio::Search::Utils, and I was free to factor > it out. The Tiling api was definitely unstable until the end, when it > was ported to the core. As I made regular reports to bioperl-l, > everything was transparent and up front, and I received excellent > suggestions there (as usual). > For Bio::Restriction, using the branch was just as natural. Here, the > existing structure was well established, and all the work needed to > happen beneath the api. All old t/Restriction tests needed to pass, > and additional ones created for the new functionality. So here, using > bioperl-dev wasn't natural, even though some "experiments" needed to > be tried (some succeeded and some failed, as you can see in the > commentary at Bug #2855). Even though the new code turned out to > require substantial effort, the effort was required to fix a true bug > in the working core, and any fixes needed to work transparently with > respect to the users for whom this bug had not been an issue. Using > the branch made it relatively easy to merge quickly back into the core > when done, and there is a certain psychological pressure too provided > by an open branch which is helpful. > > Hilmar raised the very good point in the previous discussion that > (essentially) bioperl-dev shouldn't become a sandbox with lots of > unfinished code scraps and derelict stuff that doesn't work. My view > is bioperl-dev will become a sandbox only if we treat it like > one. I've filled out the Bioperl-dev page on the wiki > (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing > some recognition to devs there whose modules become part of the > core may be a better way to insure that projects that are started on > bioperl-dev actually get finished, than to prescribe beforehand what > kinds of projects may get started. I believe this follows the adage of > liberality on what is accepted, and strictness on what is emitted. > > cheers, MAJ The main reason I wanted a bioperl-dev is for some code or implementations that don't seem to fit on a branch or directly into core, but would definitely be of use. The tendency in the past has been to accept anything that works into core (the 'bazaar' approach). Initially that worked well, but the long-term end result has become potentially unmaintainable code bloat. Committing new code to a branch isn't a great idea either, primarily b/c the code may be lost to the branch if it isn't followed up and remerged into trunk. And forcing the code to fit into bioperl (or vice versa, which happened re: Feature Annotation) isn't the best way either. Like Hilmar, though, I don't want dev to become a (sandbox|code dumping ground) either, so I think some additional discussion is warranted if anyone else wants to chime in. chris From mauricio at open-bio.org Mon Jun 22 15:56:33 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Mon, 22 Jun 2009 14:56:33 -0500 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> <4A3134EB.4080702@open-bio.org> Message-ID: <4A3FE1F1.40607@open-bio.org> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to the list. Cheers, Mauricio. Mark A. Jensen wrote: > cheers Mauricio! MAJ > ----- Original Message ----- From: "Mauricio Herrera Cuadra" > > To: "Mark A. Jensen" > Cc: "Chris Fields" ; "BioPerl List" > > Sent: Thursday, June 11, 2009 12:46 PM > Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? > > >> Hi Mark, >> >> I'll take a look into this sometime between today and tomorrow. Will >> keep you posted. Thanks for the heads up :) >> >> Mauricio. >> >> >> Mark A. Jensen wrote: >>> Hi Chris and list- >>> Will documentation for release 1.6 be available in pdoc on >>> doc.bioperl.org? >>> I notice also that autogenerated documentation for bioperl-live >>> doesn't contain >>> new modules (or HIVQuery & Tiling, anyway ;) )-- >>> cheers, Mark >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > > From cjfields at illinois.edu Mon Jun 22 16:29:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Jun 2009 15:29:46 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> Message-ID: On Jun 22, 2009, at 9:24 AM, Peter wrote: > On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote: >> Peter wrote: >>> Other issues to keep in mind: >>> >>> (3) There should be no warning parsing files where the optional >>> repeated >>> title is missing on the "+" lines (as discussed earlier on the >>> BioPerl >>> list). >> >> Agreed, though we'll have to check the current fastq parser to see >> if that's >> currently the case. I thought that was fixed but maybe not? >> >>> (4) When writing FASTQ files should BioPerl omit the optional >>> repeated >>> title on the "+" line? Biopython omits this as I understand this >>> to be >>> common practice, and can make a big different to file sizes - >>> especially >>> on short read data from Solexa/Illumina. >> >> Agreed, particularly if it's commonly encountered. >> >>> (5) Also test reading and writing files with an optional >>> description (as >>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA >>> for examples, e.g. >>> >>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC >> >> Should be easy enough to implement with a simple regex. >> >>> (6) Test reading and writing files where the encoded quality >>> string starts >>> with a "@" or a "+" character, e.g. >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >>> >>> Peter >> >> Mark, getting all that? ;> >> >> chris > > Another couple of points that I should have remembered earlier, > related to converting between PHRED scores and Solexa scores. > On the bright side, with Illumina abandoning the Solexa scores > in pipeline 1.3+, these issues will go away with time: > > (7) If BioPerl will be converting Solexa scores to/from PHRED > scores as integers automatically (as discussed earlier), make > sure you round to the nearest whole number (don't just truncate > with a call to int!). MAQ does this by adding 0.5 before calling > int (while in Biopython I just use Python's round function). That can probably be done with sprintf if needed. It avoids a call to POSIX functions. > (8) When asked to write out an old Solexa style FASTQ file, > what will you do if given a standard Sanger FASTQ file (or a > new Illumina 1.3+ FASTQ file) containing a base with PHRED > quality zero? This maps to a Solexa quality of minus infinity... > Right now the development version of Biopython will throw an > error in this situation, but mapping to the lowest observed > Solexa score might be reasonable. > > Peter Maybe address with a warning followed by assigning to the lowest solexa score? chris From cjfields at illinois.edu Mon Jun 22 16:27:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Jun 2009 15:27:32 -0500 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <4A3FE1F1.40607@open-bio.org> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> <4A3134EB.4080702@open-bio.org> <4A3FE1F1.40607@open-bio.org> Message-ID: np. Thanks Mauricio! chris On Jun 22, 2009, at 2:56 PM, Mauricio Herrera Cuadra wrote: > Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 > release and latest code from bioperl-live. Also added bioperl-dev > and bioperl-pise to the list. > > Cheers, > Mauricio. > > > Mark A. Jensen wrote: >> cheers Mauricio! MAJ >> ----- Original Message ----- From: "Mauricio Herrera Cuadra" > > >> To: "Mark A. Jensen" >> Cc: "Chris Fields" ; "BioPerl List" > > >> Sent: Thursday, June 11, 2009 12:46 PM >> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? >>> Hi Mark, >>> >>> I'll take a look into this sometime between today and tomorrow. >>> Will keep you posted. Thanks for the heads up :) >>> >>> Mauricio. >>> >>> >>> Mark A. Jensen wrote: >>>> Hi Chris and list- >>>> Will documentation for release 1.6 be available in pdoc on >>>> doc.bioperl.org? >>>> I notice also that autogenerated documentation for bioperl-live >>>> doesn't contain >>>> new modules (or HIVQuery & Tiling, anyway ;) )-- >>>> cheers, Mark >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jun 22 22:46:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Jun 2009 22:46:58 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> Message-ID: <78130116A84C4D989F3BCC217E8C5ACE@NewLife> Done-- fortinbras-public/bioperl-max-0.1.1 is at ami-b55dbbdc; rakudo cloned at 00:44 UTC, parrot @ r39729, bioperl-live @ 15800, nexml @ r1136. cheers! MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Wednesday, June 10, 2009 12:36 AM Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI > I'll be trying that out, particularly re: bioperl-run. For bioperl-db do you > have mysql or pg? > > Heh, I see Moose is installed. Just need svn'd parrot and git updated rakudo > and we could do some damage... > > chris > > On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote: > >> Hi All, >> >> I've built a public Amazon machine image, loaded with many many >> goodies, including the most recent (r15747) trunks of >> - bioperl-live >> - bioperl-run >> - bioperl-db/biosql >> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit >> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, >> emboss, and more are all there (and most even pass bioperl-run tests), and >> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo >> (r1071) and others. This is *not* a lean mean fighting machine. >> >> Please give it a try if you're so inclined. Fuller details (including >> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max . >> >> Ping me if it doesn't work. >> >> Cheers, >> Mark >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Jun 22 23:22:48 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Jun 2009 23:22:48 -0400 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <4A3FE1F1.40607@open-bio.org> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife><4A3134EB.4080702@open-bio.org> <4A3FE1F1.40607@open-bio.org> Message-ID: <8B93DCE168434F608620AF17CAF12A9F@NewLife> awesome, MHC- cheers and thanks-MAJ ----- Original Message ----- From: "Mauricio Herrera Cuadra" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Monday, June 22, 2009 3:56 PM Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? > Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release > and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to > the list. > > Cheers, > Mauricio. > > > Mark A. Jensen wrote: >> cheers Mauricio! MAJ >> ----- Original Message ----- From: "Mauricio Herrera Cuadra" >> >> To: "Mark A. Jensen" >> Cc: "Chris Fields" ; "BioPerl List" >> >> Sent: Thursday, June 11, 2009 12:46 PM >> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? >> >> >>> Hi Mark, >>> >>> I'll take a look into this sometime between today and tomorrow. Will keep >>> you posted. Thanks for the heads up :) >>> >>> Mauricio. >>> >>> >>> Mark A. Jensen wrote: >>>> Hi Chris and list- >>>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? >>>> I notice also that autogenerated documentation for bioperl-live doesn't >>>> contain >>>> new modules (or HIVQuery & Tiling, anyway ;) )-- >>>> cheers, Mark >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From pmr at ebi.ac.uk Tue Jun 23 07:00:38 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 23 Jun 2009 12:00:38 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> Message-ID: <4A40B5D6.40504@ebi.ac.uk> We just added FASTQ parsing to EMBOSS and faced the same issues. Parsing was easy - find the '@' line, read sequence until the '+' line is reached, then read (seqlen) quality characters ... and check the next line starts with '@' Quality scores are kept as phred values. Phred of 0 means unknown, which in Solexa is -5 (0.75 error rate = could be anything). We assume lower quality scores are from alignments rather than single reads. We gave up on trying to guess the quality score standard and require users to say whether they are sanger, solexa (1.0) or Illumina (1.3) format files. If we only want the sequence then we don't care so we allow "fastq" as a sequence format and ignore the quality scores in that case. We also allow the integer quality score format ... is anyone still using that (it looks horrible to me :-) Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th. Any further tips would be very useful. regards, Peter Rice From biopython at maubp.freeserve.co.uk Tue Jun 23 07:29:56 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Jun 2009 12:29:56 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A40B5D6.40504@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> Message-ID: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice wrote: > > We just added FASTQ parsing to EMBOSS and faced the same issues. > I was going to chat to you about this at BOSC, and suggest this be added to EMBOSS - but you are well ahead of me ;) > Parsing was easy - find the '@' line, read sequence until the '+' line > is reached, then read (seqlen) quality characters ... and check the next > line starts with '@' That is basically what I did for Biopython. > Quality scores are kept as phred values. Phred of 0 means unknown, > which in Solexa is -5 (0.75 error rate = could be anything). A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't quite follow your leap that this corresponds to a Solexa quality of -5. Could you clarify? > We assume lower quality scores are from alignments rather than single reads. Did you mean to say "higher quality scores" (i.e. lower probability of error), e.g a PHRED score of 80 which you can get from MAQ doing read mapping or something consensus based. > We gave up on trying to guess the quality score standard and require > users to say whether they are sanger, solexa (1.0) or Illumina (1.3) > format files. If we only want the sequence then we don't care so we allow > "fastq" as a sequence format and ignore the quality scores in that case. What format names have you used? Ideally we'd have the same names in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and "fastq-illumina"). > We also allow the integer quality score format ... is anyone still using > that (it looks horrible to me :-) Do you mean the QUAL file format holding PHRED scores? Roche provide tools to turn their SFF files into FASTA and QUAL files, so they are still used. > Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th. > > Any further tips would be very useful. Great. See you at BOSC 2009! Peter (Biopython) From pmr at ebi.ac.uk Tue Jun 23 08:22:33 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 23 Jun 2009 13:22:33 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> Message-ID: <4A40C909.40803@ebi.ac.uk> Peter wrote: > On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice wrote: >> We just added FASTQ parsing to EMBOSS and faced the same issues. >> > > I was going to chat to you about this at BOSC, and suggest this be > added to EMBOSS - but you are well ahead of me ;) Not that well ahead really ... someone asked for it in our BoF at BOSC/ISMB last year so we thought we'd better get it done before this one. it was implemented a couple of days ago :-) >> Parsing was easy - find the '@' line, read sequence until the '+' line >> is reached, then read (seqlen) quality characters ... and check the next >> line starts with '@' > > That is basically what I did for Biopython. > >> Quality scores are kept as phred values. Phred of 0 means unknown, >> which in Solexa is -5 (0.75 error rate = could be anything). > > A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't > quite follow your leap that this corresponds to a Solexa quality of -5. Could > you clarify? Phred score is -10 log(p) where p is the probability of error. A phred of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate (3/4 chance that any base you pick is wrong). Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is why Solexa scores can go down to -5 in their fastq format. >> We assume lower quality scores are from alignments rather than single reads. > > Did you mean to say "higher quality scores" (i.e. lower probability of error), > e.g a PHRED score of 80 which you can get from MAQ doing read mapping > or something consensus based. Actually I mean both. Error probabilities below 0.75 for a single base are silly, and error probabilities below 0.0001 make sense only when two or more high quality bases are aligned. >> We gave up on trying to guess the quality score standard and require >> users to say whether they are sanger, solexa (1.0) or Illumina (1.3) >> format files. If we only want the sequence then we don't care so we allow >> "fastq" as a sequence format and ignore the quality scores in that case. > > What format names have you used? Ideally we'd have the same names > in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and > "fastq-illumina"). We don't normally use '-' in our format names so we have fastqsanger, fastqsolexa, fastqillumina and fastqint. None of these have been tried on users as yet. The '-' names look nice though. We can consider introducing them. Do you have a full list of format names (sequence, feature, alignment, etc.) we can try to conform to? >> We also allow the integer quality score format ... is anyone still using >> that (it looks horrible to me :-) > > Do you mean the QUAL file format holding PHRED scores? Roche provide > tools to turn their SFF files into FASTA and QUAL files, so they are still used. Probably ... unless there is a Solexa version too. regards, Peter From rmb32 at cornell.edu Tue Jun 23 10:28:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 23 Jun 2009 07:28:08 -0700 Subject: [Bioperl-l] Anyone at YAPC? In-Reply-To: References: <19007.33948.411442.197063@already.dhcp.gene.com> Message-ID: <4A40E678.8010709@cornell.edu> Yep, YAPC is great! This is my first one. I saw a guy walking around here with a nametag that I thought said "Mark Jensen". MAJ, are you here? Rob Chris Fields wrote: > I think some of the regular #bioperl folk are there (Jay Hannah, R. > Buels, etc). May be worth going on IRC to find everyone. > > I'm giving serious thought to going next year if I can get enough work > done towards a perl6 or Moose-based bioperl. > > chris > > On Jun 22, 2009, at 8:18 AM, George Hartzell wrote: > >> >> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks. >> >> g. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Tue Jun 23 11:54:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 23 Jun 2009 11:54:24 -0400 Subject: [Bioperl-l] Anyone at YAPC? In-Reply-To: <4A40E678.8010709@cornell.edu> References: <19007.33948.411442.197063@already.dhcp.gene.com> <4A40E678.8010709@cornell.edu> Message-ID: I think there are about 75000 of us; that one ain't me, I'm afraid. Maybe next year! cheers MAJ ----- Original Message ----- From: "Robert Buels" To: "bioperl-l List" Sent: Tuesday, June 23, 2009 10:28 AM Subject: Re: [Bioperl-l] Anyone at YAPC? > Yep, YAPC is great! This is my first one. I saw a guy walking around here > with a nametag that I thought said "Mark Jensen". MAJ, are you here? > > Rob > > Chris Fields wrote: >> I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, >> etc). May be worth going on IRC to find everyone. >> >> I'm giving serious thought to going next year if I can get enough work done >> towards a perl6 or Moose-based bioperl. >> >> chris >> >> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote: >> >>> >>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks. >>> >>> g. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Jun 23 16:34:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Jun 2009 15:34:48 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A40C909.40803@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> Message-ID: <21116F70-93A3-4539-9BE2-61C838BA730E@illinois.edu> On Jun 23, 2009, at 7:22 AM, Peter Rice wrote: > Peter wrote: > ... >>> Parsing was easy - find the '@' line, read sequence until the '+' >>> line >>> is reached, then read (seqlen) quality characters ... and check >>> the next >>> line starts with '@' >> >> That is basically what I did for Biopython. This is now what bioperl will do (at least when I commit changes today or tomorrow). > ... >>> We gave up on trying to guess the quality score standard and require >>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3) >>> format files. If we only want the sequence then we don't care so >>> we allow >>> "fastq" as a sequence format and ignore the quality scores in that >>> case. >> >> What format names have you used? Ideally we'd have the same names >> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and >> "fastq-illumina"). > > We don't normally use '-' in our format names so we have fastqsanger, > fastqsolexa, fastqillumina and fastqint. None of these have been tried > on users as yet. > > The '-' names look nice though. We can consider introducing them. Do > you > have a full list of format names (sequence, feature, alignment, > etc.) we > can try to conform to? We (bioperl) are using biopython's convention of format-variant, or at least that's how I'm coding it up. With SeqIO it's fairly easy to check for the format variant prior to loading the class and pass it in as a second named parameter. I have actually thought of adding in fastqint as an option (it would be fairly easy to do). chris From cjfields at illinois.edu Tue Jun 23 17:04:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Jun 2009 16:04:25 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> Message-ID: <49A4AD93-69FB-406E-8FFB-99C74A457402@illinois.edu> Just so we're on the same page data-wise, would there be a common set of fastq data files to use for tests? I am using some from SRA (which is all converted to Sanger). Just need a few small ones for older solexa and newer illumina. chris On Jun 23, 2009, at 6:29 AM, Peter wrote: > On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice wrote: > >> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July >> 15th. >> >> Any further tips would be very useful. > > Great. See you at BOSC 2009! > > Peter > (Biopython) From biopython at maubp.freeserve.co.uk Tue Jun 23 17:39:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Jun 2009 22:39:48 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A40C909.40803@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> Message-ID: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice wrote: > Peter wrote: >> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice wrote: >>> We just added FASTQ parsing to EMBOSS and faced the same issues. >>> >> >> I was going to chat to you about this at BOSC, and suggest this be >> added to EMBOSS - but you are well ahead of me ;) > > Not that well ahead really ... someone asked for it in our BoF at > BOSC/ISMB last year so we thought we'd better get it done before this > one. it was implemented a couple of days ago :-) > Well, ahead of my asking! >>> Quality scores are kept as phred values. Phred of 0 means unknown, >>> which in Solexa is -5 (0.75 error rate = could be anything). >> >> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't >> quite follow your leap that this corresponds to a Solexa quality of -5. Could >> you clarify? > > Phred score is -10 log(p) where p is the probability of error. A phred > of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate > (3/4 chance that any base you pick is wrong). > > Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is > why Solexa scores can go down to -5 in their fastq format. > >>> We assume lower quality scores are from alignments rather than >>> single reads. >> >> Did you mean to say "higher quality scores" (i.e. lower probability of error), >> e.g a PHRED score of 80 which you can get from MAQ doing read mapping >> or something consensus based. > > Actually I mean both. Error probabilities below 0.75 for a single base > are silly, and error probabilities below 0.0001 make sense only when two > or more high quality bases are aligned. I see what you mean - a probability of error of 0.75 matches that for a random base call, obvious when you put it like that. Of course, there is this nasty little thought at the back of my mind that sooner or later someone will use FASTQ files for proteins (e.g. from some mass-spec protein sequencing). A probability less than that (e.g. 0) is actually worse than random and could be considered as mean "we're pretty sure this isn't the stated letter". But that would be silly, as you say. >>> We gave up on trying to guess the quality score standard and require >>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3) >>> format files. If we only want the sequence then we don't care so we allow >>> "fastq" as a sequence format and ignore the quality scores in that case. >> >> What format names have you used? Ideally we'd have the same names >> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and >> "fastq-illumina"). > > We don't normally use '-' in our format names so we have fastqsanger, > fastqsolexa, fastqillumina and fastqint. None of these have been tried > on users as yet. > > The '-' names look nice though. We can consider introducing them. Do you > have a full list of format names (sequence, feature, alignment, etc.) we > can try to conform to? See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO Getting EMBOSS to conforming should be trivial - in general when picking a format name for Biopython's SeqIO or AlignIO (and we have avoided multiple aliases with one exception) we have tried to use anything shared by BioPerl and EMBOSS. The FASTQ variants are unusual in that Biopython got to invent some names. In future where would be a good place to discuss these kinds of cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc). >>> We also allow the integer quality score format ... is anyone still >>> using that (it looks horrible to me :-) >> >> Do you mean the QUAL file format holding PHRED scores? >> Roche provide tools to turn their SFF files into FASTA and >> QUAL files, so they are still used. > > Probably ... unless there is a Solexa version too. We may be talking at cross purposes here, this is QUAL format: http://www.bioperl.org/wiki/Qual_sequence_format Peter From pmr at ebi.ac.uk Wed Jun 24 07:48:23 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Jun 2009 12:48:23 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> Message-ID: <4A421287.4000203@ebi.ac.uk> Peter wrote: > On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice wrote: >> The '-' names look nice though. We can consider introducing them. Do you >> have a full list of format names (sequence, feature, alignment, etc.) we >> can try to conform to? > > See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO Thanks. I'll take a look at those. > Getting EMBOSS to conforming should be trivial - in general when > picking a format name for Biopython's SeqIO or AlignIO (and we > have avoided multiple aliases with one exception) we have tried to > use anything shared by BioPerl and EMBOSS. The FASTQ variants > are unusual in that Biopython got to invent some names. > > In future where would be a good place to discuss these kinds of > cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc). I was planning to suggest a get-together at BOSC in Stockholm so we can identify common cross-platform issues. I'm sure there are many ways we can conform with naming and interfaces and perhaps even share code. >>>> We also allow the integer quality score format ... is anyone still >>>> using that (it looks horrible to me :-) >>> Do you mean the QUAL file format holding PHRED scores? >>> Roche provide tools to turn their SFF files into FASTA and >>> QUAL files, so they are still used. >> Probably ... unless there is a Solexa version too. > > We may be talking at cross purposes here, this is QUAL format: > http://www.bioperl.org/wiki/Qual_sequence_format Yes that is different. We'll worry about separate QUAL files later (we already find separate GFF files a pain for features) and still with the "fastqint" format name. regards, Peter From biopython at maubp.freeserve.co.uk Wed Jun 24 10:56:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Jun 2009 15:56:13 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A421287.4000203@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> Message-ID: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice wrote: > > I was planning to suggest a get-together at BOSC in Stockholm so we can > identify common cross-platform issues. I'm sure there are many ways we > can conform with naming and interfaces and perhaps even share code. > That would be a good idea - but while there are quite a few Biopython people at BOSC this year, I don't know if there will be many from BioPerl (there isn't a BioPerl update talk scheduled). >>>>> We also allow the integer quality score format ... is anyone still >>>>> using that (it looks horrible to me :-) >>>> Do you mean the QUAL file format holding PHRED scores? >>>> Roche provide tools to turn their SFF files into FASTA and >>>> QUAL files, so they are still used. >>> Probably ... unless there is a Solexa version too. >> >> We may be talking at cross purposes here, this is QUAL format: >> http://www.bioperl.org/wiki/Qual_sequence_format > > Yes that is different. We'll worry about separate QUAL files later (we > already find separate GFF files a pain for features) and still with the > "fastqint" format name. So when you say "fastqint" are you talking about something else? Could you show us an example record in this format? Peter [I need to remember to proof read my evening emails more carefully] From vecchi.b at gmail.com Wed Jun 24 12:13:02 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Wed, 24 Jun 2009 13:13:02 -0300 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> Message-ID: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> Jay asked me to forward this to the list, since he sometimes has problems getting his mails delivered. Feel free to suggest topics for the bioperl hackathon to take place tomorrow and on friday! Bruno. From: Jay Hannah Date: June 24, 2009 11:55:42 AM EDT To: Bioperl Subject: Hackathon tomorrow (I think) Hola, So a few of us here at YAPC might try to be productive tomorrow (and Friday?). I don't know if we have any commit bits attending. Feel free to suggest things: http://yapc10.org/yn2009/wiki?node=BioPerl Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla. Come yell at me (us?) in IRC: http://www.bioperl.org/wiki/Irc Thanks, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at illinois.edu Wed Jun 24 12:22:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 11:22:57 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> Message-ID: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> On Jun 24, 2009, at 9:56 AM, Peter wrote: > On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice wrote: >> >> I was planning to suggest a get-together at BOSC in Stockholm so we >> can >> identify common cross-platform issues. I'm sure there are many ways >> we >> can conform with naming and interfaces and perhaps even share code. >> > > That would be a good idea - but while there are quite a few Biopython > people at BOSC this year, I don't know if there will be many from > BioPerl > (there isn't a BioPerl update talk scheduled). Most of us are caught up with other work, though I will likely be able to dedicate more time to it in the ext few months. Also doesn't help that my travel stipend doesn't start until Aug. 1. >>>>>> We also allow the integer quality score format ... is anyone >>>>>> still >>>>>> using that (it looks horrible to me :-) >>>>> Do you mean the QUAL file format holding PHRED scores? >>>>> Roche provide tools to turn their SFF files into FASTA and >>>>> QUAL files, so they are still used. >>>> Probably ... unless there is a Solexa version too. >>> >>> We may be talking at cross purposes here, this is QUAL format: >>> http://www.bioperl.org/wiki/Qual_sequence_format >> >> Yes that is different. We'll worry about separate QUAL files later >> (we >> already find separate GFF files a pain for features) and still with >> the >> "fastqint" format name. > > So when you say "fastqint" are you talking about something else? > Could you show us an example record in this format? > > Peter > [I need to remember to proof read my evening emails more carefully] The same as fastq, except the ASCII quality is converted to actual score: @4_1_912_360 AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC +4_1_912_360 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40 40 40 40 40 26 40 40 14 39 40 40 @4_1_54_483 TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT +4_1_54_483 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40 40 40 40 40 40 16 40 40 5 40 40 chris From cjfields at illinois.edu Wed Jun 24 12:26:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 11:26:22 -0500 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> Message-ID: 1) Any help towards bugzilla fixes would be most welcome. 2) Better GFF3 integration 3) Typed but lightweight seqfeatures 4) Bio::Moose? I can dedicate more time to the latter two in about a month, but I'll be tied up until then. Let me know if anyone needs collab on biomoose on github; Mark Jensen's already added. chris On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote: > Jay asked me to forward this to the list, since he sometimes has > problems > getting his mails delivered. > Feel free to suggest topics for the bioperl hackathon to take place > tomorrow > and on friday! > > Bruno. > > > From: Jay Hannah > Date: June 24, 2009 11:55:42 AM EDT > To: Bioperl > Subject: Hackathon tomorrow (I think) > > Hola, > > So a few of us here at YAPC might try to be productive tomorrow (and > Friday?). > > I don't know if we have any commit bits attending. > > Feel free to suggest things: > > http://yapc10.org/yn2009/wiki?node=BioPerl > > Or point me to list(s) of things. Perhaps we'll try to help out in > Bugzilla. > > Come yell at me (us?) in IRC: > > http://www.bioperl.org/wiki/Irc > > Thanks, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Jun 24 12:27:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Jun 2009 17:27:39 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> Message-ID: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields wrote: >> So when you say "fastqint" are you talking about something else? >> Could you show us an example record in this format? >> >> Peter > > The same as fastq, except the ASCII quality is converted to actual score: > > @4_1_912_360 > AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC > +4_1_912_360 > 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40 > 40 40 40 40 26 40 40 14 39 40 40 > @4_1_54_483 > TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT > +4_1_54_483 > 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40 > 40 40 40 40 40 16 40 40 5 40 40 OK - and who uses this "Integer FASTQ" files? Peter From vecchi.b at gmail.com Wed Jun 24 12:40:50 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Wed, 24 Jun 2009 13:40:50 -0300 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com> Message-ID: <1a0c1b750906240940t7c0003f9hf10eb30c0d85a5ce@mail.gmail.com> > > Is there a todo list for biomoose? I'd be glad to hack in, but I'm afraid > to step into someone else's work or to do things without general agreement. > It would be nice to have directions for small sized chunks of work to do. > In any case, count me in! > > 2009/6/24 Chris Fields > > 1) Any help towards bugzilla fixes would be most welcome. >> 2) Better GFF3 integration >> 3) Typed but lightweight seqfeatures >> 4) Bio::Moose? >> >> I can dedicate more time to the latter two in about a month, but I'll be >> tied up until then. Let me know if anyone needs collab on biomoose on >> github; Mark Jensen's already added. >> >> chris >> >> >> On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote: >> >> Jay asked me to forward this to the list, since he sometimes has problems >>> getting his mails delivered. >>> Feel free to suggest topics for the bioperl hackathon to take place >>> tomorrow >>> and on friday! >>> >>> Bruno. >>> >>> >>> From: Jay Hannah >>> Date: June 24, 2009 11:55:42 AM EDT >>> To: Bioperl >>> Subject: Hackathon tomorrow (I think) >>> >>> Hola, >>> >>> So a few of us here at YAPC might try to be productive tomorrow (and >>> Friday?). >>> >>> I don't know if we have any commit bits attending. >>> >>> Feel free to suggest things: >>> >>> http://yapc10.org/yn2009/wiki?node=BioPerl >>> >>> Or point me to list(s) of things. Perhaps we'll try to help out in >>> Bugzilla. >>> >>> Come yell at me (us?) in IRC: >>> >>> http://www.bioperl.org/wiki/Irc >>> >>> Thanks, >>> >>> Jay Hannah >>> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From jay at jays.net Wed Jun 24 12:44:51 2009 From: jay at jays.net (Jay Hannah) Date: Wed, 24 Jun 2009 12:44:51 -0400 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> Message-ID: On Jun 24, 2009, at 12:26 PM, Chris Fields wrote: > Let me know if anyone needs collab on biomoose on github; Mark > Jensen's already added. Anything on github should be trivial, even with no perms -- we can just fork and then send you (whoever) pull requests. github++ :) > 1) Any help towards bugzilla fixes would be most welcome. I don't know how to make any progress in bugzilla if no one has a commit bit...? > 2) Better GFF3 integration > 3) Typed but lightweight seqfeatures Are there bugzilla tickets (or somewhere) describing those? I wonder if anyone can help me get out of sporadic MailMan purgatory... Thanks, j From cjfields at illinois.edu Wed Jun 24 12:54:06 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 11:54:06 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> Message-ID: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu> On Jun 24, 2009, at 11:27 AM, Peter wrote: > On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields > wrote: >>> So when you say "fastqint" are you talking about something else? >>> Could you show us an example record in this format? >>> >>> Peter >> >> The same as fastq, except the ASCII quality is converted to actual >> score: >> >> @4_1_912_360 >> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC >> +4_1_912_360 >> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 >> 40 40 40 >> 40 40 40 40 26 40 40 14 39 40 40 >> @4_1_54_483 >> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT >> +4_1_54_483 >> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 >> 40 28 40 >> 40 40 40 40 40 16 40 40 5 40 40 > > OK - and who uses this "Integer FASTQ" files? > > Peter Not sure, but it is covered by MAQ via the conversion script (as FASTQ- int): http://maq.sourceforge.net/fq_all2std.pl chris From jay at jays.net Wed Jun 24 11:55:42 2009 From: jay at jays.net (Jay Hannah) Date: Wed, 24 Jun 2009 11:55:42 -0400 Subject: [Bioperl-l] Hackathon tomorrow (I think) Message-ID: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> Hola, So a few of us here at YAPC might try to be productive tomorrow (and Friday?). I don't know if we have any commit bits attending. Feel free to suggest things: http://yapc10.org/yn2009/wiki?node=BioPerl Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla. Come yell at me (us?) in IRC: http://www.bioperl.org/wiki/Irc Thanks, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From bernd.web at gmail.com Wed Jun 24 13:11:51 2009 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 24 Jun 2009 19:11:51 +0200 Subject: [Bioperl-l] Bioperl_scripts Message-ID: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com> Hi, The bioperl scripts section at http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short examples. However, it quite a number of scripts cannot be found anymore and return errors: For example for the first link (scripts/install_bioperl_scripts.pl) Filesystem has no item: File not found: revision 15800, path '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245 Also all scripts in the Bio::Graphics section cannot be found. Is the http://www.bioperl.org/wiki/Bioperl_scripts page still supported? Regards, Bernd From cjfields at illinois.edu Wed Jun 24 16:57:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 15:57:51 -0500 Subject: [Bioperl-l] Bioperl_scripts In-Reply-To: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com> References: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com> Message-ID: <5AF99205-F977-45A1-B4AF-C3858A5727FD@illinois.edu> On Jun 24, 2009, at 12:11 PM, Bernd Web wrote: > Hi, > > The bioperl scripts section at > http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short > examples. > However, it quite a number of scripts cannot be found anymore and > return errors: > > For example for the first link (scripts/install_bioperl_scripts.pl) > Filesystem has no item: File not found: revision 15800, path > '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at > /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245 > > Also all scripts in the Bio::Graphics section cannot be found. > Is the http://www.bioperl.org/wiki/Bioperl_scripts page still > supported? > > Regards, > Bernd Re: Bio::Graphics, all modules and related scripts have been moved to a separate repo and CPAN release (latest): http://search.cpan.org/~lds/Bio-Graphics-1.96/ Beyond that I would consider all scripts and the wiki page supported. It's best to file this to bugzilla as a documentation issue so we fix it and don't about forget it amongst the flurry of email. chris From cjfields at illinois.edu Wed Jun 24 17:10:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 16:10:34 -0500 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> Message-ID: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote: > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote: >> Let me know if anyone needs collab on biomoose on github; Mark >> Jensen's already added. > > Anything on github should be trivial, even with no perms -- we can > just fork and then send you (whoever) pull requests. github++ :) > >> 1) Any help towards bugzilla fixes would be most welcome. > > I don't know how to make any progress in bugzilla if no one has a > commit bit...? For some reason I thought you had a commit bit; we can add you in if needed. Anyway, patches are most definitely welcome ;> >> 2) Better GFF3 integration >> 3) Typed but lightweight seqfeatures > > Are there bugzilla tickets (or somewhere) describing those? No as the issues are more complex than one single bug, but we do have something to help track for the time being: http://www.bioperl.org/wiki/GFF_Refactor http://www.bioperl.org/wiki/Align_Refactor I'll probably file TODOs during the process for those refactors. The easiest to tackle would be probably be Align/LocatableSeq refactors. > I wonder if anyone can help me get out of sporadic MailMan > purgatory... > > Thanks, > > j -c PS - Don't feel constrained by the above. There are many many areas to contribute to. From pmr at ebi.ac.uk Wed Jun 24 18:44:33 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Jun 2009 23:44:33 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu> Message-ID: <4A42AC51.3090809@ebi.ac.uk> Chris Fields wrote: > Not sure, but it is covered by MAQ via the conversion script (as > FASTQ-int): Are the scores phred or Solexa? Peter Rice From adlai at refenestration.com Wed Jun 24 22:08:31 2009 From: adlai at refenestration.com (Adlai Burman) Date: Thu, 25 Jun 2009 04:08:31 +0200 Subject: [Bioperl-l] Extreme newbie question. Message-ID: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> I have been trying to install BioPerl for a while now and after pummeling my hard drive (Mac OS 10.5 intel) with several attempts at Fink installation, a >cpan installation and removing my .cpan folder I am still at square 0. I do not want to do anymore damage to my computer, yet I really need a working install (especially to interface with remote DBs like GenBank. Can anyone give me some advice here? After each attempt, I have tried to run perldoc bptutorial.pl and tried test scripts with "use Bio::Perl" in the headers and I just receive error mesages like the following: Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level /Library/ Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/ Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / Library/Perl/5.8.1 .) at trsh.pl line 1. I have been working from the OReilly book astering Perl for Bioinformatics and the INSTALL file and have scoured around the BioPerl website and am still stuck. Thanks in advance, Adlai From kpclancy at hotmail.com Wed Jun 24 22:31:17 2009 From: kpclancy at hotmail.com (Kevin Clancy) Date: Wed, 24 Jun 2009 20:31:17 -0600 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> Message-ID: is there an intention to have a hackathon at ISMB this weekend - I know there is a 2 day BOSC kevin > From: cjfields at illinois.edu > To: jay at jays.net > Date: Wed, 24 Jun 2009 16:10:34 -0500 > CC: vecchi.b at gmail.com; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think) > > > On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote: > > > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote: > >> Let me know if anyone needs collab on biomoose on github; Mark > >> Jensen's already added. > > > > Anything on github should be trivial, even with no perms -- we can > > just fork and then send you (whoever) pull requests. github++ :) > > > >> 1) Any help towards bugzilla fixes would be most welcome. > > > > I don't know how to make any progress in bugzilla if no one has a > > commit bit...? > > For some reason I thought you had a commit bit; we can add you in if > needed. Anyway, patches are most definitely welcome ;> > > >> 2) Better GFF3 integration > >> 3) Typed but lightweight seqfeatures > > > > Are there bugzilla tickets (or somewhere) describing those? > > No as the issues are more complex than one single bug, but we do have > something to help track for the time being: > > http://www.bioperl.org/wiki/GFF_Refactor > http://www.bioperl.org/wiki/Align_Refactor > > I'll probably file TODOs during the process for those refactors. The > easiest to tackle would be probably be Align/LocatableSeq refactors. > > > I wonder if anyone can help me get out of sporadic MailMan > > purgatory... > > > > Thanks, > > > > j > > -c > > PS - Don't feel constrained by the above. There are many many areas > to contribute to. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 24 23:54:28 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Jun 2009 22:54:28 -0500 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> Message-ID: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu> I have no idea; I don't think there are many bioperl devs attending this year unfortunately. Any meetings in the next year where we could set up a bioperl hackathon? I will likely be available to attend if it's stateside... chris On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote: > > is there an intention to have a hackathon at ISMB this weekend - I > know there is a 2 day BOSC > kevin > >> From: cjfields at illinois.edu >> To: jay at jays.net >> Date: Wed, 24 Jun 2009 16:10:34 -0500 >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think) >> >> >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote: >> >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote: >>>> Let me know if anyone needs collab on biomoose on github; Mark >>>> Jensen's already added. >>> >>> Anything on github should be trivial, even with no perms -- we can >>> just fork and then send you (whoever) pull requests. github++ :) >>> >>>> 1) Any help towards bugzilla fixes would be most welcome. >>> >>> I don't know how to make any progress in bugzilla if no one has a >>> commit bit...? >> >> For some reason I thought you had a commit bit; we can add you in if >> needed. Anyway, patches are most definitely welcome ;> >> >>>> 2) Better GFF3 integration >>>> 3) Typed but lightweight seqfeatures >>> >>> Are there bugzilla tickets (or somewhere) describing those? >> >> No as the issues are more complex than one single bug, but we do have >> something to help track for the time being: >> >> http://www.bioperl.org/wiki/GFF_Refactor >> http://www.bioperl.org/wiki/Align_Refactor >> >> I'll probably file TODOs during the process for those refactors. The >> easiest to tackle would be probably be Align/LocatableSeq refactors. >> >>> I wonder if anyone can help me get out of sporadic MailMan >>> purgatory... >>> >>> Thanks, >>> >>> j >> >> -c >> >> PS - Don't feel constrained by the above. There are many many areas >> to contribute to. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jun 25 10:00:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 25 Jun 2009 09:00:47 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A42AC51.3090809@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu> <4A42AC51.3090809@ebi.ac.uk> Message-ID: On Jun 24, 2009, at 5:44 PM, Peter Rice wrote: > Chris Fields wrote: >> Not sure, but it is covered by MAQ via the conversion script (as >> FASTQ-int): > > Are the scores phred or Solexa? > > Peter Rice Not sure actually. The perl script I linked to looks like it converts using the same scale as solexa (illumina 1.0). chris From chmille4 at gmail.com Thu Jun 25 10:46:26 2009 From: chmille4 at gmail.com (Chase Miller) Date: Thu, 25 Jun 2009 10:46:26 -0400 Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature Message-ID: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> Hi all, Quick question I came across while writing the Bio::Nexml module. I'm trying to link taxon data to a Bio::LocatableSeq object inside a Bio::SimpleAlign object. Bio::SimpleAlign has the ability to add SeqFeatures, but according to this HowTo ( http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is considered to refer to a portion of a sequence, whereas something like taxon data would refer to the entire sequence and should be handled as an annotation. However, as far as I can tell Bio::LocatableSeq does not support annotation objects. What would be the best way to relate taxon data to a single sequence inside an alignment? Thanks, Chase From Kevin.M.Brown at asu.edu Thu Jun 25 11:21:02 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 25 Jun 2009 08:21:02 -0700 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> Message-ID: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink That error suggests that the install fails and you need to figure out why from the install error messages. I suspect you aren't doing the install as root, but as a normal user who lacks the needed permissions to change files in certain directories. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Adlai Burman > Sent: Wednesday, June 24, 2009 7:09 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Extreme newbie question. > > I have been trying to install BioPerl for a while now and after > pummeling my hard drive (Mac OS 10.5 intel) with several attempts at > Fink installation, a >cpan installation and removing my .cpan > folder I > am still at square 0. I do not want to do anymore damage to my > computer, yet I really need a working install (especially to > interface > with remote DBs like GenBank. Can anyone give me some advice here? > After each attempt, I have tried to run perldoc bptutorial.pl and > tried test scripts with "use Bio::Perl" in the headers and I just > receive error mesages like the following: > > Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ > perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level > /Library/ > Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- > multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- > thread-multi-2level /Library/Perl/5.8.8 /Library/Perl > /Network/Library/ > Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / > Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- > multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / > Library/Perl/5.8.1 .) at trsh.pl line 1. > > I have been working from the OReilly book astering Perl for > Bioinformatics and the INSTALL file and have scoured around the > BioPerl website and am still stuck. > > Thanks in advance, > > Adlai > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu Jun 25 12:39:22 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 25 Jun 2009 18:39:22 +0200 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> Message-ID: <628aabb70906250939l7d1116d0sec9efa2c16235c75@mail.gmail.com> Hi Adlai, Did the Bioperl tests run successfully? Did you get the impression that the installation was successful? If not, what are the errors you see during the install process? I ask because the error you included in your message is not necessarily indicative of a failed installation (it could just be a path issue). By the way, as I think is indicated somewhere in the installation instructions, you don't actually need to install Bioperl to use most of its functionality. Simply having the Bio/ directory in your PERL5LIB path is enough. Dave From cjfields at illinois.edu Thu Jun 25 13:02:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 25 Jun 2009 12:02:48 -0500 Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature In-Reply-To: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> Message-ID: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu> On Jun 25, 2009, at 9:46 AM, Chase Miller wrote: > Hi all, > > Quick question I came across while writing the Bio::Nexml module. > > I'm trying to link taxon data to a Bio::LocatableSeq object inside a > Bio::SimpleAlign object. Bio::SimpleAlign has the ability to add > SeqFeatures, but according to this HowTo ( > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is > considered to refer to a portion of a sequence, whereas something > like taxon > data would refer to the entire sequence and should be handled as an > annotation. However, as far as I can tell Bio::LocatableSeq does not > support > annotation objects. > What would be the best way to relate taxon data to a single sequence > inside > an alignment? > > Thanks, > Chase From working with feature/annotation-rich alignment formats such as stockholm I found this is one of the areas for Align that needs some rethinking. One way to work around this w/o major refactoring is to have a full-length SeqFeature (pointing to the proper LocatableSeq) that stores the Bio::Annotation. I don't necessarily like that approach as a long-term solution, though, as it's a little hacky and indirect, but it might get you started (just mark it as TODO so we can catch it at some point). For a long-term solution I don't think the answer is as simple as making LocatableSeq Bio::AnnotatableI; that would not be congruent with the PrimarySeq implementation (which is not AnnotatableI). LocatableSeq is supposed to represent a simple PrimarySeq that can be mapped to other sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq (note lack of 'I') and RangeI. Three options: 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI. 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the PrimarySeq AnnotationCollection). 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g. features. I personally think option #2 is easiest, as this means anything that is-a PrimarySeq is also AnnotatableI, and it might not break past scripts. Not sure how this would affect overall performance though. chris From me at miguel.weapps.com Thu Jun 25 10:09:29 2009 From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=) Date: Thu, 25 Jun 2009 16:09:29 +0200 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu> Message-ID: <94da4c880906250709j7b2cb78dk77710bd43e20fd42@mail.gmail.com> Dear all, Is there a way to run muscle silently via Bio::Tools::Run::Alignment::Muscle? Cheers, -- Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 lmrodriguezr at gmail.com me at miguel.weapps.com From chmille4 at gmail.com Thu Jun 25 13:57:25 2009 From: chmille4 at gmail.com (Chase Miller) Date: Thu, 25 Jun 2009 13:57:25 -0400 Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature In-Reply-To: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu> References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu> Message-ID: <991fb8210906251057i25bbe511r84f5d1319f191421@mail.gmail.com> Ok, I'll use the full length SeqFeature for now and mark it with a TODO. Thanks for the help. Chase On Thu, Jun 25, 2009 at 1:02 PM, Chris Fields wrote: > On Jun 25, 2009, at 9:46 AM, Chase Miller wrote: > > Hi all, >> >> Quick question I came across while writing the Bio::Nexml module. >> >> I'm trying to link taxon data to a Bio::LocatableSeq object inside a >> Bio::SimpleAlign object. Bio::SimpleAlign has the ability to add >> SeqFeatures, but according to this HowTo ( >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is >> considered to refer to a portion of a sequence, whereas something like >> taxon >> data would refer to the entire sequence and should be handled as an >> annotation. However, as far as I can tell Bio::LocatableSeq does not >> support >> annotation objects. >> What would be the best way to relate taxon data to a single sequence >> inside >> an alignment? >> >> Thanks, >> Chase >> > > From working with feature/annotation-rich alignment formats such as > stockholm I found this is one of the areas for Align that needs some > rethinking. One way to work around this w/o major refactoring is to have a > full-length SeqFeature (pointing to the proper LocatableSeq) that stores the > Bio::Annotation. I don't necessarily like that approach as a long-term > solution, though, as it's a little hacky and indirect, but it might get you > started (just mark it as TODO so we can catch it at some point). > > For a long-term solution I don't think the answer is as simple as making > LocatableSeq Bio::AnnotatableI; that would not be congruent with the > PrimarySeq implementation (which is not AnnotatableI). LocatableSeq is > supposed to represent a simple PrimarySeq that can be mapped to other > sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq > (note lack of 'I') and RangeI. > > Three options: > 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and > Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI. > 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the > PrimarySeq AnnotationCollection). > 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g. > features. > > I personally think option #2 is easiest, as this means anything that is-a > PrimarySeq is also AnnotatableI, and it might not break past scripts. Not > sure how this would affect overall performance though. > > chris > From Kevin.M.Brown at asu.edu Thu Jun 25 14:54:19 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 25 Jun 2009 11:54:19 -0700 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com> References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com> Message-ID: <1A4207F8295607498283FE9E93B775B4060BA08F@EX02.asurite.ad.asu.edu> Please keep your replies on the list. > -----Original Message----- > From: Adlai Burman [mailto:adlai at refenestration.com] > Sent: Thursday, June 25, 2009 11:39 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] Extreme newbie question. > > Thanks, Kevin. > I did install everything using sudo. I will try again and pay > attention to the error log. I hope I did not introduce any conflicts > or weird path problems. > > Adlai > On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote: > > > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink > > > > That error suggests that the install fails and you need to > figure out > > why from the install error messages. I suspect you aren't doing the > > install as root, but as a normal user who lacks the needed > permissions > > to change files in certain directories. > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >> Adlai Burman > >> Sent: Wednesday, June 24, 2009 7:09 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Extreme newbie question. > >> > >> I have been trying to install BioPerl for a while now and after > >> pummeling my hard drive (Mac OS 10.5 intel) with several > attempts at > >> Fink installation, a >cpan installation and removing my .cpan > >> folder I > >> am still at square 0. I do not want to do anymore damage to my > >> computer, yet I really need a working install (especially to > >> interface > >> with remote DBs like GenBank. Can anyone give me some advice here? > >> After each attempt, I have tried to run perldoc bptutorial.pl and > >> tried test scripts with "use Bio::Perl" in the headers and I just > >> receive error mesages like the following: > >> > >> Can't locate Bio/Perl.pm in @INC (@INC contains: > /home/users/dag/lib/ > >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level > >> /Library/ > >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- > >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- > >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl > >> /Network/Library/ > >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / > >> Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread- > >> multi-2level /System/Library/Perl/Extras/5.8.8 > /Library/Perl/5.8.6 / > >> Library/Perl/5.8.1 .) at trsh.pl line 1. > >> > >> I have been working from the OReilly book astering Perl for > >> Bioinformatics and the INSTALL file and have scoured around the > >> BioPerl website and am still stuck. > >> > >> Thanks in advance, > >> > >> Adlai > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > From adlai at refenestration.com Thu Jun 25 14:59:10 2009 From: adlai at refenestration.com (Adlai Burman) Date: Thu, 25 Jun 2009 20:59:10 +0200 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> Message-ID: Hey again, I'm right into trying to install again and I now get a new error: Client not fully configured, please proceed with configuring. o conf init urllist any ideas? Adlai On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote: > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink > > That error suggests that the install fails and you need to figure out > why from the install error messages. I suspect you aren't doing the > install as root, but as a normal user who lacks the needed permissions > to change files in certain directories. > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Adlai Burman >> Sent: Wednesday, June 24, 2009 7:09 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Extreme newbie question. >> >> I have been trying to install BioPerl for a while now and after >> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at >> Fink installation, a >cpan installation and removing my .cpan >> folder I >> am still at square 0. I do not want to do anymore damage to my >> computer, yet I really need a working install (especially to >> interface >> with remote DBs like GenBank. Can anyone give me some advice here? >> After each attempt, I have tried to run perldoc bptutorial.pl and >> tried test scripts with "use Bio::Perl" in the headers and I just >> receive error mesages like the following: >> >> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level >> /Library/ >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl >> /Network/Library/ >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / >> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- >> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / >> Library/Perl/5.8.1 .) at trsh.pl line 1. >> >> I have been working from the OReilly book astering Perl for >> Bioinformatics and the INSTALL file and have scoured around the >> BioPerl website and am still stuck. >> >> Thanks in advance, >> >> Adlai >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Thu Jun 25 16:07:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 25 Jun 2009 15:07:44 -0500 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> Message-ID: That would mean, within the cpan shell, type 'o conf init urllist' (again, requires sudo). chris On Jun 25, 2009, at 1:59 PM, Adlai Burman wrote: > Hey again, I'm right into trying to install again and I now get a > new error: > > Client not fully configured, please proceed with configuring. > o conf init urllist > > any ideas? > > Adlai > > On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote: > >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix >> >> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink >> >> That error suggests that the install fails and you need to figure out >> why from the install error messages. I suspect you aren't doing the >> install as root, but as a normal user who lacks the needed >> permissions >> to change files in certain directories. >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Adlai Burman >>> Sent: Wednesday, June 24, 2009 7:09 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Extreme newbie question. >>> >>> I have been trying to install BioPerl for a while now and after >>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at >>> Fink installation, a >cpan installation and removing my .cpan >>> folder I >>> am still at square 0. I do not want to do anymore damage to my >>> computer, yet I really need a working install (especially to >>> interface >>> with remote DBs like GenBank. Can anyone give me some advice here? >>> After each attempt, I have tried to run perldoc bptutorial.pl and >>> tried test scripts with "use Bio::Perl" in the headers and I just >>> receive error mesages like the following: >>> >>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/ >>> lib/ >>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level >>> /Library/ >>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- >>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- >>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl >>> /Network/Library/ >>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin- >>> thread- >>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / >>> Library/Perl/5.8.1 .) at trsh.pl line 1. >>> >>> I have been working from the OReilly book astering Perl for >>> Bioinformatics and the INSTALL file and have scoured around the >>> BioPerl website and am still stuck. >>> >>> Thanks in advance, >>> >>> Adlai >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Thu Jun 25 16:19:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 25 Jun 2009 21:19:07 +0100 Subject: [Bioperl-l] Extreme newbie question. In-Reply-To: References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com> <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu> Message-ID: <4A43DBBB.2050109@sendu.me.uk> Adlai Burman wrote: > Hey again, I'm right into trying to install again and I now get a new > error: > > Client not fully configured, please proceed with configuring. > o conf init urllist Run cpan and do as it says. From cjm at berkeleybop.org Thu Jun 25 20:32:05 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Thu, 25 Jun 2009 17:32:05 -0700 Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF Message-ID: I've written a module Bio::FeatureIO::seqont_owl, which generates Sequence Ontology compliant RDF/OWL. This will allow for example loading of GFF into triplestores and inference using OWL reasoners. - It's experimental, fairly incomplete, and subject to change - Relies on an experimental extension of SO - Probably of interest to a minority of bp users - It's not yet fully documented (but there will be a paper) - It doesn't introduce any additional dependencies (all done via XML::Writer, which is already a dependency) - Doesn't otherwise impinge on existing code I'd like to get this under source control. Is the appropriate place for this: - HEAD - a branch - bioperl-dev - a separate repository ? Cheers Chris From maj at fortinbras.us Thu Jun 25 21:08:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 25 Jun 2009 21:08:43 -0400 Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF In-Reply-To: References: Message-ID: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife> This sounds very Dev to me. Also cool. MAJ ----- Original Message ----- From: "Chris Mungall" To: "BioPerl List" Sent: Thursday, June 25, 2009 8:32 PM Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF > > I've written a module Bio::FeatureIO::seqont_owl, which generates Sequence > Ontology compliant RDF/OWL. This will allow for example loading of GFF into > triplestores and inference using OWL reasoners. > > - It's experimental, fairly incomplete, and subject to change > - Relies on an experimental extension of SO > - Probably of interest to a minority of bp users > - It's not yet fully documented (but there will be a paper) > - It doesn't introduce any additional dependencies (all done via XML::Writer, > which is already a dependency) > - Doesn't otherwise impinge on existing code > > I'd like to get this under source control. Is the appropriate place for this: > > - HEAD > - a branch > - bioperl-dev > - a separate repository > > ? > > Cheers > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jun 25 21:35:06 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 25 Jun 2009 20:35:06 -0500 Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF In-Reply-To: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife> References: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife> Message-ID: <12F203C3-689B-423E-9691-86EB1D500A7D@illinois.edu> I agree. Just to note, FeatureIO (even though it's in core) will be operated on at some future point to be simplified (and likely will move away from Bio::SF::Annotated). chris On Jun 25, 2009, at 8:08 PM, Mark A. Jensen wrote: > This sounds very Dev to me. Also cool. > MAJ > ----- Original Message ----- From: "Chris Mungall" > > To: "BioPerl List" > Sent: Thursday, June 25, 2009 8:32 PM > Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF > > >> >> I've written a module Bio::FeatureIO::seqont_owl, which generates >> Sequence Ontology compliant RDF/OWL. This will allow for example >> loading of GFF into triplestores and inference using OWL reasoners. >> >> - It's experimental, fairly incomplete, and subject to change >> - Relies on an experimental extension of SO >> - Probably of interest to a minority of bp users >> - It's not yet fully documented (but there will be a paper) >> - It doesn't introduce any additional dependencies (all done via >> XML::Writer, which is already a dependency) >> - Doesn't otherwise impinge on existing code >> >> I'd like to get this under source control. Is the appropriate >> place for this: >> >> - HEAD >> - a branch >> - bioperl-dev >> - a separate repository >> >> ? >> >> Cheers >> Chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri Jun 26 00:27:55 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 25 Jun 2009 21:27:55 -0700 Subject: [Bioperl-l] BioPerl hackathon, hooray! Message-ID: <4A444E4B.2000808@cornell.edu> I'm pleased to announce a thoroughly climactic conclusion to the YAPC::NA 2009 BioPerl hackathon. Between Jay Hannah (jhannah) and myself (rbuels), plus #bioperl virtual participant Bruno Vecchi (brunov), we SMASHED the HECK out of 6 bugs in the BioPerl Bugzilla. Many thanks to the participants, let's do it again next year! Rob From jay at jays.net Fri Jun 26 00:54:31 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 26 Jun 2009 00:54:31 -0400 Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray! In-Reply-To: <4A444E4B.2000808@cornell.edu> References: <4A444E4B.2000808@cornell.edu> Message-ID: On Jun 26, 2009, at 12:27 AM, Robert Buels wrote: > I'm pleased to announce a thoroughly climactic conclusion to the > YAPC::NA 2009 BioPerl hackathon. Feel free to check our work: http://github.com/rbuels/bioperl-live :) j http://www.bioperl.org/wiki/User:Jhannah From rahall2 at ualr.edu Fri Jun 26 02:28:05 2009 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 26 Jun 2009 01:28:05 -0500 Subject: [Bioperl-l] Random nucleotide string generator? Message-ID: All, Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere? I noticed a thread about it from 2000 and nothing since (searching for "random sequence"). If not - what should the namespace be for such a module should it be undone and desirable? TIA! Roger From David.Messina at sbc.su.se Fri Jun 26 06:15:04 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Jun 2009 12:15:04 +0200 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: References: Message-ID: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this post from Neil Saunders' blog: http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/ You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID package, available here: [ SQUID ftp site ] If not - what should the namespace be for such a module should it be undone > and desirable? Perhaps add it to Bio::SeqUtils? Dave From David.Messina at sbc.su.se Fri Jun 26 07:37:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Jun 2009 13:37:44 +0200 Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray! In-Reply-To: References: <4A444E4B.2000808@cornell.edu> Message-ID: <628aabb70906260437r18fc7543oc05761241fe810ff@mail.gmail.com> Awesome, great work guys! Thanks so much. Dave From David.Messina at sbc.su.se Fri Jun 26 08:58:20 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Jun 2009 14:58:20 +0200 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com> References: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com> Message-ID: <628aabb70906260558k585f6700ycef271e7f26dd1a3@mail.gmail.com> [Forwarding Bruno's reply.... -Dave] ---------- Forwarded message ---------- From: Bruno Vecchi Date: Fri, Jun 26, 2009 at 14:44 Subject: Re: [Bioperl-l] Random nucleotide string generator? To: Dave Messina Here's a little script that I used for a somewhat related task. It produces a randomized version of an input sequence (thus keeping the original's composition). Maybe you could adjust it to your needs; providing an input sequence with the desired length and composition you should get what you want. #!perl use List::Util qw(shuffle); use Bio::SeqIO; my ($seqfile, $number) = @ARGV; my $in = Bio::SeqIO->new(-file => $seqfile); my $fh = Bio::SeqIO->newFh(-format => 'fasta'); my $seq = $in->next_seq; my @chars = split '', $seq->seq; for my $i (1 .. $number) { @chars = shuffle @chars; my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars); print $fh $new_seq; } You can use it like this from the command line (assuming you want 20 output sequences): shuffle.pl input_sequence.fasta 20 > random_sequences.fasta Bruno. 2009/6/26 Dave Messina > The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on > this > post from Neil Saunders' blog: > > http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/ > > > You can also do this outside of BioPerl using shuffle from Sean Eddy's > SQUID > package, available here: > [ SQUID ftp site ] > > > > If not - what should the namespace be for such a module should it be undone > > and desirable? > > > Perhaps add it to Bio::SeqUtils? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From budd at embl-heidelberg.de Fri Jun 26 04:30:12 2009 From: budd at embl-heidelberg.de (Aidan Budd) Date: Fri, 26 Jun 2009 10:30:12 +0200 (CEST) Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: Message-ID: a non-bioperl option would be to use something external like seq-gen or similar - tools designed for outputing "random" sequences simulated over a tree - one could simply sample a single simulated sequence at random from the output alignment On Fri, 26 Jun 2009, Roger Hall wrote: > All, > Is there a random generator for creating nucleotides (of length l with > composition frequencies a, c, g, and t) in there somewhere? > > I noticed a thread about it from 2000 and nothing since (searching for "random sequence"). > > If not - what should the namespace be for such a module should it be undone and desirable? > > TIA! > > Roger > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ---------------------------------------------------------------------- Aidan Budd tel:+49 (0)6221 387 8530 EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 Meyerhofstr. 1, 69117 Heidelberg, Germany http://www.embl-heidelberg.de/~budd/ http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html From me at miguel.weapps.com Fri Jun 26 04:52:46 2009 From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=) Date: Fri, 26 Jun 2009 10:52:46 +0200 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: References: Message-ID: <94da4c880906260152k3a764951u6ea8a6fdfa3b7f2c@mail.gmail.com> Dear all, dear Roger, I'm not sure if there is such generator (I think so). Anyway, if you flag it as "undone and desirable", please take into account the possibility of extend the generator for dinucleotides, particularly useful when working with secondary structure of RNA molecules, Cheers, On Fri, Jun 26, 2009 at 8:28 AM, Roger Hall wrote: > All, > > Is there a random generator for creating nucleotides (of length l with > composition frequencies a, c, g, and t) in there somewhere? > > I noticed a thread about it from 2000 and nothing since (searching for > "random sequence"). > > If not - what should the namespace be for such a module should it be undone > and desirable? > > TIA! > > Roger > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 lmrodriguezr at gmail.com me at miguel.weapps.com From pri2darshini at gmail.com Fri Jun 26 06:18:55 2009 From: pri2darshini at gmail.com (priya darshini) Date: Fri, 26 Jun 2009 15:48:55 +0530 Subject: [Bioperl-l] bioperl installation Message-ID: <7c569a160906260318t5611fdd8nd536ae5139f5b1d4@mail.gmail.com> Respected Sir, I am K.Lakshmi priya Darshini. My specialization is M.Sc bioinformatics. I am interseted in learning bioperl. My operating system is windows Vista. I have followed the steps to install bioperl as given by your team in the bioperl tutorial. But i am getting the error message as *"Begin failed".Sir please help me to continue with my installation further. I am using 5.10 version of perl.Waithing for your reply.* * thanking you.* * * ** *regards,* *lakshmi priya darshini.* From Jonathan.Moore at warwick.ac.uk Fri Jun 26 05:55:54 2009 From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan) Date: Fri, 26 Jun 2009 10:55:54 +0100 Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML files at the TAIR FTP site. I've tried SeqIO with both tigr and tigrxml formats but both are giving errors in 1.6.0. Has anyone advice on whether it's likely to be doable, or should I wait til the .gb files are available? Jay Moore From fungazid at yahoo.com Fri Jun 26 07:59:06 2009 From: fungazid at yahoo.com (Fungazid) Date: Fri, 26 Jun 2009 04:59:06 -0700 (PDT) Subject: [Bioperl-l] Bio::Assembly::IO Message-ID: <57633.49243.qm@web65505.mail.ac4.yahoo.com> Hello, I received an ACE file containing newbler assembly of 454 cDNA reads, and a corresponding phd.ball file. I was able to view and manipulate the contigs in this assembly using Consed on linux. Consed required ~1.5GB RAM, and the assembly was loaded within ~2 min. I would like to parse the assembly within my code (preferentially in Perl, but not necessarily), to fetch all read sequences for each contig, nucleotide quality, alignment to consensus, etc. I am trying to use Bio::Assembly::IO , but it eats more than my entire RAM (3GB), and is extremely slow (~1 hour before it crashes). Maybe you have an idea ? In addition, do you maybe aware of other non-visual parsers of ACE assembly format for Perl or other languages Many thanks, funazid From cjfields at illinois.edu Fri Jun 26 13:00:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Jun 2009 12:00:41 -0500 Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> Message-ID: If there are errors this should be submitted as a bug. You should attach example data to the report after it (e.g. don't copy&paste into the text box). http://www.bioperl.org/wiki/Bugs chris On Jun 26, 2009, at 4:55 AM, Moore, Jonathan wrote: > I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML > files at the TAIR FTP site. > > I've tried SeqIO with both tigr and tigrxml formats but both are > giving errors in 1.6.0. Has anyone advice on whether it's likely to > be doable, or should I wait til the .gb files are available? > > Jay Moore > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From plantboy at gmail.com Fri Jun 26 14:46:35 2009 From: plantboy at gmail.com (cody h) Date: Fri, 26 Jun 2009 11:46:35 -0700 Subject: [Bioperl-l] test suite failing on mac os x 10.5 Message-ID: <320708320906261146v2e799c82mc1b921218fc233c5@mail.gmail.com> Hi, I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7. The Build.PL file executes fine, but the test suite fails dramatically, returning the error "No database selected" for many of the tests. All the error calls seem to be originating from line 852 in BasePersistenceAdaptor.pm. I took a look at the code but I could not figure out why it wasn't working. I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql server. The dependencies all seem to be working, but I haven't used them enough to completely verify this, so that could be part of the problem. I don't know which ones to check though. Does anyone have any idea why I might be getting these "No database selected" errors? Here is a sample of the error messages given by the ./Build test command (note, this same error is generated byt 15/16 test files) t/12ontology.t .... 1/738 ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/12ontology.t:44 ----------------------------------------------------------- t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00) From maj at fortinbras.us Fri Jun 26 14:50:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Jun 2009 14:50:02 -0400 Subject: [Bioperl-l] Fw: Inquiry about a prog written by [MAJ] Message-ID: <0581B2DAE8514F418127D54407384905@NewLife> Thought this should be archived to the list. MAJ ----- Original Message ----- From: Mark A. Jensen To: Ross KK Leung Sent: Thursday, June 25, 2009 8:46 AM Subject: Re: Inquiry about a prog written by you Hi Ross- Yes, you can specify the recombinants, as "A/C/G[subtype]" in the query string. Unfortunately, the 10000 record limit is imposed by the Los Alamos site that my program accesses. You might be able to work around this if you're willing to write your own script using the BioPerl modules that are the basis for the hivq.PLS -- by using the modules to perform multiple queries, and collecting the the entire set of sequences over that series of queries. You might look at the documentation for the modules for ideas; try looking at http://www.bioperl.org/wiki/Module:Bio::DB::HIV and http://www.bioperl.org/wiki/Module:Bio::DB::Query::HIVQuery . best regards- Mark ----- Original Message ----- From: Ross KK Leung To: maj at fortinbras.us Sent: Thursday, June 25, 2009 6:09 AM Subject: Inquiry about a prog written by you Dear Mark A. Jensen, A google search returns your program (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/DB-HIV/hivq.PLS) I wonder whether the program is able to search recombinants (e.g. B incl. recombinants) and retrieve results more than 50000 records. This limitation is a bottleneck by the web-based search. Thanks for your advice, Ross From rmb32 at cornell.edu Fri Jun 26 17:06:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Jun 2009 14:06:06 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> Message-ID: <4A45383E.40207@cornell.edu> Reposting to bioperl list. This is a really giant opportunity to expose some of the best technologists in the world to what we do in bioinformatics, and possibly to entice some of them to help us the heck out! ;-) Rob On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: > I am the Columbus.PM YAPC::2010 conference coordinator and I would > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State > University. Can you offer any lecturer recommendations and could I > fill an entire multi day thread with BioPerl lectures? I would also > like to "entice" MJD to come to YAPC with the use of BioPerl. > > Thanks for your thoughts. > > Heath Bair > (Candybar) -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cain.cshl at gmail.com Fri Jun 26 17:12:37 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 26 Jun 2009 17:12:37 -0400 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <4A45383E.40207@cornell.edu> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> Message-ID: Cool--Columbus is just down the road. I could give a talk (or even multiple talks) on a variety of GMOD topics (which I consider BioPerl related, since so much of what we do depends on BioPerl). Scott On Jun 26, 2009, at 5:06 PM, Robert Buels wrote: > Reposting to bioperl list. > > This is a really giant opportunity to expose some of the best > technologists in the world to what we do in bioinformatics, and > possibly to entice some of them to help us the heck out! ;-) > > Rob > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: >> I am the Columbus.PM YAPC::2010 conference coordinator and I would >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State >> University. Can you offer any lecturer recommendations and could I >> fill an entire multi day thread with BioPerl lectures? I would >> also like to "entice" MJD to come to YAPC with the use of BioPerl. >> >> Thanks for your thoughts. >> >> Heath Bair >> (Candybar) > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Fri Jun 26 17:49:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Jun 2009 16:49:39 -0500 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <4A45383E.40207@cornell.edu> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> Message-ID: <642C6C93-8FCD-4463-8A39-E15832F8714C@illinois.edu> Well, if it's in Columbus I'll be there (I can make a drive out of it). In short, we should probably get something going, yes. Lots of things we can talk about, inc. bioperl6, Bio::Moose, etc. chris On Jun 26, 2009, at 4:06 PM, Robert Buels wrote: > Reposting to bioperl list. > > This is a really giant opportunity to expose some of the best > technologists in the world to what we do in bioinformatics, and > possibly to entice some of them to help us the heck out! ;-) > > Rob > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: >> I am the Columbus.PM YAPC::2010 conference coordinator and I would >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State >> University. Can you offer any lecturer recommendations and could I >> fill an entire multi day thread with BioPerl lectures? I would >> also like to "entice" MJD to come to YAPC with the use of BioPerl. >> >> Thanks for your thoughts. >> >> Heath Bair >> (Candybar) > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Fri Jun 26 23:59:10 2009 From: hartzell at alerce.com (George Hartzell) Date: Fri, 26 Jun 2009 20:59:10 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <4A45383E.40207@cornell.edu> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> Message-ID: <19013.39182.97468.604560@already.dhcp.gene.com> This does seems like a great opportunity. I think you/the-community could put together at least a day, and maybe more, of Bio and Perl stuff. I think that it's important to range beyond the stuff that's in the BioPerl namespace and pull in something from the Gene Ontology project, the Ensembl project[s], maybe libbio, etc.... g. Robert Buels writes: > Reposting to bioperl list. > > This is a really giant opportunity to expose some of the best > technologists in the world to what we do in bioinformatics, and possibly > to entice some of them to help us the heck out! ;-) > > Rob > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: > > I am the Columbus.PM YAPC::2010 conference coordinator and I would > > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State > > University. Can you offer any lecturer recommendations and could I > > fill an entire multi day thread with BioPerl lectures? I would also > > like to "entice" MJD to come to YAPC with the use of BioPerl. > > > > Thanks for your thoughts. > > > > Heath Bair > > (Candybar) > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Sat Jun 27 00:28:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Jun 2009 23:28:14 -0500 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <19013.39182.97468.604560@already.dhcp.gene.com> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> <19013.39182.97468.604560@already.dhcp.gene.com> Message-ID: Agree (and should add GMOD/Gbrowse to that as well). chris On Jun 26, 2009, at 10:59 PM, George Hartzell wrote: > > This does seems like a great opportunity. I think you/the-community > could put together at least a day, and maybe more, of Bio and Perl > stuff. I think that it's important to range beyond the stuff that's > in the BioPerl namespace and pull in something from the Gene Ontology > project, the Ensembl project[s], maybe libbio, etc.... > > g. > > Robert Buels writes: >> Reposting to bioperl list. >> >> This is a really giant opportunity to expose some of the best >> technologists in the world to what we do in bioinformatics, and >> possibly >> to entice some of them to help us the heck out! ;-) >> >> Rob >> >> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: >>> I am the Columbus.PM YAPC::2010 conference coordinator and I would >>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State >>> University. Can you offer any lecturer recommendations and could I >>> fill an entire multi day thread with BioPerl lectures? I would also >>> like to "entice" MJD to come to YAPC with the use of BioPerl. >>> >>> Thanks for your thoughts. >>> >>> Heath Bair >>> (Candybar) >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat Jun 27 00:56:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 27 Jun 2009 00:56:41 -0400 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <4A45383E.40207@cornell.edu> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> Message-ID: I think BioPerl has enough to talk about to have its own conference, which would coincide with its 15th anniversary in 2010. That may put the kibosh on the original intent of the inviter, which ultimately is to get The Dominus to bite (and more power to her, I say. My programming style is forever changed, and I haven't even finished The Book). If someone organizes it, I'll bring the chips and dip. MAJ ----- Original Message ----- From: "Robert Buels" To: "BioPerl List" Cc: Sent: Friday, June 26, 2009 5:06 PM Subject: Re: [Bioperl-l] BioPerl at YAPC::2010 > Reposting to bioperl list. > > This is a really giant opportunity to expose some of the best > technologists in the world to what we do in bioinformatics, and possibly > to entice some of them to help us the heck out! ;-) > > Rob > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: >> I am the Columbus.PM YAPC::2010 conference coordinator and I would >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State >> University. Can you offer any lecturer recommendations and could I >> fill an entire multi day thread with BioPerl lectures? I would also >> like to "entice" MJD to come to YAPC with the use of BioPerl. >> >> Thanks for your thoughts. >> >> Heath Bair >> (Candybar) > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Sat Jun 27 01:30:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 27 Jun 2009 01:30:34 -0400 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net><4A45383E.40207@cornell.edu> Message-ID: [...to *him*, that is...pardon] ----- Original Message ----- From: "Mark A. Jensen" To: "Robert Buels" ; "BioPerl List" Sent: Saturday, June 27, 2009 12:56 AM Subject: Re: [Bioperl-l] BioPerl at YAPC::2010 >I think BioPerl has enough to talk about to have its own conference, which >would coincide with its 15th anniversary in 2010. That may put the kibosh on >the original intent of the inviter, which ultimately is to get The Dominus to >bite (and more power to her, I say. My programming style is forever changed, >and I haven't even finished > The Book). > If someone organizes it, I'll bring the chips and dip. > MAJ > ----- Original Message ----- > From: "Robert Buels" > To: "BioPerl List" > Cc: > Sent: Friday, June 26, 2009 5:06 PM > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010 > > >> Reposting to bioperl list. >> >> This is a really giant opportunity to expose some of the best technologists >> in the world to what we do in bioinformatics, and possibly to entice some of >> them to help us the heck out! ;-) >> >> Rob >> >> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: >>> I am the Columbus.PM YAPC::2010 conference coordinator and I would like to >>> have a "BioPerl" thread at YAPC::NA::2010 at Ohio State University. Can you >>> offer any lecturer recommendations and could I fill an entire multi day >>> thread with BioPerl lectures? I would also like to "entice" MJD to come to >>> YAPC with the use of BioPerl. >>> >>> Thanks for your thoughts. >>> >>> Heath Bair >>> (Candybar) >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kpclancy at hotmail.com Sat Jun 27 06:04:20 2009 From: kpclancy at hotmail.com (Kevin Clancy) Date: Sat, 27 Jun 2009 04:04:20 -0600 Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think) In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu> References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net> <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net> <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com> <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu> <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu> Message-ID: I think ismb will be in Boston in 2010 (feels odd just typing that...) maybe that is enough of a running start to set something up. kevin > CC: jay at jays.net; vecchi.b at gmail.com; bioperl-l at bioperl.org > From: cjfields at illinois.edu > To: kpclancy at hotmail.com > Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think) > Date: Wed, 24 Jun 2009 22:54:28 -0500 > > I have no idea; I don't think there are many bioperl devs attending > this year unfortunately. Any meetings in the next year where we could > set up a bioperl hackathon? I will likely be available to attend if > it's stateside... > > chris > > On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote: > > > > > is there an intention to have a hackathon at ISMB this weekend - I > > know there is a 2 day BOSC > > kevin > > > >> From: cjfields at illinois.edu > >> To: jay at jays.net > >> Date: Wed, 24 Jun 2009 16:10:34 -0500 > >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org > >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think) > >> > >> > >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote: > >> > >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote: > >>>> Let me know if anyone needs collab on biomoose on github; Mark > >>>> Jensen's already added. > >>> > >>> Anything on github should be trivial, even with no perms -- we can > >>> just fork and then send you (whoever) pull requests. github++ :) > >>> > >>>> 1) Any help towards bugzilla fixes would be most welcome. > >>> > >>> I don't know how to make any progress in bugzilla if no one has a > >>> commit bit...? > >> > >> For some reason I thought you had a commit bit; we can add you in if > >> needed. Anyway, patches are most definitely welcome ;> > >> > >>>> 2) Better GFF3 integration > >>>> 3) Typed but lightweight seqfeatures > >>> > >>> Are there bugzilla tickets (or somewhere) describing those? > >> > >> No as the issues are more complex than one single bug, but we do have > >> something to help track for the time being: > >> > >> http://www.bioperl.org/wiki/GFF_Refactor > >> http://www.bioperl.org/wiki/Align_Refactor > >> > >> I'll probably file TODOs during the process for those refactors. The > >> easiest to tackle would be probably be Align/LocatableSeq refactors. > >> > >>> I wonder if anyone can help me get out of sporadic MailMan > >>> purgatory... > >>> > >>> Thanks, > >>> > >>> j > >> > >> -c > >> > >> PS - Don't feel constrained by the above. There are many many areas > >> to contribute to. > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Sat Jun 27 13:08:10 2009 From: hartzell at alerce.com (George Hartzell) Date: Sat, 27 Jun 2009 10:08:10 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> Message-ID: <19014.20986.867646.940277@already.dhcp.gene.com> I had an eye-opening time at YAPC, and I think that it would be very powerful to have many members of the Bio & Perl community rubbing elbows with the folks leading (and following, for that matter) the "Modern Perl" movement (in the broader sense, not _just_ chromatic): Moose, DBIx::Class, Dist::Zilla, KiokoDB, etc.... I think that it would help pull BioPerl and the others towards powerful mainstream technologies and expose many of us to new people, tricks, and tools. Having us off on our own, or mingling with ISMB'ers, doesn't really stir the pot. g. Mark A. Jensen writes: > I think BioPerl has enough to talk about to have its own conference, > which would coincide with its 15th anniversary in 2010. That may > put the kibosh on the original intent of the inviter, which ultimately is > to get The Dominus to bite (and more power to her, I say. My > programming style is forever changed, and I haven't even finished > The Book). > > If someone organizes it, I'll bring the chips and dip. > MAJ > ----- Original Message ----- > From: "Robert Buels" > To: "BioPerl List" > Cc: > Sent: Friday, June 26, 2009 5:06 PM > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010 > > > > Reposting to bioperl list. > > > > This is a really giant opportunity to expose some of the best > > technologists in the world to what we do in bioinformatics, and possibly > > to entice some of them to help us the heck out! ;-) > > > > Rob > > > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote: > >> I am the Columbus.PM YAPC::2010 conference coordinator and I would > >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State > >> University. Can you offer any lecturer recommendations and could I > >> fill an entire multi day thread with BioPerl lectures? I would also > >> like to "entice" MJD to come to YAPC with the use of BioPerl. > >> > >> Thanks for your thoughts. > >> > >> Heath Bair > >> (Candybar) > > > > -- > > Robert Buels > > Bioinformatics Analyst, Sol Genomics Network > > Boyce Thompson Institute for Plant Research > > Tower Rd > > Ithaca, NY 14853 > > Tel: 503-889-8539 > > rmb32 at cornell.edu > > http://www.sgn.cornell.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From richard.harrison at edinburgh.ac.uk Mon Jun 29 18:43:54 2009 From: richard.harrison at edinburgh.ac.uk (Richard Harrison) Date: Mon, 29 Jun 2009 23:43:54 +0100 Subject: [Bioperl-l] PopGen Message-ID: <5FBB6056-386D-42E3-8236-1FEB8F5BE520@edinburgh.ac.uk> Dear all, I am having trouble with the PopGen modules and I was wondering if anyone had any ideas. I am working with polymorphism data. I am trying to identify the derived vs ancestral allele between two species. I have been modifying the modules a bit to include different site models etc. Here is where I fall over: Within aln_to_population I can create a modified Genotype object to include details of the ancestral allele (see at end of this post). However, the problem that I have hit upon is that aln_to_population returns a population object, filled with IndividualI objects. In other words, it takes my array of GenotypeI objects and converts them into IndividualI objects, wrapped in a single Population object. This means that the information in the GenotypeI object about the ancestral/ derived states is lost. How can I overcome this? Thanks, Richard ###excerpt from aln_to_population $inds[$i]->add_Genotype(Bio::PopGen::Genotype->new (-marker_name => $nm, -individual_id=> $inds[$i]->unique_id, -alleles => [$genotypes[$i]], -outgroup => $outgroup[0])); ###excerpt from Genotypes.pm sub new { my($class, at args) = @_; my $self = $class->SUPER::new(@args); my ($name,$desc,$type,$uid,$af,$og) = $self->_rearrange([qw(NAME DESCRIPTION TYPE UNIQUE_ID ALLELE_FREQ OUTGROUP)], at args); $self->{'_allele_freqs'} = {}; $self->{'_outgroup_name'} = {}; if( ! defined $uid ) { $uid = $UniqueCounter++; } if( defined $name) { $self->name($name); } else { $self->throw("Must provide a name when initializing a Marker"); } defined $desc && $self->description($desc); defined $type && $self->type($type); $self->outgroup_name($og); $self->unique_id($uid); return $self; } =head2 og Title : name Usage : my $name = $marker->og(); Function: Get the name of the outgroup Returns : string representing the name of the marker Args : [optional] name =cut sub outgroup_name{ my $self = shift; return $self->{'_outgroup_name'} = shift if @_; return $self->{'_outgroup_name'}; } -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jason at bioperl.org Tue Jun 30 01:03:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 29 Jun 2009 22:03:08 -0700 Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> Message-ID: There are several flavors of TIGR XML for rice and arabidoposis, and other projects etc, I don't know which is tracked with the current tigrxml version unfortunately but one can compare the test files in t/ data to the versions downloaded to see what is currently supported. Usually the gbk will be more consistently parseable but we can try and work it out if it is a sensible transformation. On Jun 26, 2009, at 2:55 AM, Moore, Jonathan wrote: > I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML > files at the TAIR FTP site. > > I've tried SeqIO with both tigr and tigrxml formats but both are > giving errors in 1.6.0. Has anyone advice on whether it's likely to > be doable, or should I wait til the .gb files are available? > > Jay Moore > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From paola.bisignano at gmail.com Tue Jun 30 05:12:49 2009 From: paola.bisignano at gmail.com (Paola Bisignano) Date: Tue, 30 Jun 2009 11:12:49 +0200 Subject: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25 In-Reply-To: References: Message-ID: Hi, I need a little help, to parse a file, but I tried to search some modules of bioperl, but there are a lot, and I don't know how to start, I find moduls for all db, for different web site, but not for my favorite PDBsum....so I parsed a lot of thing on my own, even if I was new in learning perl....but now I'm waiting for help...because I need to parse a FASTA file, resulted from aligned sequences...I need to extract the aligned sequences, only for the pdb in my lista.... my fasta file is like: Query: /ebi/research/thornton/tmp/sas307986/seq.fasta 1>>>Sequence 3e7e:A - 333 aa Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib 17840403 residues in 79353 sequences opt E() < 20 286 0:=== 22 1 0:= one = represents 135 library sequences 24 1 0:= 26 0 2:* 28 21 18:* 30 36 109:* 32 237 421:== * 34 956 1140:========* 36 1924 2342:=============== * 38 3591 3871:=========================== * 40 4904 5400:===================================== * 42 6750 6600:================================================*= 44 7145 7281:=====================================================* 46 8047 7416:======================================================*===== ......... >>2np8:A (159 aa) initn: 125 init1: 72 opt: 136 Z-score: 168.6 bits: 38.5 E(): 0.011 Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa overlap (59-204:13-153) 10 20 30 40 50 60 Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG :: 2np8:A QWALEDFEIGRPLG 10 70 80 90 100 110 Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH .: :..:: : ....::.: :: :. . . :: .. .. ..: ....:. 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG-- 20 30 40 50 60 70 120 130 140 150 160 170 Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII :.... :. : ::. .. .. :. . .. .. . :. ..: 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI 80 90 100 110 120 180 190 200 210 220 230 Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN : ::::.:..:: ::: : . :.: :. 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR 130 140 150 240 250 260 270 280 290 Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP 300 310 320 330 Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC >>2ojg:A (337 aa) initn: 85 init1: 53 opt: 140 Z-score: 168.1 bits: 39.5 E(): 0.012 Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa overlap (46-252:1-204) 10 20 30 40 50 60 Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG :..: . . . .. : 2ojg:A FDVGPRYTNLSYI-G 10 70 80 90 100 110 Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN :::...: : .: .: . ..: .:.: : ....: ....: ... 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI 20 30 40 50 60 120 130 140 150 160 170 Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI .... . ..: :... .::: . . . . : ...: .. .:. .. 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV 70 80 90 100 110 120 180 190 200 210 220 230 Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML .: :.::.:..:.. . : . :.: . . . ..: : .. : :: 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML 130 140 150 160 170 180 240 250 260 270 280 290 Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN ..: .. .:: ..:. . :: 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA 190 200 210 220 230 240 300 310 320 330 Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS 250 260 270 280 290 300 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG 310 320 330 >>2oji:A (344 aa) initn: 85 init1: 53 opt: 140 Z-score: 168.0 bits: 39.5 E(): 0.012 Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa overlap (46-252:5-208) 10 20 30 40 50 60 Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG :..: . . . .. : 2oji:A RGQVFDVGPRYTNLSYI-G 10 70 80 90 100 110 Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN :::...: : .: .: . ..: .:.: : ....: ....: ... 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI 20 30 40 50 60 70 120 130 140 150 160 170 Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI .... . ..: :... .::: . . . . : ...: .. .:. .. 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV 80 90 100 110 120 130 180 190 200 210 220 230 Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML .: :.::.:..:.. . : . :.: . . . ..: : .. : :: 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML 140 150 160 170 180 240 250 260 270 280 290 Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN ..: .. .:: ..:. . :: 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA 190 200 210 220 230 240 300 310 320 330 Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS 250 260 270 280 290 300 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY 310 320 330 340 ....... I show a part of the file...if I want for example only that two alignment? are there moduls to parse...because I've tried to parse whit regex but....without results :-(.... If anyone has suggestion for muduls or anything else, I'll be very happy to learn thanks Paola From giles.weaver at googlemail.com Tue Jun 30 07:28:25 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Tue, 30 Jun 2009 12:28:25 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> I'm developing a transcriptomics database for use with next-gen data, and have found processing the raw data to be a big hurdle. I'm a bit late in responding to this thread, so most issues have already been discussed. One thing that hasn't been mentioned is removal of adapters from raw Illumina sequence. This is a PITA, and I'm not aware of any well developed and documented open source software for removal of adapters (and poor quality sequence) from Illumina reads. My current Illumina sequence processing pipeline is an unholy mix of biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure perl to trim the poor quality sequence from each read, and bioperl with emboss to remove the adapter sequence. I'm aware that the pipeline contains bugs and would like to simplify it, but at least it does work... Ideally I'd like to replace as much of the pipeline as possible with bioperl/bioperl-run, but this isn't currently possible due to both a lack of features and poor performance. I'm sure the features will come with time, but the performance is more of a concern to me. I wonder if Bio::Moose might be used to alleviate some of the performance issues? Might next-gen modules be an ideal guinea pig for Bio::Moose? For my purposes the tools that would love to see supported in bioperl/bioperl-run are: - next-gen sequence quality parsing (to output phred scores) - sequence quality based trimming - sequencing adapter removal - filtering based on sequence complexity (repeats, entropy etc) - bioperl-run modules for bowtie etc. Obviously all of these need to be fast! I'd love to muck in, but I doubt I'll contribute much before Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares! Regarding trimming bad quality bases (see comments from Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be much faster than a primarily bioperl based implementation. I found Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My current code trims ~1300 sequences/second, including unzipping the raw data and converting it to sanger fastq with biopython. Processing an entire sequencing run with the whole pipeline takes in the region of 6-12h. Hope this looooong post was of interest to someone! Giles 2009/6/17 Tristan Lefebure > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). > > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? > > I can contribute some tests if that sounds like an important > point. > > -Tristan > From manchunjohn-ma at uiowa.edu Tue Jun 30 12:17:08 2009 From: manchunjohn-ma at uiowa.edu (John M.C. Ma) Date: Tue, 30 Jun 2009 11:17:08 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker crashes perl Message-ID: <5486b2980906300917m20e8cd06sbaee207aed3a27c9@mail.gmail.com> Hi everyone, (OS: OpenSuSE 11.1, Versions: Perl:v5.10.0-i586-linux-thread-multi, Bioperl: 1.6.0-cpan, Bioperl-run: 1.6.1-cpan, Ensembl: Ver 54-cvs) This is the first time I use Bio::Tools::Run::RepeatMasker, and it came with a strange crash that I can't think of a reason. I would rather think it's my problem? My code involved pulling a sequence from Ensembl-variation, put it into a PrimarySeq Object and run RepeatMasker on it: use strict; use warnings; use Bio::SeqIO; use Bio::PrimarySeq; use Bio::Tools::Run::RepeatMasker; use Bio::EnsEMBL::Registry; use Bio::EnsEMBL::Variation::Variation; [snips most Ensembl code as the sequence itself looks OK] my $ref_allele=$snp_obj->five_prime_flanking_seq.${$snp_obj->get_all_Alleles}[0]->allele.$snp_obj->three_prime_flanking_seq; my $mask_seq=Bio::PrimarySeq->new (-seq=>$ref_allele); my $rmasker_handle=Bio::Tools::Run::RepeatMasker->new(-species=>'rat',-noisy=>"1"); my @masked_features=$rmasker_handle->run($mask_seq); my $masked_seq=$rmasker_handle->run; And when I let the wrapper run, perl crashed with these warnings: --------------------- WARNING --------------------- MSG: RepeatMasker didn't find any repetitive sequences --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/EWLAmIVymd/wByClB8iqr.masked: No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.10.0/Bio/Root/IO.pm:310 STACK: Bio::SeqIO::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:450 STACK: Bio::SeqIO::fasta::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO/fasta.pm:81 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:347 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:373 STACK: Bio::Tools::Run::RepeatMasker::_run /usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:320 STACK: Bio::Tools::Run::RepeatMasker::run /usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:260 STACK: main::SeqList /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:40 STACK: /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:63 ----------------------------------------------------------- What could happen? Cheers, John Ma, University of Iowa From cjfields at illinois.edu Tue Jun 30 13:46:27 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Jun 2009 12:46:27 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <6723B5A0-9A21-4851-BD88-0BA3CC107439@illinois.edu> On Jun 30, 2009, at 6:28 AM, Giles Weaver wrote: > I'm developing a transcriptomics database for use with next-gen > data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have > already > been discussed. One thing that hasn't been mentioned is removal of > adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any > well > developed and documented open source software for removal of > adapters (and > poor quality sequence) from Illumina reads. > > My current Illumina sequence processing pipeline is an unholy mix of > biopython, bioperl, pure perl, emboss and bowtie. Biopython for > converting > the Illumina fastq to Sanger fastq, bioperl to read the quality > values, pure > perl to trim the poor quality sequence from each read, and bioperl > with > emboss to remove the adapter sequence. I'm aware that the pipeline > contains > bugs and would like to simplify it, but at least it does work... My local bioperl is working with FASTQ parsing of Sanger and Illumina (but not solexa yet). I'll commit what I have today, and we should be able to add in solexa soon. We'll also need to add in write_seq support. > Ideally I'd like to replace as much of the pipeline as possible with > bioperl/bioperl-run, but this isn't currently possible due to both a > lack of > features and poor performance. I'm sure the features will come with > time, > but the performance is more of a concern to me. I wonder if > Bio::Moose might > be used to alleviate some of the performance issues? Might next-gen > modules > be an ideal guinea pig for Bio::Moose? We should get FASTQ working in core first then optimize on speed (as Elia previously pointed out). We can do that within the actual SeqIO parser using a few simple tricks. For instance my local Bio::SeqIO::fastq has a reconfigured next_seq to call an iterator that returns raw processed data as a simple hash ref; users have access to that method, so if one wanted they could retrieve the raw data directly, or pass it through a filter that only creates seq instances one wants on the fly (that would be where your quality checks, adaptor modification, etc. fit in). In the end it might be to wrap a C/C++-based solution for speed. As mentioned previously a C-based parser exists from Sanger Centre that we could incorporate in some fashion, but I would like if it were able to report back file position for fast indexing. The code is fairly simple so it should be too hard to incorporate that in somehow. Just so there is no confusion, Bio::Moose is an attempt to both lay out plans for perl6 and deal with inheritance issues within bioperl now. It's still in very early development and may not see a release until Dec. at the very earliest, it will be an alpha release then, and likely won't have every major class represented at that point. It's also not intended to be backwards-compatible with bioperl core. It may help, but that's not an absolute certainty. As for bioperl6, it will be pre-alpha until perl6 spec reaches a stable draft and we have an active implementation. > For my purposes the tools that would love to see supported in > bioperl/bioperl-run are: > > - next-gen sequence quality parsing (to output phred scores) > - sequence quality based trimming > - sequencing adapter removal > - filtering based on sequence complexity (repeats, entropy etc) > - bioperl-run modules for bowtie etc. > > Obviously all of these need to be fast! > I'd love to muck in, but I doubt I'll contribute much before > Bio::Moose/bioperl6, as the (bio)perl object system gives me > nightmares! One can only read a file so fast (even with a highly optimized C/C++ based parser), but I don't think that will be the limiting factor as much as object instantiation. > Regarding trimming bad quality bases (see comments from Tristan > Lefebure) > from Solexa/Illumina reads, I did find a mixed pure/bioperl solution > to be > much faster than a primarily bioperl based implementation. I found > Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. > My > current code trims ~1300 sequences/second, including unzipping the > raw data > and converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. Right, hence coming up with a 'pre-filter' for raw data (hash refs) prior to object instantiation to speed things up. This will be a bit easier with Bio::Moose as we can introspect attributes via the meta class, but this will be a while yet. > Hope this looooong post was of interest to someone! > > Giles It's always good to hear about such issues and what one expects. chris From cjfields at illinois.edu Tue Jun 30 17:58:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Jun 2009 16:58:57 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A42AC51.3090809@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com> <4A40B5D6.40504@ebi.ac.uk> <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com> <4A40C909.40803@ebi.ac.uk> <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com> <4A421287.4000203@ebi.ac.uk> <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com> <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu> <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com> <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu> <4A42AC51.3090809@ebi.ac.uk> Message-ID: All, I have committed the first run at adding Illumina/Solexa parsing for FASTQ along with tests. It's very possible the quality scores are off, particularly for Solexa (Illumina 1.0), so test away and let me know if anything pops up (should be a quick fix). Along with that is a small commit to Bio::SeqIO so that we can add format variants (see below for an example). write_seq/write_qual/write_fastq will likely not work as expected as I haven't touched them; they are to be tackled next. For faster parsing I have also added a next_dataset method that returns a hash reference to the parsed data instead of an object; this hash includes quality scores. This method is called by next_seq and the relevant data is passed in to the sequence factory directly; one could do something like the following to filter sequences as needed: use Modern::Perl; use Bio::SeqIO; use Bio::Seq::SeqFactory; my $file = shift; # same as (-format => 'fastq', -variant => 'illumina') my $in = Bio::SeqIO->new(-file => $file, -format => 'fastq-illumina'); my $factory = Bio::Seq::SeqFactory->new(-type => 'Bio::Seq::Quality'); while (my $data = $in->next_dataset) { next if seq_is_crap($data); my $seq = $factory->create(%$data); } sub seq_is_crap { # filter here } chris From maj at fortinbras.us Tue Jun 30 21:41:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 30 Jun 2009 21:41:16 -0400 Subject: [Bioperl-l] Parsing a FASTA file (Was: Bioperl-l Digest, Vol 74, Issue 25) In-Reply-To: References: Message-ID: <9D386274308C4DF98E38918477801541@NewLife> Hi Paola, You want to try Bio::SearchIO, I think. It's not quite clear what you want to do, but here's an example of what you can do: Get all high-scoring pairs ( the mini-alignments ) involving the database sequence called "2ojg:A"-- use Bio::SearchIO; my $io = Bio::SearchIO->new(-format=>'fasta', -file=>'yourfile.fasta'); my $result = $io->next_result; my @desired_hsps; while ( my $hit = $result->next_hit ) { push @desired_hsps, grep { $_->subject->seq_id =~ /2ojg:A/ } $hit->hsps; } # now all your desired hsps are in the array @desired_hsps; # you can get Bio::SimpleAlign objects from them all, for example: my @aligns = map { $_->get_aln } @desired_hsps; #...and lots of other things... Look at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO and http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods for a nice introduction to the Bio::SearchIO system by its authors. They use a blast output as an example, but everything applies to fasta output as well. You didn't waste your time writing regexps, by the way. For a Perl student, that kind of work is like money in the bank. cheers, Mark ----- Original Message ----- From: "Paola Bisignano" To: Sent: Tuesday, June 30, 2009 5:12 AM Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25 > Hi, > I need a little help, to parse a file, but I tried to search some > modules of bioperl, but there are a lot, and I don't know how to > start, I find moduls for all db, for different web site, but not for > my favorite PDBsum....so I parsed a lot of thing on my own, even if I > was new in learning perl....but now I'm waiting for help...because I > need to parse a FASTA file, resulted from aligned sequences...I need > to extract the aligned sequences, only for the pdb in my lista.... > > > my fasta file is like: > > Query: /ebi/research/thornton/tmp/sas307986/seq.fasta > 1>>>Sequence 3e7e:A - 333 aa > Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib > 17840403 residues in 79353 sequences > > opt E() > < 20 286 0:=== > 22 1 0:= one = represents 135 library sequences > 24 1 0:= > 26 0 2:* > 28 21 18:* > 30 36 109:* > 32 237 421:== * > 34 956 1140:========* > 36 1924 2342:=============== * > 38 3591 3871:=========================== * > 40 4904 5400:===================================== * > 42 6750 6600:================================================*= > 44 7145 7281:=====================================================* > 46 8047 7416:======================================================*===== > ......... > >>>2np8:A (159 aa) > initn: 125 init1: 72 opt: 136 Z-score: 168.6 bits: 38.5 E(): 0.011 > Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa > overlap (59-204:13-153) > > 10 20 30 40 50 60 > Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG > :: > 2np8:A QWALEDFEIGRPLG > 10 > > 70 80 90 100 110 > Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH > .: :..:: : ....::.: :: :. . . :: .. .. ..: ....:. > 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG-- > 20 30 40 50 60 70 > > 120 130 140 150 160 170 > Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII > :.... :. : ::. .. .. :. . .. .. . :. ..: > 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI > 80 90 100 110 120 > > 180 190 200 210 220 230 > Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN > : ::::.:..:: ::: : . :.: :. > 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR > 130 140 150 > > 240 250 260 270 280 290 > Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP > > 300 310 320 330 > Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC > >>>2ojg:A (337 aa) > initn: 85 init1: 53 opt: 140 Z-score: 168.1 bits: 39.5 E(): 0.012 > Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa > overlap (46-252:1-204) > > 10 20 30 40 50 60 > Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG > :..: . . . .. : > 2ojg:A FDVGPRYTNLSYI-G > 10 > > 70 80 90 100 110 > Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN > :::...: : .: .: . ..: .:.: : ....: ....: ... > 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI > 20 30 40 50 60 > > 120 130 140 150 160 170 > Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI > .... . ..: :... .::: . . . . : ...: .. .:. .. > 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV > 70 80 90 100 110 120 > > 180 190 200 210 220 230 > Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML > .: :.::.:..:.. . : . :.: . . . ..: : .. : :: > 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML > 130 140 150 160 170 180 > > 240 250 260 270 280 290 > Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN > ..: .. .:: ..:. . :: > 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA > 190 200 210 220 230 240 > > 300 310 320 330 > Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC > > 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS > 250 260 270 280 290 300 > > 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG > 310 320 330 > >>>2oji:A (344 aa) > initn: 85 init1: 53 opt: 140 Z-score: 168.0 bits: 39.5 E(): 0.012 > Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa > overlap (46-252:5-208) > > 10 20 30 40 50 60 > Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG > :..: . . . .. : > 2oji:A RGQVFDVGPRYTNLSYI-G > 10 > > 70 80 90 100 110 > Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN > :::...: : .: .: . ..: .:.: : ....: ....: ... > 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI > 20 30 40 50 60 70 > > 120 130 140 150 160 170 > Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI > .... . ..: :... .::: . . . . : ...: .. .:. .. > 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV > 80 90 100 110 120 130 > > 180 190 200 210 220 230 > Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML > .: :.::.:..:.. . : . :.: . . . ..: : .. : :: > 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML > 140 150 160 170 180 > > 240 250 260 270 280 290 > Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN > ..: .. .:: ..:. . :: > 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA > 190 200 210 220 230 240 > > 300 310 320 330 > Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC > > 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS > 250 260 270 280 290 300 > > 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY > 310 320 330 340 > > ....... > I show a part of the file...if I want for example only that two > alignment? are there moduls to parse...because I've tried to parse > whit regex but....without results :-(.... > If anyone has suggestion for muduls or anything else, I'll be very > happy to learn > thanks > Paola > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Jun 30 23:48:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Jun 2009 22:48:11 -0500 Subject: [Bioperl-l] FASTQ output Message-ID: I am working on FASTQ output and noticed a real oddity. Apparently, there are three write_* methods for this module, with the odd choice of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ. write_qual() writes Qual format: http://www.bioperl.org/wiki/Qual_sequence_format and write_fastq() writes FASTQ. Now, maybe it's just me, but I think an implementation of write_seq() for a specific format should probably output that format and not something else entirely unexpected. Also, is there a reason for duplicating output code for qual and FASTA output within Bio::SeqIO::fastq, i.e. should we call Bio::SeqIO::fasta/ qual instead? I would consider the write_seq() issue a bug, the others are really just maintenance issues. Anyone have problems with me changing that up a bit? chris From upgrade32009 at live.com Mon Jun 29 20:07:57 2009 From: upgrade32009 at live.com (Webmail Support Team) Date: Mon, 29 Jun 2009 19:07:57 -0500 Subject: [Bioperl-l] Webmail Maintenance Notice Message-ID: Dear: E-Mail Owner. All webmail users are to update his or her email account as to create more space for new ones. To prevent your account from closing you will have to update it below so that we will know its an existing account. CONFIRM YOUR E-MAIL BELOW: Name:................. Email Username :..... EMAIL Password : ................ Country or Territory : .......... Warning!!! E-mail owner who fails to update his or her e-mail within Seven days of receiving this warning will risk losing his or her e-mail account permanently. Thanks, Webmail Support Team From upgrade32009 at live.com Mon Jun 29 20:10:43 2009 From: upgrade32009 at live.com (Webmail Support Team) Date: Mon, 29 Jun 2009 19:10:43 -0500 Subject: [Bioperl-l] Webmail Maintenance Notice Message-ID: Dear: E-Mail Owner. All webmail users are to update his or her email account as to create more space for new ones. To prevent your account from closing you will have to update it below so that we will know its an existing account. CONFIRM YOUR E-MAIL BELOW: Name:................. Email Username :..... EMAIL Password : ................ Country or Territory : .......... Warning!!! E-mail owner who fails to update his or her e-mail within Seven days of receiving this warning will risk losing his or her e-mail account permanently. Thanks, Webmail Support Team From Jonas_Schaer at gmx.de Sun Jun 28 06:15:18 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Sun, 28 Jun 2009 12:15:18 +0200 Subject: [Bioperl-l] different results with remote-blast skript Message-ID: Hi again :) please, I only have this little question: why do I get different results with my remote::blast perl skript then on the ncbi blast homepage? I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'. PLEASE help me, thank you in advance, Jonas ps: my skript: ################################################################################ use Bio::Seq::SeqFactory; use Bio::Tools::Run::RemoteBlast; use strict; my @blast_report; my $prog = 'blastp'; my $db = 'nr'; my $e_val= '1e-10'; #my $e_val= '10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE'; #$v is just to turn on and off the messages my $v = 1; my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); my $filename='temp2.out'; my $r = $factory->submit_blast($seq); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); } else { my $result = $rc->next_result(); $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } @blast_report = get_file_data ($filename); return @blast_report; ################################################################################## From stevey_mac2k2 at hotmail.com Sun Jun 28 06:53:04 2009 From: stevey_mac2k2 at hotmail.com (stephenmcgowan1) Date: Sun, 28 Jun 2009 03:53:04 -0700 (PDT) Subject: [Bioperl-l] Installing Bioperl on Mac OS X 10.5.7 Message-ID: <24240541.post@talk.nabble.com> Hi, I'm new to the mac way of working and programming aswell as the UNIX (Terminal) environment. I will describe in as much detail as i can as to what i have done so far in terms of bioperl installation and try to describe what my problem is. Ok so first of all i have downloaded and extracted the files BioPerl-1.6.0 and BioPerl-db-1.6.0 from the site. I have these two folders saved in a folder on my OSX desktop called "ExerciseTwo". After doing this, i open up Terminal and locate BioPerl-1.6.0. i then run: perl Build.PL (i have also tried sudo perl Build.pl) i then run ./Build test (again tried this with sudo ./Build test) after running the build test, i receive the feedback: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/AlignIO/AlignIO.t 255 65280 28 42 150.00% 8-28 t/AlignIO/arp.t 255 65280 48 92 191.67% 3-48 t/Annotation/Annotation.t 255 65280 159 83 52.20% 9 117 119-159 t/ClusterIO/SequenceFamily.t 255 65280 19 34 178.95% 3-19 t/LocalDB/Flat.t 255 65280 24 20 83.33% 15-24 t/LocalDB/Index.t 255 65280 64 66 103.12% 32-64 t/RemoteDB/BioFetch.t 255 65280 36 2 5.56% 36 t/RemoteDB/DB.t 3 768 113 59 52.21% 83-113 t/RemoteDB/EUtilities.t 1 256 309 1 0.32% 307 t/SeqIO/Handler.t 255 65280 550 1098 199.64% 2-550 t/SeqIO/chaos.t 1 256 8 1 12.50% 1 t/SeqIO/swiss.t 255 65280 240 479 199.58% 1-240 t/SeqTools/GuessSeqFormat.t 1 256 49 2 4.08% 25 50 t/Tools/Analysis/Protein/ELM.t 255 65280 15 22 146.67% 5-15 t/Tools/Analysis/Protein/Scansite 255 65280 14 20 142.86% 5-14 t/Tools/Run/WrapperBase.t 1 256 27 1 3.70% 20 44 tests and 250 subtests skipped. Failed 16/318 test scripts, 94.97% okay. 1015/15518 subtests failed, 93.46% okay Ok so going off this i then decide to run the install: ./Build install This is a segment of the info i receive back in Terminal after the install: Manifying blib/script/bp_pairwise_kaks.pl -> blib/bindoc/bp_pairwise_kaks.pl.1 Manifying blib/script/bp_seqret.pl -> blib/bindoc/bp_seqret.pl.1 Manifying blib/script/bp_seq_length.pl -> blib/bindoc/bp_seq_length.pl.1 Manifying blib/script/bp_query_entrez_taxa.pl -> blib/bindoc/bp_query_entrez_taxa.pl.1 Manifying blib/script/bp_load_gff.pl -> blib/bindoc/bp_load_gff.pl.1 Manifying blib/script/bp_fastam9_to_table.pl -> blib/bindoc/bp_fastam9_to_table.pl.1 Manifying blib/script/bp_process_wormbase.pl -> blib/bindoc/bp_process_wormbase.pl.1 Manifying blib/script/bp_nrdb.pl -> blib/bindoc/bp_nrdb.pl.1 Manifying blib/script/bp_composite_LD.pl -> blib/bindoc/bp_composite_LD.pl.1 Manifying blib/script/bp_classify_hits_kingdom.pl -> blib/bindoc/bp_classify_hits_kingdom.pl.1 Manifying blib/script/bp_blast2tree.pl -> blib/bindoc/bp_blast2tree.pl.1 Manifying blib/script/bp_heterogeneity_test.pl -> blib/bindoc/bp_heterogeneity_test.pl.1 Manifying blib/script/bp_generate_histogram.pl -> blib/bindoc/bp_generate_histogram.pl.1 Manifying blib/script/bp_process_gadfly.pl -> blib/bindoc/bp_process_gadfly.pl.1 mkdir /usr/local/share: Permission denied at /System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112 now these bp_files such as bp_nrdb.pl should be installed onto my Unix somewhere? but i'm not sure if the install has worked, and these files saved to the made directory, as is the case here: mkdir /usr/local/share: Permission denied at /System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112 is there something wrong with my install? i think /usr/local/share should be created and then all of these bp_files should go into this folder. Is there anything that i'm doing wrong here? Thanks Stephen. -- View this message in context: http://www.nabble.com/Installing-Bioperl-on-Mac-OS-X-10.5.7-tp24240541p24240541.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.