From alan.bridge at isb-sib.ch Sun Dec 2 13:29:48 2007 From: alan.bridge at isb-sib.ch (Alan Bridge) Date: Sun, 02 Dec 2007 19:29:48 +0100 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast Message-ID: <4752F99C.9050504@isb-sib.ch> Hello, I was just wondering if, when performing a RemoteBlast, it would be possible to specify the entire UniProt database (i.e. Swiss-Prot + TrEMBL), or even just TrEMBL. It seems that currently, you can only specify Swiss-Prot (the annotated portion of UniProt, which is much smaller than its automatically annotated counterpart, TrEMBL). Any hints on how to expand the search space to include TrEMBL would be really appreciated. Regards, Alan Bridge my $prog = 'blastp'; my $db = 'swissprot'; # use TrEMBL ? my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); -- Alan Bridge PhD Swiss-Prot annotator Swiss Institute of Bioinformatics (SIB) 1, rue Michel Servet CH-1211 Geneva 4 Switzerland Tel: (+41 22) 379 58 90 Fax: (+41 22) 379 58 58 http://www.expasy.org/ From avilella at gmail.com Mon Dec 3 06:39:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Mon, 3 Dec 2007 11:39:59 +0000 Subject: [Bioperl-l] Query about SLAC.pm module In-Reply-To: References: Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com> [CCing to the bioperl ml] Sorry, there were some bits left in the pod header referring to PAML objects that aren't quite true. I've updated now the PODs. The Hyphy executions return hashes: If you run the SLAC test in t/Hyphy.t you will se that the $results are something like: DB<3> x 2 $results 0 HASH(0x8df3110) 'E[NS Sites]' => ARRAY(0x8e6cff4) 'E[S Sites]' => ARRAY(0x8e6ceb0) 'Observed NS Changes' => ARRAY(0x8e7b380) 'Observed S Changes' => ARRAY(0x8e7b344) 'Observed S. Prop.' => ARRAY(0x8e6d018) 'P{S geq. observed}' => ARRAY(0x8e6d360) 'P{S leq. observed}' => ARRAY(0x8e6d33c) 'P{S}' => ARRAY(0x8e6d03c) 'Scaled dN-dS' => ARRAY(0x8e6d384) 'dN' => ARRAY(0x8e6d084) 'dN-dS' => ARRAY(0x8e6d0a8) 'dS' => ARRAY(0x8e6d060) DB<4> x $rc which correspond to the csv file that hyphy produces. Cheers, Albert. On Dec 3, 2007 10:04 AM, Johan Nilsson wrote: > > Dear Dr. Vilella, > > Please allow me to introduce myself. My name is Johan Nilsson and I am a > postdoctoral researcher in bioinformatics. > > I was planning to perform a large-scale analysis for positively selected > protein coding genes using any appropriate method from the Hyphy package, > and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very > useful for this. > > IF I interpreted the documents of e.g. the SLAC module correctly, running > $slac->run($aln,$tree) will return a > Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results > from the obtained hashref (running my script on the test files provided > with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains > that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed > reference'). > > I am fairly new to bioperl, so please appologise if this question was a > stupid one :) > > Thanks in advance! > > Yours Sincerely > /Johan > > -- > Johan Nilsson, Ph.D. > School of Life Sciences > S?dert?rns University College > S-141 89 Huddinge, Sweden > E-mail: johan.nilsson at sh.se > Phone: +46 8 608 47 05, +46 70 456 10 51 > > From cjfields at uiuc.edu Mon Dec 3 09:04:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 08:04:06 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast In-Reply-To: <4752F99C.9050504@isb-sib.ch> References: <4752F99C.9050504@isb-sib.ch> Message-ID: You are limited to the databases hosted on the NCBI server, so it's really up to them; RemoteBlast is an interface to NCBI's WebBlast using URLAPI. A list of current databases can be found here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote: > Hello, > > I was just wondering if, when performing a RemoteBlast, it would be > possible to specify the entire UniProt database (i.e. Swiss-Prot + > TrEMBL), or even just TrEMBL. > > It seems that currently, you can only specify Swiss-Prot (the > annotated > portion of UniProt, which is much smaller than its automatically > annotated counterpart, TrEMBL). Any hints on how to expand the search > space to include TrEMBL would be really appreciated. > > Regards, Alan Bridge > > my $prog = 'blastp'; > my $db = 'swissprot'; # use TrEMBL ? > my $e_val= '1e-10'; > > my @params = ( '-prog' => $prog, '-data' => $db, '-expect' > => $e_val, '-readmethod' => 'SearchIO' ); > > -- > Alan Bridge PhD > Swiss-Prot annotator > Swiss Institute of Bioinformatics (SIB) > 1, rue Michel Servet > CH-1211 Geneva 4 > Switzerland > > Tel: (+41 22) 379 58 90 > Fax: (+41 22) 379 58 58 > > http://www.expasy.org/ From bioperl at boekhoff.info Mon Dec 3 14:14:24 2007 From: bioperl at boekhoff.info (Sven Boekhoff) Date: Mon, 03 Dec 2007 20:14:24 +0100 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload Message-ID: <47545590.1000703@boekhoff.info> HI! I just started working with Perl and BioPerl. I'm quite impressed what can be easily done with this module. Today I found that my second CPU ist not used, but the first one run's at 100%. I tried to include the "-a"-parameter, but I was not successful: my @params = ( -database => 'my_db', -a => '2', -outfile => 'blast1.out' ); How do I have to use it? Second question: In my perlscript I start BLAST-searches in a loop. Everytime BLAST has finished its search, the memory is cleared and BLAST is started again. I think most of the time is used to reload the database. Is it somehow possible to keep the database loaded (e.g. by starting a second search) or is BLAST reloaded anyway? Thanks for your help! Sven www.boekhoff.info From bix at sendu.me.uk Mon Dec 3 19:05:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 00:05:23 +0000 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <47545590.1000703@boekhoff.info> References: <47545590.1000703@boekhoff.info> Message-ID: <475499C3.20801@sendu.me.uk> Sven Boekhoff wrote: > HI! > I just started working with Perl and BioPerl. I'm quite impressed what > can be easily done with this module. Today I found that my second CPU > ist not used, but the first one run's at 100%. I tried to include the > "-a"-parameter, but I was not successful: > > my @params = ( > -database => 'my_db', > -a => '2', > -outfile => 'blast1.out' > ); > > How do I have to use it? This should work in the CVS version of StandAloneBlast. In other versions, perhaps try using $object->a(2); > Second question: In my perlscript I start BLAST-searches in a loop. > Everytime BLAST has finished its search, the memory is cleared and BLAST > is started again. I think most of the time is used to reload the > database. Is it somehow possible to keep the database loaded (e.g. by > starting a second search) or is BLAST reloaded anyway? I hope someone will correct me for being wrong, but I think you'd have to that with a 2-way pipe. StandAloneBlast only uses output to a file and input from that file, finishing with the executable inbetween. I've thought about improving it with a 2-way pipe, but never got around to it, being apprehensive about stability on all platforms. The more obvious solution, which may be possible depending on exactly what you're doing, is to avoid the loop and just supply Blast all your input in one go. From Russell.Smithies at agresearch.co.nz Mon Dec 3 19:49:21 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 4 Dec 2007 13:49:21 +1300 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <475499C3.20801@sendu.me.uk> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: Hi all, It' trying to read .ace files but keep getting an error that I don't know the cause of. Really basic example code: #!/usr/local/bin/perl -w use lib "/data/home/smithiesr/bioperl-live"; use Bio::Assembly::IO; use Data::Dumper; $ace = "CLP0001001240-cE15_20030319.ace"; $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); $assembly = $io->next_assembly; foreach $contig ($assembly->all_contigs) { print Dumper $contig; } Gives this error; [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl Can't call method "get_consensus_sequence" on an undefined value at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, line 42. Which relates to this bit in ace.pm: # Loading contig qualities... (Base Quality field) /^BQ/ && do { my $consensus = $contigOBJ->get_consensus_sequence()->seq(); Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace or is the Contig object not getting created? Any ideas? Thanx, Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Mon Dec 3 21:15:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 20:15:58 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> This seems similar to the 'too many open filehandles issue' documented here: http://bugzilla.open-bio.org/show_bug.cgi?id=2320 It unfortunately is due to having an open DB_File for every contig, and is a problem with the Bio::Assembly implementation that isn't easily fixed. Changing the open filehandle limit using ulimit is the only known fix: ulimit -n 10000 chris On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > Hi all, > > It' trying to read .ace files but keep getting an error that I don't > know the cause of. > Really basic example code: > > #!/usr/local/bin/perl -w > > use lib "/data/home/smithiesr/bioperl-live"; > use Bio::Assembly::IO; > use Data::Dumper; > > $ace = "CLP0001001240-cE15_20030319.ace"; > > $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); > $assembly = $io->next_assembly; > > foreach $contig ($assembly->all_contigs) { > print Dumper $contig; > } > > Gives this error; > [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl > Can't call method "get_consensus_sequence" on an undefined value > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, > line 42. > > Which relates to this bit in ace.pm: > # Loading contig qualities... (Base Quality field) > /^BQ/ && do { > my $consensus = $contigOBJ->get_consensus_sequence()->seq(); > > Is this caused by a dud ace file or a problem with > Bio::Assembly::IO:ace > or is the Contig object not getting created? > Any ideas? > > Thanx, > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From florent.angly at gmail.com Mon Dec 3 21:25:24 2007 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 03 Dec 2007 18:25:24 -0800 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: <4754BA94.7090600@gmail.com> Would this issue cause an excessive memory usage? Because I was getting a high memory usage when parsing some TIGR Assembler files and was wondering if the tigr parser was responsible for that or the parent assembly IO module. I'd definitely be interested in a fix of the Bio::Assembly implementation if it's the assembly IO module's fault.... Florent Chris Fields wrote: > This seems similar to the 'too many open filehandles issue' documented > here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2320 > > It unfortunately is due to having an open DB_File for every contig, > and is a problem with the Bio::Assembly implementation that isn't > easily fixed. Changing the open filehandle limit using ulimit is the > only known fix: > > ulimit -n 10000 > > chris > > On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > > >> Hi all, >> >> It' trying to read .ace files but keep getting an error that I don't >> know the cause of. >> Really basic example code: >> >> #!/usr/local/bin/perl -w >> >> use lib "/data/home/smithiesr/bioperl-live"; >> use Bio::Assembly::IO; >> use Data::Dumper; >> >> $ace = "CLP0001001240-cE15_20030319.ace"; >> >> $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); >> $assembly = $io->next_assembly; >> >> foreach $contig ($assembly->all_contigs) { >> print Dumper $contig; >> } >> >> Gives this error; >> [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl >> Can't call method "get_consensus_sequence" on an undefined value >> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, >> line 42. >> >> Which relates to this bit in ace.pm: >> # Loading contig qualities... (Base Quality field) >> /^BQ/ && do { >> my $consensus = $contigOBJ->get_consensus_sequence()->seq(); >> >> Is this caused by a dud ace file or a problem with >> Bio::Assembly::IO:ace >> or is the Contig object not getting created? >> Any ideas? >> >> Thanx, >> >> Russell Smithies >> >> Bioinformatics Software Developer >> T +64 3 489 9085 >> E russell.smithies at agresearch.co.nz >> >> Invermay Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T +64 3 489 3809 >> F +64 3 489 9174 >> www.agresearch.co.nz >> >> = >> ====================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> = >> ====================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Mon Dec 3 21:32:43 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 4 Dec 2007 15:32:43 +1300 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: Thanx Chris, I'm only writing a simple .ace viewer to display assembled contigs in a Bio::Graphics::Panel so I'll parse the coords from the .ace files "manually". Unless anyone else has a better idea ? (and some example code ;-) Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, 4 December 2007 3:16 p.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files > > This seems similar to the 'too many open filehandles issue' documented > here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2320 > > It unfortunately is due to having an open DB_File for every contig, > and is a problem with the Bio::Assembly implementation that isn't > easily fixed. Changing the open filehandle limit using ulimit is the > only known fix: > > ulimit -n 10000 > > chris > > On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > > > Hi all, > > > > It' trying to read .ace files but keep getting an error that I don't > > know the cause of. > > Really basic example code: > > > > #!/usr/local/bin/perl -w > > > > use lib "/data/home/smithiesr/bioperl-live"; > > use Bio::Assembly::IO; > > use Data::Dumper; > > > > $ace = "CLP0001001240-cE15_20030319.ace"; > > > > $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); > > $assembly = $io->next_assembly; > > > > foreach $contig ($assembly->all_contigs) { > > print Dumper $contig; > > } > > > > Gives this error; > > [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl > > Can't call method "get_consensus_sequence" on an undefined value > > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, > > line 42. > > > > Which relates to this bit in ace.pm: > > # Loading contig qualities... (Base Quality field) > > /^BQ/ && do { > > my $consensus = $contigOBJ->get_consensus_sequence()->seq(); > > > > Is this caused by a dud ace file or a problem with > > Bio::Assembly::IO:ace > > or is the Contig object not getting created? > > Any ideas? > > > > Thanx, > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Dec 4 00:10:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 23:10:57 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <4754BA94.7090600@gmail.com> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> <4754BA94.7090600@gmail.com> Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu> Yes, it's possible this would cause memory issues as each Bio::Assembly::Contig instance would have a Bio::SeqFeature::Collection attached (each Collection having a tied DB hash, which would be an open filehandle), So if you had over 1000 contigs open at any one time (in a parsed scaffold, for instance) you would have 1000 open file handles. Not very efficient. My thought was to have each Bio::Assembly::Scaffold instance carry a single Bio::SeqFeature::CollectionI (it could be a Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other CollectionI, whatever's easiest). Each Contig would be passed (and store) a reference to the Scaffold SF::Collection and pull features from there; just haven't had time to mess with it. I don't think anyone's tackling it, so feel free to code away! chris On Dec 3, 2007, at 8:25 PM, Florent Angly wrote: > Would this issue cause an excessive memory usage? Because I was > getting a high memory usage when parsing some TIGR Assembler files > and was wondering if the tigr parser was responsible for that or the > parent assembly IO module. > I'd definitely be interested in a fix of the Bio::Assembly > implementation if it's the assembly IO module's fault.... > Florent > > Chris Fields wrote: >> This seems similar to the 'too many open filehandles issue' >> documented here: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2320 >> >> It unfortunately is due to having an open DB_File for every >> contig, and is a problem with the Bio::Assembly implementation >> that isn't easily fixed. Changing the open filehandle limit using >> ulimit is the only known fix: >> >> ulimit -n 10000 >> >> chris >> >> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: >> >> >>> Hi all, >>> >>> It' trying to read .ace files but keep getting an error that I don't >>> know the cause of. >>> Really basic example code: >>> >>> #!/usr/local/bin/perl -w >>> >>> use lib "/data/home/smithiesr/bioperl-live"; >>> use Bio::Assembly::IO; >>> use Data::Dumper; >>> >>> $ace = "CLP0001001240-cE15_20030319.ace"; >>> >>> $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); >>> $assembly = $io->next_assembly; >>> >>> foreach $contig ($assembly->all_contigs) { >>> print Dumper $contig; >>> } >>> >>> Gives this error; >>> [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl >>> Can't call method "get_consensus_sequence" on an undefined value >>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line >>> 170, >>> line 42. >>> >>> Which relates to this bit in ace.pm: >>> # Loading contig qualities... (Base Quality field) >>> /^BQ/ && do { >>> my $consensus = $contigOBJ->get_consensus_sequence()->seq(); >>> >>> Is this caused by a dud ace file or a problem with >>> Bio::Assembly::IO:ace >>> or is the Contig object not getting created? >>> Any ideas? >>> >>> Thanx, >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 4 00:20:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 23:20:07 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: The ulimit fix usually works but if this is for Gbrowse it probably isn't prudent. It would be nice to get Bio::Assembly working as an Bio::AlignI; it would be easier to manipulate for display. Here's a script I wrote up as an example: http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics chris On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote: > Thanx Chris, > I'm only writing a simple .ace viewer to display assembled contigs > in a > Bio::Graphics::Panel so I'll parse the coords from the .ace files > "manually". > Unless anyone else has a better idea ? > (and some example code ;-) > > Russell From avilella at gmail.com Tue Dec 4 06:51:05 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 11:51:05 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> Hi all, There is a new wrapper in bioperl-run for SLR: http://www.bioperl.org/wiki/SLR Right now, output parsing is very simple, and I have only tested it on my linux machine. Can someone with a Mac give it a try? update your bioperl-run to cvs head, then: # try the installer, SLR is option 6 perl scripts/bioperl_application_installer.PLS # then try to run the tests (should take about a minute) perl t/SLR.t Any comments on the code would be appreciated, Thanks in advance, Cheers, Albert. From captainrave at hotmail.com Tue Dec 4 06:04:57 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank Message-ID: <14148723.post@talk.nabble.com> Help. I'm very new to perl and bioperl. Basically I need to extract the location of each CDS in a genbank entry e.g.103...120 and export them to an output file as a list. How would I do this? Your help would be much appreciated! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 09:48:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 14:48:27 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14148723.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> >From the SeqIO howto: #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; >From the Feature HOWTO: for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } Surely you could have fouind that yourself? ;0 -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 11:05 To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] extracting CDS location from Genbank Help. I'm very new to perl and bioperl. Basically I need to extract the location of each CDS in a genbank entry e.g.103...120 and export them to an output file as a list. How would I do this? Your help would be much appreciated! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14148723 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From captainrave at hotmail.com Tue Dec 4 10:07:19 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <14152264.post@talk.nabble.com> Yes but actually implementing it is another story. I get an error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: file argument provided, but with an undefined value STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 STACK: test3.pl:7 ----------------------------------------------------------- Basically because I dont understand the code well enough. For example, how do I tell it which input file to read? I know this might sound stupid, but I dont understand the Biowiki very well! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 10:21:34 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 15:21:34 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152264.post@talk.nabble.com> References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> Post the script that produces that error, and your file's location -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 15:07 To: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] extracting CDS location from Genbank Yes but actually implementing it is another story. I get an error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: file argument provided, but with an undefined value STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 STACK: test3.pl:7 ----------------------------------------------------------- Basically because I dont understand the code well enough. For example, how do I tell it which input file to read? I know this might sound stupid, but I dont understand the Biowiki very well! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14152264 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Tue Dec 4 10:39:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 15:39:31 +0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152264.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> Message-ID: <475574B3.8050700@sendu.me.uk> Captainrave wrote: > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- The best way to get help is to give us your script and the error message, and the command you used to run your script. The less you know, the more you should give us (ie. don't edit anything out). From captainrave at hotmail.com Tue Dec 4 10:41:37 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <14152907.post@talk.nabble.com> #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } exit; The file is on the same folder. But how do I tell it to use this file? michael watson (IAH-C) wrote: > > Post the script that produces that error, and your file's location > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave > Sent: 04 December 2007 15:07 > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] extracting CDS location from Genbank > > > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- > > Basically because I dont understand the code well enough. For example, > how > do I tell it which input file to read? I know this might sound stupid, > but I > dont understand the Biowiki very well! > > -- > View this message in context: > http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm > l#a14152264 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 10:53:22 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 15:53:22 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk> Same script as below, but try: my $file = 'C:\path\to\my\filename.gbk'; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 15:42 To: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] extracting CDS location from Genbank #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } exit; The file is on the same folder. But how do I tell it to use this file? michael watson (IAH-C) wrote: > > Post the script that produces that error, and your file's location > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave > Sent: 04 December 2007 15:07 > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] extracting CDS location from Genbank > > > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- > > Basically because I dont understand the code well enough. For example, > how > do I tell it which input file to read? I know this might sound stupid, > but I > dont understand the Biowiki very well! > > -- > View this message in context: > http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm > l#a14152264 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14152907 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Dec 4 11:20:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Dec 2007 10:20:34 -0600 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: The 'my $file = shift;' is a perl idiom. The built-in 'shift' used implicitly in this way uses @ARGV (from command line); the file would the be passed as the first arg when running the script: get_features.pl myfile.gb This should work for any OS. Personally, I use something like the following to indicate how the script is used in case a file is never entered: my $USAGE = < Perl script to grab features from a GenBank file and print to a table END_USE my $file = shift || die $USAGE; chris On Dec 4, 2007, at 9:41 AM, Captainrave wrote: > > #!/bin/perl > > use strict; > use Bio::SeqIO; > my $file = shift; # get the file name, somehow > my $seqio_object = Bio::SeqIO->new(-file => $file); > my $seq_object = $seqio_object->next_seq; > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > > print " value: ", $value, "\n"; > } > } > } > > exit; > > The file is on the same folder. But how do I tell it to use this > file? > > > > michael watson (IAH-C) wrote: >> >> Post the script that produces that error, and your file's location >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Captainrave >> Sent: 04 December 2007 15:07 >> To: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] extracting CDS location from Genbank >> >> >> Yes but actually implementing it is another story. >> >> I get an error: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: file argument provided, but with an undefined value >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 >> STACK: test3.pl:7 >> ----------------------------------------------------------- >> >> Basically because I dont understand the code well enough. For >> example, >> how >> do I tell it which input file to read? I know this might sound >> stupid, >> but I >> dont understand the Biowiki very well! >> >> -- >> View this message in context: >> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm >> l#a14152264 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Tue Dec 4 11:22:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 16:22:12 +0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: <47557EB4.10003@sendu.me.uk> Captainrave wrote: > #!/bin/perl > my $file = shift; # get the file name, somehow > > The file is on the same folder. But how do I tell it to use this file? http://stein.cshl.org/genome_informatics/perl_intro/command_line.html Basically, when you run your script add the name of the file to your command line. me% perl myscript.pl myfile By saying 'my $file = shift' inside myscript.pl, the variable $file now contains the filename 'myfile'. You could also have hardcoded the filename: my $file = 'myfile'; Anyway, you're going to run into lots of these issues, and they're beyond the scope of this mailing list. For basic perl problems seek help via www.perl.org. When you have a BioPerl-specific question, don't hesitate to post here. From jason at bioperl.org Tue Dec 4 12:16:30 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 09:16:30 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Excellent - thanks for this ! I'm giving it whirl on linux and the SLR.t test is currently taking more than 30 minutes to run -- is it possible to cook up an example that is going to finish in a more reasonable amount of time? Also - I would prefer if the default exe could be 'Slr' rather than Slr_Linux_static - it seems like it is possible for users to install it this way. Similarly whether or not the Slr_osx or Slr is the default name, is it too big of a deal to expect the user to rename it? I'll give it a whirl on OSX later, but might be easier if the test runs shorter. Thanks! -jason On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > Hi all, > > There is a new wrapper in bioperl-run for SLR: > > http://www.bioperl.org/wiki/SLR > > Right now, output parsing is very simple, and I have only tested it on > my linux machine. > Can someone with a Mac give it a try? > > update your bioperl-run to cvs head, then: > > # try the installer, SLR is option 6 > perl scripts/bioperl_application_installer.PLS > # then try to run the tests (should take about a minute) > perl t/SLR.t > > Any comments on the code would be appreciated, > > Thanks in advance, > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Dec 4 13:17:08 2007 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 04 Dec 2007 10:17:08 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Message-ID: <475599A4.1040500@gmail.com> Hi all, I pushed a new module into bioperl-run CVS a few days ago. It's called Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an open-source software that assembles DNA sequences. Input is a list of sequence objects and output assembly objects... easy enough... Let me know if you experience problems with it. Florent From jason at bioperl.org Tue Dec 4 13:51:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 10:51:34 -0800 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <475499C3.20801@sendu.me.uk> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com> You can pass in an array reference of sequences instead of a single sequence object and the module will build a multi-FASTA database. You can also pass in a filename instead of a Sequence object and the file can be an already built multi-FASTA database. This is described in the documentation: http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall You can also just run BLAST without StandAloneBlast part as I do an just build your multifile ahead of time with SeqIO and do # wublast my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |"; # or NCBI blast # my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |"; my $fh; open($fh, $cmd) my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); The advantage of StandAloneBlast in theory is it takes care of the temporary file creation (sequncefiles) and cleanup. Personally I find I want easier access to my programs that are simple cmdline like this. You can do similar things withe SSEARCH or FASTA searching too. -jason On Dec 3, 2007 4:05 PM, Sendu Bala wrote: > Sven Boekhoff wrote: > > HI! > > I just started working with Perl and BioPerl. I'm quite impressed what > > can be easily done with this module. Today I found that my second CPU > > ist not used, but the first one run's at 100%. I tried to include the > > "-a"-parameter, but I was not successful: > > > > my @params = ( > > -database => 'my_db', > > -a => '2', > > -outfile => 'blast1.out' > > ); > > > > How do I have to use it? > > This should work in the CVS version of StandAloneBlast. In other > versions, perhaps try using $object->a(2); > > > > Second question: In my perlscript I start BLAST-searches in a loop. > > Everytime BLAST has finished its search, the memory is cleared and BLAST > > is started again. I think most of the time is used to reload the > > database. Is it somehow possible to keep the database loaded (e.g. by > > starting a second search) or is BLAST reloaded anyway? > > I hope someone will correct me for being wrong, but I think you'd have > to that with a 2-way pipe. StandAloneBlast only uses output to a file > and input from that file, finishing with the executable inbetween. I've > thought about improving it with a 2-way pipe, but never got around to > it, being apprehensive about stability on all platforms. > > The more obvious solution, which may be possible depending on exactly > what you're doing, is to avoid the loop and just supply Blast all your > input in one go. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From stefan.kirov at bms.com Tue Dec 4 14:25:21 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 04 Dec 2007 14:25:21 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: Message-ID: <4755A9A1.2040608@bms.com> Jason Stajich wrote: > PAML4 breaks our PAML parser right now because the order of things in > the result file has changed. Now sequences precede the information > about the version or the program run. This means that $result- > >get_seqs() fails because we don't parse the sequences. > > We'll see what we can do, but as usual with supporting 3rd party > programs it is brittle when file formats change. Th > > -jason > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason, I saw a commit after this post on codeml, but not on PAML.pm- I assume this is not fixed, am I correct? Thanks! Stefan From avilella at gmail.com Tue Dec 4 15:34:38 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 20:34:38 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> hmmm, 30 minutes is quite a lot... it takes much less for me: avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t 1..7 ok 1 - use Bio::Root::IO; ok 2 - use Bio::Tools::Run::Phylo::SLR; ok 3 - use Bio::AlignIO; ok 4 - use Bio::TreeIO; ok 5 ok 6 ok 7 real 0m21.517s user 0m20.717s sys 0m0.100s On Dec 4, 2007 5:16 PM, Jason Stajich wrote: > Excellent - thanks for this ! I'm giving it whirl on linux and the > SLR.t test is currently taking more than 30 minutes to run -- is it > possible to cook up an example that is going to finish in a more > reasonable amount of time? > > Also - I would prefer if the default exe could be 'Slr' rather than > Slr_Linux_static - it seems like it is possible for users to install > it this way. Similarly whether or not the Slr_osx or Slr is the > default name, is it too big of a deal to expect the user to rename it? > > I'll give it a whirl on OSX later, but might be easier if the test > runs shorter. > > Thanks! > -jason > > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > > > Hi all, > > > > There is a new wrapper in bioperl-run for SLR: > > > > http://www.bioperl.org/wiki/SLR > > > > Right now, output parsing is very simple, and I have only tested it on > > my linux machine. > > Can someone with a Mac give it a try? > > > > update your bioperl-run to cvs head, then: > > > > # try the installer, SLR is option 6 > > perl scripts/bioperl_application_installer.PLS > > # then try to run the tests (should take about a minute) > > perl t/SLR.t > > > > Any comments on the code would be appreciated, > > > > Thanks in advance, > > > > Cheers, > > > > Albert. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Tue Dec 4 15:39:26 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 20:39:26 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> oh, I forgot to mention: SLR uses the lapack and blas libraries if installed, which makes it a lot faster (according to the author)... maybe that's the reason... On Dec 4, 2007 8:34 PM, Albert Vilella wrote: > hmmm, 30 minutes is quite a lot... it takes much less for me: > > avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t > 1..7 > ok 1 - use Bio::Root::IO; > ok 2 - use Bio::Tools::Run::Phylo::SLR; > ok 3 - use Bio::AlignIO; > ok 4 - use Bio::TreeIO; > ok 5 > ok 6 > ok 7 > > real 0m21.517s > user 0m20.717s > sys 0m0.100s > > > > On Dec 4, 2007 5:16 PM, Jason Stajich wrote: > > Excellent - thanks for this ! I'm giving it whirl on linux and the > > SLR.t test is currently taking more than 30 minutes to run -- is it > > possible to cook up an example that is going to finish in a more > > reasonable amount of time? > > > > Also - I would prefer if the default exe could be 'Slr' rather than > > Slr_Linux_static - it seems like it is possible for users to install > > it this way. Similarly whether or not the Slr_osx or Slr is the > > default name, is it too big of a deal to expect the user to rename it? > > > > I'll give it a whirl on OSX later, but might be easier if the test > > runs shorter. > > > > Thanks! > > -jason > > > > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > > > > > Hi all, > > > > > > There is a new wrapper in bioperl-run for SLR: > > > > > > http://www.bioperl.org/wiki/SLR > > > > > > Right now, output parsing is very simple, and I have only tested it on > > > my linux machine. > > > Can someone with a Mac give it a try? > > > > > > update your bioperl-run to cvs head, then: > > > > > > # try the installer, SLR is option 6 > > > perl scripts/bioperl_application_installer.PLS > > > # then try to run the tests (should take about a minute) > > > perl t/SLR.t > > > > > > Any comments on the code would be appreciated, > > > > > > Thanks in advance, > > > > > > Cheers, > > > > > > Albert. > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From jason at bioperl.org Tue Dec 4 16:43:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 13:43:03 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org> My own icc compiled version seemed to have caused the problem. whoops. fixed that. -jason On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote: > oh, I forgot to mention: SLR uses the lapack and blas libraries if > installed, which makes it a lot faster (according to the author)... > maybe that's the reason... > > On Dec 4, 2007 8:34 PM, Albert Vilella wrote: >> hmmm, 30 minutes is quite a lot... it takes much less for me: >> >> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t >> 1..7 >> ok 1 - use Bio::Root::IO; >> ok 2 - use Bio::Tools::Run::Phylo::SLR; >> ok 3 - use Bio::AlignIO; >> ok 4 - use Bio::TreeIO; >> ok 5 >> ok 6 >> ok 7 >> >> real 0m21.517s >> user 0m20.717s >> sys 0m0.100s >> >> >> >> On Dec 4, 2007 5:16 PM, Jason Stajich wrote: >>> Excellent - thanks for this ! I'm giving it whirl on linux and the >>> SLR.t test is currently taking more than 30 minutes to run -- is it >>> possible to cook up an example that is going to finish in a more >>> reasonable amount of time? >>> >>> Also - I would prefer if the default exe could be 'Slr' rather than >>> Slr_Linux_static - it seems like it is possible for users to install >>> it this way. Similarly whether or not the Slr_osx or Slr is the >>> default name, is it too big of a deal to expect the user to >>> rename it? >>> >>> I'll give it a whirl on OSX later, but might be easier if the test >>> runs shorter. >>> >>> Thanks! >>> -jason >>> >>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: >>> >>>> Hi all, >>>> >>>> There is a new wrapper in bioperl-run for SLR: >>>> >>>> http://www.bioperl.org/wiki/SLR >>>> >>>> Right now, output parsing is very simple, and I have only tested >>>> it on >>>> my linux machine. >>>> Can someone with a Mac give it a try? >>>> >>>> update your bioperl-run to cvs head, then: >>>> >>>> # try the installer, SLR is option 6 >>>> perl scripts/bioperl_application_installer.PLS >>>> # then try to run the tests (should take about a minute) >>>> perl t/SLR.t >>>> >>>> Any comments on the code would be appreciated, >>>> >>>> Thanks in advance, >>>> >>>> Cheers, >>>> >>>> Albert. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> From stefan.kirov at bms.com Tue Dec 4 16:51:51 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 04 Dec 2007 16:51:51 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: <4755A9A1.2040608@bms.com> Message-ID: <4755CBF7.5010709@bms.com> Jason Stajich wrote: > should be fixed. > > $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm > revision 1.56 > date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 > Parsing PAML4 and PAML3.15 should work now. Dealing with variable > order for the sequences and summary results in > the top of the MLC files > Yes, this is the version I have and in some cases the sequences do not get parsed. I have missed this commit. I will try to assemble a testcase and send it. Cannot promise when but will try to do it tomorrow. My gut feeling so far is that the parser works whenever there are gaps in the alignment, otherwise it does not. PAML surely has very peculiar format. Thanks again! Stefan > On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > >> Jason Stajich wrote: >>> PAML4 breaks our PAML parser right now because the order of things in >>> the result file has changed. Now sequences precede the information >>> about the version or the program run. This means that $result- >>>> get_seqs() fails because we don't parse the sequences. >>> >>> We'll see what we can do, but as usual with supporting 3rd party >>> programs it is brittle when file formats change. Th >>> >>> -jason >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Jason, >> I saw a commit after this post on codeml, but not on PAML.pm- I assume >> this is not fixed, am I correct? >> Thanks! >> Stefan > > From jason at bioperl.org Tue Dec 4 16:36:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 13:36:09 -0800 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4755A9A1.2040608@bms.com> References: <4755A9A1.2040608@bms.com> Message-ID: should be fixed. $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm revision 1.56 date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 Parsing PAML4 and PAML3.15 should work now. Dealing with variable order for the sequences and summary results in the top of the MLC files On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > Jason Stajich wrote: >> PAML4 breaks our PAML parser right now because the order of things in >> the result file has changed. Now sequences precede the information >> about the version or the program run. This means that $result- >>> get_seqs() fails because we don't parse the sequences. >> >> We'll see what we can do, but as usual with supporting 3rd party >> programs it is brittle when file formats change. Th >> >> -jason >> >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Jason, > I saw a commit after this post on codeml, but not on PAML.pm- I assume > this is not fixed, am I correct? > Thanks! > Stefan From johan.nilsson at sh.se Wed Dec 5 06:35:58 2007 From: johan.nilsson at sh.se (Johan Nilsson) Date: Wed, 5 Dec 2007 12:35:58 +0100 Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm" Message-ID: Hello, I have a bunch of multiple sequence alignments of protein coding genes, which I would like to analyse with the SLAC method of the HyPhy package. I tried using the SLAC.pm module in bioperl-run, but I could not get it to work properly. Basically, for each MSA file, I create the Bio::Tree::Tree and Bio::SimpleAlign objects ($tree and $aln, respectively) required as arguments to SLAC, and call the method with: "($rc,$result) = $slac->run($aln,$tree)" in a loop procedure in my script. When I choose not to save the tmp files (the default option in SLAC.pm), the program complains that it cannot find the file "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA (which works fine). Apparently, it looks for the wrapper.bf file in the first tmp dir created, which is deleted in the end of the first SLAC call. If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')), all calls to SLAC give returncode 1, and no error message is received. However, when I look at the resulting $result hashref, it turns out that all results are for the FIRST alignment read. I've made sure there is nothing strange with my loop procedure, and I checked that the tree and alignment objects look OK for each MSA. Apparently, it does create new "results.tsv" files in the tmp directory after each run, but it is identical each time it's created. Also, it only creates ONE tmp directory, no matter how many times SLAC is executed (I would imagine it was supposed to save each result in separate tmp dirs?) Thus, it seems to me like the errors occur because something goes wrong in the creation of temporary files. Have I done something wrong here, or have any other of you experienced the same problem? Best regards /Johan -- Johan Nilsson, Ph.D. School of Life Sciences S?dert?rns University College S-141 89 Huddinge, Sweden E-mail: johan.nilsson at sh.se Phone: +46 8 608 47 05, +46 70 456 10 51 From bernd.web at gmail.com Wed Dec 5 08:10:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 5 Dec 2007 14:10:04 +0100 Subject: [Bioperl-l] SimpleAlign is_flush Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> Hi, SimpleAlign has an is_flush: Function : Tells you whether the alignment is flush, i.e. all of the same length Returns : 1 or 0 I noticed that a file with multiple fasta sequences with different lengths has an is_flush value of 1. Printing the "alignment" shows that sequences are appended with "-" so that the all are the same length. Does this mean that is_flush for alignments read in via AlignIO is indeed always true and thus as such a so useful ? (using bioperl version: 1.005002102) Regards, Bernd From cjfields at uiuc.edu Wed Dec 5 08:53:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Dec 2007 07:53:59 -0600 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu> Yes; it's a convenient way to make sure all seqs have the same length (including gaps). Nice for checking when adding new seqs to an alignment or building new parsers. chris On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: > Hi, > > SimpleAlign has an is_flush: > Function : Tells you whether the alignment is flush, i.e. all of the > same length > Returns : 1 or 0 > > I noticed that a file with multiple fasta sequences with different > lengths has an is_flush value of 1. Printing the "alignment" shows > that sequences are appended with "-" so that the all are the same > length. Does this mean that is_flush for alignments read in via > AlignIO is indeed always true and thus as such a so useful ? > > (using bioperl version: 1.005002102) > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From captainrave at hotmail.com Wed Dec 5 07:37:02 2007 From: captainrave at hotmail.com (Captainrave) Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <475574B3.8050700@sendu.me.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk> Message-ID: <14170499.post@talk.nabble.com> Thanks, it works great now. Do any of you know if there is a tag to pull out CDS location. i.e. the values such as 132...145 etc? Those are all I need. Also, is there anyway to stop it reporting tag and value and literally JUST output the value? Thanks!!! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Wed Dec 5 09:24:20 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 09:24:20 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: <4755A9A1.2040608@bms.com> Message-ID: <4756B494.7020100@bms.com> Jason, When there is a gapless alignment we have a differently formatted output from codeml: kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc seed used = 492211105 3 141 ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA And parsing this fails... The next one has gaps and works fine: kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc seed used = 492252697 Before deleting alignment gaps 2 162 ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC CTT GGT TCA GGA GGT CAG TTC CTG ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT CCT GGT ACA GGA AAC AAG CTT CTG I will send both whole files as an attachment with another mail (I do not know if these are going to pass through). My guess is that the whole _parse_summary method has to be re-worked as there is no tag to look for before the sequences start. Ugly. I am not sure what else could become broken if I try to fix it, so I will leave it to you. Stefan > should be fixed. > > $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm > revision 1.56 > date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 > Parsing PAML4 and PAML3.15 should work now. Dealing with variable > order for the sequences and summary results in > the top of the MLC files > > On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > >> Jason Stajich wrote: >>> PAML4 breaks our PAML parser right now because the order of things in >>> the result file has changed. Now sequences precede the information >>> about the version or the program run. This means that $result- >>>> get_seqs() fails because we don't parse the sequences. >>> >>> We'll see what we can do, but as usual with supporting 3rd party >>> programs it is brittle when file formats change. Th >>> >>> -jason >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Jason, >> I saw a commit after this post on codeml, but not on PAML.pm- I assume >> this is not fixed, am I correct? >> Thanks! >> Stefan > > From stefan.kirov at bms.com Wed Dec 5 09:35:23 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 09:35:23 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4756B494.7020100@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> Message-ID: <4756B72B.6000103@bms.com> Here are the files. Stefan Stefan Kirov wrote: > Jason, > When there is a gapless alignment we have a differently formatted output > from codeml: > kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc > > seed used = 492211105 > 3 141 > > ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC > CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG > CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC > CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG > CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC > CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG > CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA > > And parsing this fails... > The next one has gaps and works fine: > > kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc > > seed used = 492252697 > > Before deleting alignment gaps > 2 162 > > ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA > AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC > AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC > CTT GGT TCA GGA GGT CAG TTC CTG > ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA > AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT > AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT > CCT GGT ACA GGA AAC AAG CTT CTG > > I will send both whole files as an attachment with another mail (I do > not know if these are going to pass through). > My guess is that the whole _parse_summary method has to be re-worked as > there is no tag to look for before the sequences start. Ugly. > I am not sure what else could become broken if I try to fix it, so I > will leave it to you. > Stefan > >> should be fixed. >> >> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >> revision 1.56 >> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 >> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >> order for the sequences and summary results in >> the top of the MLC files >> >> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >> >> >>> Jason Stajich wrote: >>> >>>> PAML4 breaks our PAML parser right now because the order of things in >>>> the result file has changed. Now sequences precede the information >>>> about the version or the program run. This means that $result- >>>> >>>>> get_seqs() fails because we don't parse the sequences. >>>>> >>>> We'll see what we can do, but as usual with supporting 3rd party >>>> programs it is brittle when file formats change. Th >>>> >>>> -jason >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> Jason, >>> I saw a commit after this post on codeml, but not on PAML.pm- I assume >>> this is not fixed, am I correct? >>> Thanks! >>> Stefan >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc.tar.gz Type: application/x-gzip Size: 3237 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment.gz From aaron.j.mackey at gsk.com Wed Dec 5 09:56:31 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Wed, 5 Dec 2007 09:56:31 -0500 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu> Message-ID: Well, if you use AlignIO::fasta to read in a multi-fasta file of *unaligned* sequences, AlignIO::fasta makes the assumption that all of your sequences are aligned, and pads the ends of shorter sequences with gap characters (essentially, enforcing a rather silly, yet valid alignment). The fact that is_flush() then returns 1 is secondary. If you just want to read in an array of unaligned sequences, use SeqIO::fasta instead. It doesn't really make much sense to use AlignIO for sequences that are not aligned ... conversely, if you *do* have aligned sequences in a multi-fasta file, then it does make sense to use AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences with gaps as necessary to get a fully valid, flush multiple sequence alignment matrix. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM: > Yes; it's a convenient way to make sure all seqs have the same length > (including gaps). Nice for checking when adding new seqs to an > alignment or building new parsers. > > chris > > On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: > > > Hi, > > > > SimpleAlign has an is_flush: > > Function : Tells you whether the alignment is flush, i.e. all of the > > same length > > Returns : 1 or 0 > > > > I noticed that a file with multiple fasta sequences with different > > lengths has an is_flush value of 1. Printing the "alignment" shows > > that sequences are appended with "-" so that the all are the same > > length. Does this mean that is_flush for alignments read in via > > AlignIO is indeed always true and thus as such a so useful ? > > > > (using bioperl version: 1.005002102) > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 5 11:22:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Dec 2007 10:22:01 -0600 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: References: Message-ID: That's true. I assumed Bernd's seqs were aligned. chris On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote: > Well, if you use AlignIO::fasta to read in a multi-fasta file of > *unaligned* sequences, AlignIO::fasta makes the assumption that all of > your sequences are aligned, and pads the ends of shorter sequences > with > gap characters (essentially, enforcing a rather silly, yet valid > alignment). The fact that is_flush() then returns 1 is secondary. > > If you just want to read in an array of unaligned sequences, use > SeqIO::fasta instead. It doesn't really make much sense to use > AlignIO > for sequences that are not aligned ... conversely, if you *do* have > aligned sequences in a multi-fasta file, then it does make sense to > use > AlignIO, and it also makes sense for AlignIO::fasta to end-pad > sequences > with gaps as necessary to get a fully valid, flush multiple sequence > alignment matrix. > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM: > >> Yes; it's a convenient way to make sure all seqs have the same length >> (including gaps). Nice for checking when adding new seqs to an >> alignment or building new parsers. >> >> chris >> >> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: >> >>> Hi, >>> >>> SimpleAlign has an is_flush: >>> Function : Tells you whether the alignment is flush, i.e. all of >>> the >>> same length >>> Returns : 1 or 0 >>> >>> I noticed that a file with multiple fasta sequences with different >>> lengths has an is_flush value of 1. Printing the "alignment" shows >>> that sequences are appended with "-" so that the all are the same >>> length. Does this mean that is_flush for alignments read in via >>> AlignIO is indeed always true and thus as such a so useful ? >>> >>> (using bioperl version: 1.005002102) >>> >>> >>> Regards, >>> Bernd >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Dec 5 14:56:47 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 14:56:47 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4756B494.7020100@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> Message-ID: <4757027F.407@bms.com> Here is a patch that seems to be working and does not break the existing tests: --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 10:16:53.120720000 -0500 +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 14:46:31.436278000 -0500 @@ -419,7 +419,10 @@ # CODONML (in paml 3.12 February 2002) <<-- what we want to see! my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | YN00 )x; + my $line; + $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0; while ($_ = $self->_readline) { + $line++; if ( m/^($SEQTYPES) \s+ # seqtype: CODONML, AAML, BASEML, CODON2AAML, YN00, etc (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml 3.12 February 2002"; not present < 3.1 or YN00 (\S+) \s* # tree filename @@ -436,8 +439,11 @@ } elsif (m/^Data set \d$/) { $self->{'_summary'} = {}; $self->{'_summary'}->{'multidata'}++; - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { - my ($phylip_header) = $self->_readline; + } + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap + my ($phylip_header) = $self->_readline; + $self->_parse_seqs; + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap $self->_parse_seqs; } } @@ -681,7 +687,6 @@ } sub _parse_seqs { - # this should in fact be packed into a Bio::SimpleAlign object instead of # an array but we'll stay with this for now my ($self) = @_; What this does is trigger sequence parsing if the /Before.../ pattern is not seen until line 4. Since phylip_header seems to be doing nothing one could completely eliminate the first seq parse elsif (even though counting lines is not a good thing). Since I am not aware of all consequences of changing the sequence parsing and I have no idea how extensive the tests are, I am not committing anything, but feel free to use that if you wish. Stefan Stefan Kirov wrote: > Jason, > When there is a gapless alignment we have a differently formatted output > from codeml: > kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc > > seed used = 492211105 > 3 141 > > ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC > CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG > CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC > CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG > CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC > CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG > CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA > > And parsing this fails... > The next one has gaps and works fine: > > kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc > > seed used = 492252697 > > Before deleting alignment gaps > 2 162 > > ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA > AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC > AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC > CTT GGT TCA GGA GGT CAG TTC CTG > ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA > AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT > AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT > CCT GGT ACA GGA AAC AAG CTT CTG > > I will send both whole files as an attachment with another mail (I do > not know if these are going to pass through). > My guess is that the whole _parse_summary method has to be re-worked as > there is no tag to look for before the sequences start. Ugly. > I am not sure what else could become broken if I try to fix it, so I > will leave it to you. > Stefan > >> should be fixed. >> >> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >> revision 1.56 >> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 >> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >> order for the sequences and summary results in >> the top of the MLC files >> >> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >> >> >>> Jason Stajich wrote: >>> >>>> PAML4 breaks our PAML parser right now because the order of things in >>>> the result file has changed. Now sequences precede the information >>>> about the version or the program run. This means that $result- >>>> >>>>> get_seqs() fails because we don't parse the sequences. >>>>> >>>> We'll see what we can do, but as usual with supporting 3rd party >>>> programs it is brittle when file formats change. Th >>>> >>>> -jason >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> Jason, >>> I saw a commit after this post on codeml, but not on PAML.pm- I assume >>> this is not fixed, am I correct? >>> Thanks! >>> Stefan >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Wed Dec 5 15:01:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 5 Dec 2007 12:01:29 -0800 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4757027F.407@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> <4757027F.407@bms.com> Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> sounds good - can you - make it as a bug with the patch and sample files in bugzilla - commit changes and I'll test as well thanks, -j On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote: > Here is a patch that seems to be working and does not break the > existing > tests: > > --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 > 10:16:53.120720000 -0500 > +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm > 2007-12-05 14:46:31.436278000 -0500 > @@ -419,7 +419,10 @@ > # CODONML (in paml 3.12 February 2002) <<-- what we want to see! > > my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | > YN00 )x; > + my $line; > + $self->{'_already_parsed_seqs'}=$self-> > {'_already_parsed_seqs'}?1:0; > while ($_ = $self->_readline) { > + $line++; > if ( m/^($SEQTYPES) \s+ # seqtype: > CODONML, > AAML, BASEML, CODON2AAML, YN00, etc > (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml > 3.12 February 2002"; not present < 3.1 or YN00 > (\S+) \s* # tree filename > @@ -436,8 +439,11 @@ > } elsif (m/^Data set \d$/) { > $self->{'_summary'} = {}; > $self->{'_summary'}->{'multidata'}++; > - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { > - my ($phylip_header) = $self->_readline; > + } > + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap > + my ($phylip_header) = $self->_readline; > + $self->_parse_seqs; > + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) > {#No gap > $self->_parse_seqs; > } > } > @@ -681,7 +687,6 @@ > } > > sub _parse_seqs { > - > # this should in fact be packed into a Bio::SimpleAlign object > instead of > # an array but we'll stay with this for now > my ($self) = @_; > > > What this does is trigger sequence parsing if the /Before.../ > pattern is > not seen until line 4. Since phylip_header seems to be doing > nothing one > could completely eliminate the first seq parse elsif (even though > counting lines is not a good thing). > Since I am not aware of all consequences of changing the sequence > parsing and I have no idea how extensive the tests are, I am not > committing anything, but feel free to use that if you wish. > Stefan > > Stefan Kirov wrote: >> Jason, >> When there is a gapless alignment we have a differently formatted >> output >> from codeml: >> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc >> >> seed used = 492211105 >> 3 141 >> >> ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC >> ACC CAC >> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >> AGT CTG >> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >> ACC CTC ATA >> ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC >> ACC CAC >> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >> AGC CTG >> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >> ACC CTC ATA >> ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC >> ACC CAC >> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC >> AGC ATG >> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC >> ACC CTC ATA >> >> And parsing this fails... >> The next one has gaps and works fine: >> >> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc >> >> seed used = 492252697 >> >> Before deleting alignment gaps >> 2 162 >> >> ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG >> GCA GAA >> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA >> CCG AAC >> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA >> GAT CTC >> CTT GGT TCA GGA GGT CAG TTC CTG >> ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA >> GCA GAA >> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC >> CCA ACT >> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- >> --- ATT >> CCT GGT ACA GGA AAC AAG CTT CTG >> >> I will send both whole files as an attachment with another mail (I do >> not know if these are going to pass through). >> My guess is that the whole _parse_summary method has to be re- >> worked as >> there is no tag to look for before the sequences start. Ugly. >> I am not sure what else could become broken if I try to fix it, so I >> will leave it to you. >> Stefan >> >>> should be fixed. >>> >>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >>> revision 1.56 >>> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: >>> +21 -14 >>> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >>> order for the sequences and summary results in >>> the top of the MLC files >>> >>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >>> >>> >>>> Jason Stajich wrote: >>>> >>>>> PAML4 breaks our PAML parser right now because the order of >>>>> things in >>>>> the result file has changed. Now sequences precede the >>>>> information >>>>> about the version or the program run. This means that $result- >>>>> >>>>>> get_seqs() fails because we don't parse the sequences. >>>>>> >>>>> We'll see what we can do, but as usual with supporting 3rd party >>>>> programs it is brittle when file formats change. Th >>>>> >>>>> -jason >>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> Jason, >>>> I saw a commit after this post on codeml, but not on PAML.pm- I >>>> assume >>>> this is not fixed, am I correct? >>>> Thanks! >>>> Stefan >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From stefan.kirov at bms.com Wed Dec 5 15:33:47 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 15:33:47 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> <4757027F.407@bms.com> <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> Message-ID: <47570B2B.5090602@bms.com> Done. Jason Stajich wrote: > sounds good - can you > - make it as a bug with the patch and sample files in bugzilla > - commit changes and I'll test as well > > thanks, > -j > > On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote: > > >> Here is a patch that seems to be working and does not break the >> existing >> tests: >> >> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 >> 10:16:53.120720000 -0500 >> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm >> 2007-12-05 14:46:31.436278000 -0500 >> @@ -419,7 +419,10 @@ >> # CODONML (in paml 3.12 February 2002) <<-- what we want to see! >> >> my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | >> YN00 )x; >> + my $line; >> + $self->{'_already_parsed_seqs'}=$self-> >> {'_already_parsed_seqs'}?1:0; >> while ($_ = $self->_readline) { >> + $line++; >> if ( m/^($SEQTYPES) \s+ # seqtype: >> CODONML, >> AAML, BASEML, CODON2AAML, YN00, etc >> (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml >> 3.12 February 2002"; not present < 3.1 or YN00 >> (\S+) \s* # tree filename >> @@ -436,8 +439,11 @@ >> } elsif (m/^Data set \d$/) { >> $self->{'_summary'} = {}; >> $self->{'_summary'}->{'multidata'}++; >> - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { >> - my ($phylip_header) = $self->_readline; >> + } >> + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap >> + my ($phylip_header) = $self->_readline; >> + $self->_parse_seqs; >> + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) >> {#No gap >> $self->_parse_seqs; >> } >> } >> @@ -681,7 +687,6 @@ >> } >> >> sub _parse_seqs { >> - >> # this should in fact be packed into a Bio::SimpleAlign object >> instead of >> # an array but we'll stay with this for now >> my ($self) = @_; >> >> >> What this does is trigger sequence parsing if the /Before.../ >> pattern is >> not seen until line 4. Since phylip_header seems to be doing >> nothing one >> could completely eliminate the first seq parse elsif (even though >> counting lines is not a good thing). >> Since I am not aware of all consequences of changing the sequence >> parsing and I have no idea how extensive the tests are, I am not >> committing anything, but feel free to use that if you wish. >> Stefan >> >> Stefan Kirov wrote: >> >>> Jason, >>> When there is a gapless alignment we have a differently formatted >>> output >>> from codeml: >>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc >>> >>> seed used = 492211105 >>> 3 141 >>> >>> ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC >>> ACC CAC >>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >>> AGT CTG >>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >>> ACC CTC ATA >>> ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC >>> ACC CAC >>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >>> AGC CTG >>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >>> ACC CTC ATA >>> ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC >>> ACC CAC >>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC >>> AGC ATG >>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC >>> ACC CTC ATA >>> >>> And parsing this fails... >>> The next one has gaps and works fine: >>> >>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc >>> >>> seed used = 492252697 >>> >>> Before deleting alignment gaps >>> 2 162 >>> >>> ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG >>> GCA GAA >>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA >>> CCG AAC >>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA >>> GAT CTC >>> CTT GGT TCA GGA GGT CAG TTC CTG >>> ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA >>> GCA GAA >>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC >>> CCA ACT >>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- >>> --- ATT >>> CCT GGT ACA GGA AAC AAG CTT CTG >>> >>> I will send both whole files as an attachment with another mail (I do >>> not know if these are going to pass through). >>> My guess is that the whole _parse_summary method has to be re- >>> worked as >>> there is no tag to look for before the sequences start. Ugly. >>> I am not sure what else could become broken if I try to fix it, so I >>> will leave it to you. >>> Stefan >>> >>> >>>> should be fixed. >>>> >>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >>>> revision 1.56 >>>> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: >>>> +21 -14 >>>> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >>>> order for the sequences and summary results in >>>> the top of the MLC files >>>> >>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >>>> >>>> >>>> >>>>> Jason Stajich wrote: >>>>> >>>>> >>>>>> PAML4 breaks our PAML parser right now because the order of >>>>>> things in >>>>>> the result file has changed. Now sequences precede the >>>>>> information >>>>>> about the version or the program run. This means that $result- >>>>>> >>>>>> >>>>>>> get_seqs() fails because we don't parse the sequences. >>>>>>> >>>>>>> >>>>>> We'll see what we can do, but as usual with supporting 3rd party >>>>>> programs it is brittle when file formats change. Th >>>>>> >>>>>> -jason >>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason at bioperl.org >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Jason, >>>>> I saw a commit after this post on codeml, but not on PAML.pm- I >>>>> assume >>>>> this is not fixed, am I correct? >>>>> Thanks! >>>>> Stefan >>>>> >>>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Thu Dec 6 09:58:31 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 6 Dec 2007 15:58:31 +0100 Subject: [Bioperl-l] graphics - Panel Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com> Hi, For map $segstart is available. This holds the left most start of the feature (The left end of $ref displayed in the detailed view). However, is it accessible also for track coderefs? I'd like to access it in add_track, like -bgcolor => sub { my $feature = shift; my $start = $feature->segstart; .... do something with the segstart }, I realize I can add a -tag which holds the left most start of by segmented feature, and then get it out in from $feature, but I wonder if the $segstart can also be accessed in the coderef some how. Does someone know this? Best regards, Bernd From georose at gmail.com Thu Dec 6 10:28:24 2007 From: georose at gmail.com (geo rose) Date: Thu, 6 Dec 2007 08:28:24 -0700 Subject: [Bioperl-l] getting sequences from external databank Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> Hi Bioperl, In the past, I have been able to retrieve sequences from an external databank, but my scripts are not working anymore. I am afraid that I may have broken my Bioperl installation while updating my Fedora7 machine with yum update. Below is an example of what happens. The script is from http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and it works. (I used it on an older machine with Bioperl and MacOS Tiger) __________________________________________________________________________________ #!/usr/bin/perl -w use Bio::SeqIO; use Bio::DB::GenBank; $genBank = new Bio::DB::GenBank; # This object knows how to talk to GenBank my $seq = $genBank->get_Seq_by_acc('AF060485'); # get a record by accession my $seqOut = new Bio::SeqIO(-format => 'genbank'); $seqOut->write_seq($seq); _________________________________________________________________________________________ This is the error I get _________________________________________________________________________________________ [home at home Desktop]# perl final-seq-db-test1.pl Bio::SeqIO: genbank cannot be found Exception ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427 STACK: Bio::SeqIO::_load_format_module /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172 STACK: final-seq-db-test1.pl:8 ----------------------------------------------------------- For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: acc AF060485 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173 STACK: final-seq-db-test1.pl:8 ----------------------------------------------------------- [home at home Desktop]# Use of uninitialized value in concatenation (.) or string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm line 30. [home at home Desktop]# ________________________________________________________________________________________ Before I mess things up further I thought I'd ask: Can I fix this problem by reinstalling some part of Bioperl or Perl? Thanks, George From barry.moore at genetics.utah.edu Thu Dec 6 12:56:50 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 6 Dec 2007 10:56:50 -0700 Subject: [Bioperl-l] getting sequences from external databank In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> Message-ID: George, This is a hideous little bug in Red Hat/Fedora installations of perl. It's happened to me a couple time on upgrades, but it's always fixed with perl -MCPAN -e shell force install Scalar::Util http://www.perlmonks.org/?node_id=460411 Barry On Dec 6, 2007, at 8:28 AM, geo rose wrote: > Hi Bioperl, > > In the past, I have been able to retrieve sequences from an external > databank, but my scripts are not working anymore. > I am afraid that I may have broken my Bioperl installation while > updating my > Fedora7 machine with yum update. > > Below is an example of what happens. > > The script is from > http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ > node2.html and > it works. > (I used it on an older machine with Bioperl and MacOS Tiger) > > ______________________________________________________________________ > ____________ > #!/usr/bin/perl -w > > use Bio::SeqIO; > use Bio::DB::GenBank; > > $genBank = new Bio::DB::GenBank; # This object knows how to talk > to GenBank > > my $seq = $genBank->get_Seq_by_acc('AF060485'); # get a record by > accession > > > my $seqOut = new Bio::SeqIO(-format => 'genbank'); > > $seqOut->write_seq($seq); > > > ______________________________________________________________________ > ___________________ > This is the error I get > ______________________________________________________________________ > ___________________ > > [home at home Desktop]# perl final-seq-db-test1.pl > Bio::SeqIO: genbank cannot be found > Exception > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::SeqIO::genbank. Weak references are > not > implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427 > STACK: Bio::SeqIO::_load_format_module > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555 > STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172 > STACK: final-seq-db-test1.pl:8 > ----------------------------------------------------------- > > For more information about the SeqIO system please see the SeqIO docs. > This includes ways of checking for formats at compile time, not run > time > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: acc AF060485 does not exist > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173 > STACK: final-seq-db-test1.pl:8 > ----------------------------------------------------------- > [home at home Desktop]# Use of uninitialized value in concatenation > (.) or > string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ > Util.pm > line 30. > > [home at home Desktop]# > > > ______________________________________________________________________ > __________________ > > > Before I mess things up further I thought I'd ask: > Can I fix this problem by reinstalling some part of Bioperl or Perl? > > Thanks, > > George > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Dec 6 18:58:02 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 7 Dec 2007 10:58:02 +1100 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <47545590.1000703@boekhoff.info> References: <47545590.1000703@boekhoff.info> Message-ID: Sven, > I just started working with Perl and BioPerl. I'm quite impressed what > can be easily done with this module. Today I found that my second CPU > ist not used, but the first one run's at 100%. I tried to include the > "-a"-parameter, but I was not successful: My experience agrees with you, in that "-a" does not seem to work with the pre-compiled BLAST binaries you get from NCBI on a multi-core system. I'm not sure why, as "ldd blastall" shows it links against "/lib64/tls/libpthread.so.0". Any others have any ideas? -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University --Tel +61 3 9905 9010 From lzhtom at hotmail.com Thu Dec 6 23:25:42 2007 From: lzhtom at hotmail.com (zhihuali) Date: Fri, 7 Dec 2007 04:25:42 +0000 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? Message-ID: Hi netters, I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object: use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; my $dbadp=Bio::DB::BioDB->new(-database=>'biosql', -user=>'annoymous', -dbname=>'bioseqdb'); my $seqobj=Bio::Seq->new(-accession_number=>"test", -id=>"test1", -seq=>"AGCTAGCT", -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit; It's successful because I found corresponding rows in the bioseqdb tables. Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object? Thanks a lot! Zhihua Li _________________________________________________________________ ?? Live Search ?????????????? http://www.live.com/?searchOnly=true From Marc.Logghe at ablynx.com Fri Dec 7 03:33:17 2007 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Fri, 7 Dec 2007 09:33:17 +0100 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? In-Reply-To: Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com> Hi, The BOSC presentation of Hilmar is a very good way to start with. Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf Slide 18 for instance. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of zhihuali > Sent: vrijdag 7 december 2007 5:26 > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? > > > Hi netters, > > I've installed BioSQL and bioperl-db, and successfully created and stored > a persistent object: > > use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; > my $dbadp=Bio::DB::BioDB->new(-database=>'biosql', > -user=>'annoymous', -dbname=>'bioseqdb'); > > my $seqobj=Bio::Seq->new(-accession_number=>"test", - > id=>"test1", -seq=>"AGCTAGCT", - > version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj- > >create;$dbobj->commit; > > It's successful because I found corresponding rows in the bioseqdb tables. > > Now I want to retrieve the object back from the database. There's not much > documents available and I've tried find_by_unique_key/primary_key but all > failed. Maybe I didn't use them correctly. Could anyone give me an example > as how to retrieve the stored Bio::Seq object? > > Thanks a lot! > > Zhihua Li > _________________________________________________________________ > ?? Live Search ?????????????? > http://www.live.com/?searchOnly=true From avilella at gmail.com Fri Dec 7 05:32:43 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 7 Dec 2007 10:32:43 +0000 Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm" In-Reply-To: References: Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com> Hi Johan, It would be great if you could upload an example reproducible case: http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl Maybe simply doing a tar.gz of the directory with the sample files and the script, and a simple explanation on how to run it. If you have any special "env" vars regarding tmp files, could you specify those as well? Thanks, Albert. On Dec 5, 2007 11:35 AM, Johan Nilsson wrote: > > Hello, > > I have a bunch of multiple sequence alignments of protein coding genes, > which I would like to analyse with the SLAC method of the HyPhy package. I > tried using the SLAC.pm module in bioperl-run, but I could not get it to > work properly. > > Basically, for each MSA file, I create the Bio::Tree::Tree and > Bio::SimpleAlign objects ($tree and $aln, respectively) required as > arguments to SLAC, and call the method with: "($rc,$result) = > $slac->run($aln,$tree)" in a loop procedure in my script. > > When I choose not to save the tmp files (the default option in SLAC.pm), > the program complains that it cannot find the file > "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA > (which works fine). Apparently, it looks for the wrapper.bf file in the > first tmp dir created, which is deleted in the end of the first SLAC call. > > If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')), > all calls to SLAC give returncode 1, and no error message is received. > However, when I look at the resulting $result hashref, it turns out that > all results are for the FIRST alignment read. I've made sure there is > nothing strange with my loop procedure, and I checked that the tree and > alignment objects look OK for each MSA. Apparently, it does create new > "results.tsv" files in the tmp directory after each run, but it is > identical each time it's created. Also, it only creates ONE tmp directory, > no matter how many times SLAC is executed (I would imagine it was supposed > to save each result in separate tmp dirs?) > > Thus, it seems to me like the errors occur because something goes wrong in > the creation of temporary files. Have I done something wrong here, or have > any other of you experienced the same problem? > > Best regards > /Johan > > > -- > Johan Nilsson, Ph.D. > School of Life Sciences > S?dert?rns University College > S-141 89 Huddinge, Sweden > E-mail: johan.nilsson at sh.se > Phone: +46 8 608 47 05, +46 70 456 10 51 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From J.Hane at murdoch.edu.au Mon Dec 10 02:31:17 2007 From: J.Hane at murdoch.edu.au (James Hane) Date: Mon, 10 Dec 2007 16:31:17 +0900 Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 In-Reply-To: References: Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> I've been trying to compile some bioperl based scripts for win32 using perl2exe which have worked out really well - except I've noticed I cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic despite requiring perl2exe to include them. Anyone have any suggestions how to get these to compile? From Kevin.M.Brown at asu.edu Mon Dec 10 10:34:35 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 10 Dec 2007 08:34:35 -0700 Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> References: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu> I use PAR to create exe's for windows users and it works fine with bioperl. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane > Sent: Monday, December 10, 2007 12:31 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 > > I've been trying to compile some bioperl based scripts for win32 using > perl2exe which have worked out really well - except I've noticed I > cannot compile Align::IO, Bio::Location::Simple or > Bio::Location::Atomic > despite requiring perl2exe to include them. Anyone have any > suggestions > how to get these to compile? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Mon Dec 10 13:23:01 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 10 Dec 2007 11:23:01 -0700 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoidBLAST reload In-Reply-To: References: <47545590.1000703@boekhoff.info> Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu> I use the -a option with blast all the time and it works, even on multicore systems. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Torsten Seemann > Sent: Thursday, December 06, 2007 4:58 PM > To: Sven Boekhoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one > CPU + avoidBLAST reload > > Sven, > > > I just started working with Perl and BioPerl. I'm quite > impressed what > > can be easily done with this module. Today I found that my > second CPU > > ist not used, but the first one run's at 100%. I tried to > include the > > "-a"-parameter, but I was not successful: > > My experience agrees with you, in that "-a" does not seem to work with > the pre-compiled BLAST binaries you get from NCBI on a multi-core > system. > > I'm not sure why, as "ldd blastall" shows it links against > "/lib64/tls/libpthread.so.0". > > Any others have any ideas? > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > --Tel +61 3 9905 9010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nadav.denekamp at gmail.com Wed Dec 12 08:29:18 2007 From: nadav.denekamp at gmail.com (Nadav Y. Denekamp) Date: Wed, 12 Dec 2007 15:29:18 +0200 Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of idenifiers Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> Hello, I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic thanks - Nadav ------------------------------------------------------------------------------------------------------------ Nadav Y. Denekamp, Ph.D., Israel Oceanographic and Limnological Research, National Institute for Oceanography Tel-Shikmona, Haifa, 31080. Tel: 972-4-8565259 Fax: 972-4-8511911 mobile: 972-50-2167318 Skype: nadavden Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com; Visit the ?Sleeping Beauty? website: http://www.gmm.gu.se/SB From biojoiner at gmail.com Wed Dec 12 08:06:42 2007 From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=) Date: Wed, 12 Dec 2007 21:06:42 +0800 Subject: [Bioperl-l] problem_About_Bioperl_Installation Message-ID: Dear Admin: I have a computer which out of network service, but wanted to have bioperl installed in it. I found the installation method all need net to link CPAN to get the pakage needed, so is there some complete installation program for me to install it in a net-isolated computer, or some other method to solve the problom? Wait for your kindful answer. Thanks very much! -- ============================================================ ???? ??????????????????????????HapMap?? ??????????????????????B??6???? ??????+86-10-80481102/1176 E-mail: chengf at genomics.org.cn http://www.big.ac.cn/ *********************************************************************************************** Feng Cheng Division of HapMap Project Beijing Institute of Genomics, Chinese Academy of Sciences (CAS) Beijing Airport Industrial Zone B-6, Beijing, 101318, China Tel: +86-10-80481102/1176 E-mail: chengf at genomics.org.cn http://www.big.ac.cn/ ============================================================ From avilella at gmail.com Wed Dec 12 09:50:16 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 12 Dec 2007 14:50:16 +0000 Subject: [Bioperl-l] problem_About_Bioperl_Installation In-Reply-To: References: Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com> You can also download the tar.gz packages from the bioperl.org website, and copy them to the computer. Then unpack the tar.gzs, and update your PERL5LIB env var. On Dec 12, 2007 1:06 PM, ???? wrote: > Dear Admin: > > I have a computer which out of network service, but wanted to have > bioperl installed in it. > I found the installation method all need net to link CPAN to get the > pakage needed, so is there some complete installation program for me to > install it in a net-isolated computer, or some other method to solve the > problom? > Wait for your kindful answer. > Thanks very much! > > -- > > ============================================================ > ???? > > ??????????????????????????HapMap?? > ??????????????????????B??6???? > ??????+86-10-80481102/1176 > E-mail: chengf at genomics.org.cn > http://www.big.ac.cn/ > > *********************************************************************************************** > Feng Cheng > > Division of HapMap Project > Beijing Institute of Genomics, Chinese Academy of Sciences (CAS) > Beijing Airport Industrial Zone B-6, Beijing, 101318, China > Tel: +86-10-80481102/1176 > E-mail: chengf at genomics.org.cn > http://www.big.ac.cn/ > ============================================================ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 12 10:22:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 12 Dec 2007 09:22:45 -0600 Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of idenifiers In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> Message-ID: If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA files) then you can write up your own script. From 'perldoc Bio::Index::Fasta': # Once the index is made it can accessed, either in the # same script or a different one use Bio::Index::Fasta; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name); my $out = Bio::SeqIO?>new(?format => ?Fasta?, ?fh => \*STDOUT); foreach my $id (@ARGV) { my $seq = $inx?>fetch($id); # Returns Bio::Seq object $out?>write_seq($seq); } # or, alternatively my $id; my $seq = $inx?>get_Seq_by_id($id); # identical to fetch() .... chris On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote: > Hello, > > I am trying to retrieve a list of sequences from an indexed flast > FASTA file. I tried to use the script bp_fetch.pl but I could only > retrieve one sequence for one identifier. I am looking for a way to > provide a list of accession numbers to a script and to retrieve the > sequences. I don't have much experience with perl so I appologize if > this question is very basic > thanks - Nadav > > > ------------------------------------------------------------------------------------------------------------ > Nadav Y. Denekamp, Ph.D., > Israel Oceanographic and Limnological Research, > National Institute for Oceanography > Tel-Shikmona, Haifa, 31080. > Tel: 972-4-8565259 > Fax: 972-4-8511911 > mobile: 972-50-2167318 > Skype: nadavden > Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com; > > Visit the ?Sleeping Beauty? website: > http://www.gmm.gu.se/SB > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From karchana at ibab.ac.in Thu Dec 13 22:56:14 2007 From: karchana at ibab.ac.in (Information_details) Date: Thu, 13 Dec 2007 19:56:14 -0800 (PST) Subject: [Bioperl-l] How to get the contents? Message-ID: <14329679.post@talk.nabble.com> Hi, I am new to bioperl. I am using module Bio::SeqIO; I have genbank file. http://www.nabble.com/file/p14329679/seq.gb seq.gb In this file i have to match gene tag and get all its contents. which function i have to use? The gene portion look like this gene 1..485 /gene="PRM1" /note="Derived by automated computational analysis using gene prediction method: BestRefseq. Supporting evidence includes similarity to: 1 mRNA" /db_xref="GeneID:5619" /db_xref="HGNC:9447" i have to match gene tag and get its contents? [CODE] $seq=$seqobj->next_seq(); foreach $feat ($seq->get_all_SeqFeatures()) { if($feat->primary_tag eq "mRNA") { foreach $tag ($feat->get_all_tags()) { if($tag eq "gene") { #here i have to retrieve the information like this. 1..485 /gene="PRM1" } } } [/CODE] How do i do that? with regards Archana -- View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mike.thon at gmail.com Fri Dec 14 12:41:44 2007 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 14 Dec 2007 18:41:44 +0100 Subject: [Bioperl-l] How to get the contents? In-Reply-To: <14329679.post@talk.nabble.com> References: <14329679.post@talk.nabble.com> Message-ID: <9F93893E-182A-4A5F-B27C-089521CAA355@gmail.com> Hi Information_details, a.k.a. Archana :) "1", and "485" can be retrieved with something like: $feat->start(); $feat->end(); if you want start and end of each exon then you need: my $location = $feat->location(); which returns a Bio::LocationI object. I think the 'gene' tag is a tag-value pair that can be retrieved with: my @values = $feat->get_tag_values("gene"); -Mike On Dec 14, 2007, at 4:56 AM, Information_details wrote: > > Hi, > > I am new to bioperl. > > I am using module Bio::SeqIO; > > I have genbank file. http://www.nabble.com/file/p14329679/seq.gb > seq.gb > > In this file i have to match gene tag and get all its contents. > > which function i have to use? > > The gene portion look like this > > gene 1..485 > /gene="PRM1" > /note="Derived by automated computational analysis > using > gene prediction method: BestRefseq. Supporting > evidence > includes similarity to: 1 mRNA" > /db_xref="GeneID:5619" > /db_xref="HGNC:9447" > > i have to match gene tag and get its contents? > > [CODE] > $seq=$seqobj->next_seq(); > > foreach $feat ($seq->get_all_SeqFeatures()) > { > if($feat->primary_tag eq "mRNA") > { > foreach $tag ($feat->get_all_tags()) > { > if($tag eq "gene") > { > #here i have to retrieve the information > like > this. > 1..485 > /gene="PRM1" > } > } > } > [/CODE] > How do i do that? > > with regards > Archana > > > > > -- > View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Dec 15 10:15:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 15 Dec 2007 09:15:00 -0600 Subject: [Bioperl-l] [ANNOUNCEMENT] CVS freeze Message-ID: <9FE0873D-E009-42E6-B37A-32584655ED06@uiuc.edu> All, We are in the midst of switching over BioPerl from CVS to SVN. We are tentatively freezing the bioperl CVS repository Dec. 19 in order to prepare for the switch. At that time we plan on building and setting up the SVN repository, running some remedial tests (commit messages, etc), then announcing the switch on the list. Soon after we will try getting a sync'ed read-only CVS set up for legacy purposes. If anyone has any commits to add to the repository we suggest making them as soon as possible. chris From margots at mail.nih.gov Tue Dec 18 10:00:11 2007 From: margots at mail.nih.gov (Margot Sunshine) Date: Tue, 18 Dec 2007 15:00:11 +0000 (UTC) Subject: [Bioperl-l] bio-perl cvs freeze Message-ID: Hi, I have been trying to checkout bio-perl from cvs since yesterday afternoon (Dec 17). My request just hangs. I can login but I cannot checkout anything. My reading of your posting of the planned switch from CVS to SVN seemed to indicate that this was not to take place until tomorrow. Help! Thanks, Margot Sunshine From ste.ghi at libero.it Tue Dec 18 13:04:21 2007 From: ste.ghi at libero.it (Stefano Ghignone) Date: Tue, 18 Dec 2007 19:04:21 +0100 Subject: [Bioperl-l] dealing with large files Message-ID: Dear all, I'm facing with a really annoying problem regarding large files handling. I wrote a script (below) which should keep sequences from an embl formatted file and write out the sequences in a customized fasta format. The script works, but since the input file is rather big 5.6 GB unzipped (987 MB zipped), after a while all the physical and virtual memories of my workstation (4GB RAM) are filled and the script is killed... I really don't know how to avoid this huge memory usage...and now I'm wondering if this is the right approach.... Please help me! Best wishes, Stefano ################# #!/usr/bin/perl -w use strict; use warnings; use Fcntl; use Cwd; use Bio::SeqIO; my $infile = $ARGV[0]; my $outfile = "$ARGV[0].fasta"; my $organism; my $count; my $path = cwd()."/$outfile"; print "Working dir is: ".cwd().".\nCreating file: $path\n"; my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -format => 'EMBL'); while ( my $seq = $in->next_seq() ) { sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT); my $id = $seq->accession_number(); my $desc = $seq->desc(); chop $desc; my $species = $seq->species->binomial(); my $subspecies = $seq->species->sub_species(); if ($seq->species->sub_species()) {chop $subspecies; $organism = $species." ".$subspecies;} else {$organism = $species;} my $sequence = $seq->seq(); print TO ">$id $desc [$organism]\n$sequence\n"; $count++; warn $@ if $@; close TO; } print "Done!\n\t$count sequences have been treated. The file $ARGV[0].fasta is ready.\n"; From jason at bioperl.org Tue Dec 18 13:22:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 18 Dec 2007 10:22:07 -0800 Subject: [Bioperl-l] bio-perl cvs freeze In-Reply-To: References: Message-ID: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org> Margot - The code freeze won't affect the the anonymous cvs, and we'll likely keep anonymous CVS as is (and maybe even figure out how to keep it updated with the SVN) since external tools depend on it and have published CVS instructions. I was able to do an anonymous checkout fine on my machine just now -- if the problem persists please send a message to support at open-bio.org and the support volunteers will track it from there. -jason On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote: > Hi, > > I have been trying to checkout bio-perl from cvs since yesterday > afternoon > (Dec 17). My request just hangs. I can login but I cannot checkout > anything. > My reading of your posting of the planned switch from CVS to SVN > seemed to > indicate that this was not to take place until tomorrow. Help! > > Thanks, > Margot Sunshine > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Dec 18 13:31:39 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 18 Dec 2007 10:31:39 -0800 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org> Not exactly clear why you aren't using Bio::SeqIO to write the sequence back out in FASTA format and why you are re-opening the file each time? Did you look at the examples that show how to convert file formats? http://bioperl.org/wiki/HOWTO:SeqIO You can set the description with $seq->description($newdescription); and the ID with $seq->display_id($newid); before writing. It isn't clear to me from your code why it would be leaking memory and causing a problem - is it possible that you have a huge sequence in the EMBL file? -jason On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote: > Dear all, > I'm facing with a really annoying problem regarding large files > handling. > I wrote a script (below) which should keep sequences from an embl > formatted file and write out the sequences in a customized fasta > format. The script works, but since the input file is rather big > 5.6 GB unzipped (987 MB zipped), after a while all the physical and > virtual memories of my workstation (4GB RAM) are filled and the > script is killed... > I really don't know how to avoid this huge memory usage...and now > I'm wondering if this is the right approach.... > Please help me! > Best wishes, > Stefano > > > > ################# > #!/usr/bin/perl -w > > use strict; > > use warnings; > > use Fcntl; > use Cwd; > > use Bio::SeqIO; > > my $infile = $ARGV[0]; > my $outfile = "$ARGV[0].fasta"; > my $organism; > my $count; > my $path = cwd()."/$outfile"; > > print "Working dir is: ".cwd().".\nCreating file: $path\n"; > > my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - > format => 'EMBL'); > > while ( my $seq = $in->next_seq() ) { > sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT); > my $id = $seq->accession_number(); > my $desc = $seq->desc(); chop $desc; > my $species = $seq->species->binomial(); > my $subspecies = $seq->species->sub_species(); > if ($seq->species->sub_species()) {chop $subspecies; $organism = > $species." ".$subspecies;} > else {$organism = $species;} > my $sequence = $seq->seq(); > print TO ">$id $desc [$organism]\n$sequence\n"; > $count++; > warn $@ if $@; > close TO; > } > > print "Done!\n\t$count sequences have been treated. The file $ARGV > [0].fasta is ready.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cain.cshl at gmail.com Tue Dec 18 14:04:11 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Dec 2007 14:04:11 -0500 Subject: [Bioperl-l] bio-perl cvs freeze In-Reply-To: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org> References: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org> Message-ID: <1198004651.11000.19.camel@frissell> Hi Jason and all, Does the fact that cvs is sticking around (read only) mean that viewcvs (the web interface) will stick around too? I was thinking about modifying the GBrowse net installer to use the 'automatic' tarball of bioperl-live to download and install via nmake on Windows since it doesn't have cvs support built in. Also, with cvs sticking around, I don't need to rewrite the installer to use svn (yeah!). Thanks, Scott On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote: > Margot - > The code freeze won't affect the the anonymous cvs, and we'll likely > keep anonymous CVS as is (and maybe even figure out how to keep it > updated with the SVN) since external tools depend on it and have > published CVS instructions. > > I was able to do an anonymous checkout fine on my machine just now -- > if the problem persists please send a message to support at open-bio.org > and the support volunteers will track it from there. > > -jason > On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote: > > > Hi, > > > > I have been trying to checkout bio-perl from cvs since yesterday > > afternoon > > (Dec 17). My request just hangs. I can login but I cannot checkout > > anything. > > My reading of your posting of the planned switch from CVS to SVN > > seemed to > > indicate that this was not to take place until tomorrow. Help! > > > > Thanks, > > Margot Sunshine > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From jason at bioperl.org Tue Dec 18 14:20:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 18 Dec 2007 11:20:11 -0800 Subject: [Bioperl-l] bio-perl cvs freeze In-Reply-To: <1198004651.11000.19.camel@frissell> References: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org> <1198004651.11000.19.camel@frissell> Message-ID: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org> On Dec 18, 2007, at 11:04 AM, Scott Cain wrote: > Hi Jason and all, > > Does the fact that cvs is sticking around (read only) mean that > viewcvs > (the web interface) will stick around too? I was thinking about > modifying the GBrowse net installer to use the 'automatic' tarball of > bioperl-live to download and install via nmake on Windows since it > doesn't have cvs support built in. Also, with cvs sticking around, I > don't need to rewrite the installer to use svn (yeah!). > Hey Scott - Perhaps, there may be better tools with SVN anyways, we could also just instantiate a script that tarballed the already auto-updated code here (i think it syncs every hour): http://bioperl.org/SRC/ We'll still playing around with this and I can't guarantee that we'll get the SVN commits back to CVS to work. -jason > Thanks, > Scott > > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote: >> Margot - >> The code freeze won't affect the the anonymous cvs, and we'll likely >> keep anonymous CVS as is (and maybe even figure out how to keep it >> updated with the SVN) since external tools depend on it and have >> published CVS instructions. >> >> I was able to do an anonymous checkout fine on my machine just now -- >> if the problem persists please send a message to support at open-bio.org >> and the support volunteers will track it from there. >> >> -jason >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote: >> >>> Hi, >>> >>> I have been trying to checkout bio-perl from cvs since yesterday >>> afternoon >>> (Dec 17). My request just hangs. I can login but I cannot checkout >>> anything. >>> My reading of your posting of the planned switch from CVS to SVN >>> seemed to >>> indicate that this was not to take place until tomorrow. Help! >>> >>> Thanks, >>> Margot Sunshine >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD From cain.cshl at gmail.com Tue Dec 18 14:31:23 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Dec 2007 14:31:23 -0500 Subject: [Bioperl-l] bio-perl cvs freeze In-Reply-To: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org> References: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org> <1198004651.11000.19.camel@frissell> <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org> Message-ID: <1198006283.11000.20.camel@frissell> Cool. For the moment, I'll just wait and see what happens :-) Thanks, Scott On Tue, 2007-12-18 at 11:20 -0800, Jason Stajich wrote: > On Dec 18, 2007, at 11:04 AM, Scott Cain wrote: > > > Hi Jason and all, > > > > Does the fact that cvs is sticking around (read only) mean that > > viewcvs > > (the web interface) will stick around too? I was thinking about > > modifying the GBrowse net installer to use the 'automatic' tarball of > > bioperl-live to download and install via nmake on Windows since it > > doesn't have cvs support built in. Also, with cvs sticking around, I > > don't need to rewrite the installer to use svn (yeah!). > > > Hey Scott - > > Perhaps, there may be better tools with SVN anyways, we could also > just instantiate a script that tarballed the already auto-updated > code here (i think it syncs every hour): > http://bioperl.org/SRC/ > > We'll still playing around with this and I can't guarantee that we'll > get the SVN commits back to CVS to work. > > -jason > > Thanks, > > Scott > > > > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote: > >> Margot - > >> The code freeze won't affect the the anonymous cvs, and we'll likely > >> keep anonymous CVS as is (and maybe even figure out how to keep it > >> updated with the SVN) since external tools depend on it and have > >> published CVS instructions. > >> > >> I was able to do an anonymous checkout fine on my machine just now -- > >> if the problem persists please send a message to support at open-bio.org > >> and the support volunteers will track it from there. > >> > >> -jason > >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote: > >> > >>> Hi, > >>> > >>> I have been trying to checkout bio-perl from cvs since yesterday > >>> afternoon > >>> (Dec 17). My request just hangs. I can login but I cannot checkout > >>> anything. > >>> My reading of your posting of the planned switch from CVS to SVN > >>> seemed to > >>> indicate that this was not to take place until tomorrow. Help! > >>> > >>> Thanks, > >>> Margot Sunshine > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain at cshl.edu > > GMOD > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From avilella at gmail.com Tue Dec 18 15:33:43 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 18 Dec 2007 20:33:43 +0000 Subject: [Bioperl-l] dealing with large files In-Reply-To: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org> References: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org> Message-ID: <358f4d650712181233q2a1627c3v6fb4e3e20b9f6c78@mail.gmail.com> There is a Bio::SeqIO "largefasta" object that will use the hard-disk for very large fasta files. On Dec 18, 2007 6:31 PM, Jason Stajich wrote: > Not exactly clear why you aren't using Bio::SeqIO to write the > sequence back out in FASTA format and why you are re-opening the file > each time? > > Did you look at the examples that show how to convert file formats? > http://bioperl.org/wiki/HOWTO:SeqIO > > You can set the description with > $seq->description($newdescription); > and the ID with > $seq->display_id($newid); > before writing. > > It isn't clear to me from your code why it would be leaking memory > and causing a problem - is it possible that you have a huge sequence > in the EMBL file? > > -jason > > On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote: > > > Dear all, > > I'm facing with a really annoying problem regarding large files > > handling. > > I wrote a script (below) which should keep sequences from an embl > > formatted file and write out the sequences in a customized fasta > > format. The script works, but since the input file is rather big > > 5.6 GB unzipped (987 MB zipped), after a while all the physical and > > virtual memories of my workstation (4GB RAM) are filled and the > > script is killed... > > I really don't know how to avoid this huge memory usage...and now > > I'm wondering if this is the right approach.... > > Please help me! > > Best wishes, > > Stefano > > > > > > > > ################# > > #!/usr/bin/perl -w > > > > use strict; > > > > use warnings; > > > > use Fcntl; > > use Cwd; > > > > use Bio::SeqIO; > > > > my $infile = $ARGV[0]; > > my $outfile = "$ARGV[0].fasta"; > > my $organism; > > my $count; > > my $path = cwd()."/$outfile"; > > > > print "Working dir is: ".cwd().".\nCreating file: $path\n"; > > > > my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - > > format => 'EMBL'); > > > > while ( my $seq = $in->next_seq() ) { > > sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT); > > my $id = $seq->accession_number(); > > my $desc = $seq->desc(); chop $desc; > > my $species = $seq->species->binomial(); > > my $subspecies = $seq->species->sub_species(); > > if ($seq->species->sub_species()) {chop $subspecies; $organism = > > $species." ".$subspecies;} > > else {$organism = $species;} > > my $sequence = $seq->seq(); > > print TO ">$id $desc [$organism]\n$sequence\n"; > > $count++; > > warn $@ if $@; > > close TO; > > } > > > > print "Done!\n\t$count sequences have been treated. The file $ARGV > > [0].fasta is ready.\n"; > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Dec 18 21:29:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 18 Dec 2007 20:29:19 -0600 Subject: [Bioperl-l] perl 5.10 released Message-ID: The next major perl release, perl 5.10, has officially been released: http://use.perl.org/article.pl?sid=07/12/18/195247 I'll try testing BioPerl with perl 5.10 and any relevant modules when I can; this may have to wait until after SVN migration. If there are any interested parties who want to bioperl compatibility with perl 5.10 feel free to post your results! chris From David.Messina at sbc.su.se Wed Dec 19 11:44:06 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 19 Dec 2007 10:44:06 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: References: Message-ID: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> Hi everyone, Perl 5.10 builds fine and passes all tests on my PB G4 running OS X 10.5.1. Piece o' cake. Here are results of testing BioPerl on this virgin install: I downloaded the latest CVS tarball. I did 'perl Build.PL', which used CPAN to install a bunch of dependencies. I then did 'Build test'. For the most part everything was fine. - Bio::Biblio::IO::medlinexml throws an exception because XML::Parser isn't installed. - RNA_SearchIO fails a few tests. - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception because Graph::Directed isn't installed. - Spidey fails one test. And of course without the optional dependencies installed, many tests were skipped. I'll now go back and install the optional dependencies and do the network tests, but it looks like for the most part we play nice with the new Perl. Dave From ste.ghi at libero.it Wed Dec 19 11:45:15 2007 From: ste.ghi at libero.it (Stefano Ghignone) Date: Wed, 19 Dec 2007 17:45:15 +0100 Subject: [Bioperl-l] dealing with large files Message-ID: > Not exactly clear why you aren't using Bio::SeqIO to write the > sequence back out in FASTA format and why you are re-opening the file > each time? It was to avoid tho keep the out file always opened... > Did you look at the examples that show how to convert file formats? > http://bioperl.org/wiki/HOWTO:SeqIO yes I did...but I didn't realized how to set a customized description... > You can set the description with > $seq->description($newdescription); > and the ID with > $seq->display_id($newid); > before writing. Thanks for the hint. Anyway, just using the simple code reported to convert embl to fasta format, the results are the same...I remember you that I'm using a huge input file: the uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences! > It isn't clear to me from your code why it would be leaking memory > and causing a problem - is it possible that you have a huge sequence > in the EMBL file? > -jason At the end, I succeeded in the format conversion using this command: gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};' (Thanks to Riccardo Percudani). It's not bioperl...but it works! My best wishes, Stefano > On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote: > > > Dear all, > > I'm facing with a really annoying problem regarding large files > > handling. > > I wrote a script (below) which should keep sequences from an embl > > formatted file and write out the sequences in a customized fasta > > format. The script works, but since the input file is rather big > > 5.6 GB unzipped (987 MB zipped), after a while all the physical and > > virtual memories of my workstation (4GB RAM) are filled and the > > script is killed... > > I really don't know how to avoid this huge memory usage...and now > > I'm wondering if this is the right approach.... > > Please help me! > > Best wishes, > > Stefano > > > > > > > > ################# > > #!/usr/bin/perl -w > > > > use strict; > > > > use warnings; > > > > use Fcntl; > > use Cwd; > > > > use Bio::SeqIO; > > > > my $infile = $ARGV[0]; > > my $outfile = "$ARGV[0].fasta"; > > my $organism; > > my $count; > > my $path = cwd()."/$outfile"; > > > > print "Working dir is: ".cwd().".\nCreating file: $path\n"; > > > > my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - > > format => 'EMBL'); > > > > while ( my $seq = $in->next_seq() ) { > > sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT); > > my $id = $seq->accession_number(); > > my $desc = $seq->desc(); chop $desc; > > my $species = $seq->species->binomial(); > > my $subspecies = $seq->species->sub_species(); > > if ($seq->species->sub_species()) {chop $subspecies; $organism = > > $species." ".$subspecies;} > > else {$organism = $species;} > > my $sequence = $seq->seq(); > > print TO ">$id $desc [$organism]\n$sequence\n"; > > $count++; > > warn $@ if $@; > > close TO; > > } > > > > print "Done!\n\t$count sequences have been treated. The file $ARGV > > [0].fasta is ready.\n"; > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Wed Dec 19 12:17:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 19 Dec 2007 11:17:28 -0600 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote: >> Not exactly clear why you aren't using Bio::SeqIO to write the >> sequence back out in FASTA format and why you are re-opening the file >> each time? > It was to avoid tho keep the out file always opened... > >> Did you look at the examples that show how to convert file formats? >> http://bioperl.org/wiki/HOWTO:SeqIO > yes I did...but I didn't realized how to set a customized > description... > >> You can set the description with >> $seq->description($newdescription); >> and the ID with >> $seq->display_id($newid); >> before writing. > Thanks for the hint. Anyway, just using the simple code reported to > convert embl to fasta format, the results are the same...I remember > you that I'm using a huge input file: the > uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences! > >> It isn't clear to me from your code why it would be leaking memory >> and causing a problem - is it possible that you have a huge sequence >> in the EMBL file? >> -jason > > At the end, I succeeded in the format conversion using this command: > > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};' > > (Thanks to Riccardo Percudani). It's not bioperl...but it works! > > My best wishes, > Stefano As this shows, sometimes BioPerl isn't always the best answer (I know, blasphemy...). As Jason suggested it's quite likely there are large sequence records causing your problems when using BioPerl. The one- liner works b/c it doesn't retain data (sequence, annotation, etc) in memory as Bio::Seq object; it's a direct conversion. It would be nice to code up a lazy sequence object and related parsers; maybe for the next dev release. chris From cjfields at uiuc.edu Wed Dec 19 12:08:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 19 Dec 2007 11:08:31 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> Message-ID: On Dec 19, 2007, at 10:44 AM, Dave Messina wrote: > Hi everyone, > > > Perl 5.10 builds fine and passes all tests on my PB G4 running OS X > 10.5.1. Piece o' cake. > > Here are results of testing BioPerl on this virgin install: > > I downloaded the latest CVS tarball. I did 'perl Build.PL', which > used CPAN to install a bunch of dependencies. I then did 'Build > test'. For the most part everything was fine. > > - Bio::Biblio::IO::medlinexml throws an exception because > XML::Parser isn't installed. XML::Parser used to be shipped with a number of perl distros even though it isn't core. We should add a require to these. > - RNA_SearchIO fails a few tests. These are very likely from recent commits I made re:GenericHSP and use of bits(), raw_score(), etc. (the fails look like missing/switched vals with these method tests). I'll fix these post-svn migration, but I don't think these are related to 5.10. > - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception > because Graph::Directed isn't installed. Odd, that should be caught out before tests are run. Needs to be fixed, but one would think this would fail as well under 5.8. > - Spidey fails one test. Passes for me. Is it dependency-related? > And of course without the optional dependencies installed, many > tests were skipped. > > I'll now go back and install the optional dependencies and do the > network tests, but it looks like for the most part we play nice with > the new Perl. > > Dave Not sure, but it seems a bit faster. Maybe it's just me but it would be nice to see some benchmarks comparing perl 5.8 vs 5.10. I agree, it was a very fast and easy install. I'll start a page on the wiki for test fails using perl 5.10. I'm seeing a few fails; I'm getting the following with everything installed (including DBD::mysql, DBI, etc) using perl 5.10, Mac OS X 10.5.1 (note Test::Harness now gives TODO's, so some of these are actually passing). Note the entrezgene.t and DB.t fails; I looked into these and I think they are related to the odd 'pseudohashes are deprecated' warnings we were getting in perl 5.8 tests, so there may be something legitimately buggy. Test Summary Report ------------------- t/Annotation.t (Wstat: 0 Tests: 112 Failed: 0) TODO passed: 96 t/BioGraphics.t (Wstat: 256 Tests: 35 Failed: 1) Failed test number(s): 4 Non-zero exit status: 1 t/DB.t (Wstat: 65280 Tests: 106 Failed: 0) Non-zero exit status: 255 Parse errors: Bad plan. You planned 116 tests but ran 106. t/DBCUTG.t (Wstat: 1024 Tests: 33 Failed: 4) Failed test number(s): 29-31, 33 Non-zero exit status: 4 t/RNA_SearchIO.t (Wstat: 2048 Tests: 496 Failed: 8) Failed test number(s): 291, 338, 372-374, 395, 455, 486 Non-zero exit status: 8 t/entrezgene.t (Wstat: 65280 Tests: 648 Failed: 0) Non-zero exit status: 255 Parse errors: Bad plan. You planned 1422 tests but ran 648. Files=255, Tests=15066, 435 wallclock secs ( 3.15 usr 1.72 sys + 124.87 cusr 13.29 csys = 143.03 CPU) Result: FAIL Failed 5/255 test programs. 13/15066 subtests failed. chris From David.Messina at sbc.su.se Wed Dec 19 12:49:32 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 19 Dec 2007 11:49:32 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> Message-ID: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> > > XML::Parser used to be shipped with a number of perl distros even > though it isn't core. We should add a require to these. Agreed. > - RNA_SearchIO fails a few tests. > > These are very likely from recent commits I made re:GenericHSP and use > of bits(), raw_score(), etc. (the fails look like missing/switched > vals with these method tests). I'll fix these post-svn migration, but > I don't think these are related to 5.10. Agreed -- I doubt this is 5.10-specific. > - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception > > because Graph::Directed isn't installed. > > Odd, that should be caught out before tests are run. Needs to be > fixed, but one would think this would fail as well under 5.8. Yep, and in a minute here I'll test it under 5.8. > > - Spidey fails one test. > > Passes for me. Is it dependency-related? I don't think so, but I guess we'll see once I finish installing the dependencies. Here's what I got: t/Spidey........................ok 1/26 Can't call method "sub_SeqFeature" on an undefined value at t/Spidey.t line 24, line 170. # Looks like you planned 26 tests but only ran 3. # Looks like your test died just after 3. t/Spidey........................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 4-26 Failed 23/26 tests, 11.54% okay Dave From cjfields at uiuc.edu Wed Dec 19 14:19:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 19 Dec 2007 13:19:10 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> Message-ID: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu> Just updated from CVS and reran tests, Spidey.t is failing now. This may be from a recent commit: http://lists.open-bio.org/pipermail/bioperl-guts-l/2007-December/026854.html I'm updating the following page on the wiki for tracking. There are a few more we should look into at some point: http://www.bioperl.org/w/index.php?title=Bioperl_and_Perl_5.10 chris On Dec 19, 2007, at 11:49 AM, Dave Messina wrote: >> >> XML::Parser used to be shipped with a number of perl distros even >> though it isn't core. We should add a require to these. > > > Agreed. > > >> - RNA_SearchIO fails a few tests. >> >> These are very likely from recent commits I made re:GenericHSP and >> use >> of bits(), raw_score(), etc. (the fails look like missing/switched >> vals with these method tests). I'll fix these post-svn migration, >> but >> I don't think these are related to 5.10. > > > Agreed -- I doubt this is 5.10-specific. > > >> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception >>> because Graph::Directed isn't installed. >> >> Odd, that should be caught out before tests are run. Needs to be >> fixed, but one would think this would fail as well under 5.8. > > > Yep, and in a minute here I'll test it under 5.8. > > > > >>> - Spidey fails one test. >> >> Passes for me. Is it dependency-related? > > > I don't think so, but I guess we'll see once I finish installing the > dependencies. Here's what I got: > > t/Spidey........................ok 1/26 Can't call method > "sub_SeqFeature" > on an undefined value at t/Spidey.t line 24, line 170. > # Looks like you planned 26 tests but only ran 3. > # Looks like your test died just after 3. > t/Spidey........................dubious > > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 4-26 > Failed 23/26 tests, 11.54% okay > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Wed Dec 19 18:42:14 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 19 Dec 2007 17:42:14 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu> References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu> Message-ID: Hi Chris and everyone, With most of the optional dependencies installed, I'm seeing essentially the same test failures, including the CODE ref thingy. I've noted this on the new Wiki page you created. According to Data::Dumper's documentation, Data::Dumper cheats with CODE references. If a code reference is encountered in the structure being processed (and if you haven't set theDeparse flag), an anonymous subroutine that contains the string '"DUMMY"' will be inserted in its place, and a warning will be printed if Purity is set. You can eval the result, but bear in mind that the anonymous sub that gets created is just a placeholder. Someday, perl will have a switch to cache-on-demand the string representation of a compiled piece of code, I hope. If you have prior knowledge of all the code refs that your data structures are likely to have, you can use the Seen method to pre-seed the internal reference table and make the dumped output point to them, instead. See EXAMPLES above. So it's not BioPerl per se, but we can probably work around it. >>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception >>>> because Graph::Directed isn't installed. >>> >>> Odd, that should be caught out before tests are run. Needs to be >>> fixed, but one would think this would fail as well under 5.8. >> >> >> Yep, and in a minute here I'll test it under 5.8. Strangely, the Ontology tests properly get skipped under 5.8. Dave From ki.baik at roche.com Wed Dec 19 19:58:42 2007 From: ki.baik at roche.com (Baik, Ki) Date: Wed, 19 Dec 2007 16:58:42 -0800 Subject: [Bioperl-l] Parsing CAP3 output to Fasta Message-ID: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com> Hello, I'm interested in parsing the output of the CAP contig assembly program into a format that is more manageable. The CAP output is shown below: . : . : . : . : . : . : Seq1+ CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC ____________________________________________________________ consensus CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC . : . : . : . : . : . : Seq1+ ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC Seq2+ ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC ____________________________________________________________ consensus ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC . : . : . : . : . : . : Seq1+ CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC Seq2+ CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC ____________________________________________________________ consensus CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC . : . : . : . : . : . : Seq1+ TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG Seq2+ TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC ____________________________________________________________ consensus TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG I would like to maintain the alignment with their base positions for each sequence. A fasta format retaining the alignment position is ideal such as below: >Seq1+ CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG >Seq2+ ------------------------------------------------------------ ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC-------- Does anyone have any experience doing this? Regards, KB From cjfields at uiuc.edu Wed Dec 19 20:41:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 19 Dec 2007 19:41:51 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu> Message-ID: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu> On Dec 19, 2007, at 5:42 PM, Dave Messina wrote: > Hi Chris and everyone, > > With most of the optional dependencies installed, I'm seeing > essentially the same test failures, including the CODE ref thingy. > I've noted this on the new Wiki page you created. > > According to Data::Dumper's documentation, > Data::Dumper cheats with CODE references. If a code reference is > encountered in the structure being processed (and if you haven't set > theDeparse flag), an anonymous subroutine that contains the string > '"DUMMY"' will be inserted in its place, and a warning will be > printed if Purity is set. You can eval the result, but bear in mind > that the anonymous sub that gets created is just a placeholder. > Someday, perl will have a switch to cache-on-demand the string > representation of a compiled piece of code, I hope. If you have > prior knowledge of all the code refs that your data structures are > likely to have, you can use the Seen method to pre-seed the internal > reference table and make the dumped output point to them, instead. > See EXAMPLES above. > > > So it's not BioPerl per se, but we can probably work around it. May be something in Module::Build or Build.PL that needs tweaking. It looks like EntrezGene parsing is broken for now using perl 5.10; the 'pseudohash' warnings with perl 5.8 were indicating something was amiss but we could never place it. Any fixes will have to wait until after svn migration. Not sure what's going on with the others fails just yet. >>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception >>>>> because Graph::Directed isn't installed. >>>> >>>> Odd, that should be caught out before tests are run. Needs to be >>>> fixed, but one would think this would fail as well under 5.8. >>> >>> >>> Yep, and in a minute here I'll test it under 5.8. > > > Strangely, the Ontology tests properly get skipped under 5.8. > > Dave May be worth looking into. Have you added it to the wiki? chris From David.Messina at sbc.su.se Wed Dec 19 23:52:16 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 19 Dec 2007 22:52:16 -0600 Subject: [Bioperl-l] perl 5.10 released In-Reply-To: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu> References: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com> <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com> <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu> <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu> Message-ID: <628aabb70712192052p5d9afe3bvf4fa1da872f56355@mail.gmail.com> > > May be something in Module::Build or Build.PL that needs tweaking. I took a quick look-see and I'm pretty sure it's Module::Build. Specifically, Module::Build::Base::write_config(), where there are three calls with coderefs as parameters to _write_data() to match the three coderef errors we are seeing at the end of 'perl Build.PL'. _write_data() in turn calls Module::Build::Dumper::_data_dump() and uses some ugly Data::Dumper voodoo to serialize. I don't understand the voodoo well enough to explain why this appears only with Perl 5.10, though; it sure looks like it should have with 5.8, too. > Strangely, the Ontology tests properly get skipped under 5.8. > > May be worth looking into. Have you added it to the wiki? Uhhh, yeah...of course! (just now) Should be a simple fix after the post-svn thaw. Dave From David.Messina at sbc.su.se Thu Dec 20 00:39:41 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 19 Dec 2007 23:39:41 -0600 Subject: [Bioperl-l] Parsing CAP3 output to Fasta In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com> References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com> Message-ID: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com> Hi Ki, Hopefully someone who (unlike me) uses these modules regularly will chime in, but in the meantime, here are some ideas: The Bio::AssemblyIO module can read and write ace files, which CAP3 can produce as output. I don't think there is an explicit means to dump to a multi-fasta file like you want. But you could probably write a Bio::AssemblyIO::Fasta class which could write the multi-Fasta format you want. Then you could use Bio::AssemblyIO objects to read in ace files from CAP3 and write out to multi-fasta. Look at Bio::AssemblyIO::* Bio::Assembly::ScaffoldI Bio::Assembly::Contig Bio::LocatableSeq Bio::AlignIO Assemblies are made of scaffolds, scaffolds are made of contigs, and contigs are made of sequences which can be manipulated like any old seq in BioPerl. Bio::AlignIO can read and write multiple sequence alignments and multi-fastas, so that should help you to get from AssemblyIO to your desired output format. Hope this helps, Dave From mike.thon at gmail.com Thu Dec 20 00:59:06 2007 From: mike.thon at gmail.com (Michael Thon) Date: Thu, 20 Dec 2007 06:59:06 +0100 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote: > my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - > format => 'EMBL'); This is just for the sake of curiosity, since you already found a solution to your problem, but I wonder how perl will handle a file opened this way. Will it try to suck the whole thing into ram in one go? Mike From cjfields at uiuc.edu Thu Dec 20 00:54:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 19 Dec 2007 23:54:36 -0600 Subject: [Bioperl-l] Parsing CAP3 output to Fasta In-Reply-To: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com> References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com> <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com> Message-ID: On Dec 19, 2007, at 11:39 PM, Dave Messina wrote: > Hi Ki, > > Hopefully someone who (unlike me) uses these modules regularly will > chime > in, but in the meantime, here are some ideas: > > The Bio::AssemblyIO module can read and write ace files, which CAP3 > can > produce as output. I don't think there is an explicit means to dump > to a > multi-fasta file like you want. > > But you could probably write a Bio::AssemblyIO::Fasta class which > could > write the multi-Fasta format you want. Then you could use > Bio::AssemblyIO > objects to read in ace files from CAP3 and write out to multi-fasta. > > Look at > > Bio::AssemblyIO::* > Bio::Assembly::ScaffoldI > Bio::Assembly::Contig > Bio::LocatableSeq > Bio::AlignIO > > Assemblies are made of scaffolds, scaffolds are made of contigs, and > contigs > are made of sequences which can be manipulated like any old seq in > BioPerl. > Bio::AlignIO can read and write multiple sequence alignments and > multi-fastas, so that should help you to get from AssemblyIO to your > desired > output format. > > > > Hope this helps, > Dave What would help is to make Bio::Assembly::Contig implement Bio::AlignI correctly, or make it a subclass of Bio::SimpleAlign. That way one could read in Scaffolds in via Bio::Assembly::IO and write out Contigs through Bio::AlignIO directly. In theory that should work but IIRC it doesn't. chris From jason at bioperl.org Thu Dec 20 02:13:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Dec 2007 23:13:55 -0800 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: <02EC6D6D-F807-492F-B125-9FE0393B1FD9@bioperl.org> It gets buffered via the OS -- Bio::Root::IO calls next_line iteratively, but eventually the whole sequence object will get put into RAM as it is built up. zcat or bzcat can also be used for gzipped and bzipped files respectively, I like to use this where I want to disk space footprint down. Because we treat data input usually as from a stream ignoring whether it is in a file or not, we have to have a more flexible structure to really handle this, although I'd argue the data really belongs in a database when it is too big for memory. More compact Feature/Location objects would probably also help here. I would not be surprised if the memory requirement has more to do with the number of features than length of the sequence - human chrom 1 can fit into memory just fine on most machines with 2GB of RAM. But it would require someone taking an interest in some re- architecting here. -jason On Dec 19, 2007, at 9:59 PM, Michael Thon wrote: > > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote: > >> my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - >> format => 'EMBL'); > > This is just for the sake of curiosity, since you already found a > solution to your problem, but I wonder how perl will handle a file > opened this way. Will it try to suck the whole thing into ram in > one go? > > Mike > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ste.ghi at libero.it Thu Dec 20 08:57:54 2007 From: ste.ghi at libero.it (Stefano Ghignone) Date: Thu, 20 Dec 2007 14:57:54 +0100 Subject: [Bioperl-l] dealing with large files Message-ID: I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want... > It gets buffered via the OS -- Bio::Root::IO calls next_line > iteratively, but eventually the whole sequence object will get put > into RAM as it is built up. > zcat or bzcat can also be used for gzipped and bzipped files > respectively, I like to use this where I want to disk space footprint > down. > > Because we treat data input usually as from a stream ignoring whether > it is in a file or not, we have to have a more flexible structure to > really handle this, although I'd argue the data really belongs in a > database when it is too big for memory. > More compact Feature/Location objects would probably also help here. > I would not be surprised if the memory requirement has more to do > with the number of features than length of the sequence - human chrom > 1 can fit into memory just fine on most machines with 2GB of RAM. > > But it would require someone taking an interest in some re- > architecting here. > > -jason > > On Dec 19, 2007, at 9:59 PM, Michael Thon wrote: > > > > > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote: > > > >> my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - > >> format => 'EMBL'); > > > > This is just for the sake of curiosity, since you already found a > > solution to your problem, but I wonder how perl will handle a file > > opened this way. Will it try to suck the whole thing into ram in > > one go? > > > > Mike > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From amackey at pcbi.upenn.edu Thu Dec 20 10:32:19 2007 From: amackey at pcbi.upenn.edu (Aaron Mackey) Date: Thu, 20 Dec 2007 10:32:19 -0500 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: <476A7736.109@toulouse.inra.fr> References: <476A7736.109@toulouse.inra.fr> Message-ID: <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com> The NHX writer will only add the [&&NHX] block when there are tags to be written. Your code reads in a Newick tree without tags, and then writes it back out without adding any new tags. So yes, you need to 1) read the Newick tree, 2) traverse the tree, calling $node->nhx_tag({T => $taxon_id}) for each node with each corresponding $taxon_id, and then 3) write out the NHX tree. -Aaron On Dec 20, 2007 9:07 AM, Laurence Amilhat wrote: > Dear Mr MacKey, > > > I am pretty new in Tree parsing and writing with BioPerl. > I am trying to convert a Newick tree file to a NHX tree file with adding > the Taxid for the node in the NHX tree file. > > I saw the module Bio::Tree::NodeNHX, but very few examples... > > I don't know where do i need to start, I tried the easy way with > Bio::TreeIO, > but the resulting tree doesn't have the [&&NHX] in the internal node, > and I don't know how to add the tag [&&NHX:T=xxxx] on the node, > Do I need to use the nhx_tag method to do this? > > Maybe you have an example that use NHX tag in tree node, that might be > very helpfull for me to get to understand how it works... > > > Have a nice holidays, > > > Best regards, > > > Laurence Amilhat. > > > > > This is the simple code that I use to convert a tree from newick to nhx: > > use Bio::TreeIO; > use Getopt::Long; > my $tree_file; > my $outfile; > > GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile); > > my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file"); > my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); > > while (my $tree= $treeio->next_tree) > { > $treeout->write_tree($tree); > } > > -- > ==================================================================== > = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = > = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = > ==================================================================== > > > > From cjfields at uiuc.edu Thu Dec 20 11:14:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 20 Dec 2007 10:14:55 -0600 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: As Jason mentioned, it may be the number of features in the record if the record itself is huge (i.e. human chromosome-sized, full metagenome, etc). If (my) memory serves correctly the mem. footprint for a perl object is ~10x the actual data, give or take (it depends on the complexity of the object itself). In cases like this indexing may not fix the problem, unless you have an object which retains the file position of the data instead of the data itself; I don't think we have this object type in BioPerl. The only way I can think of to fix this would be (as Jason also suggested) lightweight objects, or something like the lazy sequence object ala the SwissKnife suite (which only bring what you want into memory). Related to that, I have been testing something like that, which uses iterators to pass in chunks of data from a stream to handlers to build a sequence object. Wouldn't be too hard to reconfigure that to return file positions as well. Maybe for the 1.7 release... chris On Dec 20, 2007, at 7:57 AM, Stefano Ghignone wrote: > I was wandering if, working with so big FILE, should be better first > index the database, than query it formatting the sequences as one > want... > >> It gets buffered via the OS -- Bio::Root::IO calls next_line >> iteratively, but eventually the whole sequence object will get put >> into RAM as it is built up. >> zcat or bzcat can also be used for gzipped and bzipped files >> respectively, I like to use this where I want to disk space footprint >> down. >> >> Because we treat data input usually as from a stream ignoring whether >> it is in a file or not, we have to have a more flexible structure to >> really handle this, although I'd argue the data really belongs in a >> database when it is too big for memory. >> More compact Feature/Location objects would probably also help here. >> I would not be surprised if the memory requirement has more to do >> with the number of features than length of the sequence - human chrom >> 1 can fit into memory just fine on most machines with 2GB of RAM. >> >> But it would require someone taking an interest in some re- >> architecting here. >> >> -jason >> >> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote: >> >>> >>> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote: >>> >>>> my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - >>>> format => 'EMBL'); >>> >>> This is just for the sake of curiosity, since you already found a >>> solution to your problem, but I wonder how perl will handle a file >>> opened this way. Will it try to suck the whole thing into ram in >>> one go? >>> >>> Mike >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Thu Dec 20 11:26:17 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 20 Dec 2007 10:26:17 -0600 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: <628aabb70712200826p36d3d451wdcd901f555bc210a@mail.gmail.com> On 12/20/07, Stefano Ghignone wrote: > > I was wandering if, working with so big FILE, should be better first index > the database, than query it formatting the sequences as one want... > Agreed, but only if you want to randomly access sequences within the file. I believe the original poster intends to do something with every sequence in the big file, in which case streaming the file is likely to be much faster. Dave From akarger at CGR.Harvard.edu Thu Dec 20 11:48:58 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 20 Dec 2007 11:48:58 -0500 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote: > > > At the end, I succeeded in the format conversion using this command: > > > > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if > > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if > > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};' > > > > (Thanks to Riccardo Percudani). It's not bioperl...but it works! > > > As this shows, sometimes BioPerl isn't always the best answer > (I know, > blasphemy...). As Jason suggested it's quite likely there are large > sequence records causing your problems when using BioPerl. The one- > liner works b/c it doesn't retain data (sequence, annotation, > etc) in > memory as Bio::Seq object; it's a direct conversion. > > It would be nice to code up a lazy sequence object and related > parsers; maybe for the next dev release. Yes! Also, BLAST parsing. Blasting the proteome against the genome makes for rather large result files. Right now, if you want to delete queries that hit, say, more than 1000 times, you still need to wait for Bioperl to create objects and sub-objects for every single hit. Sadly, this example isn't hypothetical. I'm going to solve it with something like: perl -wne 'BEGIN {$/="TBLASTN"} print if length($_) < $some_big_value' big_blast > filtered_blast (Not that I'm volunteering to help with the parser writing, so I should stop complaining.) -Amir From bix at sendu.me.uk Thu Dec 20 12:06:28 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 20 Dec 2007 17:06:28 +0000 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: <476AA114.2060201@sendu.me.uk> Chris Fields wrote: > The only way I can think of to fix this would be (as Jason also > suggested) lightweight objects, or something like the lazy sequence > object ala the SwissKnife suite (which only bring what you want into > memory). > > Related to that, I have been testing something like that, which uses > iterators to pass in chunks of data from a stream to handlers to build a > sequence object. Wouldn't be too hard to reconfigure that to return > file positions as well. Maybe for the 1.7 release... Bio::PullParserI is your friend. From bix at sendu.me.uk Thu Dec 20 13:48:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 20 Dec 2007 18:48:29 +0000 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: Message-ID: <476AB8FD.8090108@sendu.me.uk> Amir Karger wrote: >> It would be nice to code up a lazy sequence object and related >> parsers; maybe for the next dev release. > > Yes! > > Also, BLAST parsing. Blasting the proteome against the genome makes for > rather large result files. This has already been done. Use Bio::SearchIO::blast_pull. In a situation like yours I dropped run time from 20223s to 951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% less). From akarger at CGR.Harvard.edu Thu Dec 20 13:52:51 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 20 Dec 2007 13:52:51 -0500 Subject: [Bioperl-l] dealing with large files In-Reply-To: <476AB8FD.8090108@sendu.me.uk> References: <476AB8FD.8090108@sendu.me.uk> Message-ID: > Amir Karger wrote: > >> It would be nice to code up a lazy sequence object and related > >> parsers; maybe for the next dev release. > > > > Also, BLAST parsing. Blasting the proteome against the > genome makes for > > rather large result files. > > This has already been done. Use Bio::SearchIO::blast_pull. In a > situation like yours I dropped run time from 20223s to > 951s (~20x faster) and memory usage from over 8GB to less > than 5GB (~40% > less). Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I can put in my own perl lib for this, or does it require large bunches of new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2 here, but I don't see our whole center using CVS Bioperl. -Amir From cjfields at uiuc.edu Thu Dec 20 15:27:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 20 Dec 2007 14:27:45 -0600 Subject: [Bioperl-l] dealing with large files In-Reply-To: <476AA114.2060201@sendu.me.uk> References: <476AA114.2060201@sendu.me.uk> Message-ID: <29E190AB-8A6C-4F1C-BDD1-6034CFFEEFFF@uiuc.edu> On Dec 20, 2007, at 11:06 AM, Sendu Bala wrote: > Chris Fields wrote: >> The only way I can think of to fix this would be (as Jason also >> suggested) lightweight objects, or something like the lazy sequence >> object ala the SwissKnife suite (which only bring what you want >> into memory). >> Related to that, I have been testing something like that, which >> uses iterators to pass in chunks of data from a stream to handlers >> to build a sequence object. Wouldn't be too hard to reconfigure >> that to return file positions as well. Maybe for the 1.7 release... > > Bio::PullParserI is your friend. I'm looking into that, yes. I'm thinking of something like a generic lazy sequence class with an embedded Handler/PullParser object which processes stuff on the fly. Oh, when I have a bit more time... chris From cjfields at uiuc.edu Thu Dec 20 15:39:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 20 Dec 2007 14:39:48 -0600 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: <476AB8FD.8090108@sendu.me.uk> Message-ID: <2EC6A1C2-FBC9-45F6-AD1B-040E29FAFA28@uiuc.edu> On Dec 20, 2007, at 12:52 PM, Amir Karger wrote: >> Amir Karger wrote: >>>> It would be nice to code up a lazy sequence object and related >>>> parsers; maybe for the next dev release. >>> >>> Also, BLAST parsing. Blasting the proteome against the >> genome makes for >>> rather large result files. >> >> This has already been done. Use Bio::SearchIO::blast_pull. In a >> situation like yours I dropped run time from 20223s to >> 951s (~20x faster) and memory usage from over 8GB to less >> than 5GB (~40% >> less). > > Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I > can put in my own perl lib for this, or does it require large > bunches of > new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2 > here, but I don't see our whole center using CVS Bioperl. > > -Amir It's in CVS. Just to note: there have been a lot of changes between 1.5.1 and 1.5.2, and probably as many from 1.5.2 to now. We are cleaning up some code introduced prior to the 1.5 release and working on other fixes and code docs, with the final aim to be a new 1.6; I'm hoping that release will have routine point releases for bug fixes. Of course that'll have to wait until after SVN migration! There a few discussions on the list about speeding up parsing using lightweight/featherweight objects or even straight hashes (for instance, Jason has a lightweight seqfeature implementation committed on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser implementations). My feeling is that will be part of the next dev release, along with GFF3 integration and code cleanup. chris From bix at sendu.me.uk Thu Dec 20 18:29:30 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 20 Dec 2007 23:29:30 +0000 Subject: [Bioperl-l] dealing with large files In-Reply-To: References: <476AB8FD.8090108@sendu.me.uk> Message-ID: <476AFADA.20604@sendu.me.uk> Amir Karger wrote: >> Amir Karger wrote: >>>> It would be nice to code up a lazy sequence object and related >>>> parsers; maybe for the next dev release. >>> Also, BLAST parsing. Blasting the proteome against the >>> genome makes for rather large result files. >> This has already been done. Use Bio::SearchIO::blast_pull. In a >> situation like yours I dropped run time from 20223s to >> 951s (~20x faster) and memory usage from over 8GB to less >> than 5GB (~40% less). > > Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I > can put in my own perl lib for this, or does it require large bunches of > new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2 > here, but I don't see our whole center using CVS Bioperl. blast_pull is only in CVS (and needs a whole bunch of associated modules to work), though 1.5.2 also contains significant improvements to SearchIO generally which should provide you with significant speed improvements during blast parsing with the normal Bio::SearchIO::blast. From abdul.sattar4 at ntlworld.com Thu Dec 20 19:32:06 2007 From: abdul.sattar4 at ntlworld.com (Abdul Sattar) Date: Fri, 21 Dec 2007 00:32:06 -0000 Subject: [Bioperl-l] bioperl-db & biperl version Message-ID: <000001c84368$ee7872b0$c5836351@owner00d4289a7> BFG-0DRTGO0EEGREWTYU From DGroskreutz at twt.com Fri Dec 21 02:01:27 2007 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Fri, 21 Dec 2007 01:01:27 -0600 Subject: [Bioperl-l] Groskreutz, Deb is out of the office. Message-ID: I will be out of the office starting 12/20/2007 and will not return until 01/01/2008. I will respond to your message when I return on January 2nd, 2008 NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From bug-bioperl at rt.cpan.org Fri Dec 21 07:07:39 2007 From: bug-bioperl at rt.cpan.org (Brandi Cantarel via RT) Date: Fri, 21 Dec 2007 07:07:39 -0500 Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr> References: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr> Message-ID: Fri Dec 21 07:07:30 2007: Request 31796 was acted upon. Transaction: Ticket created by brandi.cantarel at afmb.univ-mrs.fr Queue: bioperl Subject: SeqIO Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: brandi.cantarel at afmb.univ-mrs.fr Status: new Ticket I might have found a bug in SeqIO in bioperl. Well it is actually a memory leak. When I try to load large file, I can step through the first 10K or so sequences (using next_seq) but then it just hangs..... If this bug is fixed please let me know. Brandi Cantarel -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From bug-bioperl at rt.cpan.org Fri Dec 21 08:57:20 2007 From: bug-bioperl at rt.cpan.org (Sendu Bala via RT) Date: Fri, 21 Dec 2007 08:57:20 -0500 Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr> References: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr> Message-ID: Queue: bioperl Ticket On Fri Dec 21 07:07:30 2007, brandi.cantarel at afmb.univ-mrs.fr wrote: > I might have found a bug in SeqIO in bioperl. Well it is actually a > memory leak. When I try to load large file, I can step through the > first 10K or so sequences (using next_seq) but then it just hangs..... > > If this bug is fixed please let me know. Please use http://bugzilla.bioperl.org/ to tell us about this bug. After creating a bug report you'll be able to attach the script in which you encounter the problem, which we need to diagnose this issue. From susantoroy at gmail.com Sat Dec 22 07:06:42 2007 From: susantoroy at gmail.com (Susanta Roy) Date: Sat, 22 Dec 2007 17:36:42 +0530 Subject: [Bioperl-l] Enquiry about bioperl project Message-ID: <236a58340712220406m3d3f9884h8f7b5e58bdfb356@mail.gmail.com> Dear Sir, Most humbly I have to state that I am Susanta Roy, 25 years and I have done my masters in bioinformatics. I have more than nine months of work experience as Associate Technical Content Developer. I have also worked in the journal "Bioinformatics India" (The first bioinformatics journal of India, now "Bioinformatics Trends"). My work with previous employer was highly appreciated. This year I have founded Bioexplore, a bioinformatics KPO (Knowledge Process Outsourcing) due to lack of bioinformatics jobs in India. Our services include 1. Bioinformatics data mining / programming 2. HR solution 3. Technical writing solution 4. E-learning 5. Abstracing & indexing 6. Business promotion solution I want to inquire if you can give me a project. -- Looking forward to your reply. Kind Regards Mr. Susanta Roy, MS Bioinformatics Founder Director Bioexplore C-5, Hazipark Market Dimapur, Nagaland - 797112 India + 91 - 9811517324 (Mobile) susanta.roy at bioexplore.co.in susantoroy at gmail.com