From anjan.purkayastha at gmail.com Mon Mar 3 12:31:11 2008 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Mon, 3 Mar 2008 12:31:11 -0500 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS Message-ID: hi i am tried to use the perl wrappers for EMBOSS with: use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/"; use Bio::Factory::EMBOSS; however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl directory mentioned above. so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the attached error message. any ideas on what i need to do to make this work? all advice will be appreciated. tia, anjan -- ANJAN PURKAYASTHA, PhD. Senior Computational Biologist ========================== 1101 King Street, Suite 310, Alexandria, VA 22314. 703.518.8040 (office) 703.740.6939 (mobile) email: anjan at vbi.vt.edu; anjan.purkayastha at gmail.com http://www.vbi.vt.edu ========================== -------------- next part -------------- A non-text attachment was scrubbed... Name: emboss_install_error_message.rtf Type: application/rtf Size: 123212 bytes Desc: not available URL: From cjfields at uiuc.edu Mon Mar 3 13:54:06 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Mar 2008 12:54:06 -0600 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: References: Message-ID: You'll need to install bioperl-run. Bio::Factory::EMBOSS is in bioperl-run, not the main bioperl distribution (aka bioperl-core). chris On Mar 3, 2008, at 11:31 AM, ANJAN PURKAYASTHA wrote: > hi > i am tried to use the perl wrappers for EMBOSS with: > > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/"; > use Bio::Factory::EMBOSS; > > however it seems that Bio::Factory::EMBOSS cannot be found in the > bioperl > directory mentioned above. > > so i tried to install Bio::Factory::EMBOSS from the cpan website. i > got the > attached error message. > > any ideas on what i need to do to make this work? > all advice will be appreciated. > > tia, > > anjan > > > -- > ANJAN PURKAYASTHA, PhD. > Senior Computational Biologist > ========================== > > 1101 King Street, Suite 310, > Alexandria, VA 22314. > 703.518.8040 (office) > 703.740.6939 (mobile) > > email: > anjan at vbi.vt.edu; > anjan.purkayastha at gmail.com > > http://www.vbi.vt.edu > > ========================== > < > emboss_install_error_message > .rtf>_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From David.Messina at sbc.su.se Mon Mar 3 14:34:20 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 3 Mar 2008 20:34:20 +0100 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: References: Message-ID: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com> Hi Anjan, Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but rather part of bioperl-run. For some reason CPAN went for the old (1.4) version of bioperl-run rather than the current 1.5.2. And indeed, I seem to run into the same problem: cpan> d /bioperl/ Distribution BIRNEY/bioperl-1.2.1.tar.gz Distribution BIRNEY/bioperl-1.2.2.tar.gz Distribution BIRNEY/bioperl-1.2.3.tar.gz Distribution BIRNEY/bioperl-1.2.tar.gz Distribution BIRNEY/bioperl-1.4.tar.gz Distribution BIRNEY/bioperl-db-0.1.tar.gz Distribution BIRNEY/bioperl-ext-1.4.tar.gz Distribution BIRNEY/bioperl-gui-0.7.tar.gz Distribution BIRNEY/bioperl-run-1.2.2.tar.gz Distribution BIRNEY/bioperl-run-1.4.tar.gz Distribution BOZO/Fry-Lib-BioPerl-0.15.tar.gz Distribution CRAFFI/Bundle-BioPerl-2.1.8.tar.gz 12 items found but when I ask in a different way the right distributions show up. [Sendu, any idea what's going on here?] cpan> ls SENDU 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz 320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz 99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz 942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz So try doing cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz Or if CPAN refuses to cooperate, you can grab it from here: http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release Dave From arareko at campus.iztacala.unam.mx Mon Mar 3 14:25:14 2008 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 03 Mar 2008 13:25:14 -0600 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: References: Message-ID: <47CC509A.10306@campus.iztacala.unam.mx> Hi Anjan, It looks like you are using the latest BioPerl developer release (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available then you should try installing the latest BioPerl-run as well (bioperl-run-1.5.2_100). After you install it, you'll have to modify your 'use lib' pragma for your script to work as you expect: use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/"; use Bio::Factory::EMBOSS; Hope this helps. Regards, Mauricio. ANJAN PURKAYASTHA wrote: > hi > i am tried to use the perl wrappers for EMBOSS with: > > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/"; > use Bio::Factory::EMBOSS; > > however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl > directory mentioned above. > > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the > attached error message. > > any ideas on what i need to do to make this work? > all advice will be appreciated. > > tia, > > anjan > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Mon Mar 3 15:05:16 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Mar 2008 14:05:16 -0600 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com> References: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com> Message-ID: <43EC247B-EC01-483D-82B1-D861590A141A@uiuc.edu> On Mar 3, 2008, at 1:34 PM, Dave Messina wrote: > Hi Anjan, > > Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but > rather part of bioperl-run. For some reason CPAN went for the old > (1.4) > version of bioperl-run rather than the current 1.5.2. > > And indeed, I seem to run into the same problem: > cpan> d /bioperl/ > > Distribution BIRNEY/bioperl-1.2.1.tar.gz > Distribution BIRNEY/bioperl-1.2.2.tar.gz > Distribution BIRNEY/bioperl-1.2.3.tar.gz > Distribution BIRNEY/bioperl-1.2.tar.gz > Distribution BIRNEY/bioperl-1.4.tar.gz > Distribution BIRNEY/bioperl-db-0.1.tar.gz > Distribution BIRNEY/bioperl-ext-1.4.tar.gz > Distribution BIRNEY/bioperl-gui-0.7.tar.gz > Distribution BIRNEY/bioperl-run-1.2.2.tar.gz > Distribution BIRNEY/bioperl-run-1.4.tar.gz > Distribution BOZO/Fry-Lib-BioPerl-0.15.tar.gz > Distribution CRAFFI/Bundle-BioPerl-2.1.8.tar.gz > 12 items found > > but when I ask in a different way the right distributions show up. > [Sendu, > any idea what's going on here?] It's marked as a developer release, which I think requires a full path (as you have below) and not just the package name. chris > cpan> ls > SENDU > 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz > 320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz > 99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz > 942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz > > So try doing > > cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz > > Or if CPAN refuses to cooperate, you can grab it from here: > http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release > > > Dave From anjan.purkayastha at gmail.com Mon Mar 3 14:57:33 2008 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Mon, 3 Mar 2008 14:57:33 -0500 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: <47CC509A.10306@campus.iztacala.unam.mx> References: <47CC509A.10306@campus.iztacala.unam.mx> Message-ID: guys, thanks! i got bioperl-run to work. next question, let's say i want to run the palindrome program in emboss using the bioperl wrapper. now, palindrome takes in a list of parameter values- these are fed into emboss as a key-value hash. where do i find the correct names of the keys to create the input hash? tia. anjan On Mon, Mar 3, 2008 at 2:25 PM, Mauricio Herrera Cuadra < arareko at campus.iztacala.unam.mx> wrote: > Hi Anjan, > > It looks like you are using the latest BioPerl developer release > (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available > then you should try installing the latest BioPerl-run as well > (bioperl-run-1.5.2_100). After you install it, you'll have to modify > your 'use lib' pragma for your script to work as you expect: > > use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/"; > use Bio::Factory::EMBOSS; > > Hope this helps. > > Regards, > Mauricio. > > > ANJAN PURKAYASTHA wrote: > > hi > > i am tried to use the perl wrappers for EMBOSS with: > > > > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/"; > > use Bio::Factory::EMBOSS; > > > > however it seems that Bio::Factory::EMBOSS cannot be found in the > bioperl > > directory mentioned above. > > > > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got > the > > attached error message. > > > > any ideas on what i need to do to make this work? > > all advice will be appreciated. > > > > tia, > > > > anjan > > > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > -- ANJAN PURKAYASTHA, PhD. Senior Computational Biologist ========================== 1101 King Street, Suite 310, Alexandria, VA 22314. 703.518.8040 (office) 703.740.6939 (mobile) email: anjan at vbi.vt.edu; anjan.purkayastha at gmail.com http://www.vbi.vt.edu ========================== From Daniel.Gerlach at medecine.unige.ch Tue Mar 4 03:48:15 2008 From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach) Date: Tue, 04 Mar 2008 09:48:15 +0100 Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not implemented in the version of perl" Message-ID: <47CD0CCF.4060306@medecine.unige.ch> Hello, Trying to run Bio::TreeIO by this command: perl -e 'use Bio::TreeIO' I get the following error: Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77. Compilation failed in require at -e line 1. BEGIN failed--compilation aborted at -e line 1. I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a recent version of bioperl around 5 month ago. Any suggestions of why this module can't be loaded correctly? Greetings, Daniel From bix at sendu.me.uk Tue Mar 4 06:55:32 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Mar 2008 11:55:32 +0000 Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not implemented in the version of perl" In-Reply-To: <47CD0CCF.4060306@medecine.unige.ch> References: <47CD0CCF.4060306@medecine.unige.ch> Message-ID: <47CD38B4.1070200@sendu.me.uk> Daniel Gerlach wrote: > Hello, > > Trying to run Bio::TreeIO by this command: > > perl -e 'use Bio::TreeIO' > > I get the following error: > > Weak references are not implemented in the version of perl > [...] > I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a > recent version of bioperl around 5 month ago. Any suggestions of why > this module can't be loaded correctly? Redhat/Fedora apparently has Perl issues. First try installing the latest version of Scalar::Util yourself: perl -MCPAN -e shell force install Scalar::Util If that doesn't work, you'll have to download and compile Perl yourself from source (don't use Fedora's installation system). From apapanicolaou at ice.mpg.de Tue Mar 4 07:03:27 2008 From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou) Date: Tue, 04 Mar 2008 13:03:27 +0100 Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm Message-ID: <47CD3A8F.9050902@ice.mpg.de> hello all, 1) I was wondering if you would you know what this error means and had time to help... Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287 line 287 is else { $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin"; } this is the header # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $ # # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, bioperl: tried with both 1.5.2_102 from cvs and checked out svn version today) use Bio::SearchIO::Writer::GbrowseGFF; use Bio::SearchIO; if ($program eq "blastn"){ #my $out_gff = new Bio::SearchIO(-writer => $writer_gff, my $out_gff = new Bio::SearchIO(-output_format => 'GbrowseGFF', -output_cigar => 1, -output_signif => 1, -file => ">$infile.$query.blast.gff"); #my $out_gff_whole = new Bio::SearchIO(-writer => $writer_gff, my $out_gff_whole = new Bio::SearchIO(-output_format => 'GbrowseGFF', -output_cigar => 1, -output_signif => 1, -file => ">>$infile.blast.gff"); $out_gff->write_result($result); $out_gff_whole->write_result($result); } Where $result is a blast result... The aim is to parse a multi-query blast report and split it into different queries and make another file with all the queries. I'm sure i'm forgetting something but I can't figure what... The GFF file is produced, but I do get the error above... 2) Finally, there is a small bug but I don't think it comes from this module? The id attribute is printed out e.g iD=match_sequence31 with iD wrongly capitalised... many thanks for your time alexie -- -- Alexie Papanicolaou Entomology Max Planck Institute for Chemical Ecology Hans Knoell Str 8 Jena 07745 Germany Email apapanicolaou at ice.mpg.de Tel +493641571561 From apapanicolaou at ice.mpg.de Tue Mar 4 07:04:16 2008 From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou) Date: Tue, 04 Mar 2008 13:04:16 +0100 Subject: [Bioperl-l] Gbrowse.pm followup Message-ID: <47CD3AC0.4080801@ice.mpg.de> Oh the iD bug is fixed in the svn developer branch. ta a -- -- Alexie Papanicolaou Entomology Max Planck Institute for Chemical Ecology Hans Knoell Str 8 Jena 07745 Germany Email apapanicolaou at ice.mpg.de Tel +493641571561 From cjfields at uiuc.edu Tue Mar 4 08:16:04 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Mar 2008 07:16:04 -0600 Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm In-Reply-To: <47CD3A8F.9050902@ice.mpg.de> References: <47CD3A8F.9050902@ice.mpg.de> Message-ID: <4A68AA28-E508-4257-86E1-393CA9B74082@uiuc.edu> I have run into a number of problems with the GbrowseGFF module myself (I think I committed the ID fix, actually). It works but needs revision and needs better conformity with GFF3. You can post (1) as a bug and well look into it when we can. It's possible (depending on how extensive the fix is) this may have to wait until 1.7. chris On Mar 4, 2008, at 6:03 AM, Alexie Papanicolaou wrote: > hello all, > > 1) I was wondering if you would you know what this error means and > had time to help... > > Use of uninitialized value in concatenation (.) or string at /usr/ > local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287 > > line 287 is > else { > $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin"; > } > > this is the header > # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $ > # > # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm > > > this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, > bioperl: tried with both 1.5.2_102 from cvs and checked out svn > version today) > > use Bio::SearchIO::Writer::GbrowseGFF; > use Bio::SearchIO; > if ($program eq "blastn"){ > #my $out_gff = new Bio::SearchIO(-writer => $writer_gff, > my $out_gff = new Bio::SearchIO(-output_format => 'GbrowseGFF', > -output_cigar => 1, > -output_signif => 1, > -file => ">$infile.$query.blast.gff"); > #my $out_gff_whole = new Bio::SearchIO(-writer => $writer_gff, > my $out_gff_whole = new Bio::SearchIO(-output_format => 'GbrowseGFF', > -output_cigar => 1, > -output_signif => 1, > -file => ">>$infile.blast.gff"); > $out_gff->write_result($result); > $out_gff_whole->write_result($result); > } > > > > Where $result is a blast result... > > The aim is to parse a multi-query blast report and split it into > different queries and make another file with all the queries. I'm > sure i'm forgetting something but I can't figure what... > > The GFF file is produced, but I do get the error above... > > 2) Finally, there is a small bug but I don't think it comes from > this module? The id attribute is printed out e.g iD=match_sequence31 > with iD wrongly capitalised... > > many thanks for your time > alexie > > -- > -- > Alexie Papanicolaou > Entomology > Max Planck Institute for Chemical Ecology > Hans Knoell Str 8 > Jena 07745 > Germany > Email apapanicolaou at ice.mpg.de > Tel +493641571561 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Daniel.Gerlach at medecine.unige.ch Tue Mar 4 07:35:03 2008 From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach) Date: Tue, 04 Mar 2008 13:35:03 +0100 Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an alignment References: <200502151616.j1FGGnKr023827@portal.open-bio.org> Message-ID: <47CD41F7.2000401@medecine.unige.ch> Hello, Is it possible to remove only columns containing e.g. more than 75% gaps from an alignment? I was thinking at $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns]) This would allow me to remove all gaps or gap-only columns but not using a threshold. Greetings, Daniel From Daniel.Gerlach at medecine.unige.ch Tue Mar 4 08:46:33 2008 From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach) Date: Tue, 04 Mar 2008 14:46:33 +0100 Subject: [Bioperl-l] branch length score - total length of the spanning subtree Message-ID: <47CD52B9.5060906@medecine.unige.ch> Hello, I would like to use bioperl to calculate a branch length score for a given set of nodes and a tree. I know how to get the total branch length by using $tree->total_branch_length, but how could I get the length of the subtree spanning some given nodes which are dispersed over the whole tree (a subset of nodes from the tree which are not monophyletic)? Greetings, Daniel From bix at sendu.me.uk Tue Mar 4 09:37:53 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Mar 2008 14:37:53 +0000 Subject: [Bioperl-l] branch length score - total length of the spanning subtree In-Reply-To: <47CD52B9.5060906@medecine.unige.ch> References: <47CD52B9.5060906@medecine.unige.ch> Message-ID: <47CD5EC1.2020103@sendu.me.uk> Daniel Gerlach wrote: > Hello, > > I would like to use bioperl to calculate a branch length score for a > given set of nodes and a tree. I know how to get the total branch length > by using $tree->total_branch_length, but how could I get the length of > the subtree spanning some given nodes which are dispersed over the whole > tree (a subset of nodes from the tree which are not monophyletic)? One 'cheat' way of doing it might be to use splice(-keep_ids => \@node_ids) or similar, then run total_branch_length() on that. No idea if it will actually give you the right answer though. Let us know! :) From bix at sendu.me.uk Tue Mar 4 09:26:10 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Mar 2008 14:26:10 +0000 Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an alignment In-Reply-To: <47CD41F7.2000401@medecine.unige.ch> References: <200502151616.j1FGGnKr023827@portal.open-bio.org> <47CD41F7.2000401@medecine.unige.ch> Message-ID: <47CD5C02.8060306@sendu.me.uk> Daniel Gerlach wrote: > Hello, > > Is it possible to remove only columns containing e.g. more than 75% gaps > from an alignment? I was thinking at > > $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns]) > > This would allow me to remove all gaps or gap-only columns but not using > a threshold. Well, you can use gap_col_matrix() to decide which columns you don't want, and then use remove_columns(). From hlapp at gmx.net Tue Mar 4 10:24:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 4 Mar 2008 10:24:13 -0500 Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm In-Reply-To: <47CD3A8F.9050902@ice.mpg.de> References: <47CD3A8F.9050902@ice.mpg.de> Message-ID: <87808BE4-B6A3-4C7F-A6DC-42ED2686375B@gmx.net> On Mar 4, 2008, at 7:03 AM, Alexie Papanicolaou wrote: > Use of uninitialized value in concatenation (.) or string at /usr/ > local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287 > > line 287 is > else { > $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin"; > } Note that this is a warning, not an error. However, if none of $prefix, $seqname, $qpmax, $qpmin can be undefined (or be equal to an empty string, which they will default to if undefined) at this position, then there is a problem (and it is before the above line). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Mar 4 11:02:02 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 4 Mar 2008 11:02:02 -0500 Subject: [Bioperl-l] branch length score - total length of the spanning subtree In-Reply-To: <47CD5EC1.2020103@sendu.me.uk> References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk> Message-ID: On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote: > Daniel Gerlach wrote: >> Hello, >> I would like to use bioperl to calculate a branch length score for >> a given set of nodes and a tree. I know how to get the total >> branch length by using $tree->total_branch_length, but how could I >> get the length of the subtree spanning some given nodes which are >> dispersed over the whole tree (a subset of nodes from the tree >> which are not monophyletic)? > > One 'cheat' way of doing it might be to use splice(-keep_ids => > \@node_ids) or similar, then run total_branch_length() on that. No > idea if it will actually give you the right answer though. Let us > know! :) Related to that, will contract_linear_paths() actually do the right thing and adjust branch lengths if it removes internal nodes with outdegree 1? Rutger - does Bio::Phylo handle this correctly? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Daniel.Gerlach at medecine.unige.ch Tue Mar 4 11:12:53 2008 From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach) Date: Tue, 04 Mar 2008 17:12:53 +0100 Subject: [Bioperl-l] branch length score - total length of the spanning subtree In-Reply-To: <47CD5EC1.2020103@sendu.me.uk> References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk> Message-ID: <47CD7505.5080105@medecine.unige.ch> Hello, Thanks for the quick answer. I tried: use Bio::TreeIO; my $treeio = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA); my $tree = $treeio->next_tree; print $tree->total_branch_length,"\n"; $tree->splice(-keep_id => [A,B,E]); print $tree->total_branch_length,"\n"; __DATA__ (((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10); Which gives me the message "MSG: After splicing, the original root was removed but there are multiple candidates for the new root!" however the root E was not removed. If I do it the complementary way by splicing out all unwanted nodes - splice(-remove_id => [C,D]) - I get what I want: 34 25 Greetings, Daniel Sendu Bala wrote: > Daniel Gerlach wrote: >> Hello, >> >> I would like to use bioperl to calculate a branch length score for a >> given set of nodes and a tree. I know how to get the total branch >> length by using $tree->total_branch_length, but how could I get the >> length of the subtree spanning some given nodes which are dispersed >> over the whole tree (a subset of nodes from the tree which are not >> monophyletic)? > > One 'cheat' way of doing it might be to use splice(-keep_ids => > \@node_ids) or similar, then run total_branch_length() on that. No idea > if it will actually give you the right answer though. Let us know! :) From bix at sendu.me.uk Tue Mar 4 11:37:47 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Mar 2008 16:37:47 +0000 Subject: [Bioperl-l] branch length score - total length of the spanning subtree In-Reply-To: References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk> Message-ID: <47CD7ADB.6050808@sendu.me.uk> Hilmar Lapp wrote: > > On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote: > >> Daniel Gerlach wrote: >>> Hello, >>> I would like to use bioperl to calculate a branch length score for a >>> given set of nodes and a tree. I know how to get the total branch >>> length by using $tree->total_branch_length, but how could I get the >>> length of the subtree spanning some given nodes which are dispersed >>> over the whole tree (a subset of nodes from the tree which are not >>> monophyletic)? >> >> One 'cheat' way of doing it might be to use splice(-keep_ids => >> \@node_ids) or similar, then run total_branch_length() on that. No >> idea if it will actually give you the right answer though. Let us >> know! :) > > Related to that, will contract_linear_paths() actually do the right > thing and adjust branch lengths if it removes internal nodes with > outdegree 1? I think ultimately it boils down to remove_Descendent() being called as appropriate which does the branch length alteration. From a glance I can't answer your question with certainly, but it 'should' do the right thing. It needs to be tested; when I implemented these things I was only concerned with tree topology, not branch lengths or anything else. From David.Messina at sbc.su.se Tue Mar 4 15:47:06 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 4 Mar 2008 21:47:06 +0100 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: References: <47CC509A.10306@campus.iztacala.unam.mx> Message-ID: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com> > where do i find the > correct names of the keys to create the input hash? I've never used this module, but from a quick look at the code it appears to pass on any parameters to palindrome. I'm guessing you've already done this, but have you tried using the parameter names and values that palindrome itself asks for? Dave From cjfields at uiuc.edu Tue Mar 4 16:34:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Mar 2008 15:34:21 -0600 Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl Message-ID: I don't know what the current status is for OBDA, but we have several bugs listed for Bio::DB::Flat which need someone versed in OBDA to look at them (they are all interrelated): http://bugzilla.open-bio.org/show_bug.cgi?id=2336 http://bugzilla.open-bio.org/show_bug.cgi?id=2337 http://bugzilla.open-bio.org/show_bug.cgi?id=2338 http://bugzilla.open-bio.org/show_bug.cgi?id=2339 If anyone has any input I would greatly appreciate it. I have been trying to stomp as many bugs as possible so we can work on a new release. chris From bosborne11 at verizon.net Tue Mar 4 16:42:05 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 04 Mar 2008 16:42:05 -0500 Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl In-Reply-To: References: Message-ID: Chris, I'll take a look at them this weekend. Brian O. On Mar 4, 2008, at 4:34 PM, Chris Fields wrote: > I don't know what the current status is for OBDA, but we have > several bugs listed for Bio::DB::Flat which need someone versed in > OBDA to look at them (they are all interrelated): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2336 > http://bugzilla.open-bio.org/show_bug.cgi?id=2337 > http://bugzilla.open-bio.org/show_bug.cgi?id=2338 > http://bugzilla.open-bio.org/show_bug.cgi?id=2339 > > If anyone has any input I would greatly appreciate it. I have been > trying to stomp as many bugs as possible so we can work on a new > release. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From anjan.purkayastha at gmail.com Tue Mar 4 18:52:09 2008 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Tue, 4 Mar 2008 18:52:09 -0500 Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS In-Reply-To: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com> References: <47CC509A.10306@campus.iztacala.unam.mx> <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com> Message-ID: guys, thanks for all your inputs. i went to the following site: http://www.koders.com/perl/fid5F28A3DDD453F0DB4995B7DDF304B02DBBACE0A0.aspx?s=calculate they have the key names for most of the emboss programs. thanks, anjan On Tue, Mar 4, 2008 at 3:47 PM, Dave Messina wrote: > > where do i find the > > correct names of the keys to create the input hash? > > > > I've never used this module, but from a quick look at the code it appears > to pass on any parameters to palindrome. > > I'm guessing you've already done this, but have you tried using the > parameter names and values that palindrome itself asks for? > > > Dave > > -- ANJAN PURKAYASTHA, PhD. Senior Computational Biologist ========================== 1101 King Street, Suite 310, Alexandria, VA 22314. 703.518.8040 (office) 703.740.6939 (mobile) email: anjan at vbi.vt.edu; anjan.purkayastha at gmail.com http://www.vbi.vt.edu ========================== From staffa at niehs.nih.gov Wed Mar 5 18:43:30 2008 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Wed, 05 Mar 2008 18:43:30 -0500 Subject: [Bioperl-l] SeqIO Message-ID: So the Howto says that Bio::SeqIO will read almost any known format including GCG. So I create a GCG file with Seqlab and try to printout its sequence as a string. ( I did guess at the way to get the sequence string: #!/usr/bin/perl -w use strict; $| = 1; use Bio::SeqIO; my $number_of_files = @ARGV; if(!$number_of_files){print "no files entered\n";exit:} foreach my $file (@ARGV){ my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; my $sequence = $seq_object->seq; print "$sequence\n"; my $status = &windowscore($sequence); } But what it returned was the entire contents of the file with no format decoding. Have I been deluded? NewDNALength:810March5,200818:26Type:NCheck:3368..1TGTTCGAATTCCGTGCGGTCCACCT CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From cjfields at uiuc.edu Wed Mar 5 21:22:53 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Mar 2008 20:22:53 -0600 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: I thought GCG format changed somewhere along the way but I maybe I'm wrong? Regardless, you'll have to post this as a bug (along with an example file). Also, kind of odd that the sequence data wasn't checked... chris On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: > So the Howto says that Bio::SeqIO will read almost any known format > including GCG. > So I create a GCG file with Seqlab and try to printout its sequence > as a > string. ( I did guess at the way to get the sequence string: > > #!/usr/bin/perl -w > use strict; > $| = 1; > use Bio::SeqIO; > my $number_of_files = @ARGV; > if(!$number_of_files){print "no files entered\n";exit:} > foreach my $file (@ARGV){ > my $seqio_object = Bio::SeqIO->new(-file => $file); > my $seq_object = $seqio_object->next_seq; > my $sequence = $seq_object->seq; > print "$sequence\n"; > my $status = &windowscore($sequence); > } > > But what it returned was the entire contents of the file with no > format > decoding. Have I been deluded? > > NewDNALength:810March5,200818:26Type:NCheck: > 3368..1TGTTCGAATTCCGTGCGGTCCACCT > CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT > T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT > GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC > GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG > GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG > GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG > AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC > AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6 > 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG > CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC > TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Wed Mar 5 21:33:48 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 5 Mar 2008 18:33:48 -0800 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org> probably you should try specifying the format explicitly first- as in (-format => 'gcg') -j On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: > I thought GCG format changed somewhere along the way but I maybe > I'm wrong? Regardless, you'll have to post this as a bug (along > with an example file). > > Also, kind of odd that the sequence data wasn't checked... > > chris > > On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: > >> So the Howto says that Bio::SeqIO will read almost any known format >> including GCG. >> So I create a GCG file with Seqlab and try to printout its >> sequence as a >> string. ( I did guess at the way to get the sequence string: >> >> #!/usr/bin/perl -w >> use strict; >> $| = 1; >> use Bio::SeqIO; >> my $number_of_files = @ARGV; >> if(!$number_of_files){print "no files entered\n";exit:} >> foreach my $file (@ARGV){ >> my $seqio_object = Bio::SeqIO->new(-file => $file); >> my $seq_object = $seqio_object->next_seq; >> my $sequence = $seq_object->seq; >> print "$sequence\n"; >> my $status = &windowscore($sequence); >> } >> >> But what it returned was the entire contents of the file with no >> format >> decoding. Have I been deluded? >> >> NewDNALength:810March5,200818:26Type:NCheck: >> 3368..1TGTTCGAATTCCGTGCGGTCCACCT >> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG >> CGAAGGT >> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC >> GGCTGCT >> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT >> GCAGAGC >> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG >> GCCAGCG >> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG >> TCCCCTG >> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 >> 51GGCAG >> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG >> AGACATC >> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG >> CCGCCC6 >> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT >> TCATGCG >> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG >> CAGCCGC >> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA >> GGG >> >> >> >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Information Technology Support Services Contract >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Mar 5 21:01:07 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 05 Mar 2008 21:01:07 -0500 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: <19DC527F-3D34-4F3E-9B4C-D2C6011A2C8F@verizon.net> Nick, Take a look at the GCG files that are used in the SeqIO tests: bioperl-live//t/data/test.gcg bioperl-live//t/data/test_badlf.gcg Does the file that you created have a format like the format in those files? I'm guessing you're going to say 'yes', from the looks of your output. Brian O. On Mar 5, 2008, at 6:43 PM, Staffa, Nick (NIH/NIEHS) wrote: > So the Howto says that Bio::SeqIO will read almost any known format > including GCG. > So I create a GCG file with Seqlab and try to printout its sequence > as a > string. ( I did guess at the way to get the sequence string: > > #!/usr/bin/perl -w > use strict; > $| = 1; > use Bio::SeqIO; > my $number_of_files = @ARGV; > if(!$number_of_files){print "no files entered\n";exit:} > foreach my $file (@ARGV){ > my $seqio_object = Bio::SeqIO->new(-file => $file); > my $seq_object = $seqio_object->next_seq; > my $sequence = $seq_object->seq; > print "$sequence\n"; > my $status = &windowscore($sequence); > } > > But what it returned was the entire contents of the file with no > format > decoding. Have I been deluded? > > NewDNALength:810March5,200818:26Type:NCheck: > 3368..1TGTTCGAATTCCGTGCGGTCCACCT > CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT > T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT > GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC > GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG > GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG > GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG > AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC > AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6 > 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG > CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC > TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Wed Mar 5 22:09:11 2008 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Wed, 05 Mar 2008 22:09:11 -0500 Subject: [Bioperl-l] SeqIO In-Reply-To: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org> Message-ID: Verily, One interpretation of the docs might be: will read any format if the format is specified. I was hoping that I could write a program that one needn't specify format. It'd be more user-friendly and useful. On 3/5/08 9:33 PM, "Jason Stajich" wrote: > probably you should try specifying the format explicitly first- as in > (-format => 'gcg') > > -j > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: > >> I thought GCG format changed somewhere along the way but I maybe >> I'm wrong? Regardless, you'll have to post this as a bug (along >> with an example file). >> >> Also, kind of odd that the sequence data wasn't checked... >> >> chris >> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: >> >>> So the Howto says that Bio::SeqIO will read almost any known format >>> including GCG. >>> So I create a GCG file with Seqlab and try to printout its >>> sequence as a >>> string. ( I did guess at the way to get the sequence string: >>> >>> #!/usr/bin/perl -w >>> use strict; >>> $| = 1; >>> use Bio::SeqIO; >>> my $number_of_files = @ARGV; >>> if(!$number_of_files){print "no files entered\n";exit:} >>> foreach my $file (@ARGV){ >>> my $seqio_object = Bio::SeqIO->new(-file => $file); >>> my $seq_object = $seqio_object->next_seq; >>> my $sequence = $seq_object->seq; >>> print "$sequence\n"; >>> my $status = &windowscore($sequence); >>> } >>> >>> But what it returned was the entire contents of the file with no >>> format >>> decoding. Have I been deluded? >>> >>> NewDNALength:810March5,200818:26Type:NCheck: >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG >>> CGAAGGT >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC >>> GGCTGCT >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT >>> GCAGAGC >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG >>> GCCAGCG >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG >>> TCCCCTG >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 >>> 51GGCAG >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG >>> AGACATC >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG >>> CCGCCC6 >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT >>> TCATGCG >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG >>> CAGCCGC >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA >>> GGG >>> >>> >>> >>> Nick Staffa >>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>> Scientific Computing Support Group >>> NIEHS Information Technology Support Services Contract >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) >>> National Institute of Environmental Health Sciences >>> National Institutes of Health >>> Research Triangle Park, North Carolina >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Mar 5 22:44:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Mar 2008 21:44:14 -0600 Subject: [Bioperl-l] SeqIO In-Reply-To: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net> References: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net> Message-ID: <9146DF9D-C0D6-4F18-9B7E-7BB42FCE0737@uiuc.edu> Heh, good one! Though Jason may have worked out the issue (not indicating the format explicitly). Would be worth looking at the tested files. As for dinosaurs, well I can't talk ... chris On Mar 5, 2008, at 8:49 PM, Brian Osborne wrote: > Chris, > > Many many years ago, when dinosaurs roamed the earth, only about > half of the formats had their own tests. A primitive being saw this > and created simple tests for all the 'missing' formats. His thought > probably was 'this is better than nothing'. In fact this being > assumed that GCG was an outdated and unused format, even as long ago > as that time was. > > The origins of so much of what we now know as 'Bioperl' are > frequently mysterious, or incomprehensible to modern day humans... > > Brian O. > > On Mar 5, 2008, at 9:22 PM, Chris Fields wrote: > >> Also, kind of odd that the sequence data wasn't checked... From bosborne11 at verizon.net Wed Mar 5 21:49:26 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 05 Mar 2008 21:49:26 -0500 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net> Chris, Many many years ago, when dinosaurs roamed the earth, only about half of the formats had their own tests. A primitive being saw this and created simple tests for all the 'missing' formats. His thought probably was 'this is better than nothing'. In fact this being assumed that GCG was an outdated and unused format, even as long ago as that time was. The origins of so much of what we now know as 'Bioperl' are frequently mysterious, or incomprehensible to modern day humans... Brian O. On Mar 5, 2008, at 9:22 PM, Chris Fields wrote: > Also, kind of odd that the sequence data wasn't checked... From cjfields at uiuc.edu Wed Mar 5 22:54:15 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Mar 2008 21:54:15 -0600 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: <67C6AE9D-3934-4717-A97A-4C31DB4F7E33@uiuc.edu> You can leave off the format, but you must append the correct file extension for the parser to determine the correct format ('.gcg' for GCG, for example). There is also Bio::Tools::GuessSeqFormat though it doesn't cover all formats. chris On Mar 5, 2008, at 9:09 PM, Staffa, Nick (NIH/NIEHS) wrote: > Verily, > One interpretation of the docs might be: will read any format if the > format > is specified. > I was hoping that I could write a program that one needn't specify > format. > It'd be more user-friendly and useful. > > > On 3/5/08 9:33 PM, "Jason Stajich" wrote: > >> probably you should try specifying the format explicitly first- as in >> (-format => 'gcg') >> >> -j >> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: >> >>> I thought GCG format changed somewhere along the way but I maybe >>> I'm wrong? Regardless, you'll have to post this as a bug (along >>> with an example file). >>> >>> Also, kind of odd that the sequence data wasn't checked... >>> >>> chris >>> >>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: >>> >>>> So the Howto says that Bio::SeqIO will read almost any known format >>>> including GCG. >>>> So I create a GCG file with Seqlab and try to printout its >>>> sequence as a >>>> string. ( I did guess at the way to get the sequence string: >>>> >>>> #!/usr/bin/perl -w >>>> use strict; >>>> $| = 1; >>>> use Bio::SeqIO; >>>> my $number_of_files = @ARGV; >>>> if(!$number_of_files){print "no files entered\n";exit:} >>>> foreach my $file (@ARGV){ >>>> my $seqio_object = Bio::SeqIO->new(-file => $file); >>>> my $seq_object = $seqio_object->next_seq; >>>> my $sequence = $seq_object->seq; >>>> print "$sequence\n"; >>>> my $status = &windowscore($sequence); >>>> } >>>> >>>> But what it returned was the entire contents of the file with no >>>> format >>>> decoding. Have I been deluded? >>>> >>>> NewDNALength:810March5,200818:26Type:NCheck: >>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT >>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG >>>> CGAAGGT >>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC >>>> GGCTGCT >>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT >>>> GCAGAGC >>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG >>>> GCCAGCG >>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG >>>> TCCCCTG >>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 >>>> 51GGCAG >>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG >>>> AGACATC >>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG >>>> CCGCCC6 >>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT >>>> TCATGCG >>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG >>>> CAGCCGC >>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA >>>> GGG >>>> >>>> >>>> >>>> Nick Staffa >>>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>>> Scientific Computing Support Group >>>> NIEHS Information Technology Support Services Contract >>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) >>>> National Institute of Environmental Health Sciences >>>> National Institutes of Health >>>> Research Triangle Park, North Carolina >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ewijaya at gmail.com Thu Mar 6 03:16:25 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Thu, 6 Mar 2008 16:16:25 +0800 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database Message-ID: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> Dear experts, Is there any? The TRANSFAC text file which contain entry like this. Especially we wich to capture the PWM for each of the Transcription factor. Regards, Edward __BEGIN__ VV TRANSFAC MATRIX TABLE, Release 11.1 - licensed - 2007-03-31, (C) Biobase GmbH XX // AC M00001 XX ID V$MYOD_01 XX DT 19.10.1992 (created); ewi. DT 22.10.1997 (updated); dbo. CO Copyright (C), Biobase GmbH. XX NA MyoD XX DE myoblast determination gene product XX BF T00526; MyoD; Species: mouse, Mus musculus. BF T09177; MyoD; Species: mouse, Mus musculus. XX P0 A C G T 01 1 2 2 0 S 02 2 1 2 0 R 03 3 0 1 1 A 04 0 5 0 0 C 05 5 0 0 0 A 06 0 0 4 1 G 07 0 1 4 0 G 08 0 0 0 5 T 09 0 0 5 0 G 10 0 1 2 2 K 11 0 2 0 3 Y 12 1 0 3 1 G ....etc.... From watashi at post.com Thu Mar 6 07:06:42 2008 From: watashi at post.com (Masa Masa) Date: Thu, 6 Mar 2008 07:06:42 -0500 Subject: [Bioperl-l] failure of add_seqfeature Message-ID: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com> Dear experts, Would anybody know why the following codes generate an error of: ------------- EXCEPTION ------------- MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained within parent feature, and expansion is not valid STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767 STACK toplevel test.pl:118 -------------------------------------- 15616 15693 79568 83016 ================= use Bio::Graphics; use Bio::SeqFeature::Generic; use Bio::SeqIO; my $bsg = 'Bio::SeqFeature::Generic'; my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], -display_name=>'U'); for (my $i=0; $i < @from; $i++) { print "$from[$i] $to[$i]\n"; $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to[$i])); if ($i > 10) { exit; } } -- Want an e-mail address like mine? Get a free e-mail account today at www.mail.com! From heikki at sanbi.ac.za Thu Mar 6 07:20:03 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 6 Mar 2008 14:20:03 +0200 Subject: [Bioperl-l] SeqIO In-Reply-To: References: Message-ID: <200803061420.04123.heikki@sanbi.ac.za> Nick, This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file: /Length: .*Type: .*Check: .*\.\.$/ It is the second line in GCG file. If first line matches to some other format regex, this will not not be evaluated. Let us know, -Heikki On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote: > Verily, > One interpretation of the docs might be: will read any format if the format > is specified. > I was hoping that I could write a program that one needn't specify format. > It'd be more user-friendly and useful. > > On 3/5/08 9:33 PM, "Jason Stajich" wrote: > > probably you should try specifying the format explicitly first- as in > > (-format => 'gcg') > > > > -j > > > > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: > >> I thought GCG format changed somewhere along the way but I maybe > >> I'm wrong? Regardless, you'll have to post this as a bug (along > >> with an example file). > >> > >> Also, kind of odd that the sequence data wasn't checked... > >> > >> chris > >> > >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: > >>> So the Howto says that Bio::SeqIO will read almost any known format > >>> including GCG. > >>> So I create a GCG file with Seqlab and try to printout its > >>> sequence as a > >>> string. ( I did guess at the way to get the sequence string: > >>> > >>> #!/usr/bin/perl -w > >>> use strict; > >>> $| = 1; > >>> use Bio::SeqIO; > >>> my $number_of_files = @ARGV; > >>> if(!$number_of_files){print "no files entered\n";exit:} > >>> foreach my $file (@ARGV){ > >>> my $seqio_object = Bio::SeqIO->new(-file => $file); > >>> my $seq_object = $seqio_object->next_seq; > >>> my $sequence = $seq_object->seq; > >>> print "$sequence\n"; > >>> my $status = &windowscore($sequence); > >>> } > >>> > >>> But what it returned was the entire contents of the file with no > >>> format > >>> decoding. Have I been deluded? > >>> > >>> NewDNALength:810March5,200818:26Type:NCheck: > >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT > >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG > >>> CGAAGGT > >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC > >>> GGCTGCT > >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT > >>> GCAGAGC > >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG > >>> GCCAGCG > >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG > >>> TCCCCTG > >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 > >>> 51GGCAG > >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG > >>> AGACATC > >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG > >>> CCGCCC6 > >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT > >>> TCATGCG > >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG > >>> CAGCCGC > >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA > >>> GGG > >>> > >>> > >>> > >>> Nick Staffa > >>> Telephone: 919-316-4569 (NIEHS: 6-4569) > >>> Scientific Computing Support Group > >>> NIEHS Information Technology Support Services Contract > >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > >>> National Institute of Environmental Health Sciences > >>> National Institutes of Health > >>> Research Triangle Park, North Carolina > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Mar 6 08:07:21 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Mar 2008 13:07:21 +0000 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database In-Reply-To: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> Message-ID: <47CFEC89.1000705@sendu.me.uk> Edward Wijaya wrote: > Dear experts, > > Is there any? The TRANSFAC text file which contain entry like this. > Especially we wich to capture the PWM for each of the Transcription > factor. Yes; I've written a module to do this, I just haven't committed it yet because certain things aren't quite right in terms of the API. But to just grab the PWM it should work fine. If you want I can email you the modules. From sdavis2 at mail.nih.gov Thu Mar 6 08:40:25 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 6 Mar 2008 08:40:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database In-Reply-To: <47CFEC89.1000705@sendu.me.uk> References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> <47CFEC89.1000705@sendu.me.uk> Message-ID: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala wrote: > Edward Wijaya wrote: > > Dear experts, > > > > Is there any? The TRANSFAC text file which contain entry like this. > > Especially we wich to capture the PWM for each of the Transcription > > factor. > > Yes; I've written a module to do this, I just haven't committed it yet > because certain things aren't quite right in terms of the API. But to > just grab the PWM it should work fine. If you want I can email you the > modules. I believe there are a set of non-bioperl modules called TFBS. See here (although I'm not sure this is the most up-to-date site): http://tfbs.genereg.net/ Sean From David.Messina at sbc.su.se Thu Mar 6 09:55:24 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 Mar 2008 15:55:24 +0100 Subject: [Bioperl-l] failure of add_seqfeature In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com> References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com> Message-ID: <628aabb70803060655k5245296etf5ee2f31755230d3@mail.gmail.com> Hi Masa, Could you give us a little more information? A complete test case (the code you included doesn't run because for example the @from array doesn't exist) and input file would be helpful, as well as the version of BioPerl you are using. Dave From staffa at niehs.nih.gov Thu Mar 6 10:23:34 2008 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 06 Mar 2008 10:23:34 -0500 Subject: [Bioperl-l] SeqIO In-Reply-To: <200803061420.04123.heikki@sanbi.ac.za> Message-ID: Here's the scoop: When I use Jason's suggestion, (-format => 'gcg'), My program works without complaint on the original file that looks like: !!NA_SEQUENCE 1.0 NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT et c. BUT if I remove the first line to test Bio::Tools::GuessSeqFormat, (which should be retro-gcg format (before version 11?)), my program runs, but there IS a complaint: Use of uninitialized value in scalar chomp at /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, line 1. BUT If I remove (-format => 'gcg'), I get no complaint, but the sequence returned still has its numbers imbedded. This effects my calculations. Thanks, at least i know what my options are. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" wrote: > > Nick, > > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file: > > /Length: .*Type: .*Check: .*\.\.$/ > > It is the second line in GCG file. If first line matches to some other format > regex, this will not not be evaluated. > > Let us know, > > -Heikki > > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote: >> Verily, >> One interpretation of the docs might be: will read any format if the format >> is specified. >> I was hoping that I could write a program that one needn't specify format. >> It'd be more user-friendly and useful. >> >> On 3/5/08 9:33 PM, "Jason Stajich" wrote: >>> probably you should try specifying the format explicitly first- as in >>> (-format => 'gcg') >>> >>> -j >>> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: >>>> I thought GCG format changed somewhere along the way but I maybe >>>> I'm wrong? Regardless, you'll have to post this as a bug (along >>>> with an example file). >>>> >>>> Also, kind of odd that the sequence data wasn't checked... >>>> >>>> chris >>>> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: >>>>> So the Howto says that Bio::SeqIO will read almost any known format >>>>> including GCG. >>>>> So I create a GCG file with Seqlab and try to printout its >>>>> sequence as a >>>>> string. ( I did guess at the way to get the sequence string: >>>>> >>>>> #!/usr/bin/perl -w >>>>> use strict; >>>>> $| = 1; >>>>> use Bio::SeqIO; >>>>> my $number_of_files = @ARGV; >>>>> if(!$number_of_files){print "no files entered\n";exit:} >>>>> foreach my $file (@ARGV){ >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file); >>>>> my $seq_object = $seqio_object->next_seq; >>>>> my $sequence = $seq_object->seq; >>>>> print "$sequence\n"; >>>>> my $status = &windowscore($sequence); >>>>> } >>>>> >>>>> But what it returned was the entire contents of the file with no >>>>> format >>>>> decoding. Have I been deluded? >>>>> >>>>> NewDNALength:810March5,200818:26Type:NCheck: >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT >>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG >>>>> CGAAGGT >>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC >>>>> GGCTGCT >>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT >>>>> GCAGAGC >>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG >>>>> GCCAGCG >>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG >>>>> TCCCCTG >>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 >>>>> 51GGCAG >>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG >>>>> AGACATC >>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG >>>>> CCGCCC6 >>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT >>>>> TCATGCG >>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG >>>>> CAGCCGC >>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA >>>>> GGG >>>>> >>>>> >>>>> >>>>> Nick Staffa >>>>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>>>> Scientific Computing Support Group >>>>> NIEHS Information Technology Support Services Contract >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) >>>>> National Institute of Environmental Health Sciences >>>>> National Institutes of Health >>>>> Research Triangle Park, North Carolina >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Thu Mar 6 10:26:52 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Mar 2008 10:26:52 -0500 Subject: [Bioperl-l] failure of add_seqfeature In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com> References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com> Message-ID: <6BD917FC-803E-471B-A0C4-219286E53C47@gmx.net> It seems you are adding subfeatures with a location that is not within their parent feature location. If that's indeed what you want to do, add the 'EXPAND' argument. Excerpted from the POD of Bio::SeqFeature::Generic: Usage : $feat->add_SeqFeature($subfeat); $feat->add_SeqFeature($subfeat,'EXPAND') Function: adds a SeqFeature into the subSeqFeature array. with no 'EXPAND' qualifer, subfeat will be tested as to whether it lies inside the parent, and throw an exception if not. If EXPAND is used, the parent's start/end/strand will be adjusted so that it grows to accommodate the new subFeature On Mar 6, 2008, at 7:06 AM, Masa Masa wrote: > Dear experts, > > Would anybody know why the following codes generate an error of: > > > ------------- EXCEPTION ------------- > MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained > within parent feature, and expansion is not valid > STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/ > site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767 > STACK toplevel test.pl:118 > > -------------------------------------- > 15616 15693 > 79568 83016 > > ================= > > > use Bio::Graphics; > use Bio::SeqFeature::Generic; > use Bio::SeqIO; > > > my $bsg = 'Bio::SeqFeature::Generic'; > > my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], - > display_name=>'U'); > > for (my $i=0; $i < @from; $i++) { > print "$from[$i] $to[$i]\n"; > $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to > [$i])); > if ($i > 10) { > exit; > } > } > > -- > Want an e-mail address like mine? > Get a free e-mail account today at www.mail.com! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Thu Mar 6 10:41:49 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Mar 2008 15:41:49 +0000 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database In-Reply-To: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com> References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> <47CFEC89.1000705@sendu.me.uk> <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com> Message-ID: <47D010BD.4000801@sendu.me.uk> Sean Davis wrote: > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala wrote: >> Edward Wijaya wrote: >> > Dear experts, >> > >> > Is there any? The TRANSFAC text file which contain entry like this. >> > Especially we wich to capture the PWM for each of the Transcription >> > factor. >> >> Yes; I've written a module to do this, I just haven't committed it yet >> because certain things aren't quite right in terms of the API. But to >> just grab the PWM it should work fine. If you want I can email you the >> modules. > > I believe there are a set of non-bioperl modules called TFBS. See > here (although I'm not sure this is the most up-to-date site): > > http://tfbs.genereg.net/ I believe it's out of date enough to not work on the latest Transfac data, though I haven't used tried to confirm. At any rate, the Transfac (Pro) database is pretty strange and complicated, and the TFBS modules certainly don't let you access everything in the way you might want or expect. From cain.cshl at gmail.com Thu Mar 6 11:43:35 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 06 Mar 2008 11:43:35 -0500 Subject: [Bioperl-l] anonymous cvs? Message-ID: <1204821815.6689.7.camel@frissell> Hi All, So now that the transition to svn is complete (and I like it), should anonymous cvs still be working? I believe there was discussion about keeping it going via mirroring, and I hope that is the case. It will make life a little easier for people who want to do automated installs of GBrowse and would like to use the installer script to get bioperl via anon cvs. If anon cvs is no longer available, does anyone have suggestions for the best route to take for getting command line svn on Windows? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain.cshl at gmail.com Thu Mar 6 11:48:08 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 06 Mar 2008 11:48:08 -0500 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <1204821815.6689.7.camel@frissell> References: <1204821815.6689.7.camel@frissell> Message-ID: <1204822088.6689.8.camel@frissell> I should have mentioned that I tried it and it is not currently working: $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl checkout bioperl-live can't create temporary directory /tmp/cvs-serv32067 No space left on device On Thu, 2008-03-06 at 11:43 -0500, Scott Cain wrote: > Hi All, > > So now that the transition to svn is complete (and I like it), should > anonymous cvs still be working? I believe there was discussion about > keeping it going via mirroring, and I hope that is the case. It will > make life a little easier for people who want to do automated installs > of GBrowse and would like to use the installer script to get bioperl via > anon cvs. If anon cvs is no longer available, does anyone have > suggestions for the best route to take for getting command line svn on > Windows? > > Thanks, > Scott > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From Marc.Logghe at ablynx.com Thu Mar 6 11:22:10 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Thu, 6 Mar 2008 17:22:10 +0100 Subject: [Bioperl-l] SeqIO In-Reply-To: Message-ID: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com> Hi Nick, I don't think you should leave out the -format option. You have to leave it in but the format should be provided by the B::T::GuessSeqFormat object. Something like: #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; $| = 1; my $number_of_files = @ARGV; if(!$number_of_files){print "no files entered\n";exit:} foreach my $file (@ARGV){ my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file); my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format => $guesser->guess); my $seq_object = $seqio_object->next_seq; my $sequence = $seq_object->seq; print "$sequence\n"; } HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) > Sent: donderdag 6 maart 2008 16:24 > To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] SeqIO > > Here's the scoop: > When I use Jason's suggestion, (-format => 'gcg'), > My program works without complaint on the original file that looks like: > !!NA_SEQUENCE 1.0 > NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. > > 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT > et c. > > BUT if I remove the first line to test Bio::Tools::GuessSeqFormat, > (which should be retro-gcg format (before version 11?)), > my program runs, but there IS a complaint: > Use of uninitialized value in scalar chomp at > /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, line 1. > BUT > If I remove (-format => 'gcg'), I get no complaint, but the sequence > returned still has its numbers imbedded. This effects my calculations. > > Thanks, at least i know what my options are. > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > > > > > > > > On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" wrote: > > > > > Nick, > > > > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg > file: > > > > /Length: .*Type: .*Check: .*\.\.$/ > > > > It is the second line in GCG file. If first line matches to some other > format > > regex, this will not not be evaluated. > > > > Let us know, > > > > -Heikki > > > > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote: > >> Verily, > >> One interpretation of the docs might be: will read any format if the > format > >> is specified. > >> I was hoping that I could write a program that one needn't specify > format. > >> It'd be more user-friendly and useful. > >> > >> On 3/5/08 9:33 PM, "Jason Stajich" wrote: > >>> probably you should try specifying the format explicitly first- as in > >>> (-format => 'gcg') > >>> > >>> -j > >>> > >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote: > >>>> I thought GCG format changed somewhere along the way but I maybe > >>>> I'm wrong? Regardless, you'll have to post this as a bug (along > >>>> with an example file). > >>>> > >>>> Also, kind of odd that the sequence data wasn't checked... > >>>> > >>>> chris > >>>> > >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote: > >>>>> So the Howto says that Bio::SeqIO will read almost any known format > >>>>> including GCG. > >>>>> So I create a GCG file with Seqlab and try to printout its > >>>>> sequence as a > >>>>> string. ( I did guess at the way to get the sequence string: > >>>>> > >>>>> #!/usr/bin/perl -w > >>>>> use strict; > >>>>> $| = 1; > >>>>> use Bio::SeqIO; > >>>>> my $number_of_files = @ARGV; > >>>>> if(!$number_of_files){print "no files entered\n";exit:} > >>>>> foreach my $file (@ARGV){ > >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file); > >>>>> my $seq_object = $seqio_object->next_seq; > >>>>> my $sequence = $seq_object->seq; > >>>>> print "$sequence\n"; > >>>>> my $status = &windowscore($sequence); > >>>>> } > >>>>> > >>>>> But what it returned was the entire contents of the file with no > >>>>> format > >>>>> decoding. Have I been deluded? > >>>>> > >>>>> NewDNALength:810March5,200818:26Type:NCheck: > >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT > >>>>> > CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG > >>>>> CGAAGGT > >>>>> > T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC > >>>>> GGCTGCT > >>>>> > GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT > >>>>> GCAGAGC > >>>>> > GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG > >>>>> GCCAGCG > >>>>> > GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG > >>>>> TCCCCTG > >>>>> > GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 > >>>>> 51GGCAG > >>>>> > AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG > >>>>> AGACATC > >>>>> > AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG > >>>>> CCGCCC6 > >>>>> > 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT > >>>>> TCATGCG > >>>>> > CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG > >>>>> CAGCCGC > >>>>> > TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA > >>>>> GGG > >>>>> > >>>>> > >>>>> > >>>>> Nick Staffa > >>>>> Telephone: 919-316-4569 (NIEHS: 6-4569) > >>>>> Scientific Computing Support Group > >>>>> NIEHS Information Technology Support Services Contract > >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > >>>>> National Institute of Environmental Health Sciences > >>>>> National Institutes of Health > >>>>> Research Triangle Park, North Carolina > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> Christopher Fields > >>>> Postdoctoral Researcher > >>>> Lab of Dr. Robert Switzer > >>>> Dept of Biochemistry > >>>> University of Illinois Urbana-Champaign > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stefan.kirov at bms.com Thu Mar 6 10:51:25 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 06 Mar 2008 10:51:25 -0500 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database In-Reply-To: <47D010BD.4000801@sendu.me.uk> References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> <47CFEC89.1000705@sendu.me.uk> <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com> <47D010BD.4000801@sendu.me.uk> Message-ID: <47D012FD.7090600@bms.com> Sendu Bala wrote: > Sean Davis wrote: >> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala wrote: >>> Edward Wijaya wrote: >>> > Dear experts, >>> > >>> > Is there any? The TRANSFAC text file which contain entry like this. >>> > Especially we wich to capture the PWM for each of the Transcription >>> > factor. >>> >>> Yes; I've written a module to do this, I just haven't committed it yet >>> because certain things aren't quite right in terms of the API. But to >>> just grab the PWM it should work fine. If you want I can email you the >>> modules. >> >> I believe there are a set of non-bioperl modules called TFBS. See >> here (although I'm not sure this is the most up-to-date site): >> >> http://tfbs.genereg.net/ > > I believe it's out of date enough to not work on the latest Transfac > data, though I haven't used tried to confirm. > > At any rate, the Transfac (Pro) database is pretty strange and > complicated, and the TFBS modules certainly don't let you access > everything in the way you might want or expect. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Also be careful: there is a difference between PFM and PWM. Getting PWM through most programs I have encountered will assume random distribution (0.25 per each position in the background), unless you specify your own. This could be something you may be comfortable with, but you definitely should be aware of. From jay at jays.net Thu Mar 6 12:03:51 2008 From: jay at jays.net (Jay Hannah) Date: Thu, 06 Mar 2008 11:03:51 -0600 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <1204821815.6689.7.camel@frissell> References: <1204821815.6689.7.camel@frissell> Message-ID: <47D023F7.4000803@jays.net> Scott Cain wrote: > It will make life a little easier for people who want to do automated installs > of GBrowse and would like to use the installer script to get bioperl via > anon cvs. Those installer scripts can't use anon SVN instead? > If anon cvs is no longer available, does anyone have > suggestions for the best route to take for getting command line svn on > Windows? > At $work our Windows guys use GUIs for both CVS (repo dead this summer) and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And there isn't an SVN equivalent? j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From whs at ebi.ac.uk Thu Mar 6 12:08:51 2008 From: whs at ebi.ac.uk (William Spooner) Date: Thu, 6 Mar 2008 17:08:51 +0000 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <1204821815.6689.7.camel@frissell> References: <1204821815.6689.7.camel@frissell> Message-ID: <07E3119E-0354-4E93-9980-3CB2B26DF2BE@ebi.ac.uk> This will be important for Ensembl as well. As far as I know all of their install docs refer to BioPerl's anonymous CVS. On 6 Mar 2008, at 16:43, Scott Cain wrote: > Hi All, > > So now that the transition to svn is complete (and I like it), should > anonymous cvs still be working? I believe there was discussion about > keeping it going via mirroring, and I hope that is the case. It will > make life a little easier for people who want to do automated installs > of GBrowse and would like to use the installer script to get bioperl > via > anon cvs. If anon cvs is no longer available, does anyone have > suggestions for the best route to take for getting command line svn on > Windows? > > Thanks, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- William Spooner Visiting Scientist whs at ebi.ac.uk From MEC at stowers-institute.org Thu Mar 6 11:58:57 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 6 Mar 2008 10:58:57 -0600 Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database In-Reply-To: <47D010BD.4000801@sendu.me.uk> References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com> <47CFEC89.1000705@sendu.me.uk> <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com> <47D010BD.4000801@sendu.me.uk> Message-ID: we use TFBS all the time against data coming from a recent local install of TRANSFAC(r) Professional 11.1 (2007-03-31) the most recent is 11.4 (2007-12-14) TFBS::* has the nice advantage that you can interoperate Transfac pwms with other (say, Jaspar) matrices and/or simple consesus sequence patterns; and it COULD be fairly easily extended to allow interoperation with other sources, say cisRED. "One interface to rule them all" - bwa ha ha. However, if you DO have locally installed Transfac (Pro) ($$), and want to use just it, then you should know that you can also call their `match` routines from the unix command line (though this is not documented to my knowledge). I can supply my cheat sheet or otherwise advise if desired. Also, if you go this way, I've written the requisite TFMatchOut2GFF to convert TRANSFAC match's output to GFF, if it suits your purpose, which I could release if asked. If you want to use TFBS::**, I have written a command-line wrapper for the TFBS perl modules that might give you a leg up if you decide to use TFBS::**. I could release them too, if useful. But I agree, if I recall, TFBS::* were dropped from ongoing active development due to issues with data access policys. And, I think that they no longer with with remotely hosted Transfac. They did a few years ago. I think I tested a while ago and found that they do not. Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Thursday, March 06, 2008 9:42 AM > To: Sean Davis > Cc: bioperl-l at lists.open-bio.org; Edward Wijaya > Subject: Re: [Bioperl-l] BioPerl Module to Parse Transfac > Flat File Database > > Sean Davis wrote: > > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala wrote: > >> Edward Wijaya wrote: > >> > Dear experts, > >> > > >> > Is there any? The TRANSFAC text file which contain > entry like this. > >> > Especially we wich to capture the PWM for each of the > >> Transcription > factor. > >> > >> Yes; I've written a module to do this, I just haven't > committed it > >> yet because certain things aren't quite right in terms of > the API. > >> But to just grab the PWM it should work fine. If you want I can > >> email you the modules. > > > > I believe there are a set of non-bioperl modules called TFBS. See > > here (although I'm not sure this is the most up-to-date site): > > > > http://tfbs.genereg.net/ > > I believe it's out of date enough to not work on the latest > Transfac data, though I haven't used tried to confirm. > > At any rate, the Transfac (Pro) database is pretty strange > and complicated, and the TFBS modules certainly don't let you > access everything in the way you might want or expect. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Mar 6 12:10:35 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Mar 2008 11:10:35 -0600 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <1204821815.6689.7.camel@frissell> References: <1204821815.6689.7.camel@frissell> Message-ID: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu> BioPerl CVS is no longer being updated; you have to use Subversion to grab the latest (we have anon. svn set up for this). We discussed syncing svn commits over to cvs but found it way too problematic and decided to make a clean break. The best option I can think of as a replacement (so everyone isn't dependent on installing svn to get Gbrowse and bioperl-live) is to get a cron job set up which drops a bioperl-live archive into bioperl.org/ DIST or bioperl.org/SRC. We have already talked about doing this for nightly builds from svn main trunk; we can probably set that up on our end. Would that be feasible as a fallback in case svn isn't present? The subversion project page has information on Windows versions: http://subversion.tigris.org/project_packages.html chris On Mar 6, 2008, at 10:43 AM, Scott Cain wrote: > Hi All, > > So now that the transition to svn is complete (and I like it), should > anonymous cvs still be working? I believe there was discussion about > keeping it going via mirroring, and I hope that is the case. It will > make life a little easier for people who want to do automated installs > of GBrowse and would like to use the installer script to get bioperl > via > anon cvs. If anon cvs is no longer available, does anyone have > suggestions for the best route to take for getting command line svn on > Windows? > > Thanks, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cain.cshl at gmail.com Thu Mar 6 12:22:29 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 06 Mar 2008 12:22:29 -0500 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu> References: <1204821815.6689.7.camel@frissell> <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu> Message-ID: <1204824149.6689.14.camel@frissell> Hi Chris, I think a nightly generated tarball would be sufficient for my use. We used anon cvs to get the lastest bioperl and then threw it away once it was installed, so a tarball is just as good,if not better, since users wouldn't need to install svn. Not needing to install svn is good thing for all my users, since I think many distributions do not supply it by default. Thanks, Scott On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote: > BioPerl CVS is no longer being updated; you have to use Subversion to > grab the latest (we have anon. svn set up for this). We discussed > syncing svn commits over to cvs but found it way too problematic and > decided to make a clean break. > > The best option I can think of as a replacement (so everyone isn't > dependent on installing svn to get Gbrowse and bioperl-live) is to get > a cron job set up which drops a bioperl-live archive into bioperl.org/ > DIST or bioperl.org/SRC. We have already talked about doing this for > nightly builds from svn main trunk; we can probably set that up on our > end. Would that be feasible as a fallback in case svn isn't present? > > The subversion project page has information on Windows versions: > > http://subversion.tigris.org/project_packages.html > > chris > > On Mar 6, 2008, at 10:43 AM, Scott Cain wrote: > > > Hi All, > > > > So now that the transition to svn is complete (and I like it), should > > anonymous cvs still be working? I believe there was discussion about > > keeping it going via mirroring, and I hope that is the case. It will > > make life a little easier for people who want to do automated installs > > of GBrowse and would like to use the installer script to get bioperl > > via > > anon cvs. If anon cvs is no longer available, does anyone have > > suggestions for the best route to take for getting command line svn on > > Windows? > > > > Thanks, > > Scott > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain.cshl at gmail.com Thu Mar 6 12:28:13 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 06 Mar 2008 12:28:13 -0500 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <47D023F7.4000803@jays.net> References: <1204821815.6689.7.camel@frissell> <47D023F7.4000803@jays.net> Message-ID: <1204824493.6689.19.camel@frissell> Hi Jay, It could use anon svn, though svn is considerably less ubiquitous, so it effectively adds another prerequisite. For cvs, the GUI WinCVS provides command line cvs as well. I was wondering if there was an easy to install equivalent for svn, though it may be moot for me if the powers that be will provide a nightly tarball :-) Scott On Thu, 2008-03-06 at 11:03 -0600, Jay Hannah wrote: > Scott Cain wrote: > > It will make life a little easier for people who want to do automated installs > > of GBrowse and would like to use the installer script to get bioperl via > > anon cvs. > > Those installer scripts can't use anon SVN instead? > > > If anon cvs is no longer available, does anyone have > > suggestions for the best route to take for getting command line svn on > > Windows? > > > > At $work our Windows guys use GUIs for both CVS (repo dead this summer) > and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And > there isn't an SVN equivalent? > > j > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Thu Mar 6 12:28:36 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Mar 2008 11:28:36 -0600 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: <1204824149.6689.14.camel@frissell> References: <1204821815.6689.7.camel@frissell> <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu> <1204824149.6689.14.camel@frissell> Message-ID: I'm working on the nightly build script now and will post back when everything is set up. chris On Mar 6, 2008, at 11:22 AM, Scott Cain wrote: > Hi Chris, > > I think a nightly generated tarball would be sufficient for my use. > We > used anon cvs to get the lastest bioperl and then threw it away once > it > was installed, so a tarball is just as good,if not better, since users > wouldn't need to install svn. Not needing to install svn is good > thing > for all my users, since I think many distributions do not supply it by > default. > > Thanks, > Scott > > > > On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote: >> BioPerl CVS is no longer being updated; you have to use Subversion to >> grab the latest (we have anon. svn set up for this). We discussed >> syncing svn commits over to cvs but found it way too problematic and >> decided to make a clean break. >> >> The best option I can think of as a replacement (so everyone isn't >> dependent on installing svn to get Gbrowse and bioperl-live) is to >> get >> a cron job set up which drops a bioperl-live archive into >> bioperl.org/ >> DIST or bioperl.org/SRC. We have already talked about doing this for >> nightly builds from svn main trunk; we can probably set that up on >> our >> end. Would that be feasible as a fallback in case svn isn't present? >> >> The subversion project page has information on Windows versions: >> >> http://subversion.tigris.org/project_packages.html >> >> chris >> >> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote: >> >>> Hi All, >>> >>> So now that the transition to svn is complete (and I like it), >>> should >>> anonymous cvs still be working? I believe there was discussion >>> about >>> keeping it going via mirroring, and I hope that is the case. It >>> will >>> make life a little easier for people who want to do automated >>> installs >>> of GBrowse and would like to use the installer script to get bioperl >>> via >>> anon cvs. If anon cvs is no longer available, does anyone have >>> suggestions for the best route to take for getting command line >>> svn on >>> Windows? >>> >>> Thanks, >>> Scott >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain.cshl at gmail.com >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 6 15:38:22 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Mar 2008 14:38:22 -0600 Subject: [Bioperl-l] anonymous cvs? In-Reply-To: References: <1204821815.6689.7.camel@frissell> <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu> <1204824149.6689.14.camel@frissell> Message-ID: <2F746C5B-902C-4510-AEA3-2C46D4F51E7A@uiuc.edu> Okay, I have set up nightly builds for bioperl-live, db, network, and run here: http://www.bioperl.org/DIST/nightly_builds/ ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds At the moment this is running via a crontab off a script in my portal account, retrieving everything via anon. svn and bundling it up into zip and tarball archives. I would like to set it up to grab everything off dev but I don't want to mess with my ssh setup, so if anyone has ideas there... The script also adds a CHANGELOG file (last 10 commits) and removes the .svn directories prior to bundling. The archive name has the subversion revision number and date included; md5 checksums are in the SIGNATURES file. I'll check on it again tomorrow to make sure cron ran it. We can probably set up automated PPM builds as well; might be worth testing down the road (we need a way to set defaults for Build args prior to getting that running). chris On Mar 6, 2008, at 11:28 AM, Chris Fields wrote: > I'm working on the nightly build script now and will post back when > everything is set up. > > chris > > On Mar 6, 2008, at 11:22 AM, Scott Cain wrote: > >> Hi Chris, >> >> I think a nightly generated tarball would be sufficient for my >> use. We >> used anon cvs to get the lastest bioperl and then threw it away >> once it >> was installed, so a tarball is just as good,if not better, since >> users >> wouldn't need to install svn. Not needing to install svn is good >> thing >> for all my users, since I think many distributions do not supply it >> by >> default. >> >> Thanks, >> Scott >> >> >> >> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote: >>> BioPerl CVS is no longer being updated; you have to use Subversion >>> to >>> grab the latest (we have anon. svn set up for this). We discussed >>> syncing svn commits over to cvs but found it way too problematic and >>> decided to make a clean break. >>> >>> The best option I can think of as a replacement (so everyone isn't >>> dependent on installing svn to get Gbrowse and bioperl-live) is to >>> get >>> a cron job set up which drops a bioperl-live archive into >>> bioperl.org/ >>> DIST or bioperl.org/SRC. We have already talked about doing this >>> for >>> nightly builds from svn main trunk; we can probably set that up on >>> our >>> end. Would that be feasible as a fallback in case svn isn't >>> present? >>> >>> The subversion project page has information on Windows versions: >>> >>> http://subversion.tigris.org/project_packages.html >>> >>> chris >>> >>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote: >>> >>>> Hi All, >>>> >>>> So now that the transition to svn is complete (and I like it), >>>> should >>>> anonymous cvs still be working? I believe there was discussion >>>> about >>>> keeping it going via mirroring, and I hope that is the case. It >>>> will >>>> make life a little easier for people who want to do automated >>>> installs >>>> of GBrowse and would like to use the installer script to get >>>> bioperl >>>> via >>>> anon cvs. If anon cvs is no longer available, does anyone have >>>> suggestions for the best route to take for getting command line >>>> svn on >>>> Windows? >>>> >>>> Thanks, >>>> Scott >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain.cshl at gmail.com >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 6 16:48:37 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Mar 2008 15:48:37 -0600 Subject: [Bioperl-l] Nightly build archives now available Message-ID: We now have nightly bundled archives for bioperl-live, bioperl-db, bioperl-run, and bioperl-network running; these will be updated ~ 1:00 am every night. http://www.bioperl.org/DIST/nightly_builds/ ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds The archives are date-stamped and also have the Subversion revision, just in case one wanted to ensure they get the correct version for the bug fix. They also contain a CHANGELOG file for the last 10 revisions (if there are any). These are currently derived off the anon. svn repository. chris From David.Messina at sbc.su.se Thu Mar 6 18:50:04 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 7 Mar 2008 00:50:04 +0100 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: Message-ID: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> Very slick and well-thought-out, Chris -- nice job! Dave From hlapp at gmx.net Thu Mar 6 19:06:41 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Mar 2008 19:06:41 -0500 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: Message-ID: Awesome - thanks for doing this, Chris! -hilmar On Mar 6, 2008, at 4:48 PM, Chris Fields wrote: > We now have nightly bundled archives for bioperl-live, bioperl-db, > bioperl-run, and bioperl-network running; these will be updated ~ > 1:00 am every night. > > http://www.bioperl.org/DIST/nightly_builds/ > ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds > > The archives are date-stamped and also have the Subversion > revision, just in case one wanted to ensure they get the correct > version for the bug fix. They also contain a CHANGELOG file for > the last 10 revisions (if there are any). These are currently > derived off the anon. svn repository. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From staffa at niehs.nih.gov Thu Mar 6 18:27:31 2008 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 06 Mar 2008 18:27:31 -0500 Subject: [Bioperl-l] SeqIO In-Reply-To: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com> Message-ID: Thanks I really appreciate all the interest given and help generated. that sure sounds like a great idea, but i think Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself. Is there a substitute? It works great with >> !!NA_SEQUENCE 1.0 >> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. >> >> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT >> et c. as seen in: gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more guesser guesses gcg TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA GAGCCTCGGGA et c. (yes, I added my $file_type = $guesser->guess; print "guesser guesses $file_type\n"; ) BUT when applied to a genbank sequence passed thru the Seqlab editor and turned into GCG, to wit: !!NA_SEQUENCE 1.0 LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993 DEFINITION Human testis-specific PGK-2 gene for phosphoglycerate kinase (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3). ACCESSION X05246 Y00261 ... ... BASE COUNT 583 a 367 c 442 g 519 t ORIGIN HSPGK2G Length: 1911 August 24, 1998 10:56 Type: N Check: 4156 .. 1 GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC et c. It thinks it is a flawed PIR: gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more guesser guesses pir ------------- EXCEPTION ------------- MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0 LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993 Must look at why guesser is thinking PIR. On 3/6/08 11:22 AM, "Marc Logghe" wrote: > Hi Nick, > I don't think you should leave out the -format option. You have to leave > it in but the format should be provided by the B::T::GuessSeqFormat > object. > Something like: > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Tools::GuessSeqFormat; > > $| = 1; > my $number_of_files = @ARGV; > if(!$number_of_files){print "no files entered\n";exit:} > foreach my $file (@ARGV){ > my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file); > my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format => > $guesser->guess); > my $seq_object = $seqio_object->next_seq; > my $sequence = $seq_object->seq; > print "$sequence\n"; > } > > HTH, > Marc > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) >> Sent: donderdag 6 maart 2008 16:24 >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org >> Cc: Chris Fields >> Subject: Re: [Bioperl-l] SeqIO >> >> Here's the scoop: >> When I use Jason's suggestion, (-format => 'gcg'), >> My program works without complaint on the original file that looks > like: >> !!NA_SEQUENCE 1.0 >> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. >> >> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT >> et c. >> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat, >> (which should be retro-gcg format (before version 11?)), >> my program runs, but there IS a complaint: >> Use of uninitialized value in scalar chomp at >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, line > 1. >> BUT >> If I remove (-format => 'gcg'), I get no complaint, but the sequence >> returned still has its numbers imbedded. This effects my calculations. >> >> Thanks, at least i know what my options are. >> >> >> >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Information Technology Support Services Contract >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > From cjfields at uiuc.edu Thu Mar 6 23:32:39 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Mar 2008 22:32:39 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> Message-ID: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> I would like to get automated PPM builds set up as well but I think we have to rework some Build.PL stuff to get that going. The next thing is to set up a regular script to check test/POD coverage. chris On Mar 6, 2008, at 5:50 PM, Dave Messina wrote: > Very slick and well-thought-out, Chris -- nice job! > > > Dave From Marc.Logghe at ablynx.com Fri Mar 7 04:04:35 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Fri, 7 Mar 2008 10:04:35 +0100 Subject: [Bioperl-l] SeqIO In-Reply-To: Message-ID: <03C512635899144083CADB0EE22201890172A938@alpaca.lan.ablynx.com> Ahh, my reply did not make much sense when I took a new look. I was the one who learnt something here :-) Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat under the hood. Learnt as well that you have to be careful with the filename extension because this seems to have precedence. Regards, Marc > -----Original Message----- > From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov] > Sent: vrijdag 7 maart 2008 0:28 > To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] SeqIO > > Thanks > I really appreciate all the interest given and help generated. > that sure sounds like a great idea, but i think > Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself. > Is there a substitute? > It works great with > >> !!NA_SEQUENCE 1.0 > >> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. > >> > >> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT > >> et c. > > as seen in: > gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more > guesser guesses gcg > TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG GG > CA > GAGCCTCGGGA et c. > (yes, I added > my $file_type = $guesser->guess; > print "guesser guesses $file_type\n"; > ) > > BUT > when applied to a genbank sequence passed thru the Seqlab editor and > turned > into GCG, to wit: > !!NA_SEQUENCE 1.0 > LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993 > DEFINITION Human testis-specific PGK-2 gene for phosphoglycerate kinase > (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3). > ACCESSION X05246 Y00261 > ... > ... > BASE COUNT 583 a 367 c 442 g 519 t > ORIGIN > > HSPGK2G Length: 1911 August 24, 1998 10:56 Type: N Check: 4156 .. > > 1 GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC > et c. > > It thinks it is a flawed PIR: > > gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more > guesser guesses pir > > ------------- EXCEPTION ------------- > MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0 > LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993 > > > Must look at why guesser is thinking PIR. > > > > > On 3/6/08 11:22 AM, "Marc Logghe" wrote: > > > Hi Nick, > > I don't think you should leave out the -format option. You have to leave > > it in but the format should be provided by the B::T::GuessSeqFormat > > object. > > Something like: > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Tools::GuessSeqFormat; > > > > $| = 1; > > my $number_of_files = @ARGV; > > if(!$number_of_files){print "no files entered\n";exit:} > > foreach my $file (@ARGV){ > > my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file); > > my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format => > > $guesser->guess); > > my $seq_object = $seqio_object->next_seq; > > my $sequence = $seq_object->seq; > > print "$sequence\n"; > > } > > > > HTH, > > Marc > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) > >> Sent: donderdag 6 maart 2008 16:24 > >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org > >> Cc: Chris Fields > >> Subject: Re: [Bioperl-l] SeqIO > >> > >> Here's the scoop: > >> When I use Jason's suggestion, (-format => 'gcg'), > >> My program works without complaint on the original file that looks > > like: > >> !!NA_SEQUENCE 1.0 > >> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 .. > >> > >> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT > >> et c. > >> > >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat, > >> (which should be retro-gcg format (before version 11?)), > >> my program runs, but there IS a complaint: > >> Use of uninitialized value in scalar chomp at > >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, line > > 1. > >> BUT > >> If I remove (-format => 'gcg'), I get no complaint, but the sequence > >> returned still has its numbers imbedded. This effects my calculations. > >> > >> Thanks, at least i know what my options are. > >> > >> > >> > >> Nick Staffa > >> Telephone: 919-316-4569 (NIEHS: 6-4569) > >> Scientific Computing Support Group > >> NIEHS Information Technology Support Services Contract > >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > >> National Institute of Environmental Health Sciences > >> National Institutes of Health > >> Research Triangle Park, North Carolina > > From bix at sendu.me.uk Fri Mar 7 05:32:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 10:32:01 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> Message-ID: <47D119A1.10408@sendu.me.uk> Chris Fields wrote: > I would like to get automated PPM builds set up as well but I think we > have to rework some Build.PL stuff to get that going. What's the hold-up on that front? From heikki at sanbi.ac.za Fri Mar 7 06:09:25 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 7 Mar 2008 13:09:25 +0200 Subject: [Bioperl-l] BioSQL V1.0.0 released Message-ID: <200803071309.25294.heikki@sanbi.ac.za> BIOSQL V1.0.0 RELEASED http://news.open-bio.org/archives/2008_03.html#000094 Congratulations, Hilmar! -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Fri Mar 7 08:53:50 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 07:53:50 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D119A1.10408@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> Message-ID: I haven't tried it out yet, to tell the truth. The worry I have is prompting during the build process for database tests, networking, etc. I have looked for it, but couldn't determine whether we have a way to run 'perl Build.PL' and bypass prompts with passed arguments. The only one I could find was 'network', for network tests. Scott Cain and I have corresponded about this before, i.e. it would be nice to have boolean flags for each prompt (prereqs, database tests, scripts, network, etc). For nightly PPMs I would forego tests and include scripts. chris On Mar 7, 2008, at 4:32 AM, Sendu Bala wrote: > Chris Fields wrote: >> I would like to get automated PPM builds set up as well but I think >> we have to rework some Build.PL stuff to get that going. > > What's the hold-up on that front? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Mar 7 08:22:27 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 07:22:27 -0600 Subject: [Bioperl-l] BioSQL V1.0.0 released In-Reply-To: <200803071309.25294.heikki@sanbi.ac.za> References: <200803071309.25294.heikki@sanbi.ac.za> Message-ID: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu> Same here. Great news! chris On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote: > BIOSQL V1.0.0 RELEASED > http://news.open-bio.org/archives/2008_03.html#000094 > > > Congratulations, Hilmar! > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Mar 7 09:10:08 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 14:10:08 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> Message-ID: <47D14CC0.8000104@sendu.me.uk> Chris Fields wrote: > I haven't tried it out yet, to tell the truth. The worry I have is > prompting during the build process for database tests, networking, etc. > > I have looked for it, but couldn't determine whether we have a way to > run 'perl Build.PL' and bypass prompts with passed arguments. The only > one I could find was 'network', for network tests. > > Scott Cain and I have corresponded about this before, i.e. it would be > nice to have boolean flags for each prompt (prereqs, database tests, > scripts, network, etc). For nightly PPMs I would forego tests and > include scripts. I don't quite understand how you're making the nightlys right now, but you should be using the dist actions: http://www.bioperl.org/wiki/Making_a_BioPerl_release Ie. One time (and one time only): perl Build.PL (it doesn't matter how you answer the questions) Then every night: ./Build dist ./Build ppmdist You then upload the resulting .tar.gz and .zip files. Only if Build.PL or ModuleBuildBioperl are updated might you need to: ./Build realclean perl Build.PL again. But this should be a rare event and even more rarely would it be /required/ (probably never). From bix at sendu.me.uk Fri Mar 7 09:19:36 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 14:19:36 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D14CC0.8000104@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> Message-ID: <47D14EF8.5090107@sendu.me.uk> Sendu Bala wrote: > Chris Fields wrote: >> I haven't tried it out yet, to tell the truth. The worry I have is >> prompting during the build process for database tests, networking, etc. >> >> I have looked for it, but couldn't determine whether we have a way to >> run 'perl Build.PL' and bypass prompts with passed arguments. The >> only one I could find was 'network', for network tests. >> >> Scott Cain and I have corresponded about this before, i.e. it would be >> nice to have boolean flags for each prompt (prereqs, database tests, >> scripts, network, etc). For nightly PPMs I would forego tests and >> include scripts. > > I don't quite understand how you're making the nightlys right now, but > you should be using the dist actions: > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > Ie. > > One time (and one time only): > perl Build.PL (it doesn't matter how you answer the questions) > > Then every night: > ./Build dist > ./Build ppmdist > > You then upload the resulting .tar.gz and .zip files. Ah, having uploaded the various archives you'll have to manually delete them before dunning the dist action the next night, otherwise dist will ask you if you want to overwrite them. Otherwise dist asks no questions. From cjfields at uiuc.edu Fri Mar 7 09:28:36 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 08:28:36 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D14CC0.8000104@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> Message-ID: On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote: > Chris Fields wrote: >> I haven't tried it out yet, to tell the truth. The worry I have is >> prompting during the build process for database tests, networking, >> etc. >> I have looked for it, but couldn't determine whether we have a way >> to run 'perl Build.PL' and bypass prompts with passed arguments. >> The only one I could find was 'network', for network tests. >> Scott Cain and I have corresponded about this before, i.e. it would >> be nice to have boolean flags for each prompt (prereqs, database >> tests, scripts, network, etc). For nightly PPMs I would forego >> tests and include scripts. > > I don't quite understand how you're making the nightlys right now, > but you should be using the dist actions: > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > Ie. > > One time (and one time only): > perl Build.PL (it doesn't matter how you answer the questions) > > Then every night: > ./Build dist > ./Build ppmdist > > You then upload the resulting .tar.gz and .zip files. > > > Only if Build.PL or ModuleBuildBioperl are updated might you need to: > ./Build realclean > perl Build.PL > again. But this should be a rare event and even more rarely would it > be /required/ (probably never). I'm not making a distribution; the archives are merely cleaned up svn checkouts (no .svn directories). This is essentially what the net_install script would get when installing GBrowse using the 'dev' option, except you don't need to install Subversion to get updates. Also, at this point we don't have an analogous 'Download tarball' setting for browsable svn either, so this is a suitable alternative. Again, I don't want to deal with prompts while running a cron job (this is a bash script), particularly if I can't guarantee the number of prompts or the prompting order won't change down the line. If we can set up a way around that using passed args to Build.PL then it would make life much easier and we could automate 'Build dist', 'Build ppmdist', 'Build testcover', etc. chris From bix at sendu.me.uk Fri Mar 7 09:54:41 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 14:54:41 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> Message-ID: <47D15731.2050000@sendu.me.uk> Chris Fields wrote: > On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote: >> One time (and one time only): >> perl Build.PL (it doesn't matter how you answer the questions) >> >> Then every night: >> ./Build dist >> ./Build ppmdist >> >> You then upload the resulting .tar.gz and .zip files. >> >> >> Only if Build.PL or ModuleBuildBioperl are updated might you need to: >> ./Build realclean >> perl Build.PL >> again. But this should be a rare event and even more rarely would it >> be /required/ (probably never). > > I'm not making a distribution; the archives are merely cleaned up svn > checkouts (no .svn directories). This is essentially what the > net_install script would get when installing GBrowse using the 'dev' > option, except you don't need to install Subversion to get updates. > Also, at this point we don't have an analogous 'Download tarball' > setting for browsable svn either, so this is a suitable alternative. The dist action does what you want. I did a diff on the most recent nightly build and the .tar.gz produced by the dist action of a checkout of revision 14603: $ diff -r bioperl-1.5.2_100 bioperl-live diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/chaos.pm 2c2 < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $ --- > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $ Only in bioperl-live/Bio/Tools: WebBlat.pm Only in bioperl-live: CHANGELOG Only in bioperl-1.5.2_100: MANIFEST Only in bioperl-1.5.2_100: META.yml diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL 1,31c1,30 < # Note: this file was auto-generated by Module::Build::Compat version 0.03 [snip] --- > #!/usr/bin/perl -w > > # This is a stub that simply tells you to use Build.PL instead [snip] Only in bioperl-live: bioperl.lisp diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/maintenance/cvs2cl_by_file.pl 29c29 < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $ --- > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $ I don't know what's going on with the date differences, but for a file found in a folder called '/DIST/nightly_builds/', you want the MANIFEST and META.yml files. You also want the Compat version of Build.PL since we haven't yet moved to forcing people to use Build.PL. './Build dist' does the right thing. > Again, I don't want to deal with prompts while running a cron job (this > is a bash script), particularly if I can't guarantee the number of > prompts or the prompting order won't change down the line. If we can > set up a way around that using passed args to Build.PL then it would > make life much easier and we could automate 'Build dist', 'Build > ppmdist', 'Build testcover', etc. Again, you only need to run 'perl Build.PL' once and answer the questions only once. Then you can svn update and run the actions with no more questions to answer. This isn't a problem that needs to be solved. It is /supposed/ to be this way. It's ready to use! Please make use of it; it's one of the (many) reasons I moved Bioperl over to Build.PL in the first place. From cjfields at uiuc.edu Fri Mar 7 10:29:11 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 09:29:11 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D15731.2050000@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> Message-ID: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote: >>> One time (and one time only): >>> perl Build.PL (it doesn't matter how you answer the questions) >>> >>> Then every night: >>> ./Build dist >>> ./Build ppmdist >>> >>> You then upload the resulting .tar.gz and .zip files. >>> >>> >>> Only if Build.PL or ModuleBuildBioperl are updated might you need >>> to: >>> ./Build realclean >>> perl Build.PL >>> again. But this should be a rare event and even more rarely would >>> it be /required/ (probably never). >> I'm not making a distribution; the archives are merely cleaned up >> svn checkouts (no .svn directories). This is essentially what the >> net_install script would get when installing GBrowse using the >> 'dev' option, except you don't need to install Subversion to get >> updates. Also, at this point we don't have an analogous 'Download >> tarball' setting for browsable svn either, so this is a suitable >> alternative. > > The dist action does what you want. I did a diff on the most recent > nightly build and the .tar.gz produced by the dist action of a > checkout of revision 14603: > > $ diff -r bioperl-1.5.2_100 bioperl-live > diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/ > chaos.pm > 2c2 > < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $ > --- > > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $ > Only in bioperl-live/Bio/Tools: WebBlat.pm > Only in bioperl-live: CHANGELOG > Only in bioperl-1.5.2_100: MANIFEST > Only in bioperl-1.5.2_100: META.yml > diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL > 1,31c1,30 > < # Note: this file was auto-generated by Module::Build::Compat > version 0.03 > [snip] > --- > > #!/usr/bin/perl -w > > > > # This is a stub that simply tells you to use Build.PL instead > [snip] > Only in bioperl-live: bioperl.lisp > diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/ > maintenance/cvs2cl_by_file.pl > 29c29 > < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $ > --- > > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $ > > I don't know what's going on with the date differences, but for a > file found in a folder called '/DIST/nightly_builds/', you want the > MANIFEST and META.yml files. You also want the Compat version of > Build.PL since we haven't yet moved to forcing people to use Build.PL. > > './Build dist' does the right thing. > > >> Again, I don't want to deal with prompts while running a cron job >> (this is a bash script), particularly if I can't guarantee the >> number of prompts or the prompting order won't change down the >> line. If we can set up a way around that using passed args to >> Build.PL then it would make life much easier and we could automate >> 'Build dist', 'Build ppmdist', 'Build testcover', etc. > > Again, you only need to run 'perl Build.PL' once and answer the > questions only once. Then you can svn update and run the actions > with no more questions to answer. This isn't a problem that needs to > be solved. It is /supposed/ to be this way. It's ready to use! > Please make use of it; it's one of the (many) reasons I moved > Bioperl over to Build.PL in the first place. Then set it up the way you want. I give up. chris From bix at sendu.me.uk Fri Mar 7 10:43:44 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 15:43:44 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> Message-ID: <47D162B0.5070402@sendu.me.uk> Chris Fields wrote: > > On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote: > >> Again, you only need to run 'perl Build.PL' once and answer the >> questions only once. Then you can svn update and run the actions with >> no more questions to answer. This isn't a problem that needs to be >> solved. It is /supposed/ to be this way. It's ready to use! Please >> make use of it; it's one of the (many) reasons I moved Bioperl over to >> Build.PL in the first place. > > Then set it up the way you want. I give up. I really don't understand that response. I have merely informed you how Build.PL and the actions work, since you didn't know. I have informed you it already does what you want in terms of automation; there's nothing to wait for, no more work to do. I have requested you use it, since there is little value in duplicating code and effort. Now that you have the information, you can make an informed choice as to how to proceed, based on your needs. If you have good reasons for sticking with your current nightly build process, by all means stick with them. Mainly I just wanted to make clear (as a general point for anyone interested) that the questions asked by Build.PL aren't an issue or obstacle in terms of automating builds or tests. From cjfields at uiuc.edu Fri Mar 7 11:15:31 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 10:15:31 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D162B0.5070402@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> <47D162B0.5070402@sendu.me.uk> Message-ID: On Mar 7, 2008, at 9:43 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote: >> >>> Again, you only need to run 'perl Build.PL' once and answer the >>> questions only once. Then you can svn update and run the actions >>> with no more questions to answer. This isn't a problem that needs >>> to be solved. It is /supposed/ to be this way. It's ready to use! >>> Please make use of it; it's one of the (many) reasons I moved >>> Bioperl over to Build.PL in the first place. >> Then set it up the way you want. I give up. > > I really don't understand that response. I have merely informed you > how Build.PL and the actions work, since you didn't know. I have > informed you it already does what you want in terms of automation; > there's nothing to wait for, no more work to do. I have requested > you use it, since there is little value in duplicating code and > effort. > > Now that you have the information, you can make an informed choice > as to how to proceed, based on your needs. If you have good reasons > for sticking with your current nightly build process, by all means > stick with them. > > Mainly I just wanted to make clear (as a general point for anyone > interested) that the questions asked by Build.PL aren't an issue or > obstacle in terms of automating builds or tests. It doesn't come across that way; it comes off as pretty condescending. And please don't assume I lack experience with how Module::Build works (I have used 'Build ppmdist' and 'Build testcover' quite a few times recently, and the next item on my agenda is to fix the various issues with Build.PL and database checking, which you already know). So my response is pretty simple; if you feel the need to use 'Build.PL' to make nightlies, then by all means set it up. I find it much harder to work with the current Build process in an automated way using a bash script, so I work around it. If it makes you happier we can switch the directory over to 'nightly_checkouts', but I think that's just mincing semantics. Okay, it's pretty obvious we're not on the same page here. I'll go through it carefully so you understand the problem: 1) I am running a 'svn co' on anon. svn for the various distros to a temp directory. This is done using a bash script. If I attempt to change into the distribution directory and run 'perl Build.PL' from the bash script, I immediately run into permissions issues and several odd things: Checking prerequisites... - ERROR: Bio::Root::Version is not installed (I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok Going to read /root/.cpan/Metadata Database was generated on Tue, 05 Feb 2008 11:30:54 GMT Warning: You are not allowed to write into directory "/root/.cpan/ sources/authors". I'll continue, but if you encounter problems, they may be due to insufficient permissions. CPAN: LWP::UserAgent loaded ok Fetching with LWP: ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/ authors/01mailrc.txt.gz-8678': Permission denied] Fetching with Net::FTP: ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz Cannot open Local file /root/.cpan/sources/authors/01mailrc.txt.gz: Permission denied .... 2) I suspect, even if I worked around permissions and set up the job as root or admin and worked out why it can't find 'Bio::Root::Version' (?!?), this would still be a terrific pain in the *** to deal with as the Build.PL process is expecting answers for each and every prompt, and the process differs for each distribution. Yes, I could set something up to deal with that on in the script. No, I will not do that as any additions or changes to prompts could break/ hang the script or (worse) silently change what the archive contains. Hence my indication that passing flags to 'perl Build.PL' would be a nice way to work around that. For that I haven't heard a response, so I assume that functionality isn't there (or am I assuming incorrectly?). So, from where I stand, even if using Build.PL is the /proper/ way to do it, it doesn't work as expected using an automated process (i.e. cron). Make sense? chris From matthewehodges at gmail.com Fri Mar 7 11:16:47 2008 From: matthewehodges at gmail.com (Matt) Date: Fri, 7 Mar 2008 16:16:47 +0000 (UTC) Subject: [Bioperl-l] Reciprocal blast Message-ID: Dear experts, I want to do a best reciprocal blastp of a fasta protein dataset against the protein models of various species also in fasta format. The aim is o have an output showing presence/not presence. I think this is possible to do using perl, but i'm very much a beginner so any help in this would be greatly appreciated. Thanks Matt From bix at sendu.me.uk Fri Mar 7 12:34:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 07 Mar 2008 17:34:17 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> <47D162B0.5070402@sendu.me.uk> Message-ID: <47D17C99.9050009@sendu.me.uk> Chris Fields wrote: > 1) I am running a 'svn co' on anon. svn for the various distros to a > temp directory. Is it important that you do a fresh co every night? Why not do a co once and then do a 'svn update' every night? This is the crux of the problems: if you choose to simply update, then you only have to get 'perl Build.PL' to work once. > If I attempt to change into the distribution directory and run 'perl Build.PL' from the > bash script, I immediately run into permissions issues and several odd > things: > > Checking prerequisites... > - ERROR: Bio::Root::Version is not installed > (I think you ran Build.PL directly, so will use CPAN to install > prerequisites on demand) > CPAN: Storable loaded ok > Going to read /root/.cpan/Metadata > Database was generated on Tue, 05 Feb 2008 11:30:54 GMT > Warning: You are not allowed to write into directory > "/root/.cpan/sources/authors". [snip] I'm assuming this is on portal? The CPAN setup for users is a little broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/" Then you can run and configure cpan correctly and install Bundle::CPAN. Some of the zlib stuff failed to install for me, but that doesn't seem to matter. Of course, I guess it makes sense for root to just install all of Bioperl's prereqs anyway, so that testing can be automated in the future. Anyway, once you have cpan happy 'perl Build.PL' will run fine. Answer 'n' to everything and then your cron job just has to call './Build dist'. > 2) I suspect, even if I worked around permissions and set up the job as > root or admin and worked out why it can't find 'Bio::Root::Version' > (?!?), this would still be a terrific pain in the *** to deal with as > the Build.PL process is expecting answers for each and every prompt, and > the process differs for each distribution. You won't be running Build.PL in the cron job. > passing flags to 'perl Build.PL' would be a nice way to work around > that. For that I haven't heard a response, so I assume that > functionality isn't there (or am I assuming incorrectly?). It isn't AFAIK, but my point is that it doesn't need to be (for this particular use-case at least). > So, from where I stand, even if using Build.PL is the /proper/ way to do > it, it doesn't work as expected using an automated process (i.e. cron). > Make sense? Only if you can't run 'svn update' instead of 'svn co' each night. From cjfields at uiuc.edu Fri Mar 7 13:00:52 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 12:00:52 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D17C99.9050009@sendu.me.uk> References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> <47D162B0.5070402@sendu.me.uk> <47D17C99.9050009@sendu.me.uk> Message-ID: On Mar 7, 2008, at 11:34 AM, Sendu Bala wrote: > Chris Fields wrote: >> 1) I am running a 'svn co' on anon. svn for the various distros to >> a temp directory. > > Is it important that you do a fresh co every night? Why not do a co > once and then do a 'svn update' every night? This is the crux of > the problems: if you choose to simply update, then you only have to > get 'perl Build.PL' to work once. Unless you update Build.PL (which will happen as the distributions grow). Then you need to rerun 'perl Build.PL'. It seems safer to run that each time with a 'pass-through' flag for automated builds. >> If I attempt to change into the distribution directory and run >> 'perl Build.PL' from the bash script, I immediately run into >> permissions issues and several odd things: >> Checking prerequisites... >> - ERROR: Bio::Root::Version is not installed >> (I think you ran Build.PL directly, so will use CPAN to install >> prerequisites on demand) >> CPAN: Storable loaded ok >> Going to read /root/.cpan/Metadata >> Database was generated on Tue, 05 Feb 2008 11:30:54 GMT >> Warning: You are not allowed to write into directory "/root/.cpan/ >> sources/authors". > [snip] > > I'm assuming this is on portal? The CPAN setup for users is a little > broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm > > $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/" > > Then you can run and configure cpan correctly and install > Bundle::CPAN. Some of the zlib stuff failed to install for me, but > that doesn't seem to matter. > > Of course, I guess it makes sense for root to just install all of > Bioperl's prereqs anyway, so that testing can be automated in the > future. > > Anyway, once you have cpan happy 'perl Build.PL' will run fine. > Answer 'n' to everything and then your cron job just has to call './ > Build dist'. I agree about setting up the prereqs. I could also (as mentioned before) set this up as root. However, if we go this route we need to have 'perl Build.PL' included in the process in order to ensure a clean build process each time and to prevent the script from breaking whenever someone decides to change Build.PL. >> 2) I suspect, even if I worked around permissions and set up the >> job as root or admin and worked out why it can't find >> 'Bio::Root::Version' (?!?), this would still be a terrific pain in >> the *** to deal with as the Build.PL process is expecting answers >> for each and every prompt, and the process differs for each >> distribution. > > You won't be running Build.PL in the cron job. See above. I don't want to set up something automated which can't be maintained in the long term. >> passing flags to 'perl Build.PL' would be a nice way to work around >> that. For that I haven't heard a response, so I assume that >> functionality isn't there (or am I assuming incorrectly?). > > It isn't AFAIK, but my point is that it doesn't need to be (for this > particular use-case at least). See above. There are very good reasons to allow this (and the functionality has been requested before, particularly from the GMOD crowd). If I can pass in a single flag (for instance, --defaults, which just uses the default arg for each prompt) then it would make it /much/ easier. >> So, from where I stand, even if using Build.PL is the /proper/ way >> to do it, it doesn't work as expected using an automated process >> (i.e. cron). Make sense? > > Only if you can't run 'svn update' instead of 'svn co' each night. I think a single co with updates is feasible (I can do that with the current setup; just run the initial co, copy the directory over to a temp copy, then go about my business). I'll leave the nightly build setup as is for now and work on getting Build.PL working (something we need anyway for Devel::Cover and Pod::Coverage work). chris From David.Messina at sbc.su.se Fri Mar 7 13:14:38 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 7 Mar 2008 19:14:38 +0100 Subject: [Bioperl-l] Reciprocal blast In-Reply-To: References: Message-ID: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com> Hey Matt, Your question is a little beyond the scope of this mailing list. I don't know what your bioinformatics background is, but in my experience it's best to get started hands-on, either in a class or with someone you can sit down and work through it with. You'll have a million questions, and a mailing list isn't really suitable for that. That being said, I would run the blasts on the command-line, parse out the best hits with BioPerl, and then use hashes to identify mutual best hits. Briefly, you have two datasets A & B. Format each dataset into a blast database using xdformat or formatdb. Run two blasts, one with A as query and B as database and then one with B as query and A as database. The two output files, each containing multiple Blast reports, can then be processed with Bio::SearchIO to extract the best hit for each protein. Read this tutorial for help with that: http://www.bioperl.org/wiki/HOWTO:SearchIO Once you get the best hit for each protein, then you can use Perl to find every instance where two proteins, one from each set, are each other's best hit. One way would be to create two hashes, one for each set, with query proteins as keys and best hits as values, and then step through to find the reciprocal bests. Dave From jay at jays.net Fri Mar 7 13:51:35 2008 From: jay at jays.net (Jay Hannah) Date: Fri, 07 Mar 2008 12:51:35 -0600 Subject: [Bioperl-l] Reciprocal blast In-Reply-To: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com> References: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com> Message-ID: <47D18EB7.3060906@jays.net> Dave Messina wrote: > Your question is a little beyond the scope of this mailing list. I don't > know what your bioinformatics background is, but in my experience it's best > to get started hands-on, either in a class or with someone you can sit down > and work through it with. You'll have a million questions, and a mailing > list isn't really suitable for that. > > That being said, I would run the blasts on the command-line, parse out the > best hits with BioPerl, and then use hashes to identify mutual best hits. > Hi Matt, If you're a glutton for punishment and want to see a ball of Perl that automates and tracks stats across my version of "reciprocal blasts" (mine is called cross_blast()), help yourself: svn checkout svn://vc.jays.net/seqlab seqlab I abandoned my maiden voyage into bioinformatics, called "SeqLab," as a stand-alone entity when the subsequent thousand tasks I worked turned out to be unrelated to all the software I had built so far. My naive grand unification vision for all of bioinformatics didn't quite work out as I had planned. -laugh- Nowadays I just cherry-pick solutions out of its guts on demand. :) I'm happy to field any questions you have about that code, if it helps you any. Cheers, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jay at jays.net Fri Mar 7 14:43:03 2008 From: jay at jays.net (Jay Hannah) Date: Fri, 07 Mar 2008 13:43:03 -0600 Subject: [Bioperl-l] Reciprocal blast In-Reply-To: <47D18EB7.3060906@jays.net> References: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com> <47D18EB7.3060906@jays.net> Message-ID: <47D19AC7.1060907@jays.net> Jay Hannah wrote: > I'm happy to field any questions you have about that code, if it helps > you any. I created a wiki page since I stopped paying the bill on the "seqlab.net" domain: :) http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29 Cheers, j From cain.cshl at gmail.com Fri Mar 7 15:17:29 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 07 Mar 2008 15:17:29 -0500 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: Message-ID: <1204921049.6467.9.camel@frissell> Hi Chris, Thanks much for this. I have one observation though: both the http and ftp directories are empty except for a log file :-/ Also, I saw that you mentioned the 'accept the defaults' option I asked about in January. I did implement that on Build.PL at the exact time that the transition from cvs to svn was happening, so I never got committed back. Hopefully I still have it :-) I'll look around and commit it when I find it. Scott On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote: > We now have nightly bundled archives for bioperl-live, bioperl-db, > bioperl-run, and bioperl-network running; these will be updated ~ 1:00 > am every night. > > http://www.bioperl.org/DIST/nightly_builds/ > ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds > > The archives are date-stamped and also have the Subversion revision, > just in case one wanted to ensure they get the correct version for the > bug fix. They also contain a CHANGELOG file for the last 10 revisions > (if there are any). These are currently derived off the anon. svn > repository. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Fri Mar 7 15:25:01 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Mar 2008 14:25:01 -0600 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <1204921049.6467.9.camel@frissell> References: <1204921049.6467.9.camel@frissell> Message-ID: I was testing a few things earlier using 'Build dist' which tanked the old archives. I reran the script manually so everything should be up now. If you have the default setting implemented for Build.PL that would be great. There is a lingering minor issue with Data::Dumper error output via perl 5.10, but beyond that it should be fine. chris On Mar 7, 2008, at 2:17 PM, Scott Cain wrote: > Hi Chris, > > Thanks much for this. I have one observation though: both the http > and > ftp directories are empty except for a log file :-/ > > Also, I saw that you mentioned the 'accept the defaults' option I > asked > about in January. I did implement that on Build.PL at the exact time > that the transition from cvs to svn was happening, so I never got > committed back. Hopefully I still have it :-) I'll look around and > commit it when I find it. > > Scott > > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote: >> We now have nightly bundled archives for bioperl-live, bioperl-db, >> bioperl-run, and bioperl-network running; these will be updated ~ >> 1:00 >> am every night. >> >> http://www.bioperl.org/DIST/nightly_builds/ >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds >> >> The archives are date-stamped and also have the Subversion revision, >> just in case one wanted to ensure they get the correct version for >> the >> bug fix. They also contain a CHANGELOG file for the last 10 >> revisions >> (if there are any). These are currently derived off the anon. svn >> repository. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From N.Haigh at sheffield.ac.uk Fri Mar 7 16:01:43 2008 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 7 Mar 2008 21:01:43 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> <47D162B0.5070402@sendu.me.uk> <47D17C99.9050009@sendu.me.uk> Message-ID: <1204923703.47d1ad37a614a@webmail.shef.ac.uk> Quoting Chris Fields : -- snip -- > > I'll leave the nightly build setup as is for now and work on getting > Build.PL working (something we need anyway for Devel::Cover and > Pod::Coverage work). > One of the test metrics of Devel::Cover is Pod::Coverage ....no need to have a seperate Pod::Coverage test :o) Nath From cain.cshl at gmail.com Fri Mar 7 17:25:53 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 07 Mar 2008 17:25:53 -0500 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <1204921049.6467.9.camel@frissell> Message-ID: <1204928753.6467.19.camel@frissell> OK, I added my 'accept the defaults' option. Use it like this: perl Build.PL --accept 1 Scott On Fri, 2008-03-07 at 14:25 -0600, Chris Fields wrote: > I was testing a few things earlier using 'Build dist' which tanked the > old archives. I reran the script manually so everything should be up > now. > > If you have the default setting implemented for Build.PL that would be > great. There is a lingering minor issue with Data::Dumper error > output via perl 5.10, but beyond that it should be fine. > > chris > > On Mar 7, 2008, at 2:17 PM, Scott Cain wrote: > > > Hi Chris, > > > > Thanks much for this. I have one observation though: both the http > > and > > ftp directories are empty except for a log file :-/ > > > > Also, I saw that you mentioned the 'accept the defaults' option I > > asked > > about in January. I did implement that on Build.PL at the exact time > > that the transition from cvs to svn was happening, so I never got > > committed back. Hopefully I still have it :-) I'll look around and > > commit it when I find it. > > > > Scott > > > > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote: > >> We now have nightly bundled archives for bioperl-live, bioperl-db, > >> bioperl-run, and bioperl-network running; these will be updated ~ > >> 1:00 > >> am every night. > >> > >> http://www.bioperl.org/DIST/nightly_builds/ > >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds > >> > >> The archives are date-stamped and also have the Subversion revision, > >> just in case one wanted to ensure they get the correct version for > >> the > >> bug fix. They also contain a CHANGELOG file for the last 10 > >> revisions > >> (if there are any). These are currently derived off the anon. svn > >> repository. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From n.haigh at sheffield.ac.uk Sat Mar 8 07:55:39 2008 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Sat, 08 Mar 2008 12:55:39 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: References: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com> <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu> <47D119A1.10408@sendu.me.uk> <47D14CC0.8000104@sendu.me.uk> <47D15731.2050000@sendu.me.uk> <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu> <47D162B0.5070402@sendu.me.uk> Message-ID: <47D28CCB.50507@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Fields wrote: - -- snip -- > 1) I am running a 'svn co' on anon. svn for the various distros to a > temp directory. This is done using a bash script. If I attempt to > change into the distribution directory and run 'perl Build.PL' from the > bash script, I immediately run into permissions issues and several odd > things: > - -- snip -- Hi Chris, Do you need to do any svn commands after the checkout? If not, you can do "svn export" instead: http://svnbook.red-bean.com/en/1.0/re10.html This basically recursively gets the URL specified without the .svn dirs. However, you then won't be able to run any svn commands on it, as it won't be a working copy....save bandwidth and possible post processing to delete all the .svn dirs. Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH0ozL9gTv6QYzVL4RAkvcAJ9eSosx3+YWfbBg/KT6+HZrbweGSgCguLCe ZYtTxSi5q6iiR+sVGDQEZ68= =uFNP -----END PGP SIGNATURE----- From nm249 at cornell.edu Sat Mar 8 11:48:44 2008 From: nm249 at cornell.edu (Naama Menda) Date: Sat, 08 Mar 2008 11:48:44 -0500 Subject: [Bioperl-l] Bio::Ontology::OntologyI Message-ID: Hi Hilmar, I have a loading script that uses Bio::Ontology::OntologyI for parsing obo files and loading terms into chado schema. I'm trying to find all relationship types, and it seems that the parser looks at the distinct relationship types used by the terms in the file, but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ). This is important for storing the relationships in the right context , for example all relationships types defined by Sequence Ontology should be stored in the chado schema using the SO cv_id, while other relationship types, not defined as Typedef in the obo file, should be stored using the 'relationship' cv_id. Without a way to parse Typedefs, I also cannot use Bio::Ontology for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo). Is there another function in Bio::Ontology that handles Typedefs? If not can one be added? Thanks! -Naama Menda From bix at sendu.me.uk Sat Mar 8 18:30:40 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 08 Mar 2008 23:30:40 +0000 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <1204928753.6467.19.camel@frissell> References: <1204921049.6467.9.camel@frissell> <1204928753.6467.19.camel@frissell> Message-ID: <47D321A0.9010209@sendu.me.uk> Scott Cain wrote: > OK, I added my 'accept the defaults' option. Use it like this: > > perl Build.PL --accept 1 Thanks for that Scott, but can you revert and have another go at that commit, because you ended up wiping out the recent commits by Chris and myself. Also, rather than individually alter the Bioperl-specific methods like choose_scripts(), is there perhaps a cleaner way to catch every prompt, perhaps by overriding prompt() itself? Other questions may get added in the future, and some existing questions aren't immediately obvious, so it would be nice to be sure an automated process like a cron job /never/ gets asked a question. From hlapp at gmx.net Sun Mar 9 17:37:01 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 9 Mar 2008 17:37:01 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: <47D2C36C.2020802@cornell.edu> References: <47D2C36C.2020802@cornell.edu> Message-ID: Naama - it is the OntologyIO::obo parser that omits the typedefs. Parsing rather than skipping those could be added to the code; I also once started and almost completed a project to integrated the go-perl .obo parser into the Bio::OntologyIO framework, but the final touches fell victim to moving jobs and the ensuing upheaval. If all you need to do is parsing a .obo-formatted ontology and traversing it in some way, go-perl might have all you need. If you need more than that, could you elaborate? -hilmar On Mar 8, 2008, at 11:48 AM, Naama Menda wrote: > Hi Hilmar, > > I have a loading script that uses Bio::Ontology::OntologyI for > parsing obo files and loading terms into chado schema. > I'm trying to find all relationship types, and it seems that the > parser looks at the distinct relationship types used by the terms > in the file, > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ). > This is important for storing the relationships in the right > context , for example all relationships types defined by Sequence > Ontology should be stored in > the chado schema using the SO cv_id, while other relationship > types, not defined as Typedef in the obo file, should be stored > using the 'relationship' cv_id. > Without a way to parse Typedefs, I also cannot use Bio::Ontology > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo). > > Is there another function in Bio::Ontology that handles Typedefs? > If not can one be added? > > Thanks! > -Naama Menda -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From naama.menda at gmail.com Sun Mar 9 21:34:05 2008 From: naama.menda at gmail.com (Naama Menda) Date: Sun, 9 Mar 2008 21:34:05 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI Message-ID: <48F99F4E-F17B-4000-8460-9F2CB9E0D75A@gmail.com> My main problem is that go-perl does not handle updates, so if I want to update GO I need an empty schema. We find it more complicated to re-load our annotations than to update cvterms and their related data. Our loading script compares an existing load of an ontology to the obo file and updates/insets/deletes accordingly. We are now in the process of committing this code to GMOD, and thought this will be a good opportunity for adding the Typedef parsing option. Thanks, -Naama On Sun, Mar 9, 2008 at 5:37 PM, Hilmar Lapp wrote: Naama - it is the OntologyIO::obo parser that omits the typedefs. Parsing rather than skipping those could be added to the code; I also once started and almost completed a project to integrated the go-perl .obo parser into the Bio::OntologyIO framework, but the final touches fell victim to moving jobs and the ensuing upheaval. If all you need to do is parsing a .obo-formatted ontology and traversing it in some way, go-perl might have all you need. If you need more than that, could you elaborate? -hilmar On Mar 8, 2008, at 11:48 AM, Naama Menda wrote: > Hi Hilmar, > > I have a loading script that uses Bio::Ontology::OntologyI for > parsing obo files and loading terms into chado schema. > I'm trying to find all relationship types, and it seems that the > parser looks at the distinct relationship types used by the terms > in the file, > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms ()' ). > This is important for storing the relationships in the right > context , for example all relationships types defined by Sequence > Ontology should be stored in > the chado schema using the SO cv_id, while other relationship > types, not defined as Typedef in the obo file, should be stored > using the 'relationship' cv_id. > Without a way to parse Typedefs, I also cannot use Bio::Ontology > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo). > > Is there another function in Bio::Ontology that handles Typedefs? > If not can one be added? > > Thanks! > -Naama Menda -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Mar 9 22:13:15 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 9 Mar 2008 22:13:15 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: References: <47D2C36C.2020802@cornell.edu> Message-ID: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote: > My main problem is that go-perl does not handle updates, so if I > want to update GO I need an empty schema. We find it more > complicated to re-load our annotations than to update cvterms and > their related data. > Our loading script compares an existing load of an ontology to the > obo file and updates/insets/deletes accordingly. load_ontology.pl in bioperl-db should have all this functionality, though of course that doesn't give you the typedef support (yet). > > We are now in the process of committing this code to GMOD Cool - obviously load_ontology.pl doesn't work off of Chado but instead uses BioSQL as the schema (though the ontology model is *very* similar between the two). BTW please keep the Bioperl list in the loop, others may have insight too or be interested in the information. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Mar 9 22:43:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 9 Mar 2008 22:43:13 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: References: <47D2C36C.2020802@cornell.edu> <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> Message-ID: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote: > > On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp wrote: > > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote: > >> My main problem is that go-perl does not handle updates, so if I >> want to update GO I need an empty schema. We find it more >> complicated to re-load our annotations than to update cvterms and >> their related data. >> Our loading script compares an existing load of an ontology to the >> obo file and updates/insets/deletes accordingly. > > load_ontology.pl in bioperl-db should have all this functionality, > though of course that doesn't give you the typedef support (yet). > > Will you add this support to obo.pm? I had a look at it and it > seems easy to implement. > Will there be a patch? Or in the next Bioperl release? If you have ideas for how to implement this we'd be thrilled if you can provide a patch. Most changes in BioPerl happen because and by people who have an itch to scratch. Seems like this one is right down your alley? I'd in principle be interested in doing this too but can't give any promises as to when I might have time (unless I need it myself :) > > >> >> We are now in the process of committing this code to GMOD > > Cool - obviously load_ontology.pl doesn't work off of Chado but > instead uses BioSQL as the schema (though the ontology model is > *very* similar between the two). > > We store ontologies in Chado, and that was the reason for writing > a new loader. Looking at it it seems you wrote a whole new language binding? Did you find it too difficult to build on one of the existing ones (which use Class::DBI if I recall correctly, though Scott will have the details here) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From naama.menda at gmail.com Sun Mar 9 22:26:04 2008 From: naama.menda at gmail.com (Naama Menda) Date: Sun, 9 Mar 2008 22:26:04 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> References: <47D2C36C.2020802@cornell.edu> <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> Message-ID: On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp wrote: > > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote: > > My main problem is that go-perl does not handle updates, so if I want to > update GO I need an empty schema. We find it more complicated to re-load our > annotations than to update cvterms and their related data. > Our loading script compares an existing load of an ontology to the obo > file and updates/insets/deletes accordingly. > > > load_ontology.pl in bioperl-db should have all this functionality, though > of course that doesn't give you the typedef support (yet). > Will you add this support to obo.pm? I had a look at it and it seems easy to implement. Will there be a patch? Or in the next Bioperl release? > > > We are now in the process of committing this code to GMOD > > > Cool - obviously load_ontology.pl doesn't work off of Chado but instead > uses BioSQL as the schema (though the ontology model is *very* similar > between the two). > We store ontologies in Chado, and that was the reason for writing a new loader. > > BTW please keep the Bioperl list in the loop, others may have insight too > or be interested in the information. > > -hilmar > > -- > Thanks! -Naama > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From akarger at CGR.Harvard.edu Mon Mar 10 09:33:49 2008 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 10 Mar 2008 09:33:49 -0400 Subject: [Bioperl-l] Reciprocal blast References: Message-ID: <72AF30DC2881964CB911FD08E57157E7367BD5@lsdiv-msxbe-001.nucleus.harvard.edu> There's a cut & paste protocol for Reciprocal best hit blast at http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Pro tocols/Sequences.html Let me know if you need to tweak things. -Amir Karger > -----Original Message----- > From: Matt [mailto:matthewehodges at gmail.com] > Sent: Friday, March 07, 2008 11:17 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Reciprocal blast > > Dear experts, > > I want to do a best reciprocal blastp of a fasta protein > dataset against the > protein models of various species also in fasta format. The > aim is o have an > output showing presence/not presence. I think this is > possible to do using > perl, but i'm very much a beginner so any help in this would > be greatly > appreciated. > Thanks > Matt > > > From Daniel.Gerlach at medecine.unige.ch Mon Mar 10 12:13:39 2008 From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach) Date: Mon, 10 Mar 2008 17:13:39 +0100 Subject: [Bioperl-l] Bio::TreeIO - tree object to string Message-ID: <47D55E33.8060205@medecine.unige.ch> Dear all, This is a very basic question. I have a tree object in $tree and want to save its newick representation in a variable as a string: my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick'); $out->write_tree($tree); print $tree_string; Unfortunately this does not work and he prints out the newick tree on stdout plus the message "Use of uninitialized value in print at ...". He also prints out the tree on the stdout if I remove the line "print $tree_string". The variable $tree_string seems to be empty. D. From naama.menda at gmail.com Mon Mar 10 11:09:12 2008 From: naama.menda at gmail.com (Naama Menda) Date: Mon, 10 Mar 2008 11:09:12 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net> References: <47D2C36C.2020802@cornell.edu> <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net> Message-ID: On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp wrote: > > On Mar 9, 2008, at 10:26 PM, Naama Menda wrote: > > > On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp wrote: > > > > > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote: > > > > My main problem is that go-perl does not handle updates, so if I want to > > update GO I need an empty schema. We find it more complicated to re-load our > > annotations than to update cvterms and their related data. > > Our loading script compares an existing load of an ontology to the obo > > file and updates/insets/deletes accordingly. > > > > > > load_ontology.pl in bioperl-db should have all this functionality, > > though of course that doesn't give you the typedef support (yet). > > > > Will you add this support to obo.pm? I had a look at it and it seems easy > to implement. > Will there be a patch? Or in the next Bioperl release? > > > If you have ideas for how to implement this we'd be thrilled if you can > provide a patch. > > Most changes in BioPerl happen because and by people who have an itch to > scratch. Seems like this one is right down your alley? > > I'd in principle be interested in doing this too but can't give any > promises as to when I might have time (unless I need it myself :) > I'll try to provide a patch for this. I'll let you know how it goes.. > > > > > > > We are now in the process of committing this code to GMOD > > > > > > Cool - obviously load_ontology.pl doesn't work off of Chado but instead > > uses BioSQL as the schema (though the ontology model is *very* similar > > between the two). > > > > We store ontologies in Chado, and that was the reason for writing a new > loader. > > > Looking at it it seems you wrote a whole new language binding? Did you > find it too difficult to build on one of the existing ones (which use > Class::DBI if I recall correctly, though Scott will have the details here) > We already had most of the classes. We use these for other code at SGN, since it's all OO perl. All we needed to do is to add some methods and accessors to our Chado classes, and write a loading script that stores the ontology from the db and the ontology from the file in hashrefs, compare the 2 and insert/update accordingly. Our main concern was for updating pre-loaded ontologies (for new ontology files GMOD's make-ontologies works great!) > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > -Naama From clarsen at vecna.com Mon Mar 10 11:56:55 2008 From: clarsen at vecna.com (Christopher Larsen) Date: Mon, 10 Mar 2008 11:56:55 -0400 (EDT) Subject: [Bioperl-l] Reciprocal best blast hits / Orthology Message-ID: <49819.64.47.82.110.1205164615.squirrel@mail.vecna.com> Matt, Dave, Regarding reciprocal best blast hit, yes -- its beyond the list and heres how/ where to go. It seems what you are looking for is actually an Ortholog search. If so there is more to it than reciprocity and ranking--other groups are using phylo trees and bootstrap values etc. Perhaps check out the perl written up by David Roos and Chris Stockerts work: OrthoMCL. Their group is quite helpful as well. http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi The perl install goes local and will help you to identify a homolog that should have the same enzyme function or cellular role. Importantly it tends to the idea of paralogy and pseudogenes as well so you dont step into a pit. The site explains more. The point is, you're on the right track, but theres a group that's been through what you are doing and can supply you with a working implementation thats very robust and uses BioPerl modules already, so you don't have to scratch up some code. Also you can check out INPARANOID for the same reasons. Having just been through this, I'm just trying to lead to you where we went. Right now we point OrthoMCL at a whole folder of proteomes (*.faa) and it groups them accordingly. Brian O: Don't know if this folds well into your MCL wiki page or not, apologies. If the group wants some post-processing code that shows the presence/absence of proteins in any one group perhaps we can help too as there are a few things written that take the raw output directly. Cheers, Chris L ========================= Message: 5 Dear experts, I want to do a best reciprocal blastp of a fasta protein dataset against the protein models of various species also in fasta format. The aim is o have an output showing presence/not presence. I think this is possible to do using perl, but i'm very much a beginner so any help in this would be greatly appreciated. Thanks Matt -- Christopher Larsen, Ph.D. Senior Scientist Research Grants Manager Vecna Technologies 5004 Lehigh Ave College Park, MD 20740 240-737-1625 From Kevin.M.Brown at asu.edu Mon Mar 10 12:17:11 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 10 Mar 2008 09:17:11 -0700 Subject: [Bioperl-l] Bio::TreeIO - tree object to string In-Reply-To: <47D55E33.8060205@medecine.unige.ch> References: <47D55E33.8060205@medecine.unige.ch> Message-ID: <1A4207F8295607498283FE9E93B775B4048A0825@EX02.asurite.ad.asu.edu> You need to either pass in a FileHandle or a path to an output file else you are going to see the behavior you are getting. open my $tree_string, ">TreeFile.txt"; my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick'); OR my $out = new Bio::TreeIO(-file => "TreeFile.txt", -format => 'newick'); > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Daniel Gerlach > Sent: Monday, March 10, 2008 9:14 AM > To: bioperl-l at portal.open-bio.org > Subject: [Bioperl-l] Bio::TreeIO - tree object to string > > Dear all, > > This is a very basic question. I have a tree object in $tree > and want to > save its newick representation in a variable as a string: > > my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick'); > $out->write_tree($tree); > print $tree_string; > > Unfortunately this does not work and he prints out the newick tree on > stdout plus the message "Use of uninitialized value in print > at ...". He > also prints out the tree on the stdout if I remove the line "print > $tree_string". The variable $tree_string seems to be empty. > > D. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Mon Mar 10 12:51:59 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 10 Mar 2008 16:51:59 +0000 Subject: [Bioperl-l] Bio::TreeIO - tree object to string In-Reply-To: <47D55E33.8060205@medecine.unige.ch> References: <47D55E33.8060205@medecine.unige.ch> Message-ID: <47D5672F.6000709@sendu.me.uk> Daniel Gerlach wrote: > Dear all, > > This is a very basic question. I have a tree object in $tree and want to > save its newick representation in a variable as a string: > > my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick'); > $out->write_tree($tree); > print $tree_string; > > Unfortunately this does not work and he prints out the newick tree on > stdout plus the message "Use of uninitialized value in print at ...". He > also prints out the tree on the stdout if I remove the line "print > $tree_string". The variable $tree_string seems to be empty. The -fh argument is supposed to be a file handle, not a string. You can use whatever standard Perl method you like for attaching a filehandle to a scalar. Eg. my $tree_string = ''; open(my $fake_fh, "+<", \$tree_string); my $out = new Bio::TreeIO(-fh => $fake_fh, -format => 'newick'); $out->write_tree($tree); print $tree_string; Alternatively, my $tree_string = $tree->simplify_to_leaves_string() might give you want you want. From stephan.rosecker at ish.de Mon Mar 10 12:27:57 2008 From: stephan.rosecker at ish.de (stephan.rosecker) Date: Mon, 10 Mar 2008 17:27:57 +0100 Subject: [Bioperl-l] how to get unigene-cluster with bio-db Message-ID: Dear list, I try to understand how to fetch unigene-cluster with help of bio-db and a local biosql-db, but without success. I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl". #!/usr/local/bin/perl -w use strict; use Bio::DB::BioDB; use Bio::DB::Query::BioQuery; my $db = Bio::DB::BioDB->new( -database => 'biosql', -user => 'postgres', -pass => 'foo', -dbname => 'bioseqdb', -host => 'foo.bar', -port => 5435, # optional -driver => 'Pg' ); my $query = Bio::DB::Query::BioQuery->new(); $query->datacollections( ["Bio::PrimarySeqI c::subject", "Bio::PrimarySeqI p::object", "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]); $query->where(["p.accession_number = 'Hs.2'"]); my $adp = $db->get_object_adaptor('Bio::PrimarySeqI'); my $adp2 = $db->get_object_adaptor('Bio::ClusterI'); my $qres = $adp->find_by_query($query); my $qres2 = $adp2->find_by_query($query); while(my $pseq = $qres->next_object()) { print $pseq->accession_number,?\n?; } while(my $pseq = $qres2->next_object()) { print $pseq->accession_number,?\n?; } Maybe this way is wrong. Hope you can help me. stephan From hlapp at gmx.net Mon Mar 10 22:53:47 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 10 Mar 2008 22:53:47 -0400 Subject: [Bioperl-l] how to get unigene-cluster with bio-db In-Reply-To: References: Message-ID: Stephan - what is the result that you are getting? Do you receive an error? Or simply nothing? (BTW note that the object adaptor that you let the query execute will determine what you kind of object you get in return. Hence, I'd expect your $qres2 to return Bio::ClusterI compliant objects, not Bio::PrimarySeqI ones. That is probably not at the root of the problem here, though.) -hilmar On Mar 10, 2008, at 12:27 PM, stephan.rosecker wrote: > Dear list, > > I try to understand how to fetch unigene-cluster with help of bio- > db and a local biosql-db, but without success. > I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl". > > #!/usr/local/bin/perl -w > > use strict; > use Bio::DB::BioDB; > use Bio::DB::Query::BioQuery; > > my $db = Bio::DB::BioDB->new( > -database => 'biosql', > -user => 'postgres', > -pass => 'foo', > -dbname => 'bioseqdb', > -host => 'foo.bar', > -port => 5435, # optional > -driver => 'Pg' > ); > > my $query = Bio::DB::Query::BioQuery->new(); > > $query->datacollections( > ["Bio::PrimarySeqI c::subject", > "Bio::PrimarySeqI p::object", > "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]); > $query->where(["p.accession_number = 'Hs.2'"]); > > my $adp = $db->get_object_adaptor('Bio::PrimarySeqI'); > my $adp2 = $db->get_object_adaptor('Bio::ClusterI'); > my $qres = $adp->find_by_query($query); > my $qres2 = $adp2->find_by_query($query); > > while(my $pseq = $qres->next_object()) { > print $pseq->accession_number,?\n?; > } > while(my $pseq = $qres2->next_object()) { > print $pseq->accession_number,?\n?; > } > > Maybe this way is wrong. > Hope you can help me. > > stephan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Mar 10 23:17:01 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 10 Mar 2008 23:17:01 -0400 Subject: [Bioperl-l] Bio::Ontology::OntologyI In-Reply-To: References: <47D2C36C.2020802@cornell.edu> <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net> <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net> Message-ID: On Mar 10, 2008, at 11:09 AM, Naama Menda wrote: > > > On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp wrote: > > On Mar 9, 2008, at 10:26 PM, Naama Menda wrote: >> >> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp wrote: >> >> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote: >> >>> My main problem is that go-perl does not handle updates, so if I >>> want to update GO I need an empty schema. We find it more >>> complicated to re-load our annotations than to update cvterms and >>> their related data. >>> Our loading script compares an existing load of an ontology to >>> the obo file and updates/insets/deletes accordingly. >> >> load_ontology.pl in bioperl-db should have all this functionality, >> though of course that doesn't give you the typedef support (yet). >> >> Will you add this support to obo.pm? I had a look at it and it >> seems easy to implement. >> Will there be a patch? Or in the next Bioperl release? > > If you have ideas for how to implement this we'd be thrilled if you > can provide a patch. > > Most changes in BioPerl happen because and by people who have an > itch to scratch. Seems like this one is right down your alley? > > I'd in principle be interested in doing this too but can't give any > promises as to when I might have time (unless I need it myself :) > > I'll try to provide a patch for this. I'll let you know how it goes.. That'd be awesome! Don't hesitate to let us know if you hit bumps. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From stephan.rosecker at ish.de Tue Mar 11 07:08:08 2008 From: stephan.rosecker at ish.de (stephan.rosecker) Date: Tue, 11 Mar 2008 12:08:08 +0100 Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl Message-ID: Dear list, I have started the "bp_load_seqdatabase.pl" script from the "bioperl-db-1.5.2_100" package with the unigene "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS is postgres on a similar machine. BioSQL core schema is v1.0.0.. The job runs since friday. ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb --dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ncbi/Hs.data Is it normal that it takes so long? What are your experiences? best regards stephan From diriano at uni-potsdam.de Tue Mar 11 07:55:54 2008 From: diriano at uni-potsdam.de (=?UTF-8?B?RGllZ28gTWF1cmljaW8gUmlhwpbDsW8gUGFjaMOzwpdu?=) Date: Tue, 11 Mar 2008 12:55:54 +0100 Subject: [Bioperl-l] problem with SearchIO and writer Message-ID: <47D6734A.5060103@uni-potsdam.de> Dear all, I have a small problem parsing a BLAST report with SearchIO and using TextResultWriter. I have a large file with several BLAST results, I instantiate SearchIO as: my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => "$blastFile"); ##then I extract each individual report as while (my $result = $searchio->next_result){ my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new(); my $blastresult=$writertxt->to_string($result); while (my $hit = $result->next_hit){ print $hit->name."\n"; } } -------------- next part -------------- A non-text attachment was scrubbed... Name: diriano.vcf Type: text/x-vcard Size: 324 bytes Desc: not available URL: From diriano at uni-potsdam.de Tue Mar 11 09:31:35 2008 From: diriano at uni-potsdam.de (diriano at uni-potsdam.de) Date: Tue, 11 Mar 2008 14:31:35 +0100 Subject: [Bioperl-l] problem with SearchIO and writer Message-ID: <1205242295.47d689b7186ad@webmail.uni-potsdam.de> Dear all, I have a small problem parsing a BLAST report with SearchIO and using TextResultWriter. I have a large file with several BLAST results, I instantiate SearchIO as: my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => "$blastFile"); ##then I extract each individual report as while (my $result = $searchio->next_result){ my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new(); my $blastresult=$writertxt->to_string($result); while (my $hit = $result->next_hit){ print $hit->name."\n"; } } -- Diego Mauricio Ria?o-Pach?n Biologist Institute of Biology and Biochemistry University of Potsdam Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel:0331/977-2809 http://www.geocities.com/dmrp.geo/ From diriano at uni-potsdam.de Tue Mar 11 10:25:19 2008 From: diriano at uni-potsdam.de (diriano at uni-potsdam.de) Date: Tue, 11 Mar 2008 15:25:19 +0100 Subject: [Bioperl-l] problem with SearchIO and writer Message-ID: <1205245519.47d6964fcaa48@webmail.uni-potsdam.de> Dear all, Please excuse my previous e-mail, it was incomplete, here it is again: I have a small problem parsing a BLAST report with SearchIO and using TextResultWriter. I have a large file with several BLAST results, I instantiate SearchIO as: my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => "$blastFile"); ##then I extract each individual report as while (my $result = $searchio->next_result){ my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new(); my $blastresult=$writertxt->to_string($result); while (my $hit = $result->next_hit){ print $hit->name."\n"; #I will do further processing of the HSPs } } But I do not get any output. It works if I comment the lines referencing the $writertxt. The problem is that I need to extract the whole report to later insert it ($blastresult) into a database. But I also need to process each hit and hsp. Any idea how can I accomplish this? Any help will be greatly appreciated. Have a nice day, Diego -- Diego Mauricio Riano Pachon Biologist Institute of Biology and Biochemistry University of Potsdam Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel:0331/977-2809 http://www.geocities.com/dmrp.geo/ From sac at bioperl.org Tue Mar 11 15:04:48 2008 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 11 Mar 2008 12:04:48 -0700 Subject: [Bioperl-l] BioSQL V1.0.0 released In-Reply-To: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu> References: <200803071309.25294.heikki@sanbi.ac.za> <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu> Message-ID: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com> Ditto. Thanks for biting the bullet, Hilmar. BTW, I put together a little compendium/review of various bioinformatics data models a few months ago, where I mention BioSQL among others, but I never really announced it: http://biodatamodel.org/ It thought about wikifying it to get the community involved in maintaining it, but haven't gotten around to it yet. Feedback is welcome. Cheers, Steve On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields wrote: > Same here. Great news! > > chris > > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote: > > > BIOSQL V1.0.0 RELEASED > > http://news.open-bio.org/archives/2008_03.html#000094 > > > > > > Congratulations, Hilmar! > > > > -Heikki > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Tue Mar 11 15:29:41 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 11 Mar 2008 15:29:41 -0400 Subject: [Bioperl-l] Nightly build archives now available In-Reply-To: <47D321A0.9010209@sendu.me.uk> References: <1204921049.6467.9.camel@frissell> <1204928753.6467.19.camel@frissell> <47D321A0.9010209@sendu.me.uk> Message-ID: <1205263781.6220.37.camel@frissell> Hi Sendu, Sorry about that; I diffed the version I had with what was in svn, but apparently didn't look closely at the results. Do you happen to know the best way of reverting with svn? After it gets reverted one way or the other, I agree that overriding prompt to include another argument is a good way to go. That way the value of $accept can be passed to it and it just does the right thing, regardless of when/where it is getting called. I'll do that. Scott On Sat, 2008-03-08 at 23:30 +0000, Sendu Bala wrote: > Scott Cain wrote: > > OK, I added my 'accept the defaults' option. Use it like this: > > > > perl Build.PL --accept 1 > > Thanks for that Scott, but can you revert and have another go at that > commit, because you ended up wiping out the recent commits by Chris and > myself. > > Also, rather than individually alter the Bioperl-specific methods like > choose_scripts(), is there perhaps a cleaner way to catch every prompt, > perhaps by overriding prompt() itself? Other questions may get added in > the future, and some existing questions aren't immediately obvious, so > it would be nice to be sure an automated process like a cron job /never/ > gets asked a question. -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gmx.net Tue Mar 11 17:34:30 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 11 Mar 2008 17:34:30 -0400 Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl In-Reply-To: References: Message-ID: It won't be fast, as it will create about ~6 Mln bioentries in your database. However, it running since Friday sounds on the high end. The first step I recommend doing when running into this kind of situation is checking the CPU load that the script generates, compared to the load generated by the database server. If the script's CPU load is significantly less than ~10% then it is likely that your database is too slow. There are various possible reasons why it may be too slow, ranging from limited resources, to grossly suboptimal configuration. If your database is running on the same 15GB server then resources should not be an issue (assuming that you don't have a totally antiquated CPU there). You might still want to check the PostgreSQL config file, though. What I would suspect though is that you didn't VACUUM the database before and/or during the load. That will make the indexes used for lookup increasingly slow as a large amount of data accumulates. Does this ring a bell? -hilmar On Mar 11, 2008, at 7:08 AM, stephan.rosecker wrote: > Dear list, > > I have started the "bp_load_seqdatabase.pl" script from the > "bioperl-db-1.5.2_100" package with the unigene > "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS > is postgres on a similar machine. > BioSQL core schema is v1.0.0.. > > The job runs since friday. > > ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb -- > dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ > ncbi/Hs.data > > Is it normal that it takes so long? > What are your experiences? > > best regards > stephan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From vuhlhorn at ramapo.edu Wed Mar 12 14:51:50 2008 From: vuhlhorn at ramapo.edu (Victoria Lyn Uhlhorn) Date: Wed, 12 Mar 2008 14:51:50 -0400 (EDT) Subject: [Bioperl-l] HOWTO:Trees module Message-ID: <20080312145150.ASS93757@msg-1.mail.ramapo.edu> The following script: #!/usr/bin/perl -w use CGI ':standard'; use Bio::Perl; use Bio::Align::ProteinStatistics; use Bio::Tree::DistanceFactory; use Bio::TreeIO; print header; print start_html(-bgcolor=>"pink", -title=>('Phylogenetic Tree'), -style=>{- src=>$style}, -class=>Ltitle), p(), 'Tree'; print start_form, hr; my $alnio= Bio::AlignIO->new(-file => '/Users/glitterchix4u/Sites/CGI- bin/HepatitisSerineProt.clustalw', -format => 'clustalw'); my $profactory = Bio::Tree::DistanceFactory->new(-mnethod => 'NJ'); my $stats = Bio::Align::ProteinStatistics->new; my $treeout = Bio::TreeIO->new(-format => 'newick'); my $tree; while(my $aln = $alnio->next_aln) { my $mat = $stats->distance(-method => 'Kimura', -align => $aln); $tree = $profactory->make_tree($mat); $treeout->write_tree($tree); } #$treeout->print_tree($tree); print "Tree is: ", $tree->size; print end_form; print end_html; How do I print the tree? I'm having a hard time printing the tree out. From bix at sendu.me.uk Wed Mar 12 19:20:01 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 12 Mar 2008 23:20:01 +0000 Subject: [Bioperl-l] HOWTO:Trees module In-Reply-To: <20080312145150.ASS93757@msg-1.mail.ramapo.edu> References: <20080312145150.ASS93757@msg-1.mail.ramapo.edu> Message-ID: <47D86521.1010705@sendu.me.uk> Victoria Lyn Uhlhorn wrote: > my $treeout = Bio::TreeIO->new(-format => 'newick'); > $treeout->write_tree($tree); > How do I print the tree? I'm having a hard time printing the tree out. Your TreeIO will write its trees to the file or filehandle you give it. But you haven't given it one. Give it one are write_tree() will then cause the tree to be 'printed' there. If you want the trees stored in a string so you can print() them, there are ways to open a filehandle onto a scalar variable. From hlapp at gmx.net Thu Mar 13 18:51:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Mar 2008 18:51:13 -0400 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> Message-ID: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> (this is more of a bioperl question than a biosql one) The load_ncbi_taxonomy.pl script is designed to update the taxon tables in a non-disruptive way, and if there weren't many changes shouldn't actually take that long (except that recalculating the nested set values may take a couple of minutes). Bioperl-db will store the taxon information it finds in the Bio::Species object if it can't locate the taxon by lookup, and will not raise an error. The problem with this is that it relies on the Bio::SeqIO parser to have gotten the species and lineage information correct, which is sometimes a wrong assumption for exotic species. Most often the error will not manifest itself at the time of storing the erroneously parsed information, but when it is re-retrieved and used to populate a Bio::Species object. For the SymAtlas project we had this situation (new species in sequence updates that the last NCBI taxonomy update hadn't yet brought in) quite regularly. I wrote a SQL script would fix those 'haphazard' additions such that load_ncbi_taxonomy would update them to their correct values come the next NCBI taxonomy update. I can send you the script (it would be for the Oracle version), but I'm not sure this is a widely viable strategy. -hilmar On Mar 13, 2008, at 11:06 AM, Peter wrote: > Dear list, > > One of the unresolved issues with Biopython's BioSQL interface is > dealing with the NCBI taxon ID when loading sequences into the > database. > > As I understand it, ideally before loading any sequences, the user > will have loaded in the entire NCBI taxonomy using the > load_ncbi_taxonomy.pl script, as I described here: > http://biopython.org/wiki/BioSQL#NCBI_Taxonomy > > When a new sequence is added to the database with a known taxon id, > there is no problem. But happens if its a recently sequenced organism > which isn't defined yet in the BioSQL taxonomy tables? Could/should > the user re-run load_ncbi_taxonomy.pl, and then load in their new > sequence? > > Right now in Biopython due what appears to have been intended as a > short term hack, we simple don't record the taxon id at all (!), and I > would like to fix this (bug 2422). > http://bugzilla.open-bio.org/show_bug.cgi?id=2422 > > How do BioPerl et al deal with this issue? Do they try and update the > taxonomy tables using the available information in the new record's > annotation (i.e. the new taxon id and the species name)? Do they > lookup the NCBI taxonomy definition via the internet? Do they throw > an error and halt? > > Thanks, > > Peter > (Biopython) > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Mar 13 19:41:43 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Mar 2008 19:41:43 -0400 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> Message-ID: On Mar 13, 2008, at 7:13 PM, Peter wrote: > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp wrote: >> [...] >> The load_ncbi_taxonomy.pl script is designed to update the taxon >> tables in a non-disruptive way, and if there weren't many changes >> shouldn't actually take that long (except that recalculating the >> nested set values may take a couple of minutes). > > Do you think when faced with a novel taxon id, Biopython/BioPerl/... > could write some minimal taxonomy entry (without any guess work based > on the species name), in order to record the sequence's taxon This is what Bioperl-db does. There isn't any guesswork. If Bio::Species has lineage information it will also insert the lineage information, though. > - and then running an improved load_ncbi_taxonomy.pl at a later > date would > sort out the proper taxonomy? If I remember correctly, the script makes (and hence expects) the primary key and the NCBI taxonomy ID to be identical. If your loading procedure can achieve that already then load_ncbi_taxonomy.pl should pick them up and fix them. You can try that by loading the taxonomy through the script, then arbitrarily choose a taxon, create a stub bioentry for it and set its taxon_id foreign key to the chosen taxon, change its taxon_name.name to some bogus value (for the 'scientific name' class, for example) (and feel free to change the left_id and right_id values in taxon too), and rerun the script. It should fix the change you made, and your bioentry should still point to the same taxon (because its primary key did not change, and did not get deleted either; otherwise the bioentry would now have a null value in the foreign key). The Bioperl-db way of storing things does not give control over primary key assignment to Bioperl-db, so the database will assign it. > [...] >> For the SymAtlas project we had this situation (new species in >> sequence updates that the last NCBI taxonomy update hadn't yet >> brought in) quite regularly. I wrote a SQL script would fix those >> 'haphazard' additions such that load_ncbi_taxonomy would update them >> to their correct values come the next NCBI taxonomy update. I can >> send you the script (it would be for the Oracle version), but I'm >> not >> sure this is a widely viable strategy. > > So this wasn't integrated with load_ncbi_taxonomy.pl at all? No, but now that you say it I don't see any reason why I couldn't. Maybe that's just what I should do. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Thu Mar 13 19:13:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Mar 2008 23:13:32 +0000 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> Message-ID: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp wrote: > (this is more of a bioperl question than a biosql one) Well, yes and no. And I'm not subscribed to the Bioperl list, nor the BioJava one, nor the BioRuby one. > The load_ncbi_taxonomy.pl script is designed to update the taxon > tables in a non-disruptive way, and if there weren't many changes > shouldn't actually take that long (except that recalculating the > nested set values may take a couple of minutes). Do you think when faced with a novel taxon id, Biopython/BioPerl/... could write some minimal taxonomy entry (without any guess work based on the species name), in order to record the sequence's taxon - and then running an improved load_ncbi_taxonomy.pl at a later date would sort out the proper taxonomy? > Bioperl-db will store the taxon information it finds in the > Bio::Species object if it can't locate the taxon by lookup, and will > not raise an error. The problem with this is that it relies on the > Bio::SeqIO parser to have gotten the species and lineage information > correct, which is sometimes a wrong assumption for exotic species. > Most often the error will not manifest itself at the time of storing > the erroneously parsed information, but when it is re-retrieved and > used to populate a Bio::Species object. This is what I would like to avoid with Biopython. > For the SymAtlas project we had this situation (new species in > sequence updates that the last NCBI taxonomy update hadn't yet > brought in) quite regularly. I wrote a SQL script would fix those > 'haphazard' additions such that load_ncbi_taxonomy would update them > to their correct values come the next NCBI taxonomy update. I can > send you the script (it would be for the Oracle version), but I'm not > sure this is a widely viable strategy. So this wasn't integrated with load_ncbi_taxonomy.pl at all? Peter From hlapp at gmx.net Fri Mar 14 00:00:40 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Mar 2008 00:00:40 -0400 Subject: [Bioperl-l] bioperl basics In-Reply-To: <20080313.195825.6855.0@webmail20.vgs.untd.com> References: <20080313.195825.6855.0@webmail20.vgs.untd.com> Message-ID: <0A104B1F-315F-418C-A6DA-84FF04CC438C@gmx.net> John - you want to send this to the BioPerl mailing list at bioperl-l at lists.open-bio.org. Your problem really is a Perl problem though, not BioPerl. The most likely cause is that you don't have the Cache::FileCache module installed, so that's what I would do. The answer to your question for how to change @INC is using -I on the command line, 'use lib' in your script, or set the PERL5LIB environment variable. -hilmar On Mar 14, 2008, at 1:58 AM, mrphysh at juno.com wrote: > I am a molecular biologist studying bioinformatics from a Perl > background and making progress. I am realizing that without > tapping into the existing infrastructure, I will be writing code > for ever. Bioperl is the path for me. I am moving forward. > > the error I encounter is > > can't locate Cache/FileCache in @INC (@INC contains /etc/perl/ /usr/ > locaql/lib/perl/5.8.8 .....) and so forth. > > I found the files in a home directory. I must have told the > install to put them there...? > > > anyway: How do I edit this environmental variable..... @INC. I > cannot find anything in my book. > > thanks > john brigham > > > I will be writing code for years and need to tap into the > _____________________________________________________________ > Need cash? Click to get an emergency loan, bad credit ok > http://thirdpartyoffers.juno.com/TGL2121/fc/ > Ioyw6i3mKmyQsg01zMPK1Qa0178ZfajwTEBgEXdzlmb9zLLZc8pLOU/ > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From markjschreiber at gmail.com Fri Mar 14 09:48:38 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 14 Mar 2008 21:48:38 +0800 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> Message-ID: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> >From memory BioJava will add it if it is not already in there. If the taxid can be found then the system connects you with whatever is in that taxid, it doesn't overwrite it. This has two curious side effects. Because the details associated with a taxid sometimes change (eg common name changes a lot) you can get connected to an outdated version (if your record is newer than your NCBI taxonomy) or you can get connected with a version that is newer than your record which means when you round-trip you don't get complete identity. For compatibility across the projects some kind of consensus would be good. - Mark On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp wrote: > > > On Mar 13, 2008, at 7:13 PM, Peter wrote: > > > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp wrote: > >> [...] > > >> The load_ncbi_taxonomy.pl script is designed to update the taxon > >> tables in a non-disruptive way, and if there weren't many changes > >> shouldn't actually take that long (except that recalculating the > >> nested set values may take a couple of minutes). > > > > Do you think when faced with a novel taxon id, Biopython/BioPerl/... > > could write some minimal taxonomy entry (without any guess work based > > on the species name), in order to record the sequence's taxon > > This is what Bioperl-db does. There isn't any guesswork. If > Bio::Species has lineage information it will also insert the lineage > information, though. > > > > - and then running an improved load_ncbi_taxonomy.pl at a later > > date would > > sort out the proper taxonomy? > > If I remember correctly, the script makes (and hence expects) the > primary key and the NCBI taxonomy ID to be identical. If your loading > procedure can achieve that already then load_ncbi_taxonomy.pl should > pick them up and fix them. You can try that by loading the taxonomy > through the script, then arbitrarily choose a taxon, create a stub > bioentry for it and set its taxon_id foreign key to the chosen > taxon, change its taxon_name.name to some bogus value (for the > 'scientific name' class, for example) (and feel free to change the > left_id and right_id values in taxon too), and rerun the script. It > should fix the change you made, and your bioentry should still point > to the same taxon (because its primary key did not change, and did > not get deleted either; otherwise the bioentry would now have a null > value in the foreign key). > > The Bioperl-db way of storing things does not give control over > primary key assignment to Bioperl-db, so the database will assign it. > > > [...] > > >> For the SymAtlas project we had this situation (new species in > >> sequence updates that the last NCBI taxonomy update hadn't yet > >> brought in) quite regularly. I wrote a SQL script would fix those > >> 'haphazard' additions such that load_ncbi_taxonomy would update them > >> to their correct values come the next NCBI taxonomy update. I can > >> send you the script (it would be for the Oracle version), but I'm > >> not > >> sure this is a widely viable strategy. > > > > So this wasn't integrated with load_ncbi_taxonomy.pl at all? > > No, but now that you say it I don't see any reason why I couldn't. > Maybe that's just what I should do. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > > > > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From cjfields at uiuc.edu Fri Mar 14 10:31:09 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 14 Mar 2008 09:31:09 -0500 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: The counter to that perspective (using new sequences with old tax info) would be to regularly update NCBI taxonomy, particularly in circumstances prior to adding new sequences. Hilmar mentioned that once tax is loaded it doesn't take as long to update, so you could set up a cron job to update regularly. I remember someone mentioning weekly or monthly updates on the list quite a while ago, but I'm unsure how often NCBI updates tax information (i.e. with every release, monthly, weekly, etc). I can see instances popping up where you used the an up-to-date taxonomy but a new sequence contains a tax ID not present. I think bioperl-db handles these but I'm not sure what other Bio* do. chris On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote: >> From memory BioJava will add it if it is not already in there. If the > taxid can be found then the system connects you with whatever is in > that taxid, it doesn't overwrite it. > > This has two curious side effects. Because the details associated with > a taxid sometimes change (eg common name changes a lot) you can get > connected to an outdated version (if your record is newer than your > NCBI taxonomy) or you can get connected with a version that is newer > than your record which means when you round-trip you don't get > complete identity. > > For compatibility across the projects some kind of consensus would > be good. > > - Mark > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp wrote: >> >> >> On Mar 13, 2008, at 7:13 PM, Peter wrote: >> >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp wrote: >>>> [...] >> >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon >>>> tables in a non-disruptive way, and if there weren't many changes >>>> shouldn't actually take that long (except that recalculating the >>>> nested set values may take a couple of minutes). >>> >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/... >>> could write some minimal taxonomy entry (without any guess work >>> based >>> on the species name), in order to record the sequence's taxon >> >> This is what Bioperl-db does. There isn't any guesswork. If >> Bio::Species has lineage information it will also insert the lineage >> information, though. >> >> >>> - and then running an improved load_ncbi_taxonomy.pl at a later >>> date would >>> sort out the proper taxonomy? >> >> If I remember correctly, the script makes (and hence expects) the >> primary key and the NCBI taxonomy ID to be identical. If your loading >> procedure can achieve that already then load_ncbi_taxonomy.pl should >> pick them up and fix them. You can try that by loading the taxonomy >> through the script, then arbitrarily choose a taxon, create a stub >> bioentry for it and set its taxon_id foreign key to the chosen >> taxon, change its taxon_name.name to some bogus value (for the >> 'scientific name' class, for example) (and feel free to change the >> left_id and right_id values in taxon too), and rerun the script. It >> should fix the change you made, and your bioentry should still point >> to the same taxon (because its primary key did not change, and did >> not get deleted either; otherwise the bioentry would now have a null >> value in the foreign key). >> >> The Bioperl-db way of storing things does not give control over >> primary key assignment to Bioperl-db, so the database will assign it. >> >>> [...] >> >>>> For the SymAtlas project we had this situation (new species in >>>> sequence updates that the last NCBI taxonomy update hadn't yet >>>> brought in) quite regularly. I wrote a SQL script would fix those >>>> 'haphazard' additions such that load_ncbi_taxonomy would update >>>> them >>>> to their correct values come the next NCBI taxonomy update. I can >>>> send you the script (it would be for the Oracle version), but I'm >>>> not >>>> sure this is a widely viable strategy. >>> >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all? >> >> No, but now that you say it I don't see any reason why I couldn't. >> Maybe that's just what I should do. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> >> >> >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From markjschreiber at gmail.com Fri Mar 14 20:56:37 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 15 Mar 2008 08:56:37 +0800 Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id In-Reply-To: References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com> <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net> <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com> <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com> Message-ID: <93b45ca50803141756m3d7f022cnb57bd39f37270682@mail.gmail.com> I agree. A regular update would be best. Of course if your BioSQL db is limited to one or a few organisms you can just keep a fragment of the db. - Mark On Fri, Mar 14, 2008 at 10:31 PM, Chris Fields wrote: > The counter to that perspective (using new sequences with old tax > info) would be to regularly update NCBI taxonomy, particularly in > circumstances prior to adding new sequences. Hilmar mentioned that > once tax is loaded it doesn't take as long to update, so you could set > up a cron job to update regularly. > > I remember someone mentioning weekly or monthly updates on the list > quite a while ago, but I'm unsure how often NCBI updates tax > information (i.e. with every release, monthly, weekly, etc). I can > see instances popping up where you used the an up-to-date taxonomy but > a new sequence contains a tax ID not present. I think bioperl-db > handles these but I'm not sure what other Bio* do. > > chris > > On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote: > > >> From memory BioJava will add it if it is not already in there. If the > > taxid can be found then the system connects you with whatever is in > > that taxid, it doesn't overwrite it. > > > > This has two curious side effects. Because the details associated with > > a taxid sometimes change (eg common name changes a lot) you can get > > connected to an outdated version (if your record is newer than your > > NCBI taxonomy) or you can get connected with a version that is newer > > than your record which means when you round-trip you don't get > > complete identity. > > > > For compatibility across the projects some kind of consensus would > > be good. > > > > - Mark > > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp wrote: > >> > >> > >> On Mar 13, 2008, at 7:13 PM, Peter wrote: > >> > >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp wrote: > >>>> [...] > >> > >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon > >>>> tables in a non-disruptive way, and if there weren't many changes > >>>> shouldn't actually take that long (except that recalculating the > >>>> nested set values may take a couple of minutes). > >>> > >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/... > >>> could write some minimal taxonomy entry (without any guess work > >>> based > >>> on the species name), in order to record the sequence's taxon > >> > >> This is what Bioperl-db does. There isn't any guesswork. If > >> Bio::Species has lineage information it will also insert the lineage > >> information, though. > >> > >> > >>> - and then running an improved load_ncbi_taxonomy.pl at a later > >>> date would > >>> sort out the proper taxonomy? > >> > >> If I remember correctly, the script makes (and hence expects) the > >> primary key and the NCBI taxonomy ID to be identical. If your loading > >> procedure can achieve that already then load_ncbi_taxonomy.pl should > >> pick them up and fix them. You can try that by loading the taxonomy > >> through the script, then arbitrarily choose a taxon, create a stub > >> bioentry for it and set its taxon_id foreign key to the chosen > >> taxon, change its taxon_name.name to some bogus value (for the > >> 'scientific name' class, for example) (and feel free to change the > >> left_id and right_id values in taxon too), and rerun the script. It > >> should fix the change you made, and your bioentry should still point > >> to the same taxon (because its primary key did not change, and did > >> not get deleted either; otherwise the bioentry would now have a null > >> value in the foreign key). > >> > >> The Bioperl-db way of storing things does not give control over > >> primary key assignment to Bioperl-db, so the database will assign it. > >> > >>> [...] > >> > >>>> For the SymAtlas project we had this situation (new species in > >>>> sequence updates that the last NCBI taxonomy update hadn't yet > >>>> brought in) quite regularly. I wrote a SQL script would fix those > >>>> 'haphazard' additions such that load_ncbi_taxonomy would update > >>>> them > >>>> to their correct values come the next NCBI taxonomy update. I can > >>>> send you the script (it would be for the Oracle version), but I'm > >>>> not > >>>> sure this is a widely viable strategy. > >>> > >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all? > >> > >> No, but now that you say it I don't see any reason why I couldn't. > >> Maybe that's just what I should do. > >> > >> -hilmar > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> _______________________________________________ > >> > >> > >> > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From charles-listes+bioperl at plessy.org Mon Mar 17 00:13:11 2008 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Mon, 17 Mar 2008 13:13:11 +0900 Subject: [Bioperl-l] Are all recommended modules equally important ? Message-ID: <20080317041311.GA3784@kunpuu.plessy.org> Dear Bioperl developpers, In the Debian Project, we distribute packages for Bioperl and need to express their dependancy to other Perl modules with "Depends", "Recommends" and "Suggests" levels. For the moment, everything that is listed in the "recommends" hash of Build.PL is "Recommended" by our Debian package. This means that they will be installed by default when installing Bioperl, but that users can force their removal if needed. Being "Recommended" also means in Debian that if the recommended module is not available, then the Debian bioperl package will not reach our internal quality criteria for being part of our stable release. Therefore I would like to know if you think that some of the modules recommeded by Bioperl through the "recommends" hash of Build.PL are less important than others, i.e. that we can just "Suggest" them in our dependancy system. "Suggested" packages are not installed by default. The complete definition of the meaning of "Depends", "Recommends" and "Suggests" for Debian packages can be found in the section 7.2 of the Debian policy: http://www.debian.org/doc/debian-policy/ch-relationships.html Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers using 1.5.2 in its next stable release. We welcome your comments on this as well. The Debian package for Bioperl 1.4: http://packages.debian.org/lenny/bioperl and for Bioperl 1.5.2: http://packages.debian.org/sid/bioperl (A copy of this email has been sent to the mailing list of the Debian-Med project). Have a nice day, -- Charles Plessy http://charles.plessy.org Wak?, Saitama, Japan From David.Messina at sbc.su.se Mon Mar 17 11:38:28 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 17 Mar 2008 16:38:28 +0100 Subject: [Bioperl-l] Are all recommended modules equally important ? In-Reply-To: <20080317041311.GA3784@kunpuu.plessy.org> References: <20080317041311.GA3784@kunpuu.plessy.org> Message-ID: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com> Hi Charles, Thanks for your note. All of the BioPerl 'recommended' modules involve optional functionality, so I would think all of them would map to 'suggested' under Debian so they won't be installed by default. For everyone else, this is the list of recommended modules he's talking about: Ace Class::AutoClass Clone Convert::Binary::C Data::Stag::XMLWriter GD GD::SVG Graph HTML::Entities HTML::Parser HTTP::Request::Common LWP::UserAgent PostScript::TextBlock Set::Scalar SOAP::Lite Spreadsheet::ParseExcel Storable SVG SVG::Graph Text::Shellwords URI::Escape XML::DOM::XPath XML::Parser XML::Parser::PerlSAX XML::SAX XML::SAX::Writer XML::Twig XML::Writer Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers > using 1.5.2 in its next stable release. We welcome your comments on this > as well. > I think the consensus here would be that 1.5.2 is the appropriate version of Bioperl to use in the next stable release of Debian. Although we've started to work toward Bioperl 1.6, that release will be at least a few months off, and 1.4, while technically our most recent 'stable' release, is waaay out of date. Dave From mrphysh at juno.com Mon Mar 17 18:27:21 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Mon, 17 Mar 2008 22:27:21 GMT Subject: [Bioperl-l] bioperl email list Message-ID: <20080317.162721.27257.1@webmail19.vgs.untd.com> Hello bioperl people. I am a Perl programmer/molecular biologist/nice guy. I am wandering around within the bioinformatics arena and making progress. I am realizing that I will be writing code forever unless I can tap into the existing infrastructure. For me that appears to be bioperl. I would like to be part of the bioperl community. I subscribed to the bioperl list and got a return email, but have never received an email. Did I do something wrong? Could you look into this please? The truth is: these object are blowing me away and I need help. John S. Brigham 13810 Braun Drive Golden, Colorado 80401 303-216-0994 mrphysh2juno.com _____________________________________________________________ Make money while staying at home. Click here for information on top-notch home businesses. http://thirdpartyoffers.juno.com/TGL2121/fc/Ioyw6i3l5e2nlxbjZebFePkm5lBTaKaRlgHsk8Xt4yjn3c9lhQhoRW/ From hlapp at gmx.net Mon Mar 17 23:44:11 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 17 Mar 2008 23:44:11 -0400 Subject: [Bioperl-l] Are all recommended modules equally important ? In-Reply-To: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com> References: <20080317041311.GA3784@kunpuu.plessy.org> <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com> Message-ID: On Mar 17, 2008, at 11:38 AM, Dave Messina wrote: > Hi Charles, > > Thanks for your note. > > All of the BioPerl 'recommended' modules involve optional > functionality, so > I would think all of them would map to 'suggested' under Debian so > they > won't be installed by default. I would probably elevate LWP to 'recommended.' Other than that I agree. -hilmar > > For everyone else, this is the list of recommended modules he's > talking > about: > Ace > Class::AutoClass > Clone > Convert::Binary::C > Data::Stag::XMLWriter > GD > GD::SVG > Graph > HTML::Entities > HTML::Parser > HTTP::Request::Common > LWP::UserAgent > PostScript::TextBlock > Set::Scalar > SOAP::Lite > Spreadsheet::ParseExcel > Storable > SVG > SVG::Graph > Text::Shellwords > URI::Escape > XML::DOM::XPath > XML::Parser > XML::Parser::PerlSAX > XML::SAX > XML::SAX::Writer > XML::Twig > XML::Writer > > > > Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers >> using 1.5.2 in its next stable release. We welcome your comments >> on this >> as well. >> > > I think the consensus here would be that 1.5.2 is the appropriate > version of > Bioperl to use in the next stable release of Debian. Although we've > started > to work toward Bioperl 1.6, that release will be at least a few > months off, > and 1.4, while technically our most recent 'stable' release, is > waaay out of > date. > > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Tue Mar 18 05:29:10 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 18 Mar 2008 09:29:10 +0000 Subject: [Bioperl-l] Are all recommended modules equally important ? In-Reply-To: References: <20080317041311.GA3784@kunpuu.plessy.org> <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com> Message-ID: <47DF8B66.8020509@sendu.me.uk> Hilmar Lapp wrote: > On Mar 17, 2008, at 11:38 AM, Dave Messina wrote: > >> Hi Charles, >> >> Thanks for your note. >> >> All of the BioPerl 'recommended' modules involve optional >> functionality, so >> I would think all of them would map to 'suggested' under Debian so they >> won't be installed by default. > > I would probably elevate LWP to 'recommended.' Other than that I agree. I looked at the most used external modules. Used 6 times or more: Data::Dumper => used 55 times Carp => used 51 times IO::String => used 25 times Symbol => used 19 times File::Spec => used 17 times HTTP::Request::Common => used 17 times POSIX => used 12 times DB_File => used 11 times Fcntl => used 11 times IO::File => used 11 times Exporter => used 10 times File::Temp => used 9 times Dumpvalue => used 8 times LWP::UserAgent => used 8 times Scalar::Util => used 8 times URI::Escape => used 8 times File::Basename => used 6 times File::Path => used 6 times XML::Writer => used 6 times I can never remember how to figure out which of those is included with perl 5.6.1. Except maybe XML::Writer, if we do want to promote anything to recommended, I suppose it would be those above. I also agree with everything Dave said; if it's easier everything can be 'suggested'. (I reckon most if not all of the Data::Dumper and Carp usages should be removed) From David.Messina at sbc.su.se Tue Mar 18 10:30:02 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Mar 2008 15:30:02 +0100 Subject: [Bioperl-l] bioperl email list In-Reply-To: <20080317.162721.27257.1@webmail19.vgs.untd.com> References: <20080317.162721.27257.1@webmail19.vgs.untd.com> Message-ID: <628aabb70803180730g2aae7ae0u18a9e6a360c725fa@mail.gmail.com> Hi John, Welcome to BioPerl! > I subscribed to the bioperl list and got a return email, but have never > received an email. As I'm sure you know, most of the time when email doesn't show up, it's because it's been filtered as spam. I assuming you looked for this already though. If that's not it, then I suggest trying to log in to the mailing list server here: http://bioperl.org/mailman/listinfo/bioperl-l Log in by entering your email address (the one you subscribed with) in the last field on that page and clicking the "Unsubscribe or edit options" button. On the next page that comes up, type your password in the first field on the page. If for some reason your haven't been subscribed to the list properly, then you will get an error here. Otherwise, you will be taken to your membership configuration page. There you can verify. among other options, that mail delivery is enabled. The truth is: these object are blowing me away and I need help. BioPerl does have a bit of a learning curve, but fortunately there are some good tutorials that should help you to get started. If you haven't already, visit the HOWTO section of bioperl.org. Check out the one on BioPerl for beginners, and then you might follow up with the SeqIO and SearchIO HOWTOs which cover how to read and write sequences and sequence alignment program output. Also, there's lots of great example code in the examples folder of the BioPerl distribution. I find looking at how other people use BioPerl is very helpful in understanding what objects are used for what. Finally, I'll plug the BioPerl Deobfuscator, which is a class browser for BioPerl and available at: http://bioperl.org/cgi-bin/deob_interface.cgi BioPerl classes tend to have multlple levels of inheritance, and the Deobfuscator lets you see all of the methods available to objects of a given class. Dave From bix at sendu.me.uk Tue Mar 18 11:32:25 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 18 Mar 2008 15:32:25 +0000 Subject: [Bioperl-l] Priorities for a bioperl-1.6 release In-Reply-To: References: Message-ID: <47DFE089.1070304@sendu.me.uk> aaron.j.mackey at gsk.com wrote: >> Or is the split intended to be 'core' == "anything and everything >> that was in 1.4", '????' == "everything else"? In which case, >> what's a good name for "modules created after 1.4"? 'crust'? ;) > > Nah, "icing". > > a module "use" map might be very useful to help identify "core" vs. > other layers of mantle/crust/icing. > > http://www.perlmonks.org/?node_id=87329 > http://search.cpan.org/src/NEILB/pmusage-1.2/ Thanks for those. Neither could quite cope with BioPerl, but I've munged them together and hacked up 'module_usage.pl' which I've just committed to the maintenance directory of bioperl-live. module_usage.pl ../Bio Produces: *warning, may crash your browser; download it and view in a dedicated image viewer* http://bix.sendu.me.uk/files/module_usage.jpeg http://bix.sendu.me.uk/files/module_usage.txt First I considered what modules each BioPerl package (aka class, module) 'uses' (what modules does it load via 'use', 'require' or inherit from via 'use base', excluding external (non-BioPerl) modules), then grouped together packages that have identical usage. The graph shows all the groups with more than one member as nodes and edges from them pointing to the individual packages that they use. The set of those individual packages pointed to by groups also have edges showing their use-relationship to other members of the set (only). Members of the set are also shaded in red. The saturation of the shade indicates how many packages use that package (so dark red packages are used a lot). (I had to simplify in this way because otherwise GraphViz bailed on me. If anyone can come with nicer simplification/visualisation systems, please do! It's important to note that there is lots of information loss in my scheme, so you can't rely on the graph alone.) Getting to the question on how to decide what is 'core' and on what basis to split things up, first consider the darker red packages. Next consider how many groups point to it. Finally consider the membership of those groups: are they all highly related, or are they from different 'parts' of BioPerl? For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups pointing to it, but all the members of those groups are Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or Bio::Graphics?) could be split off cleanly if desired and not kept in core. Bio::SimpleAlign, on the other hand, whilst not being quite as dark a red, has 7 attached groups with members from Bio::AlignIO, Bio::Search and Bio::Tools. You could easily argue it is more fundamental to BioPerl and should be in core. In turn, the things that Bio::SimpleAlign points to would also have to be in core. I haven't done any full analysis along these lines and leave as an exercise for the interested reader for now ;) Chris Fields wrote: > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules > > I'm pretty flexible on any of that; it's a proposal only and I think > some of it may be wrongheaded, but hey, I'm willing to take a few > rotten tomatoes. The key issue is we should try to work out what we > mean by 'core' or the core library. I have a rather extreme view of > it as being the bare essentials without external, non-perl core > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI > and required modules for those classes) but I'm sure others would > lump in parsers, DB functionality, etc. I basically suggest placing > those (and any stable but potentially non-core code) in a > 'bioperl-main', with any unstable or untested code going into a > 'bioperl-unstable'. My thoughts are along these lines: # I agree that core should have no external dependencies # I agree that it might mostly be interfaces # It should represent a framework with all the interfaces (that have stable APIs), directory structure and base classes that everything else relies on # It might not do much useful bioinformatics, but provides just about everything needed for a dev to create a new module that does > In essence, bioperl-main would require core and resemble a stable > release; bioperl-unstable would require bioperl-main (and core) and > resemble a dev release. Not sure how versioning would go or if this > is a viable option at all, but it's worth discussing. # I agree that this 3-way split seems reasonable # bioperl-main would consist primarily of the 'leaves' of the module tree, mostly parsers and the like which, whilst 'stable' and tested should still be split away from core because the data sources they parse could change format slightly # bioperl-unstable, better bioperl-bleed, would feature brand-new stuff, be it new parsers for totally new formats, new APIs that do something not thought of before etc. When they are complete, bug-free and have stood the test of time they get moved into bioperl-main. (It is not a place for all new commits; bug fixes to something in bioperl-main would be committed to bioperl-main) # The current splits (bioperl-run, bioperl-network etc.) do not get their own core and bleed variant. Anything they need for core functionality would enter the single bioperl-core, anything new would enter the single bioperl-bleed, and anything stable would be in their own bioperl-[package] Discuss :) From snoze.pa at gmail.com Tue Mar 18 14:27:47 2008 From: snoze.pa at gmail.com (snoze pa) Date: Tue, 18 Mar 2008 13:27:47 -0500 Subject: [Bioperl-l] BioSQL V1.0.0 released In-Reply-To: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com> References: <200803071309.25294.heikki@sanbi.ac.za> <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu> <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com> Message-ID: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com> Thanks hilmar. I am still wondering if my old problem was fixed. It is related to when NR databases mixes files from different databases. On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz wrote: > Ditto. Thanks for biting the bullet, Hilmar. > > BTW, I put together a little compendium/review of various bioinformatics > data models a few months ago, where I mention BioSQL among others, but I > never really announced it: > > http://biodatamodel.org/ > > It thought about wikifying it to get the community involved in maintaining > it, but haven't gotten around to it yet. > > Feedback is welcome. > > Cheers, > Steve > > On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields wrote: > > > Same here. Great news! > > > > chris > > > > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote: > > > > > BIOSQL V1.0.0 RELEASED > > > http://news.open-bio.org/archives/2008_03.html#000094 > > > > > > > > > Congratulations, Hilmar! > > > > > > -Heikki > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > > _/ _/ _/ University of Western Cape, South Africa > > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > > ___ _/_/_/_/_/________________________________________________________ > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From darin.london at duke.edu Tue Mar 18 14:16:58 2008 From: darin.london at duke.edu (darin.london at duke.edu) Date: Tue, 18 Mar 2008 13:16:58 -0500 Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions Message-ID: <200803181816.m2IIGwOL007248@tenero.duhs.duke.edu> BOSC 2008 Call for Abstracts The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. Many Open Source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation and dissemination of new computational methods as well as informatics infrastructure. Since the work of the Open Source Bioinformatics Community represents some of the most cutting edge of Bioinformatics in general, the overall theme for the conference this year is "Tackling Hard Problems with Emerging Technologies". Topics under this umbrella include cyberinfrastructure, grid computing and workflow management and discovery, and visualization. We will also have a series of update talks about the main Open Source Bioinformatics Software suites. One of the hallmarks of BOSC is the coming together of the open source developer community in one location. A face-to-face meeting of this community creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done. This year, BOSC is accepting abstract submissions on the conference theme "Tackling Hard Problems with Emerging Technologies". The conference theme reflects that there are new technologies emerging on both the scientific front (new sequencing technologies, etc.) and the IT front (workflows, mashup/web 2.0, improvements in all of the major programming languages, etc.), which may allow the open source community to solve problems that were previously intractable. Abstracts may be submitted for the following topics. 1. Cyberinfrastructure - We are interested in presentations on topics dealing with the development of infrastructure on the web to facilitate software and data re-use (mashups, or traditional), interoperability and inter-process communication, system/service discovery, and data movement and modeling in distributed systems. This may include peer-to-peer systems of data transfer, Web Services, various flavors of data representation (SOAP, JSON, XML, others), and technologies commonly referred to under the Web 2.0 paradigm (e.g. folksonomies/tagging, user-based content generation, content feeds, and Social Networking). 2. Grid Computing and Workflow Management and Discovery - We particularly invite talks that report progress in making workflow systems easier to use and on how to do distributed-collaborative research , e.g. workflows that encompass the coordination of systems running in different parts of the world. 3. Visualization - Visualization is a maturing area of open source software development. We particularly invite talks that demonstrate innovative visualization systems in the context of workflows. 4. Open Source Software - Speakers will present talks on the use, development, or philosophy of open source software in bioinformatics. 5. Bio* Open Source Project Updates - We invite abstracts from the representatives of the open source projects sponsored by or affiliated to the O|B|F (see Projects). Please consult the official BOSC 2008 website at http://www.open-bio.org/wiki/Upcoming_BOSC_conference for all updates and extra information. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers From hlapp at gmx.net Tue Mar 18 15:07:54 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Mar 2008 15:07:54 -0400 Subject: [Bioperl-l] BioSQL V1.0.0 released In-Reply-To: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com> References: <200803071309.25294.heikki@sanbi.ac.za> <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu> <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com> <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com> Message-ID: <0B1635C3-0BD4-449C-9C52-FD8D07E9D669@gmx.net> Can you point me to the bug report or a mailing list thread? -hilmar On Mar 18, 2008, at 2:27 PM, snoze pa wrote: > Thanks hilmar. I am still wondering if my old problem was fixed. It is > related to when NR databases mixes files from different databases. > > On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz > wrote: > >> Ditto. Thanks for biting the bullet, Hilmar. >> >> BTW, I put together a little compendium/review of various >> bioinformatics >> data models a few months ago, where I mention BioSQL among others, >> but I >> never really announced it: >> >> http://biodatamodel.org/ >> >> It thought about wikifying it to get the community involved in >> maintaining >> it, but haven't gotten around to it yet. >> >> Feedback is welcome. >> >> Cheers, >> Steve >> >> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields >> wrote: >> >>> Same here. Great news! >>> >>> chris >>> >>> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote: >>> >>>> BIOSQL V1.0.0 RELEASED >>>> http://news.open-bio.org/archives/2008_03.html#000094 >>>> >>>> >>>> Congratulations, Hilmar! >>>> >>>> -Heikki >>>> >>>> -- >>>> ______ _/ _/ >>>> _____________________________________________________ >>>> _/ _/ >>>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >>>> _/ _/ _/ SANBI, South African National Bioinformatics >>>> Institute >>>> _/ _/ _/ University of Western Cape, South Africa >>>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>>> ___ _/_/_/_/_/ >>>> ________________________________________________________ >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From snoze.pa at gmail.com Tue Mar 18 16:33:08 2008 From: snoze.pa at gmail.com (snoze pa) Date: Tue, 18 Mar 2008 15:33:08 -0500 Subject: [Bioperl-l] NCBI taxonomy database Message-ID: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com> Dear Users, How can i use NCBI taxonomy database in bioperl? any suggestions!!! thanks in advance s From aaron.j.mackey at gsk.com Tue Mar 18 12:23:41 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 18 Mar 2008 12:23:41 -0400 Subject: [Bioperl-l] Priorities for a bioperl-1.6 release In-Reply-To: <47DFE089.1070304@sendu.me.uk> Message-ID: Very cool. I can envision this being printed as a laminated poster to put up next to the periodic table of Perl Elements ( http://www.ozonehouse.com/mark/blog/code/PeriodicTable.html) One GraphViz trick you could try would be to group Bio::X::* (nodes and your collection groups sharing common Bio::X:: prefixes) together as subgraphs; that should quickly show you which edges go outside of the various "domains", and which are entirely self contained. you could also try to distinguish "use base" relationships (i.e. inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require Bio::X" wrapped in an eval (optional use if available) by various edge colorings -- this might help to further break things up if we can guess at the intended "use" of any Bio::X by Bio::Y. -Aaron "Sendu Bala" wrote on 03/18/2008 11:32:25 AM: > aaron.j.mackey at gsk.com wrote: > >> Or is the split intended to be 'core' == "anything and everything > >> that was in 1.4", '????' == "everything else"? In which case, > >> what's a good name for "modules created after 1.4"? 'crust'? ;) > > > > Nah, "icing". > > > > a module "use" map might be very useful to help identify "core" vs. > > other layers of mantle/crust/icing. > > > > http://www.perlmonks.org/?node_id=87329 > > http://search.cpan.org/src/NEILB/pmusage-1.2/ > > Thanks for those. Neither could quite cope with BioPerl, but I've munged > them together and hacked up 'module_usage.pl' which I've just committed > to the maintenance directory of bioperl-live. > > module_usage.pl ../Bio > > Produces: > *warning, may crash your browser; download it and view in a dedicated > image viewer* > http://bix.sendu.me.uk/files/module_usage.jpeg > http://bix.sendu.me.uk/files/module_usage.txt > > First I considered what modules each BioPerl package (aka class, module) > 'uses' (what modules does it load via 'use', 'require' or inherit from > via 'use base', excluding external (non-BioPerl) modules), then grouped > together packages that have identical usage. The graph shows all the > groups with more than one member as nodes and edges from them pointing > to the individual packages that they use. The set of those individual > packages pointed to by groups also have edges showing their > use-relationship to other members of the set (only). Members of the set > are also shaded in red. The saturation of the shade indicates how many > packages use that package (so dark red packages are used a lot). > > (I had to simplify in this way because otherwise GraphViz bailed on me. > If anyone can come with nicer simplification/visualisation systems, > please do! It's important to note that there is lots of information loss > in my scheme, so you can't rely on the graph alone.) > > Getting to the question on how to decide what is 'core' and on what > basis to split things up, first consider the darker red packages. Next > consider how many groups point to it. Finally consider the membership of > those groups: are they all highly related, or are they from different > 'parts' of BioPerl? > > For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups > pointing to it, but all the members of those groups are > Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or > Bio::Graphics?) could be split off cleanly if desired and not kept in > core. Bio::SimpleAlign, on the other hand, whilst not being quite as > dark a red, has 7 attached groups with members from Bio::AlignIO, > Bio::Search and Bio::Tools. You could easily argue it is more > fundamental to BioPerl and should be in core. In turn, the things that > Bio::SimpleAlign points to would also have to be in core. > > I haven't done any full analysis along these lines and leave as an > exercise for the interested reader for now ;) > > > Chris Fields wrote: > > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules > > > > I'm pretty flexible on any of that; it's a proposal only and I think > > some of it may be wrongheaded, but hey, I'm willing to take a few > > rotten tomatoes. The key issue is we should try to work out what we > > mean by 'core' or the core library. I have a rather extreme view of > > it as being the bare essentials without external, non-perl core > > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI > > and required modules for those classes) but I'm sure others would > > lump in parsers, DB functionality, etc. I basically suggest placing > > those (and any stable but potentially non-core code) in a > > 'bioperl-main', with any unstable or untested code going into a > > 'bioperl-unstable'. > > My thoughts are along these lines: > # I agree that core should have no external dependencies > # I agree that it might mostly be interfaces > # It should represent a framework with all the interfaces (that have > stable APIs), directory structure and base classes that everything > else relies on > # It might not do much useful bioinformatics, but provides just about > everything needed for a dev to create a new module that does > > > > In essence, bioperl-main would require core and resemble a stable > > release; bioperl-unstable would require bioperl-main (and core) and > > resemble a dev release. Not sure how versioning would go or if this > > is a viable option at all, but it's worth discussing. > > # I agree that this 3-way split seems reasonable > # bioperl-main would consist primarily of the 'leaves' of the module > tree, mostly parsers and the like which, whilst 'stable' and tested > should still be split away from core because the data sources they > parse could change format slightly > # bioperl-unstable, better bioperl-bleed, would feature brand-new > stuff, be it new parsers for totally new formats, new APIs that do > something not thought of before etc. When they are complete, bug-free > and have stood the test of time they get moved into bioperl-main. > (It is not a place for all new commits; bug fixes to something in > bioperl-main would be committed to bioperl-main) > # The current splits (bioperl-run, bioperl-network etc.) do not get > their own core and bleed variant. Anything they need for core > functionality would enter the single bioperl-core, anything new > would enter the single bioperl-bleed, and anything stable would > be in their own bioperl-[package] > > Discuss :) > From David.Messina at sbc.su.se Tue Mar 18 17:23:18 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Mar 2008 22:23:18 +0100 Subject: [Bioperl-l] NCBI taxonomy database In-Reply-To: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com> References: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com> Message-ID: <628aabb70803181423g305db155r9d66c114f38c64b6@mail.gmail.com> Hi snoze, I think you will want to take a look at the docs for the Bio::Taxon module http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html and these scripts: scripts/taxa/local_taxonomydb_query.PLS Script that accesses a local taxonomy database and retrieves species or TaxonIDs. scripts/taxa/query_entrez_taxa.PLS Demonstrate how to retrieve the NCBI TaxonIDfor a given species. Also retrieve TaxonID for a given accession number. scripts/taxa/taxid4species.PLS Retrieve the NCBI TaxonIDfor a given species. Dave From alexl at users.sourceforge.net Wed Mar 19 04:32:38 2008 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 19 Mar 2008 01:32:38 -0700 Subject: [Bioperl-l] Priorities for a bioperl-1.6 release In-Reply-To: <47DFE089.1070304@sendu.me.uk> (Sendu Bala's message of "Tue\, 18 Mar 2008 15\:32\:25 +0000") References: <47DFE089.1070304@sendu.me.uk> Message-ID: <42eja7azbt.fsf@allele2.eebweb.arizona.edu> >>>>> "SB" == Sendu Bala writes: [...] SB> # I agree that this 3-way split seems reasonable # bioperl-main SB> would consist primarily of the 'leaves' of the module tree, mostly SB> parsers and the like which, whilst 'stable' and tested should SB> still be split away from core because the data sources they parse SB> could change format slightly # bioperl-unstable, better SB> bioperl-bleed, would feature brand-new stuff, be it new parsers SB> for totally new formats, new APIs that do something not thought of SB> before etc. When they are complete, bug-free and have stood the SB> test of time they get moved into bioperl-main. (It is not a place SB> for all new commits; bug fixes to something in bioperl-main would SB> be committed to bioperl-main) # The current splits (bioperl-run, SB> bioperl-network etc.) do not get their own core and bleed SB> variant. Anything they need for core functionality would enter the SB> single bioperl-core, anything new would enter the single SB> bioperl-bleed, and anything stable would be in their own SB> bioperl-[package] SB> Discuss :) While on the subject of how to split up the bioperl package, spare a thought for upstream package maintainers. The Fedora package for the bioperl "core" that I now maintain is currently a single package which makes it easy to get reviewed, included in the distribution and updated/maintained. (bioperl-run is a separate package). While I agree that bioperl is now perhaps a little too monolithic, I thinking splitting it up in a too fine-grained manner like CPAN might go too far the other way. For Fedora, each package would then need to be reviewed and updated separately. Similar issues might apply for other distros (such as Debian/Ubuntu). I think something similar to the three-way split proposed sounds like a good compromise, so long as everything that a "basic" user of Bioperl can install most of the functionality in the current "bioperl" package in (at most) 2-3 packages. One model to look at might be the gstreamer model which has a "core" (gstreamer) and "gstreamer-plugins-base", "gstreamer-plugins-good", "gstreamer-plugins-bad" and "gstreamer-plugins-ugly" modules for plugins, see: http://gstreamer.net/ Alex From charles-listes+bioperl at plessy.org Wed Mar 19 06:01:59 2008 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Wed, 19 Mar 2008 19:01:59 +0900 Subject: [Bioperl-l] Are all recommended modules equally important ? In-Reply-To: <47DF8B66.8020509@sendu.me.uk> References: <20080317041311.GA3784@kunpuu.plessy.org> <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com> <47DF8B66.8020509@sendu.me.uk> Message-ID: <20080319100159.GD29304@kunpuu.plessy.org> Le Tue, Mar 18, 2008 at 09:29:10AM +0000, Sendu Bala a ?crit : > > I looked at the most used external modules. Used 6 times or more: > > Data::Dumper => used 55 times > Carp => used 51 times > IO::String => used 25 times > Symbol => used 19 times > File::Spec => used 17 times > HTTP::Request::Common => used 17 times > POSIX => used 12 times > DB_File => used 11 times > Fcntl => used 11 times > IO::File => used 11 times > Exporter => used 10 times > File::Temp => used 9 times > Dumpvalue => used 8 times > LWP::UserAgent => used 8 times > Scalar::Util => used 8 times > URI::Escape => used 8 times > File::Basename => used 6 times > File::Path => used 6 times > XML::Writer => used 6 times Dear Sendu, thanks a lot for this analysis ! We will downgrade all modules except those you listed to the priority 'Suggested'. In terms of Debian package, it means keeping only libio-string-perl, libwww-perl, liburi-perl and libxml-writer-perl in our 'Recommends' field, as the others are provided by our perl package itself. Thanks a lot for the advice, -- Charles Plessy Debian-Med packaging team Wak?, Saitama, Japan From bix at sendu.me.uk Wed Mar 19 09:27:11 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 19 Mar 2008 13:27:11 +0000 Subject: [Bioperl-l] Priorities for a bioperl-1.6 release In-Reply-To: References: Message-ID: <47E114AF.8030801@sendu.me.uk> aaron.j.mackey at gsk.com wrote: > One GraphViz trick you could try would be to group Bio::X::* (nodes and > your collection groups sharing common Bio::X:: prefixes) together as > subgraphs; that should quickly show you which edges go outside of the > various "domains", and which are entirely self contained. Not quite sure if I used the 'trick' you were thinking of, but I now 'cluster' them as you describe. It's no longer quite as attractively proportioned, but I suppose it's more useful :) > you could also try to distinguish "use base" relationships (i.e. > inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require > Bio::X" wrapped in an eval (optional use if available) by various edge > colorings -- this might help to further break things up if we can guess at > the intended "use" of any Bio::X by Bio::Y. I haven't distinguished the eval require cases, but now edges are green for inheritance and blue for use/require. I updated the jpeg: *warning, may crash your browser; download it and view in a dedicated image viewer* http://bix.sendu.me.uk/files/module_usage.jpeg If someone wants to mess with the script so it will output a sane ps file for conversion to pdf, please do so. I can't figure out how to get it to work correctly. From Jorge.DUARTE at biogemma.com Wed Mar 19 11:32:44 2008 From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com) Date: Wed, 19 Mar 2008 16:32:44 +0100 Subject: [Bioperl-l] how to go from AlignIO to Variation ? Message-ID: Dear Bioperl-users, could someone give me a hint on how to find SNPs in alignments using bioperl objects ? I found several modules capable of representing Sequence Variations, but could not understand how to go from an "Align" object to a "Variation" object. Any help would be much appreciated, Thanks, Jorge. --- Jorge Duarte Bioinformatics Software Engineer BIOGEMMA Z.I. Du Br?zet 8, Rue des Fr?res Lumi?re 63028 CLERMONT FERRAND Cedex 2 FRANCE Tel : +33 (0)4 73 39 60 73 Fax : +33 (0)4 73 39 60 71 E-mail : jorge.duarte at biogemma.com From avilella at gmail.com Wed Mar 19 12:59:47 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 Mar 2008 16:59:47 +0000 Subject: [Bioperl-l] how to go from AlignIO to Variation ? In-Reply-To: References: Message-ID: <358f4d650803190959h744f33f5ha345897565a071b0@mail.gmail.com> Hi Jorge, (good to see an ex-EBI in the bioperl-ml :-) ) You can use the method aln_to_population in Bio::PopGen::Utilities: my $pop = Bio::PopGen::Utilities->aln_to_population($aln); http://www.bioperl.org/wiki/HOWTO:PopGen#Allele_data_from_Alignments_using_Bio::AlignIO_and_Bio::PopGen::Utilities Cheers, Albert. On Wed, Mar 19, 2008 at 3:32 PM, wrote: > Dear Bioperl-users, > > could someone give me a hint on how to find SNPs in alignments using > bioperl objects ? > > I found several modules capable of representing Sequence Variations, > but could not understand how to go from an "Align" object to a "Variation" > object. > > Any help would be much appreciated, > > Thanks, > > Jorge. > > --- > Jorge Duarte > Bioinformatics Software Engineer > BIOGEMMA > Z.I. Du Br?zet > 8, Rue des Fr?res Lumi?re > 63028 CLERMONT FERRAND Cedex 2 > FRANCE > Tel : +33 (0)4 73 39 60 73 > Fax : +33 (0)4 73 39 60 71 > E-mail : jorge.duarte at biogemma.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Jorge.DUARTE at biogemma.com Wed Mar 19 11:22:57 2008 From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com) Date: Wed, 19 Mar 2008 16:22:57 +0100 Subject: [Bioperl-l] Using Bioperl book Message-ID: Hello, i just found on amazon something about a book "Using Bioperl", published on the 1st of March 2008 but which is no more available. Does anyone know how to get it ? Many thanks, Jorge. --- Jorge Duarte Bioinformatics Software Engineer BIOGEMMA Z.I. Du Br?zet 8, Rue des Fr?res Lumi?re 63028 CLERMONT FERRAND Cedex 2 FRANCE Tel : +33 (0)4 73 39 60 73 Fax : +33 (0)4 73 39 60 71 E-mail : jorge.duarte at biogemma.com ***************************************************************** Pour toute demande de support merci d'inclure BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com dans les destinataires lors du premier contact ***************************************************************** From jason at bioperl.org Wed Mar 19 13:54:16 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Mar 2008 10:54:16 -0700 Subject: [Bioperl-l] Using Bioperl book In-Reply-To: References: Message-ID: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org> it's probably more than 6 months out. We still haven't finished writing it as life and work continues to intrude on book writing. -jason On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote: > Hello, > > i just found on amazon something about a book "Using Bioperl", > published > on the 1st of March 2008 but which is no more available. > > Does anyone know how to get it ? > > Many thanks, > > Jorge. > > --- > Jorge Duarte > Bioinformatics Software Engineer > BIOGEMMA > Z.I. Du Br?zet > 8, Rue des Fr?res Lumi?re > 63028 CLERMONT FERRAND Cedex 2 > FRANCE > Tel : +33 (0)4 73 39 60 73 > Fax : +33 (0)4 73 39 60 71 > E-mail : jorge.duarte at biogemma.com > > ***************************************************************** > Pour toute demande de support merci d'inclure > BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com > dans les destinataires lors du premier contact > ***************************************************************** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From smarkel at accelrys.com Wed Mar 19 13:53:53 2008 From: smarkel at accelrys.com (Scott Markel) Date: Wed, 19 Mar 2008 10:53:53 -0700 Subject: [Bioperl-l] Using Bioperl book In-Reply-To: References: Message-ID: Jorge, This is a book that Jason Stajich, Ewan Birney, and I are writing. We're behind. So it's not that the book is no longer available, but that it's not yet available. Hopefully later this year or early in 2009. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics bioperl-l-bounces at lists.open-bio.org wrote on 19.03.2008 08:22:57: > Hello, > > i just found on amazon something about a book "Using Bioperl", published > on the 1st of March 2008 but which is no more available. > > Does anyone know how to get it ? > > Many thanks, > > Jorge. > > --- > Jorge Duarte > Bioinformatics Software Engineer > BIOGEMMA > Z.I. Du Br?zet > 8, Rue des Fr?res Lumi?re > 63028 CLERMONT FERRAND Cedex 2 > FRANCE > Tel : +33 (0)4 73 39 60 73 > Fax : +33 (0)4 73 39 60 71 > E-mail : jorge.duarte at biogemma.com > > ***************************************************************** > Pour toute demande de support merci d'inclure > BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com > dans les destinataires lors du premier contact > ***************************************************************** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From me at hongyu.org Thu Mar 20 14:54:53 2008 From: me at hongyu.org (Hongyu Zhang) Date: Thu, 20 Mar 2008 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Bio::DB::GenBank module Message-ID: <501654.68882.qm@web51412.mail.re2.yahoo.com> Dear all, It seems that some of the important methods in Bio::DB::GenBank module was discontinued right now, such as get_Seq_by_acc(). The corresponding methods have empty content underneath its names. How come? Best, Hongyu Zhang, Ph.D. Ceres Inc., Thousand Oaks, CA Cell: 805-405-5394 Fax: 866-447-8750 From joseph.fass at gmail.com Thu Mar 20 18:10:33 2008 From: joseph.fass at gmail.com (Joseph Fass) Date: Thu, 20 Mar 2008 15:10:33 -0700 Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality? Message-ID: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com> I've written code to trim a certain number of bases (and, possibly, associated qualities) from fasta (or fastq) format sequences, using: $seq->seq($seq->subseq($a+1,$len-$b)); and, if it's fastq: $seq->qual($seq->subqual($a+1,$len-$b)); where: $len = $seq->length; # defined before changing $seq->seq $a is the number of bases to trim off the beginning of the sequence $b is the number of bases to trim off the end of the sequence The code works for sequences, but for qualities I get a trimmed series of quality characters that is the correct length and is at the correct position, but has a number of characters (equal to $a) at the *end* of the series changed to '!' ... i.e.: @fake header 1 tcggacaatatatat + fjasfiojeq%!@%@ becomes: @fake header 1 trimmed by 4 at beginning and 3 at end acaatata +fake header 1 trimmed by 4 at beginning and 3 at end fioj!!!! Since the relevant section of code is short, I'll post it: my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format); my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format); my $seq_length; while (my $seq = $in->next_seq()) { $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at end"); $seq_length = $seq->length; $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e)); if ($format eq 'fastq') { # if fastq, trim qualities then write out in fastq format $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e)); $out->write_fastq($seq); } else {$out->write_seq($seq);} # just write out sequence in fasta format } Why should the same process work for ->seq and ->subseq but not ->qual and ->subqual? Please enlighten me ... -- Joseph Fass jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional) 970.227.5928 (c) || 530.752.2698 (w) From hlapp at gmx.net Thu Mar 20 18:49:41 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 20 Mar 2008 18:49:41 -0400 Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl> References: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl> Message-ID: <0F80B40B-0232-4367-8433-992588B6E71B@gmx.net> Hi Erik, thanks for the report. Given the error message, it looks more like the integer (which in reality is a string) can't be automatically converted to a string. That would be equally interesting, though. DBI I thought used to bind all parameters as string by default, but maybe that has changed? The parameter values are indeed all bound generically (and the query is created dynamically too), and I'm leaving it up to the DBD drivers to do the "Right Thing". I could obviously force everything into type string, but that is likely to have it's own repercussions on various RDBMSs. So could you file this as a bug report on bugzilla.open-bio.org (category bioperl-db, this is actually not a BioSQL problem), and run the following test on your 8.3 instance (which minor version actually?): CREATE TABLE t1 (a varchar(10), b text, c integer); SELECT * from t1 WHERE a = 1; SELECT * from t1 WHERE b = 1; SELECT * from t1 WHERE c = '1'; INSERT INTO t1 (a,b,c) VALUES ('a','b',1); SELECT * from t1 WHERE a = 1; SELECT * from t1 WHERE b = 1; SELECT * from t1 WHERE c = '1'; SELECT * from t1 WHERE a = 1::text; SELECT * from t1 WHERE b = 1::text; SELECT * from t1 WHERE c = integer '1'; DROP TABLE t1; These work all fine on my 8.1.4 instance. -hilmar On Mar 20, 2008, at 3:24 PM, Erik wrote: > Hi, > > (latest BioSQL, bioperl-db, and bioperl-live installed.) > > Postgres 8.3 will not auto-cast text (='character > varying') to integer any longer, which causes test > t/16odba.t to fail: > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: error while executing query in > Bio::DB::BioSQL::SeqAdaptor::find_by_query: ERROR: > operator does not exist: character varying = integer > LINE 1: ...eq.taxon_id FROM bioentry seq WHERE > seq.identifier = 5456929 > > It seems likely to cause many similar statements to fail; > how should this be solved? > > I tried to fix it but I couldn't find the place where the > statement/clauses are put together. > > > Thanks, > > Erik Rijkers > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From er at xs4all.nl Thu Mar 20 19:30:03 2008 From: er at xs4all.nl (Erik) Date: Fri, 21 Mar 2008 00:30:03 +0100 (CET) Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to integer any longer Message-ID: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl> On Thu, March 20, 2008 23:49, Hilmar Lapp wrote: > Hi Erik, thanks for the report. Given the error message, > it looks > more like the integer (which in reality is a string) can't > be automatically converted to a string. you are right, of course :) Here is the postgres 8.3.1 result of your sql statements: CREATE TABLE t1 (a varchar(10), b text, c integer); SELECT * from t1 WHERE a = 1; -- fails in 8.3.1 SELECT * from t1 WHERE b = 1; -- fails in 8.3.1 SELECT * from t1 WHERE c = '1'; -- ok INSERT INTO t1 (a,b,c) VALUES ('a','b',1); SELECT * from t1 WHERE a = 1; -- fails in 8.3.1 SELECT * from t1 WHERE b = 1; -- fails in 8.3.1 SELECT * from t1 WHERE c = '1'; -- ok SELECT * from t1 WHERE a = 1::text; -- ok SELECT * from t1 WHERE b = 1::text; -- ok SELECT * from t1 WHERE c = integer '1'; -- ok The failure is always (virtually) the same: ERROR: operator does not exist: character varying = integer LINE 1: SELECT * from t1 WHERE a = 1; ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. Then there is the cast function: for instance, I can let the test in t/16odba.t proceed faultlessly with $seq = $biodb->get_Seq_by_id( "cast(5456929 as text)" ); I am also doubtful/curious as to how this would affect the various loading scripts which I was going to use - I want to set up a GBrowse with human/mouse/flybase sequence annotation to show ChipSeq data against. But one thing at a time, I guess... > So could you file this as a bug report on > bugzilla.open-bio.org > (category bioperl-db, this is actually not a BioSQL > problem), I'll make an entry in bugzilla/bioperl-db. Thanks for you quick reply! Erik Rijkers From David.Messina at sbc.su.se Thu Mar 20 19:39:49 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 21 Mar 2008 00:39:49 +0100 Subject: [Bioperl-l] Bio::DB::GenBank module In-Reply-To: <501654.68882.qm@web51412.mail.re2.yahoo.com> References: <501654.68882.qm@web51412.mail.re2.yahoo.com> Message-ID: <628aabb70803201639y33df19a6ib83967c33dd90b7f@mail.gmail.com> Hi Hongyu, Those methods are inherited. get_Seq_by_acc(), for example, comes from Bio::DB::WebDBSeqI. The BioPerl Deobfuscator is one way to see where the methods a given class has are actually coded. Here's the Deobfuscator view of Bio::DB::GenBank. Dave From hlapp at gmx.net Thu Mar 20 20:34:42 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 20 Mar 2008 20:34:42 -0400 Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl> References: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl> Message-ID: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net> On Mar 20, 2008, at 7:30 PM, Erik wrote: > Here is the postgres 8.3.1 result of your sql statements: > > CREATE TABLE t1 (a varchar(10), b text, c integer); > > SELECT * from t1 WHERE a = 1; -- fails in 8.3.1 > SELECT * from t1 WHERE b = 1; -- fails in 8.3.1 > SELECT * from t1 WHERE c = '1'; -- ok > > [...] > The failure is always (virtually) the same: > ERROR: operator does not exist: character varying = integer > LINE 1: SELECT * from t1 WHERE a = 1; > ^ > HINT: No operator matches the given name and argument > type(s). You might need to add explicit type casts. So it's indeed the backend that changed behavior. It's actually documented as I see now: http://www.postgresql.org/docs/8.3/static/release-8-3.html scroll to section E.2.2. Migration to Version 8.3, E.2.2.1. General, and the first item there: Non-character data types are no longer automatically cast to TEXT (Peter, Tom) Previously, if a non-character value was supplied to an operator or function that requires text input, it was automatically cast to text, for most (though not all) built-in data types. This no longer happens: an explicit cast to text is now required for all non- character-string types. I can see the arguments there but this will prevent upgrading to 8.3 for many many applications, and the comments from the Pg developers ('fix your SQL to use casts') that I've seen there on the mailing lists are just not helpful. Fixing SQL is for many legacy applications is just not an option. In the case of Bioperl-db it's very non-trivial, because all of a sudden we would be changing from a hands-off and let-the-driver- figure-it-out approach to forcing types everywhere. So I think at this point with this change I have to declare Bioperl- db officially incompatible with PostgreSQL 8.3+ until we've found a solution to this, which is too bad because it seems 8.3 has some really nice performance features added. One possible solution might be to create a CAST in the database (namely the one that was taken away, restoring behavior to pre-8.3). Another possibility is to move the parameter binding method into the driver adaptor which would then delegate to the DBI method but would be overridden for the PostgreSQL adapter to force all bindings to type string. Which leads me back to the surprise observation that the parameter was bound as an integer in the first place, when DBD::Pg used to bind everything as string unless you told it otherwise. Which DBD::Pg version is it that you are using? I would suspect (or hope) that maybe there is soon an update release of DBD::Pg that fixes this problem by going back to binding everything as string by default (and as the tests show PostgreSQL will still convert strings to integer if necessary). Depending on what I (or can someone else update us on this?) find out for the DBD::Pg plans, I'll probably start looking into moving the parameter binding into the driver adapters. Though it does feel pathetic that this is now also not transparent between drivers. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From er at xs4all.nl Thu Mar 20 20:51:43 2008 From: er at xs4all.nl (Erik) Date: Fri, 21 Mar 2008 01:51:43 +0100 (CET) Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any longer Message-ID: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl> On Fri, March 21, 2008 01:34, Hilmar Lapp wrote: > > So I think at this point with this change I have to > declare Bioperl- > db officially incompatible with PostgreSQL 8.3+ until > we've found a > solution to this, which is too bad because it seems 8.3 > has some > really nice performance features added. Pg 8.3 is indeed very noticably faster, and it has other excellent new features like full text indexing. (This also makes that downgrading is not really an option) > Which DBD::Pg version is it that you are using? DBD::Pg 2.3.0 Thanks, Erik Rijkers From hlapp at gmx.net Thu Mar 20 21:36:50 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 20 Mar 2008 21:36:50 -0400 Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl> References: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl> Message-ID: <071CB899-AB3E-40B8-9477-82AE98DB88B1@gmx.net> On Mar 20, 2008, at 8:51 PM, Erik wrote: > On Fri, March 21, 2008 01:34, Hilmar Lapp wrote: >> >> So I think at this point with this change I have to declare >> Bioperl-db officially incompatible with PostgreSQL 8.3+ until >> we've found a solution to this, which is too bad because it seems >> 8.3 has some really nice performance features added. > > Pg 8.3 is indeed very noticably faster, and it has other > excellent new features like full text indexing. (This also > makes that downgrading is not really an option) Right, I saw that too. It is, however, just migrated from what was a contrib module before, so downgrading and using the contrib module is an option. Furthermore, folding these new features together with a behavior change that is backwards incompatible was a choice the PostgreSQL people made, not we. We also aren't doing poor typing that deserves fixing; we're just not doing any typing by treating everything as a string. This is the Perl paradigm. At this point it's actually unclear to me how this new behavior is compatible with untyped scripting languages unless you know the type of each column that you're binding a value for, because if you actually force typecasts to string for everything you get an error if an integer is indeed what's needed. I'm wondering what I'm missing. -hilmar BTW what does the following query yield on your 8.3.1 database: select s.typname as source, t.typname as target, f.proname as function, c.castcontextfrom pg_cast c, pg_type s, pg_type t, pg_proc f where c.castsource = s.oid and c.casttarget = t.oid and c.castfunc = f.oidand t.typname = 'text'; On my 8.1.4 database I get: source | target | function | castcontext -------------+--------+----------+------------- bpchar | text | text | i char | text | text | i name | text | text | i int8 | text | text | i int2 | text | text | i int4 | text | text | i oid | text | text | i float4 | text | text | i float8 | text | text | i macaddr | text | text | e cidr | text | text | e inet | text | text | e date | text | text | i time | text | text | i timestamp | text | text | i timestamptz | text | text | i interval | text | text | i timetz | text | text | i numeric | text | text | i (19 rows) -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From greg at turnstep.com Thu Mar 20 22:41:10 2008 From: greg at turnstep.com (Greg Sabino Mullane) Date: Fri, 21 Mar 2008 02:41:10 -0000 Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net> Message-ID: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Which leads me back to the surprise observation that the parameter > was bound as an integer in the first place, when DBD::Pg used to bind > everything as string unless you told it otherwise. Which DBD::Pg > version is it that you are using? I would suspect (or hope) that > maybe there is soon an update release of DBD::Pg that fixes this > problem by going back to binding everything as string by default (and > as the tests show PostgreSQL will still convert strings to integer if > necessary). > > Depending on what I (or can someone else update us on this?) find out > for the DBD::Pg plans, I'll probably start looking into moving the > parameter binding into the driver adapters. Though it does feel > pathetic that this is now also not transparent between drivers. What you are probably looking for is already there, namely: $dbh->{pg_server_prepare} = 0; There's good reasons for the casting enforcement in 8.3, although I've been a sharp critic of the change, and certainly of the suddeness of it. Another solution to consider is adding the casts back in: http://people.planetpostgresql.org/peter/index.php?/archives/2008/03.html (the March 4th entry) - -- Greg Sabino Mullane greg at turnstep.com PGP Key: 0x14964AC8 200803202237 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkfjIBYACgkQvJuQZxSWSsiamwCdEbNrC4F4oU7AGHrbHAm1YNXG HbUAoIRJtGW4brvMKklxZYG6pusbcTqf =Zawx -----END PGP SIGNATURE----- From David.Messina at sbc.su.se Fri Mar 21 04:36:16 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 21 Mar 2008 09:36:16 +0100 Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality? In-Reply-To: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com> References: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com> Message-ID: <628aabb70803210136p11de495p26d0ffaebbc3370e@mail.gmail.com> Hi Joseph, This looks like a bug; I saw the same thing here. Could you please submit this to the bug tracker along with your test code? Thanks, Dave From hlapp at gmx.net Fri Mar 21 08:52:39 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 21 Mar 2008 08:52:39 -0400 Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com> References: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com> Message-ID: Hi Greg - thanks for your email, it's very helpful. On Mar 20, 2008, at 10:41 PM, Greg Sabino Mullane wrote: >> >> Depending on what I (or can someone else update us on this?) find out >> for the DBD::Pg plans, I'll probably start looking into moving the >> parameter binding into the driver adapters. Though it does feel >> pathetic that this is now also not transparent between drivers. > > What you are probably looking for is already there, namely: > > $dbh->{pg_server_prepare} = 0; So disabling server-side prepares will leave values quoted? Having server-side prepares would be very useful though, especially for Bioperl-db with its many lookup queries that all use similar parameter values. > > There's good reasons for the casting enforcement in 8.3 I do understand that, but it's also a sharp contrast to other RDBMSs that doesn't it make it easier for people to choose Pg when they should, and doesn't help writing cross-platform database applications either. > although I've been a sharp critic of the change, and certainly of > the suddeness > of it. Another solution to consider is adding the casts back in: > > http://people.planetpostgresql.org/peter/index.php?/archives/ > 2008/03.html > (the March 4th entry) Thanks for this, that helps a lot. Do you have links to some of the key threads showing what rationale went into the decision? (Or should I just search for your name?) I'd like to read up on that first before pouring more oil into the fire. I suspect that many of those who made the decision are never faced with needing to write cross-RDBMS code. Also, I wonder why this wasn't made a configurable option so it can be disabled by a simple config file change (such as the move away from automatic OID columns). But obviously this is the wrong list for discussing this (though Bioperl-db *is* one of those pieces of software that must be cross-RDBMS). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From baucom at msg.ucsf.edu Fri Mar 21 16:13:00 2008 From: baucom at msg.ucsf.edu (Albion Baucom) Date: Fri, 21 Mar 2008 13:13:00 -0700 Subject: [Bioperl-l] SearchIO Performance Message-ID: Hi. I am pretty new to BioPerl, and have a question about performance with regard to Blast (nucleotide) file parsing. My Blast result files usually have close to 100 or more sequence hits. Each sequence is about 1400 nucleotides long. After profiling code I wrote, I find that calling the next_result() function after creating a search object takes substantially longer than non-OO, quick and dirty code I am using to parse the same Blast files. What is substantially longer? Well, the existing code takes about 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find that to be a dramatic difference, and that kind of time difference becomes significant when I have to parse 30 Blast files in a row. I understand that SearchIO is parsing the entire file and storing it all for easy retrieval later, and maybe this time penalty is what I have to pay for that convenience and organization. I am just wondering if there is anything other than writing custom code based on BioPerl to speed this up. Something I might not be aware of that I can do ahead of time, or during parsing, to limit what is parsed, or facilitate the parsing process. For instance, is there a way to "look ahead" and simply parse alignments that meet a specific expectancy cutoff? I confess I have not read the documentation thoroughly (although obviously enough to make it do what I want), but am certainly willing to do so if someone can point me in the right direction. Thanks Albion From jason at bioperl.org Fri Mar 21 17:40:00 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 21 Mar 2008 14:40:00 -0700 Subject: [Bioperl-l] SearchIO Performance In-Reply-To: References: Message-ID: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote: > Hi. I am pretty new to BioPerl, and have a question about > performance with regard to Blast (nucleotide) file parsing. My > Blast result files usually have close to 100 or more sequence hits. > Each sequence is about 1400 nucleotides long. > > After profiling code I wrote, I find that calling the next_result() > function after creating a search object takes substantially longer > than non-OO, quick and dirty code I am using to parse the same > Blast files. > > What is substantially longer? Well, the existing code takes about > 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find > that to be a dramatic difference, and that kind of time difference > becomes significant when I have to parse 30 Blast files in a row. I > understand that SearchIO is parsing the entire file and storing it > all for easy retrieval later, and maybe this time penalty is what I > have to pay for that convenience and organization. > > I am just wondering if there is anything other than writing custom > code based on BioPerl to speed this up. Something I might not be > aware of that I can do ahead of time, or during parsing, to limit > what is parsed, or facilitate the parsing process. For instance, is > there a way to "look ahead" and simply parse alignments that meet a > specific expectancy cutoff? > > I confess I have not read the documentation thoroughly (although > obviously enough to make it do what I want), but am certainly > willing to do so if someone can point me in the right direction. > We are quite aware of the speed issues. This is discussed on the wiki in brief detail. http://bioperl.org/wiki/Why_BioPerl_is_slow It boils down to the object creation not the parsing (relatively speaking). It takes a while because we're creating a lot of objects under the hood for each alignment. Sendu has written a pull parser that doesn't require creation of all the objects until the user requests them. As I've said in the past, if someone wrote SearchIO event-listener that created lightweight objects (or just hashes) instead this would also provide a substantial speedup. In the fall I did some experimentation with array-based instead of hash-based feature objects got a pretty decent speedup as well, but just haven't had any time to roll out a more substantial prototyping. For the inner-loops of things it may make sense to substitute a less-flexible but super-fast object. I always advocate thinking about what your needs are - if you just want start/stop of alignments, you can grab this out of a blast format table with the -m9 (NCBI) or --mformat =3 (WUBLAST) and you can write a fast parser that uses 'split'. > Thanks > > Albion > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From er at xs4all.nl Fri Mar 21 17:43:47 2008 From: er at xs4all.nl (Erik) Date: Fri, 21 Mar 2008 22:43:47 +0100 (CET) Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl / swissprot Message-ID: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl> Hi, PostgreSQL 8.3.1 DBD::Pg 2.3.0 perl 5.8.8 (The following error may have to do with the 8.3 problems that I reported yesterday (bug 2472) - I don't know) I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without problem. Then I ran scripts/biosql/load_seqdatabase.pl as: perl scripts/biosql/load_seqdatabase.pl \ -driver Pg \ -dbuser xxxxxxx \ -dbname bioseqdb \ -namespace swissprot \ -format swiss \ /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat It took two hours to load 26504 records (7%) of uniprot_sprot.dat (is it expected to be so slow?), then failed with: Could not store Q2UXW0: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK: Bio::DB::Persistent::PersistentObject::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK: scripts/biosql/load_seqdatabase.pl:630 ----------------------------------------------------------- I don't know if this is directly related to the 8.3 casting problems I reported yesterday (bug 2472), or a separate Bio::Species issue regards, Erik Rijkers From bix at sendu.me.uk Fri Mar 21 19:17:59 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 21 Mar 2008 23:17:59 +0000 Subject: [Bioperl-l] SearchIO Performance In-Reply-To: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org> References: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org> Message-ID: <47E44227.3050002@sendu.me.uk> Jason Stajich wrote: > > On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote: > >> Hi. I am pretty new to BioPerl, and have a question about performance >> with regard to Blast (nucleotide) file parsing. [...] >> What is substantially longer? Well, the existing code takes about 0.25 >> seconds, and the BioPerl call takes about 4.5 seconds. I find that to >> be a dramatic difference, and that kind of time difference becomes >> significant when I have to parse 30 Blast files in a row. I understand >> that SearchIO is parsing the entire file and storing it all for easy >> retrieval later, and maybe this time penalty is what I have to pay for >> that convenience and organization. [...] > Sendu has written a pull parser that > doesn't require creation of all the objects until the user requests them. > As I've said in the past, if someone wrote SearchIO event-listener that > created lightweight objects (or just hashes) instead this would also > provide a substantial speedup. Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the format to 'blast_pull'. Depending on the cirumstance and thoughtful usage, you can see orders of magnitude speed up. http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html The only disadvantage to the normal parser is that the pull parser currently only supports NCBI BLASTN and BLASTP. From hlapp at gmx.net Sat Mar 22 14:18:45 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Mar 2008 14:18:45 -0400 Subject: [Bioperl-l] Call for Student Applications - NESCent participates in the Google Summer of Code In-Reply-To: <0025B440-EF1E-4632-9DB4-B98489BF3550@duke.edu> Message-ID: <5AC4F213-8D88-41C6-B380-59B2EF7831F0@gmx.net> Hi all - just wanted to draw your attention to our Google Summer of Code participation this year. One of the projects deals directly with BioPerl, another one builds on BioSQL (and could be implemented taking advantage of BioPerl or Bio::Phylo, or Biojava). Cheers, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== Phyloinformatics Summer of Code 2008 http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008 *** Please disseminate this announcement widely to appropriate students at your institution *** The National Evolutionary Synthesis Center (NESCent: http:// www.nescent.org/) is participating in 2008 for the second year as a mentoring organization in the Google Summer of Code (http:// code.google.com/soc). Through this program, Google provides undergraduate, masters, and PhD students with a unique opportunity to obtain hands-on experience writing and extending open-source software under the mentorship of experienced developers from around the world. Our goal in participating is to train future researchers and developers to not only have awareness and understanding of the value of open-source and collaboratively developed software, but also to gain the programming and remote collaboration skills needed to successfully contribute to such projects. Students will receive a stipend from Google, and may work from their home, or home institution, for the duration of the 3 month program. Students will each have one or more dedicated mentors with expertise in phylogenetic methods and open-source software development. NESCent is particularly targeting students interested in both evolutionary biology and software development. Project ideas (see URL below) range from visualizing phylogenetic data in R, to development of a Mesquite module, web-services for phylogenetic data providers or geophylogeny mashups, implementing phyloXML support, navigating databases of networks, topology queries for PhyloCode registries, to phylogenetic tree mining in a MapReduce framework, and more. The project ideas are flexible and many can be adjusted in scope to match the skills of the student. If the program sounds interesting to you but you are unsure whether you have the necessary skills, please email the mentors at the address below. We will work with you to find a project that fits your interests and skills. INQUIRIES: Email any questions, including self-proposed project ideas, to phylosoc {at} nescent {dot} org. TO APPLY: Apply on-line at the Google Summer of Code website (http://code.google.com/soc/2008), where you will also find GSoC program rules and eligibility requirements. The 1-week application period for students opens on Monday March 24th and runs through Monday, March 31st, 2008. Hilmar Lapp and Todd Vision US National Evolutionary Synthesis Center ===== URLs: ===== 2008 NESCent Phyloinformatics Summer of Code: http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008 Eligibility requirements: http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_eligibility Stipends: http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_administrivia To sign up for quarterly NESCent newsletters: with announcements about upcoming programs at the Center: http://www.nescent.org/about/contact.php From hlapp at gmx.net Sat Mar 22 15:30:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Mar 2008 15:30:07 -0400 Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl> References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl> Message-ID: <14191FB6-A8DF-4F60-9FF7-FDAA8F3974B8@gmx.net> Hi Erik, I suspect that's a seperate Bio::Species issue. If you query your BioSQL database for the existence of the taxon: SELECT * FROM taxon WHERE ncbi_taxon_id = 326939; do you get a result? If not, then for some reason the taxon wasn't yet included in the NCBI taxonomy that you loaded. If yes, then somehow BioPerl didn't properly parse out the taxonID from the record. There should have been another message preceding the error below, could you post that too? Otherwise, can you rerun with -- printerror as command line argument? Note also that you can always specify --safe to go past any loading error. In fact that's what I recommend doing unless you want to debug why a particular record doesn't load. BTW I would recommend that you restore the CASTs that were removed in Pg 8.3; otherwise you may hit random issues in Bioperl-db whenever a parameter value for a string-type column happens to be a number. (taxon.ncbi_taxon_id is of type integer) See http://people.planetpostgresql.org/peter/index.php?/archives/18- Readding-implicit-casts-in-PostgreSQL-8.3.html as per Greg's email. -hilmar On Mar 21, 2008, at 5:43 PM, Erik wrote: > Hi, > > PostgreSQL 8.3.1 > DBD::Pg 2.3.0 > perl 5.8.8 > > (The following error may have to do with the 8.3 problems > that I reported yesterday (bug 2472) - I don't know) > > I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without > problem. > > Then I ran scripts/biosql/load_seqdatabase.pl as: > > perl scripts/biosql/load_seqdatabase.pl \ > -driver Pg \ > -dbuser xxxxxxx \ > -dbname bioseqdb \ > -namespace swissprot \ > -format swiss \ > /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat > > It took two hours to load 26504 records (7%) of > uniprot_sprot.dat (is it expected to be so slow?), then > failed with: > > Could not store Q2UXW0: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: create: object (Bio::Species) failed to insert or to > be found by unique key > STACK: Error::throw > STACK: Bio::Root::Root::throw > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:206 > STACK: Bio::DB::Persistent::PersistentObject::create > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ > PersistentObject.pm:244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:251 > STACK: Bio::DB::Persistent::PersistentObject::store > /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ > PersistentObject.pm:271 > STACK: scripts/biosql/load_seqdatabase.pl:630 > ----------------------------------------------------------- > > > I don't know if this is directly related to the 8.3 > casting problems I reported yesterday (bug 2472), or a > separate Bio::Species issue > > > regards, > > Erik Rijkers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Mar 22 16:01:51 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Mar 2008 16:01:51 -0400 Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl / swissprot In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl> References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl> Message-ID: <69D3EA33-810B-40EA-8687-752FA1A34FBF@gmx.net> Forgot to respond to this: On Mar 21, 2008, at 5:43 PM, Erik wrote: > It took two hours to load 26504 records (7%) of uniprot_sprot.dat > (is it expected to be so slow?) The last time I used to load those regularly it was a bit faster (~ 5 seqs/s) but it is in a ballpark that wouldn't raise a red flag for me. BTW you can make it print statistics using the --logchunk N option, where N is the number of seqs after which you want the current count and the #recs/s printed. You may get it to be faster if you tune the database (e.g., make sure there is enough memory for index reorganization, transaction log and tablespace datafile are on separate disks, etc; fiddling with the query optimizer has probably little effect as almost all queries are simple lookups or inserts). That all said, the strength of load_seqdatabase.pl isn't speed. It doesn't make use of any bulk upload optimizations, and therefore the initial load of a very large database will take its time. The power is more in subsequent updates where you can configure what you want to happen, and during which the database is never in an inconsistent state, so it can run in the background. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From er at xs4all.nl Sat Mar 22 16:34:14 2008 From: er at xs4all.nl (Erik) Date: Sat, 22 Mar 2008 21:34:14 +0100 (CET) Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot Message-ID: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl> On Sat, March 22, 2008 20:30, Hilmar Lapp wrote: > SELECT * FROM taxon WHERE ncbi_taxon_id = 326939; No, I don't seem to have that particular id, although I ran the ncbi load script yesterday just before the sprot. Btw, in the meantime I figured out that it was a parsing error choking on an unexpected period. You asked for preceding errors, but there were none. I have now restarted the same uniprot_sprot.dat load with --safe, which if I understand you correctly will just skip any non-parsable records. And wrt the postgres 8.3 casting: I only added the first cast of the list of Peter Eisentraut: CREATE FUNCTION pg_catalog.text(integer) RETURNS text STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT textin(int4out($1));'; --added 20080322 CREATE CAST (integer AS text) WITH FUNCTION pg_catalog.text(integer) AS IMPLICIT; --added 20080322 I hope eventually a more durable solution will be found - I fear this reinstalling of old casting functionality will generate unexpected problems of it's own. But it seems a good intermediary solution; with it, the previously failing t/16odba.t succeeds... Thank you, Erik Rijkers From hlapp at gmx.net Sat Mar 22 17:16:18 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Mar 2008 17:16:18 -0400 Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot In-Reply-To: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl> References: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl> Message-ID: <3C253027-5A2B-4C0C-9DF6-A0DA84CC96A8@gmx.net> On Mar 22, 2008, at 4:34 PM, Erik wrote: > On Sat, March 22, 2008 20:30, Hilmar Lapp wrote: > >> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939; > > No, I don't seem to have that particular id, although I > ran the ncbi load script yesterday just before the sprot. Odd. It's on the NCBI taxonomy browser. Maybe just was added the other day? > Btw, in the meantime I figured out that it was a parsing > error choking on an unexpected period. Do you want to report that to the BioPerl category on bugzilla.open- bio.org? > > You asked for preceding errors, but there were none. > > I have now restarted the same uniprot_sprot.dat load with > --safe, which if I understand you correctly will just skip > any non-parsable records. And all records that cause some other database error when inserting. Note that if you didn't erase the previously loaded records, you will either need to choose a new namespace, or, better, use the --lookup and --noupdate flags. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mrphysh at juno.com Sat Mar 22 18:39:22 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Sat, 22 Mar 2008 22:39:22 GMT Subject: [Bioperl-l] these objects are pretty cool Message-ID: <20080322.163922.21808.1@webmail01.vgs.untd.com> I am starting to understand how to use the objects. I am the sort who wants to understand how things work, at least on some level. I think my understanding wold be increased with knowledge of the actual contents of the object. My book says they are hashes and that makes sense: field-value.......field-value.......field-value But as far as I can tell they cannot be taken apart like a regular hash. How can I print out the contents? this makes sense to me: xxxxxxxxxxxxxxxx use Bio::SeqIO; #these objects were made for file input...conversion...file output while ( my $seq = $out->next_seq() ) {print "$seq\n"; } #or..how about this? foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; } xxxxxxxxxxxxxx this gives back: Bio::Seq::RichSeq=HASH(0x860dcdc) Bio::Seq=HASH(0x85f5a20) Is there some way to slice up the object and look at the parts? John _____________________________________________________________ Click to get a free auto insurance quotes from top companies. http://thirdpartyoffers.juno.com/TGL2121/fc/REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/ From jason at bioperl.org Sat Mar 22 18:46:46 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 22 Mar 2008 15:46:46 -0700 Subject: [Bioperl-l] these objects are pretty cool In-Reply-To: <20080322.163922.21808.1@webmail01.vgs.untd.com> References: <20080322.163922.21808.1@webmail01.vgs.untd.com> Message-ID: it's got methods that you need to call to get the data. Did you try looking at any of the howtos - they discuss this sort of thing. http://bioperl.org/wiki/HOWTOs -jason On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote: > > I am starting to understand how to use the objects. > > I am the sort who wants to understand how things work, at least on > some level. I think my understanding wold be increased with > knowledge of the actual contents of the object. My book says they > are hashes and that makes sense: field-value.......field- > value.......field-value > > But as far as I can tell they cannot be taken apart like a regular > hash. > > How can I print out the contents? this makes sense to me: > xxxxxxxxxxxxxxxx > use Bio::SeqIO; > > #these objects were made for file input...conversion...file output > > while ( my $seq = $out->next_seq() ) {print "$seq\n"; } > > #or..how about this? > > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; } > xxxxxxxxxxxxxx > this gives back: > > Bio::Seq::RichSeq=HASH(0x860dcdc) > Bio::Seq=HASH(0x85f5a20) > > Is there some way to slice up the object and look at the parts? > > John > _____________________________________________________________ > Click to get a free auto insurance quotes from top companies. > http://thirdpartyoffers.juno.com/TGL2121/fc/ > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robfsouza at gmail.com Sat Mar 22 19:11:49 2008 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Sat, 22 Mar 2008 20:11:49 -0300 Subject: [Bioperl-l] these objects are pretty cool In-Reply-To: References: <20080322.163922.21808.1@webmail01.vgs.untd.com> Message-ID: Hi John, The methods usually are designed to provide simple means to access the object's internal data structure. They are intended to hide the internal data structure, replacing the need to understand it by a binding to the documented class methods. If the documentation does not satisfy you and you still want to take a look at its data structure, try dumping the object with a code like use Bio::SeqIO; use Data::Dumper; while ( my $seq = $out->next_seq() ) { print Dumper($seq),"\n"; } and check ou chapters four and five of Programming Perl. Best, Robson PS: watch out for lots of printed output... 2008/3/22, Jason Stajich : > it's got methods that you need to call to get the data. Did you try > looking at any of the howtos - they discuss this sort of thing. > > http://bioperl.org/wiki/HOWTOs > > > -jason > > On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote: > > > > > I am starting to understand how to use the objects. > > > > I am the sort who wants to understand how things work, at least on > > some level. I think my understanding wold be increased with > > knowledge of the actual contents of the object. My book says they > > are hashes and that makes sense: field-value.......field- > > value.......field-value > > > > But as far as I can tell they cannot be taken apart like a regular > > hash. > > > > How can I print out the contents? this makes sense to me: > > xxxxxxxxxxxxxxxx > > use Bio::SeqIO; > > > > #these objects were made for file input...conversion...file output > > > > while ( my $seq = $out->next_seq() ) {print "$seq\n"; } > > > > #or..how about this? > > > > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; } > > xxxxxxxxxxxxxx > > this gives back: > > > > Bio::Seq::RichSeq=HASH(0x860dcdc) > > Bio::Seq=HASH(0x85f5a20) > > > > Is there some way to slice up the object and look at the parts? > > > > John > > _____________________________________________________________ > > Click to get a free auto insurance quotes from top companies. > > http://thirdpartyoffers.juno.com/TGL2121/fc/ > > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/ > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From er at xs4all.nl Sat Mar 22 19:36:13 2008 From: er at xs4all.nl (Erik) Date: Sun, 23 Mar 2008 00:36:13 +0100 (CET) Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot Message-ID: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl> Hi Hilmar, > either need to choose a new namespace, or, better, use the > --lookup and --noupdate flags. scripts/biosql/load_seqdatabase.pl is now churning along to load uniprot_sprot.dat. I'll try to gather up the rejected records for further inspection / parser improvement. The next thing is performance, it's really intolerably slow, and I don't think the database is the bottleneck - isn't it more likely bioperl object heaviness? I get continuous near 100% load for 1 cpu (this machine has 2 cpus). I could give it 10 or more processors; I am thinking I could cut up the input into 10 (or more) chunks. Is there anything specific in bioperl/biosql that knows how to use multiple cores? thank you very much for your help Erik Rijkers From hlapp at gmx.net Sat Mar 22 21:40:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Mar 2008 21:40:55 -0400 Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot In-Reply-To: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl> References: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl> Message-ID: <1B55060F-534D-4732-B428-4863DD098654@gmx.net> On Mar 22, 2008, at 7:36 PM, Erik wrote: > The next thing is performance, it's really intolerably > slow, and I don't think the database is the bottleneck - > isn't it more likely bioperl object heaviness? I get > continuous near 100% load for 1 cpu (this machine has 2 > cpus). Is the database on the same machine? If yes, and a significant fraction (~30-50% or even more) of the load are generated by the perl script, rather than almost everything coming from the postmaster, then indeed the database is not the bottleneck. Of course, the bioperl object creation overhead takes a toll too. I would be surprised though if BioPerl can't parse more than 3.6 records/s on a modern CPU; you can convince yourself of that though by writing a simple script along the lines of the following and see how fast that goes: my $seqio = Bio::SeqIO->new(-file => ' 'swiss); my $n = 0; while (my $seq = $seqio->next_seq) { $n++; # print something every 5,000 sequences or so } But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't suitable for your use-case? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Mar 23 10:09:56 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 23 Mar 2008 09:09:56 -0500 Subject: [Bioperl-l] Using Bioperl book In-Reply-To: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org> References: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org> Message-ID: <4C401D4F-064C-43F9-A37C-14FA65A96657@uiuc.edu> Maybe something to discuss at BOSC? chris On Mar 19, 2008, at 12:54 PM, Jason Stajich wrote: > it's probably more than 6 months out. We still haven't finished > writing it as life and work continues to intrude on book writing. > > -jason > On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote: > >> Hello, >> >> i just found on amazon something about a book "Using Bioperl", >> published >> on the 1st of March 2008 but which is no more available. >> >> Does anyone know how to get it ? >> >> Many thanks, >> >> Jorge. >> >> --- >> Jorge Duarte >> Bioinformatics Software Engineer >> BIOGEMMA >> Z.I. Du Br?zet >> 8, Rue des Fr?res Lumi?re >> 63028 CLERMONT FERRAND Cedex 2 >> FRANCE >> Tel : +33 (0)4 73 39 60 73 >> Fax : +33 (0)4 73 39 60 71 >> E-mail : jorge.duarte at biogemma.com >> >> ***************************************************************** >> Pour toute demande de support merci d'inclure >> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com >> dans les destinataires lors du premier contact >> ***************************************************************** >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Mar 23 10:17:56 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 23 Mar 2008 09:17:56 -0500 Subject: [Bioperl-l] Priorities for a bioperl-1.6 release In-Reply-To: <47DFE089.1070304@sendu.me.uk> References: <47DFE089.1070304@sendu.me.uk> Message-ID: On Mar 18, 2008, at 10:32 AM, Sendu Bala wrote: > aaron.j.mackey at gsk.com wrote: >>> Or is the split intended to be 'core' == "anything and everything >>> that was in 1.4", '????' == "everything else"? In which case, >>> what's a good name for "modules created after 1.4"? 'crust'? ;) >> Nah, "icing". >> a module "use" map might be very useful to help identify "core" vs. >> other layers of mantle/crust/icing. >> http://www.perlmonks.org/?node_id=87329 http://search.cpan.org/src/NEILB/pmusage-1.2/ > > Thanks for those. Neither could quite cope with BioPerl, but I've > munged > them together and hacked up 'module_usage.pl' which I've just > committed > to the maintenance directory of bioperl-live. > > module_usage.pl ../Bio > > Produces: > *warning, may crash your browser; download it and view in a dedicated > image viewer* > http://bix.sendu.me.uk/files/module_usage.jpeg > http://bix.sendu.me.uk/files/module_usage.txt > > ... > > I haven't done any full analysis along these lines and leave as an > exercise for the interested reader for now ;) I'm coming into this late (just got back) but I agree, this would be very useful. Your updates based on Aaron's comments help quite a bit. > Chris Fields wrote: >> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules >> I'm pretty flexible on any of that; it's a proposal only and I think >> some of it may be wrongheaded, but hey, I'm willing to take a few >> rotten tomatoes. The key issue is we should try to work out what we >> mean by 'core' or the core library. I have a rather extreme view of >> it as being the bare essentials without external, non-perl core >> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI >> and required modules for those classes) but I'm sure others would >> lump in parsers, DB functionality, etc. I basically suggest placing >> those (and any stable but potentially non-core code) in a >> 'bioperl-main', with any unstable or untested code going into a >> 'bioperl-unstable'. > > My thoughts are along these lines: > # I agree that core should have no external dependencies > # I agree that it might mostly be interfaces > # It should represent a framework with all the interfaces (that have > stable APIs), directory structure and base classes that everything > else relies on > # It might not do much useful bioinformatics, but provides just about > everything needed for a dev to create a new module that does Yes, that's essentially the idea. >> In essence, bioperl-main would require core and resemble a stable >> release; bioperl-unstable would require bioperl-main (and core) and >> resemble a dev release. Not sure how versioning would go or if this >> is a viable option at all, but it's worth discussing. > > # I agree that this 3-way split seems reasonable > # bioperl-main would consist primarily of the 'leaves' of the module > tree, mostly parsers and the like which, whilst 'stable' and tested > should still be split away from core because the data sources they > parse could change format slightly > # bioperl-unstable, better bioperl-bleed, would feature brand-new > stuff, be it new parsers for totally new formats, new APIs that do > something not thought of before etc. When they are complete, bug-free > and have stood the test of time they get moved into bioperl-main. > (It is not a place for all new commits; bug fixes to something in > bioperl-main would be committed to bioperl-main) > # The current splits (bioperl-run, bioperl-network etc.) do not get > their own core and bleed variant. Anything they need for core > functionality would enter the single bioperl-core, anything new > would enter the single bioperl-bleed, and anything stable would > be in their own bioperl-[package] > > Discuss :) We can work on updating the plan via the wiki as well as the mail list. I find it easier to track; we can always link back to the mail list when needed. http://www.bioperl.org/wiki/Proposed_1.6_core_modules http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules chris From er at xs4all.nl Sun Mar 23 14:16:05 2008 From: er at xs4all.nl (Erik) Date: Sun, 23 Mar 2008 19:16:05 +0100 (CET) Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot Message-ID: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote: > But maybe load_seqdatabase.pl or even BioSQL or BioPerl > aren't suitable for your use-case? well, that may turn out to be the case, but I'm not quite deterred yet. I am in a situation like many others, I think: microarray, mass spec, and chipseq (Solexa) data all need annotation,and while it is easy to retrieve some useful records from public data sources (entrez, ensembl & biomart, etc.), it is not so easy to have such high atomicity in the locally stored annotation data that fine-grained filtering and sorting on a sql level becomes possible. I hope the bioperl parsers, together with the biosql schema, will give SQL access to all or most data bits. And I understand GBrowse can run on top of BioSQL/Pg too, albeit somewhat preliminary; this is another usage I will need. btw, should not all those references to postgres 7.3 be upgraded to something newer, like 8.2.7 (maybe not yet 8.3 heh) ? 7.3 is not supported anymore by the pg project. Sprot loaded in 20 hours. Only 170 were rejected - not too bad. Thanks, Erik Rijkers From hlapp at gmx.net Sun Mar 23 15:22:46 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Mar 2008 15:22:46 -0400 Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot In-Reply-To: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl> References: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl> Message-ID: On Mar 23, 2008, at 2:16 PM, Erik wrote: > On Sun, March 23, 2008 02:40, Hilmar Lapp wrote: >> But maybe load_seqdatabase.pl or even BioSQL or BioPerl >> aren't suitable for your use-case? > > well, that may turn out to be the case, but I'm not quite > deterred yet. > > I am in a situation like many others, I think: microarray, > mass spec, and chipseq (Solexa) data all need > annotation,and while it is easy to retrieve some useful > records from public data sources (entrez, ensembl & > biomart, etc.), it is not so easy to have such high > atomicity in the locally stored annotation data that > fine-grained filtering and sorting on a sql level becomes > possible. I hope the bioperl parsers, together with the > biosql schema, will give SQL access to all or most data > bits. If you mean annotation by data bits then yes, it should be fairly normalized (possibly more normalized than you want, in fact). Also, using BioSQL as the sequence and sequence annotation model add- on to some other database holding your lab data is what many others have used it for too. > > And I understand GBrowse can run on top of BioSQL/Pg too, > albeit somewhat preliminary; this is another usage I will > need. It can, though keep in mind that that's not the use-case it (BioSQL) was built for. If you need to have rapid access to genome intervals with 10s of thousands of features and their annotation, you'll have start thinking about a more de-normalized data store to run this off of, such as populating a native GBrowse GFF store. > > btw, should not all those references to postgres 7.3 be > upgraded to something newer, like 8.2.7 (maybe not yet 8.3 > heh) ? 7.3 is not supported anymore by the pg project. Oops, indeed. Where are they? > > Sprot loaded in 20 hours. Only 170 were rejected - not too > bad. That's great. Would be nice if you can provide some rough summary as to why they were rejected (if that's obvious), such as taxon errors, or other errors. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From sac at bioperl.org Sun Mar 23 18:20:43 2008 From: sac at bioperl.org (Steve Chervitz) Date: Sun, 23 Mar 2008 15:20:43 -0700 Subject: [Bioperl-l] HitTableWriter error In-Reply-To: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk> References: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk> Message-ID: <8f200b4c0803231520o5082f9f5mf7be8cd061faa98f@mail.gmail.com> Hi Owen Sorry, I don't have time to look into this right now, but two thoughts: 1) The ResultTableWriter is intended to convert standard blast reports into a tabular format. So if you already have tabular results, you are sort of using it "off label", but in principle, it should work. 2) The iteration method is only available to BlastHit objects, since it is only relevant to PSI-blast results. The fact that you got this error when working with blast results indicates that the parser did not generate the correct object type for your hits, using the GenericHit when it should have used BlastHit. This is just a hunch and would be worth following up on. Can you submit this as a bug report? Thanks, Steve On Thu, Mar 20, 2008 at 4:56 AM, Owen Lancaster wrote: > Hello > > I hope you don't mind me emailing you but I have come across a problem > when trying to use HitTableWriter. The error can be seen below - the > input for the script is the BLAST tabular output (specified with the - > m 8 option) from a blastn search. > > If you have any idea what the problem might be I would much appreciate > it! Hope you can help... > > Thanks > > Owen > > > Using default column map. > > ------------- EXCEPTION ------------- > MSG: Trouble in ResultTableWriter::_set_row_data_func() eval: Can't > locate object method "iteration" via package > "Bio::Search::Hit::GenericHit" at (eval 97) line 1, line 2. > > > > STACK Bio::SearchIO::Writer::ResultTableWriter::__ANON__ /Library/Perl/ > 5.8.8/Bio/SearchIO/Writer/ResultTableWriter.pm:328 > STACK Bio::SearchIO::Writer::HitTableWriter::to_string /Library/Perl/ > 5.8.8/Bio/SearchIO/Writer/HitTableWriter.pm:268 > STACK Bio::SearchIO::write_result /Library/Perl/5.8.8/Bio/SearchIO.pm: > 331 > STACK Bio::SearchIO::blast::write_result /Library/Perl/5.8.8/Bio/ > SearchIO/blast.pm:2208 > STACK toplevel ./generate_discordant_tails.pl:62 > > -------------------------------------- > > From greg at turnstep.com Sun Mar 23 20:42:36 2008 From: greg at turnstep.com (Greg Sabino Mullane) Date: Mon, 24 Mar 2008 00:42:36 -0000 Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to integer any longer In-Reply-To: Message-ID: <4ab14dcc59d7566b55ba87027055e9fd@biglumber.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >> Depending on what I (or can someone else update us on this?) find out >> for the DBD::Pg plans, I'll probably start looking into moving the >> parameter binding into the driver adapters. Though it does feel >> pathetic that this is now also not transparent between drivers. > > What you are probably looking for is already there, namely: > > $dbh->{pg_server_prepare} = 0; > So disabling server-side prepares will leave values quoted? Having > server-side prepares would be very useful though, especially for > Bioperl-db with its many lookup queries that all use similar > parameter values. Yes, it forces DBD::Pg to do the quoting itself, which basically means that everything is shipped to the server as a single SQL string, and no placeholders are used. In the grand scheme of things, the speed difference is not large for most queries. Certainly one way would be to turn this on for 8.3 and above, and slowly migrate the queries/schema over time. >> There's good reasons for the casting enforcement in 8.3 > I do understand that, but it's also a sharp contrast to other RDBMSs > that doesn't it make it easier for people to choose Pg when they > should, and doesn't help writing cross-platform database applications > either. I'm not overly familiar with how other databases treat this, but I've heard DB2 can be a stickler about this too. I've not dug into the bioperl code in a while, to be honest, so I'm not sure what sort of queries we're talking about. Certainly long-term the code and schema should move away from implicit casting. Maybe a better short-term solution is addind the more obvious casts (e.g. text<->int) back in. > Do you have links to some of the key threads showing what rationale > went into the decision? (Or should I just search for your name?) I'd > like to read up on that first before pouring more oil into the fire. > I suspect that many of those who made the decision are never faced > with needing to write cross-RDBMS code. > > Also, I wonder why this wasn't made a configurable option so it can > be disabled by a simple config file change (such as the move away > from automatic OID columns). But obviously this is the wrong list for . discussing this (though Bioperl-db *is* one of those pieces of > software that must be cross-RDBMS). I did ask about that, and was told it would not have been easy to do so. But I agree, a phasing in period (heck, even a warning) would have been nice. Feel free to pour some oil on the fire, I think this is one of many apps that has been affected. (I've run across two other major cross-DB apps (Interchange and MediaWiki) that are struggling with the same pain. I managed to painfully fix the latter, but the former is way too complex to tackle at the moment). I could not find the thread(s?) I weighed in on, but you can find some relevant discussions by googling "strict-typing benefits grokbase" - -- Greg Sabino Mullane greg at turnstep.com PGP Key: 0x14964AC8 200803232039 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkfm+NAACgkQvJuQZxSWSsi4ogCdGNWvCJIzXxb+YKzdm6wwxQMv p3AAnizkWXoo/rvxv4KVdC8tD0vF87k3 =dNYi -----END PGP SIGNATURE----- From er at xs4all.nl Sun Mar 23 20:45:49 2008 From: er at xs4all.nl (Erik) Date: Mon, 24 Mar 2008 01:45:49 +0100 (CET) Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot Message-ID: <19067.156.83.1.157.1206319549.squirrel@webmail.xs4all.nl> On Sun, March 23, 2008 20:22, Hilmar Lapp wrote: > > On Mar 23, 2008, at 2:16 PM, Erik wrote: >> Sprot loaded in 20 hours. Only 170 were rejected - not >> too bad. > > That's great. Would be nice if you can provide some rough > summary as to why they were rejected (if that's obvious), such as taxon errors, > or other errors. see http://bugzilla.open-bio.org/show_bug.cgi?id=2474 So I think one easy improvement will be to enlarge that varchar(40) column, dbxref.accession. See the following: select dbname , accession , length(accession) from dbxref where accession ~ 'Cyc' order by length(accession) desc limit 100 patch attached. (which will probably get bug 2389 resolved) It seems to me bioentry.accession (maybe identifier too?) needs a similar enlargement. thanks, Erikjan -------------- next part -------------- A non-text attachment was scrubbed... Name: biosqldb-pl.sql.diff Type: application/octet-stream Size: 535 bytes Desc: not available URL: From Russell.Smithies at agresearch.co.nz Wed Mar 26 22:13:16 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 27 Mar 2008 15:13:16 +1300 Subject: [Bioperl-l] Bioinformatician wanted Message-ID: Dear colleagues, It would be appreciated if you could bring the following position to the attention of potential candidates. Bioinformatician wanted Many of the world's most amazing scientific discoveries are the result of someone 'taking a closer look'. It's this inquisitive nature and relentless search for answers that fuels scientific advancement. And it's also what we'd like you to apply to us, right now. Mind you, you won't need to look too hard to discover that AgResearch is the best place to break all new ground in your career. At first glance you'll see we are New Zealand's largest research institute -world leaders in pastoral research working at the leading-edge of innovation. Scratch the surface a little more and you'll find all the diversity and intellectual challenge a Bioinformatician could ask for. This is a highly collaborative role where you'll be involved in everything from the analysis of genomic data to the design, development, implementation and testing of bioinformatics tools. Knowledge sharing is a pivotal component of our success, so you can also look forward to acting in consultant capacity (both internally and externally) and the autonomy to contribute to scientific publications. Our people are at the pinnacle in their professions, so with your biological background, higher qualification in bioinformatics or computing and your experience in contributing bioinformatics expertise to research groups, you'll not only fit right in, you'll hit the ground running. An outstanding communicator, time manager and relationship builder, you'll also come to us with a thorough knowledge of Unix, pipeline-development, web based technologies and scripting and programming languages. AgResearch is a unique organisation at the forefront of our field, and as far as your future's concerned, that makes us well worth a closer look. There are many benefits waiting to be discovered here, so isn't it time you experienced them? The job description is available online and applications are invited at www.agresearch.co.nz/recruitment/ Reference AGR661, or contact Nauman Maqbool for further information. Applications close 11 April 2008. Regards, Russell Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz AgResearch Farming Food and Health. First Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Marc.Logghe at ablynx.com Thu Mar 27 09:26:24 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Thu, 27 Mar 2008 14:26:24 +0100 Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds Message-ID: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> Hi all, I am a little bit confused about the above mentioned seq_inds() method. At first, I had the impression that the method returns an array of positions in the hsp (hit or query) sequence. At least that is what one would expect looking at the example usage in the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods second code block). Am I correct in believing you can only do this if your hsp query stretch starts at position 1 of the query sequence? I think seq_inds() returns a list of positions relative to the query/hit sequence. So, the code shown in the HOWTO is a kind of special case. However, I do not understand how seq_inds() is dealing with gaps. An example. If you blast the worm protein ZK822.4 against swissprot using blastp at ncbi you get this hsp as top: >sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460 Length=1461 Score = 35.8 bits (81), Expect = 0.48, Method: Composition-based stats. Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%) Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 453 +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193 Now, if you call seq_inds(query => 'gap') on that particular hsp object, you get these positions: 417, 431, 432. Obviously, there is no gap in the original query sequence at these positions. How do you have to read these numbers ? Remark also that for instance 417 is the res just in front of the gap. Regards, Marc From bix at sendu.me.uk Thu Mar 27 10:46:35 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 27 Mar 2008 14:46:35 +0000 Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> Message-ID: <47EBB34B.8010606@sendu.me.uk> Marc Logghe wrote: > Hi all, > > I am a little bit confused about the above mentioned seq_inds() method. > At first, I had the impression that the method returns an array of > positions in the hsp (hit or query) sequence. Yes... > At least that is what one would expect looking at the example usage in > the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > second code block). > > Am I correct in believing you can only do this if your hsp query stretch > starts at position 1 of the query sequence? No... > Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 453 > +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L > Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193 > > Now, if you call seq_inds(query => 'gap') on that particular hsp object, > you get these positions: 417, 431, 432. Obviously, there is no gap in > the original query sequence at these positions. > How do you have to read these numbers ? Remark also that for instance > 417 is the res just in front of the gap. Its purpose is to let you know the position in query or subject coordinates where something interesting happened in the alignment. So seq_inds(query => 'gap') is telling you all the places that a gap starts in the alignment in terms of the query coordinates. Hence 417 etc. (Actually, does 432 make sense? Shouldn't it be 431 twice?) From Marc.Logghe at ablynx.com Thu Mar 27 11:09:56 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Thu, 27 Mar 2008 16:09:56 +0100 Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds In-Reply-To: <47EBB34B.8010606@sendu.me.uk> References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> <47EBB34B.8010606@sendu.me.uk> Message-ID: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com> Hi Sendu, Chris > > At least that is what one would expect looking at the example usage in > > the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > second code block). > > > > Am I correct in believing you can only do this if your hsp query stretch > > starts at position 1 of the query sequence? > > No... # put all the conserved matches in query strand into an array my @str_array = split "",$hsp->query_string; foreach ( $hsp->seq_inds('query','conserved') ){ push @conserved,$str_array[$_ - 1]; } $hsp->query_string will return 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL' In my example using the 'gap' class (instead of 'conserved'), @str_array will contain 417, 431 and 432. The off-by-one indices do not exist in that array. Therefore, I still think the howto shows a special case where the hsp query sequence starts at 1 (compared to 402 in my particular example). > > > > Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 453 > > +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L > > Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193 > > > > Now, if you call seq_inds(query => 'gap') on that particular hsp object, > > you get these positions: 417, 431, 432. Obviously, there is no gap in > > the original query sequence at these positions. > > How do you have to read these numbers ? Remark also that for instance > > 417 is the res just in front of the gap. > > Its purpose is to let you know the position in query or subject > coordinates where something interesting happened in the alignment. So > seq_inds(query => 'gap') is telling you all the places that a gap starts > in the alignment in terms of the query coordinates. Hence 417 etc. So, this means you have to interpret that as a gap is coming after 417 ? > > > (Actually, does 432 make sense? Shouldn't it be 431 twice?) Don't know, depends on how you have to 'read' this. Thanks for looking into this. Regards, Marc From cjfields at uiuc.edu Thu Mar 27 11:05:59 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 27 Mar 2008 10:05:59 -0500 Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> Message-ID: <489252B3-2255-45C3-9219-A8F8A0978B89@uiuc.edu> According to the GenericHSP::seq_inds() POD, seq_inds() reports residue positions (indices) for the query/subject based on identity/ conservation, i.e. these are fro the original sequence positions as determined by the HSP data, not alignment column positions. 'gaps' should be reported at the position prior to where a gap is inserted. However I think something is getting borked when the gap length is longer than one, so I would partially qualify this as a bug. Example: When I ran this using bioperl-live it gives a different set of gaps indices which appear to be correct. I reran the BLASTP using the web form using your query against swissprot and parsed it. I got slightly different results for the BLAST report (probably differences in the query sequence): >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460 Length=1461 Score = 35.8 bits (81), Expect = 0.47, Method: Composition-based stats. Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%) Query 394 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 445 +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193 ..... seq_inds('query' => 'gaps') reports 409,423, and 424, which is partially correct, e.g. there is a gap inserted after position 409 and 423 in the query. However, no gap is present after 424; I think this occurs b/c the gap length is 2. The other HSPs report similar problems. chris P.S. Just saw than Sendu posted; I agree, seq. positions with gap lengths > 1 should be repeated. Should be easy to fix that. On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote: > Hi all, > > I am a little bit confused about the above mentioned seq_inds() > method. > At first, I had the impression that the method returns an array of > positions in the hsp (hit or query) sequence. > > At least that is what one would expect looking at the example usage in > the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > second code block). > > Am I correct in believing you can only do this if your hsp query > stretch > starts at position 1 of the query sequence? > > I think seq_inds() returns a list of positions relative to the query/ > hit > sequence. So, the code shown in the HOWTO is a kind of special case. > > However, I do not understand how seq_inds() is dealing with gaps. > > An example. If you blast the worm protein ZK822.4 against swissprot > using blastp at ncbi you get this hsp as top: > > > >> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460 > Length=1461 > > Score = 35.8 bits (81), Expect = 0.48, Method: Composition-based > stats. > Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%) > > Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL > 453 > +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L > Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL > 193 > > > > Now, if you call seq_inds(query => 'gap') on that particular hsp > object, > you get these positions: 417, 431, 432. Obviously, there is no gap in > the original query sequence at these positions. > How do you have to read these numbers ? Remark also that for instance > 417 is the res just in front of the gap. > > Regards, > > Marc > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 27 12:04:20 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 27 Mar 2008 11:04:20 -0500 Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds In-Reply-To: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com> References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com> <47EBB34B.8010606@sendu.me.uk> <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com> Message-ID: On Mar 27, 2008, at 10:09 AM, Marc Logghe wrote: > Hi Sendu, Chris > >>> At least that is what one would expect looking at the example usage > in >>> the HOWTOs > (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> second code block). >>> >>> Am I correct in believing you can only do this if your hsp query > stretch >>> starts at position 1 of the query sequence? >> >> No... > > > # put all the conserved matches in query strand into an array > my @str_array = split "",$hsp->query_string; > foreach ( $hsp->seq_inds('query','conserved') ){ > push @conserved,$str_array[$_ - 1]; > } > > > $hsp->query_string will return > 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL' > > In my example using the 'gap' class (instead of 'conserved'), > @str_array > will contain 417, 431 and 432. The off-by-one indices do not exist in > that array. > Therefore, I still think the howto shows a special case where the hsp > query sequence starts at 1 (compared to 402 in my particular example). We'll have to look at it; it should probably be clarified particularly in reference to 'gaps' and use of seq positions vs. HSP (or alignment) positions. Think of it this way; seq_inds() takes 'identical', 'conserved', etc., all of which refer to the original positions (indices) of the sequence which fall into the particular category asked for. In these cases we are using the coordinates for query/hit directly from the HSP info in the report. This is done with the express purpose of mapping attributes back to the original sequence, be it the query or subject. Gaps, however, are tricky, since sequence coordinates refer to residues (not gaps) when using BLAST. In this case we use the sequence position prior to the gap to note where a gap is inserted. The previous results, then, would be wrong as there is no gap inserted after 432. I just committed a fix which just repeats the position based on the number of gaps. >>> Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL > 453 >>> +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L >>> Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL > 193 >>> >>> Now, if you call seq_inds(query => 'gap') on that particular hsp > object, >>> you get these positions: 417, 431, 432. Obviously, there is no gap > in >>> the original query sequence at these positions. >>> How do you have to read these numbers ? Remark also that for > instance >>> 417 is the res just in front of the gap. >> >> Its purpose is to let you know the position in query or subject >> coordinates where something interesting happened in the alignment. So >> seq_inds(query => 'gap') is telling you all the places that a gap > starts >> in the alignment in terms of the query coordinates. Hence 417 etc. > > So, this means you have to interpret that as a gap is coming after > 417 ? Yes. >> (Actually, does 432 make sense? Shouldn't it be 431 twice?) > Don't know, depends on how you have to 'read' this. > Thanks for looking into this. > Regards, > Marc Repeating the position based on the number of gaps is now the default in bioperl-live. Just working on fixing problems with collapsing numbers and tests and everything should be fine. chris From hiekeen at gmail.com Sat Mar 29 12:09:18 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Sun, 30 Mar 2008 00:09:18 +0800 Subject: [Bioperl-l] Gene Id converts. Message-ID: Hi, I have a list of gene bank accession id. I want to convert these ids to NCBI id. For example: >From NM_011917 to 2919914. How can I do it? Thanks -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From cjfields at uiuc.edu Sat Mar 29 13:42:50 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 29 Mar 2008 12:42:50 -0500 Subject: [Bioperl-l] Gene Id converts. In-Reply-To: References: Message-ID: There are the GenBank LiveLists (updated every Sunday), which has accession/version/UID mappings for nuc and protein GenBank records. I haven't used it personally but it's worth a look: ftp://ftp.ncbi.nih.gov/genbank/livelists/ There is also gene2accession, which contains mappings between accession and UID (though this is more EntrezGene-related, I believe): ftp://ftp.ncbi.nih.gov/gene/DATA/ Both have documentation detailing formats. I would recommend using one of the above two on a local database setup if you plan on converting a large number of accessions. Bio::DB::EUtilities can also do this but is web-based via eutils. There are a couple of stub examples in the Cookbook HOWTO under 'efetch' on converting accessions to UID (and vice versa), though note there is no one-to-one correspondence. You can also convert UIDs to accessions using 'esummary' but the converse (accession to GI) requires, strangely, using efetch to grab the UIDs first, then re- retrieving the acccessions via esummary for one-to-one correspondence. http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris On Mar 29, 2008, at 11:09 AM, Jinyan Huang wrote: > Hi, > > I have a list of gene bank accession id. I want to convert these ids > to NCBI id. > > For example: > >> From NM_011917 to 2919914. > > How can I do it? > > Thanks > > > -- > Best regards, > Jinyan Huang (ekeen) > School of Life Sciences and Technology, 1302 Room > Tongji University > Siping Road 1239, Shanghai 200092 > P.R. China > Tel :0086-21-65981041 > Msn: hiekeen at hotmail.com > eMail: hiekeen at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From pradel.d at gmail.com Mon Mar 31 10:35:26 2008 From: pradel.d at gmail.com (Damien Pradel) Date: Mon, 31 Mar 2008 16:35:26 +0200 Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection Message-ID: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com> Hello, I use the SeqIO module in order to parse EMBL files. Unfortunately I got a problem: the ID was not recognised because instead of the ID value I get the answer "unknown_id" ... So to solve this problem I have modified the file embl.pm located in directory SeqIO at the line 189 as follow : if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) { ($name,$mol,$div) = ($1,$2,$3); } unless( defined $name && length($name) ) { $name = "unknown_id"; } in : if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) { ($name,$mol,$div) = ($1,$2,$3); } unless( defined $name && length($name) ) { $name = "unknown_id"; } With this modification, the ID value is correctly collected. Hope it will help. Damien From golharam at umdnj.edu Mon Mar 31 15:31:56 2008 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 31 Mar 2008 15:31:56 -0400 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module Message-ID: <47F13C2C.4070909@umdnj.edu> I have a (very) basic SAX implementation of a SeqIO module to parse GenBank XML records. Right now, it only reads in basic information regarding the sequence and the sequence itself. It does not yet parse the features table. Should I submit it to be included in bioperl or wait until I implement more for the features table? I'm not sure when I'll get around to it though Ryan From cjfields at uiuc.edu Mon Mar 31 16:05:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 31 Mar 2008 15:05:51 -0500 Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module In-Reply-To: <47F13C2C.4070909@umdnj.edu> References: <47F13C2C.4070909@umdnj.edu> Message-ID: <4A3D5CD8-13D7-4CBF-B89A-CE81B8804C61@uiuc.edu> You can submit it either to me directly or to bugzilla (start a new bug report as an enhancement request, then attach the relevant files). Does it have a test suite available? If not, you should try setting one up: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests chris On Mar 31, 2008, at 2:31 PM, Ryan Golhar wrote: > I have a (very) basic SAX implementation of a SeqIO module to parse > GenBank XML records. Right now, it only reads in basic information > regarding the sequence and the sequence itself. > > It does not yet parse the features table. Should I submit it to be > included in bioperl or wait until I implement more for the features > table? I'm not sure when I'll get around to it though > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Mar 31 19:58:44 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 31 Mar 2008 18:58:44 -0500 Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection In-Reply-To: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com> References: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com> Message-ID: <629B8FAF-3A1F-41E2-BFF1-A709DDE56A09@uiuc.edu> The parser no longer has this line; it has been updated to work with both old and new format EMBL. You might want to try updating from Subversion or install the nightly build. http://bioperl.org/DIST/nightly_builds/ chris On Mar 31, 2008, at 9:35 AM, Damien Pradel wrote: > Hello, > > I use the SeqIO module in order to parse EMBL files. > Unfortunately I got a problem: the ID was not recognised because > instead of > the ID value I get the answer "unknown_id" ... > > So to solve this problem I have modified the file embl.pm located in > directory SeqIO at the line 189 as follow : > > if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) { > ($name,$mol,$div) = ($1,$2,$3); > } > unless( defined $name && length($name) ) { > $name = "unknown_id"; > } > > in : > if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) { > ($name,$mol,$div) = ($1,$2,$3); > } > unless( defined $name && length($name) ) { > $name = "unknown_id"; > } > > With this modification, the ID value is correctly collected. > > Hope it will help. > > Damien > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dfog22 at hotmail.com Wed Mar 26 10:13:25 2008 From: dfog22 at hotmail.com (MathGon) Date: Wed, 26 Mar 2008 07:13:25 -0700 (PDT) Subject: [Bioperl-l] File concatenation Message-ID: <16301515.post@talk.nabble.com> For my first post, I will introduce myself. I'm a PhD student in microbiology focusing in horizontal gene transfer in hyperthermophilic Archaea; I retrieve a genbank file for each contig of an unfinished genome. I want to produce a unique genbank file by concatenation. I didn't manage to find a such script and I'm not enough trained in perl to write it... Have you got an other solution or a script for me? Best regards... -- View this message in context: http://www.nabble.com/File-concatenation-tp16301515p16301515.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.