From brunovecchi at yahoo.com.ar Fri Aug 1 00:16:16 2008 From: brunovecchi at yahoo.com.ar (Bruno Vecchi) Date: Fri, 01 Aug 2008 01:16:16 -0300 Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED] Message-ID: <48928E10.7090903@yahoo.com.ar> An HTML attachment was scrubbed... URL: From Kevin.Clancy at invitrogen.com Fri Aug 1 18:30:30 2008 From: Kevin.Clancy at invitrogen.com (Clancy, Kevin) Date: Fri, 1 Aug 2008 15:30:30 -0700 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> Hi Folks I am using the windows version of Bioperl 1.5.2_100. I recently was compiling a tool using ActiveState's PerlApp which included Bioperl modules. I received an error for the Bio::SeqIO module, which was calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of the Bio::SeqIO.pm module. I don't appear to have a copy of the staden module under the Bio::SeqIO directory and it doesn't appear to be present in the current BioPerl trunk. I simply commented this out of my SeqIO.pm file to perform my build and its all running normally. Was this simply a reference to a non existent module or am I missing something? Thank you for your help. kevin Kevin Clancy, PhD Senior Scientist, Informatic Sciences Invitrogen Corp Carlsbad, CA 92008 Phone: (768) 268 8356 Email: kevin.clancy at invitrogen.com From jason at bioperl.org Sat Aug 2 08:58:05 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Aug 2008 07:58:05 -0500 Subject: [Bioperl-l] Inframe stop codon In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com> References: <516747.39380.qm@web36405.mail.mud.yahoo.com> Message-ID: [regarding PAML analyses] You would need to translate the cDNA sequence and identify where the stop codon is, then remove that codon or remove that sequence from your bulk analyses. it depends on why you think the stop codon is in the sequence - mis-annotation, this is a pseudogene, or what? If this is a small percentage of a lot of sequences I would probably just skip these, if this is the terminal stop codon that being included in the sequences, you just need to remove the last codon from the sequences before providing it to PAML. There Seq HOWTO has many examples of how to manipulate a sequence object with substr, trunc, as well as just the simple seq() method that gives you the sequence as a string, which you can manipulate, then update the sequence object afterwards. As in my $str = $seq->seq; # remove the last codon from this cDNA sequence substr($str, -3, 3,''); $seq->seq($str); Alternatively you can use trunc to truncate the sequence my $trunc = $seq->trunc(1,$seq->length -3); $seq = $trunc; You can translate the sequence with the $seq->translate command, then test for presence of a stop codon (This is exactly the code that is running in the pairwise_kaks script that is in the scripts/utilities/ directory). If you have a stop codon you need to figure out where it is at the end of the sequence or not. If it is the terminal codon, you can just lop off the last codon on all your sequences, but if it is internal, you need to decide what you want to do with this sequence. If there are multiple stop codons, I am not sure it is appropriate to run PAML here, unless you are interested in some sort of pseudo-rate calculation that has many of the codons omitted. Otherwise you may just want to calculate a DNA substitution rate for the sequences to make comparison. I suggest working a single file by hand to get the appropriate steps down and then coding it up will be easier. I am sure folks on the list can help too so it is important to post to the mailing list - I don't see any messages from you on the list about this query. -jason On Aug 2, 2008, at 5:42 AM, Tannistha wrote: > > Hi Jason, > > Please suggest me how to filter the inframe stop codons, > aa_to_dna_aln returns the sequence with in-frame stop codons. > I have posted my query along with the input files to the forum. > > Thanks for your earlier advice, runmode =0 is working for me. > > Look forward to your reply > > Best Regards > Tannistha > > > Dr. Tannistha Nandi > email: tannistha3 at yahoo.com > > > From David.Messina at sbc.su.se Sun Aug 3 15:10:18 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 3 Aug 2008 21:10:18 +0200 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> References: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> Hi Kevin, The staden module is a oddball one, to be sure. A search on the BioPerl website turns up this FAQ entry: http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F Also the Windows install page http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows says: > Some external programs such as Staden and > the EMBOSS suite of programs can only > be installed on Windows by using Cygwin and its gcc > C compiler (see Bioperl in Cygwin, below) > In any case, the staden module (and associated external libraries) is used only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr binary formats. So your edit shouldn't cause you any problems otherwise. Dave From cjfields at uiuc.edu Sun Aug 3 16:20:52 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Aug 2008 15:20:52 -0500 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> References: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> Message-ID: This seems to be a problem with PerlApp and eval{}; judging by a quick Google search this isn't the only module affected. The line in question is wrapped in an eval{} to check for the availability of Bio::SeqIO::staden::read (but not die on it). BTW, the eval was moved into the relevant plugin modules post-1.5.2, so the eval{} is checked when the module is loaded dynamically (i.e. when a format requiring it is passed in). It was causing other issues with ActivePerl installations and was redundant, so it was removed. http://bugzilla.open-bio.org/show_bug.cgi?id=2295 chris On Aug 3, 2008, at 2:10 PM, Dave Messina wrote: > Hi Kevin, > > The staden module is a oddball one, to be sure. > > A search on the BioPerl website turns up this FAQ entry: > http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F > > Also the Windows install page > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > says: > >> Some external programs such as Staden > > and >> the EMBOSS suite of programs >> can only >> be installed on Windows by using Cygwin >> and its gcc >> C compiler (see Bioperl in Cygwin, below) >> > > > In any case, the staden module (and associated external libraries) > is used > only if you are trying to read the scf, abi, alf, pln, exp, ctf, or > ztr > binary formats. So your edit shouldn't cause you any problems > otherwise. > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From btemperton at googlemail.com Sat Aug 2 16:05:37 2008 From: btemperton at googlemail.com (Benbo) Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT) Subject: [Bioperl-l] Finding possible primers regex Message-ID: <18792782.post@talk.nabble.com> Hi there, I'm trying to write a perl script to scan an aligned multiple entry fasta file and find possible primers. So far I've produced a string which contains bases which match all sequences and * where they don't match e.g. 1) TTAGCCTAA 2) TTAGCAGAA 3) TTACCCTAA would give TTA*C**AA. I want to parse this string and pull out all sequences which are 18-21 bp in length and have no more than 4 * in them. So far, I've got this: while($fragment_match =~ /([GTAC*]{18,21})/g){ print "$1\n"; } hoping to match all fragments 18-21 characters in length. However even that doesn't work as it has essentially chunked it into 21 char blocks, rather than what I hoped for of 0-18 0-19 0-20 0-21 1-19 1-20 1-21 1-22 etc. Can anyone let me know if this is already possible in BioPerl, or how one would go about it with regex. Sadly I'm fairly new to perl and getting to grips with BioPerl, so please treat me gently :). Many thanks, Ben -- View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Aug 4 00:08:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Aug 2008 23:08:51 -0500 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <18792782.post@talk.nabble.com> References: <18792782.post@talk.nabble.com> Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> On Aug 2, 2008, at 3:05 PM, Benbo wrote: > > Hi there, > I'm trying to write a perl script to scan an aligned multiple entry > fasta > file and find possible primers. So far I've produced a string which > contains > bases which match all sequences and * where they don't match e.g. > 1) TTAGCCTAA > 2) TTAGCAGAA > 3) TTACCCTAA > > would give TTA*C**AA. > > I want to parse this string and pull out all sequences which are > 18-21 bp in > length and have no more than 4 * in them. > > So far, I've got this: > > while($fragment_match =~ /([GTAC*]{18,21})/g){ > print "$1\n"; > } > > hoping to match all fragments 18-21 characters in length. However > even that > doesn't work as it has essentially chunked it into 21 char blocks, > rather > than what I hoped for of > 0-18 > 0-19 > 0-20 > 0-21 > 1-19 > 1-20 > 1-21 > 1-22 > > etc. > > Can anyone let me know if this is already possible in BioPerl, or > how one > would go about it with regex. Sadly I'm fairly new to perl and > getting to > grips with BioPerl, so please treat me gently :). > > Many thanks, > > Ben There is a trick to this which is discussed more extensively in 'Mastering Regular Expressions'. Essentially you have to embed code into the regex and trick the parser into backtracking using a negative lookahead. The match itself fails (i.e. no match is returned), but the embedded code is executed for each match attempt, The following script is a slight modification of one I used which checks the consensus string from the input alignment (in aligned FASTA format here), extracts the alignment slice using that match, then spit the alignment out to STDOUT in clustalw format. This should work for perl 5.8 and up, but it's only been tested on perl 5.10. You should be able to use this to fit what you want. my $in = Bio::AlignIO->new(-file => $file, -format => 'fasta'); my $out = Bio::AlignIO->new(-fh => \*STDOUT, -format => 'clustalw'); while (my $aln = $in->next_aln) { my $c = $aln->consensus_string(100); my @matches; $c =~ m/ ([GTAC?]{18,21}) (?{my $match = check_match($1); push @matches, [$match, pos(), length($match)] if defined $match;}) (?!) /xig; for my $match (@matches) { my ($hit, $st, $end) = ($match->[0], $match->[1] - $match->[2] + 1, $match->[1]); my $newaln = $aln->slice($st, $end); $out->write_aln($newaln); } } sub check_match { my $match = shift; return unless $match; my $ct = $match =~ tr/?/?/; return $match if $ct <= 4; } chris From heikki at sanbi.ac.za Mon Aug 4 02:42:57 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 4 Aug 2008 08:42:57 +0200 Subject: [Bioperl-l] Bio::Coordinate::Pair In-Reply-To: References: Message-ID: <200808040842.57466.heikki@sanbi.ac.za> Prashanth, Your example coordinates do not do the conversion but more or less report the locations of your features in some third coordinates. The way to think coordinates pairs is to use them as HSPs. You tell the pair object what is the matching segment in the pair of sequences. The synopsis in Bio::Coordinate::Pair class file gives the following example: use Bio::Location::Simple; use Bio::Coordinate::Pair; my $match1 = Bio::Location::Simple->new (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 ); my $match2 = Bio::Location::Simple->new (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 ); my $pair = Bio::Coordinate::Pair->new(-in => $match1, -out => $match2 ); # location to match $pos = Bio::Location::Simple->new (-start => 25, -end => 25, -strand=> -1 ); $res = $pair->map($pos); print $res->match->start; # 5 In other words, region 25-40 in the propeptide matches locations 1-20 in the final peptide. Therefore conversion from 25 gives 5: signalp 21 25 40 --------------------|---|--------------| 1 5 pep 20 I hope this clarifies it. The advantage of using these objects over manual conversion is that the code has been debugged (no all too easy +/-1 errors) and that they can be chained together. Yours, -Heikki On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote: > Dear Professor Lehvaslaiho: > > I had a quick question about the module- Bio::Coordinate::Pair > > The BioPerl tutorial has the following example: > > $input_coordinates = Bio::Location::Simple->new > (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 ); > > $output_coordinates = Bio::Location::Simple->new > (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 ); > > $pair = Bio::Coordinate::Pair->new > (-in => $input_coordinates , -out => $output_coordinates ); > > $pos = Bio::Location::Simple->new (-start => 500, -end => 500 ); > > $res = $pair->map($pos); > $converted_start = $res->start; > > The way I understand it, $converted_start should return ?1600?. But when I > run this snippet, it returns ?500?. Could you please let me know how > $pair->map($pos) is processed? > > I appreciate your time and thanks in advance. > > Regards, > Prashanth -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From lengjingmao at gmail.com Tue Aug 5 03:36:23 2008 From: lengjingmao at gmail.com (Shaohua Fan) Date: Tue, 5 Aug 2008 15:36:23 +0800 Subject: [Bioperl-l] how to remove indentical sequences from a dataset References: <18792782.post@talk.nabble.com> Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> Hi, there , I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? thanks a lot. yours, shaohua ----- Original Message ----- From: "Benbo" To: Sent: Sunday, August 03, 2008 4:05 AM Subject: [Bioperl-l] Finding possible primers regex > > Hi there, > I'm trying to write a perl script to scan an aligned multiple entry fasta > file and find possible primers. So far I've produced a string which contains > bases which match all sequences and * where they don't match e.g. > 1) TTAGCCTAA > 2) TTAGCAGAA > 3) TTACCCTAA > > would give TTA*C**AA. > > I want to parse this string and pull out all sequences which are 18-21 bp in > length and have no more than 4 * in them. > > So far, I've got this: > > while($fragment_match =~ /([GTAC*]{18,21})/g){ > print "$1\n"; > } > > hoping to match all fragments 18-21 characters in length. However even that > doesn't work as it has essentially chunked it into 21 char blocks, rather > than what I hoped for of > 0-18 > 0-19 > 0-20 > 0-21 > 1-19 > 1-20 > 1-21 > 1-22 > > etc. > > Can anyone let me know if this is already possible in BioPerl, or how one > would go about it with regex. Sadly I'm fairly new to perl and getting to > grips with BioPerl, so please treat me gently :). > > Many thanks, > > Ben > > > > -- > View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Aug 5 05:49:55 2008 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 5 Aug 2008 11:49:55 +0200 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> Hi, There is a BioPerl Utility script doing this. See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header. " scripts/utilities/bp_nrdb.PLS Make a non-redundant database based on sequence, not id. Requires Digest::MD5." Alternatively, you can make a hash using the sequences as keys. Regards, Bernd On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan wrote: > Hi, there , > > I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? > > thanks a lot. > yours, > shaohua > ----- Original Message ----- > From: "Benbo" > To: > Sent: Sunday, August 03, 2008 4:05 AM > Subject: [Bioperl-l] Finding possible primers regex > > >> >> Hi there, >> I'm trying to write a perl script to scan an aligned multiple entry fasta >> file and find possible primers. So far I've produced a string which contains >> bases which match all sequences and * where they don't match e.g. >> 1) TTAGCCTAA >> 2) TTAGCAGAA >> 3) TTACCCTAA >> >> would give TTA*C**AA. >> >> I want to parse this string and pull out all sequences which are 18-21 bp in >> length and have no more than 4 * in them. >> >> So far, I've got this: >> >> while($fragment_match =~ /([GTAC*]{18,21})/g){ >> print "$1\n"; >> } >> >> hoping to match all fragments 18-21 characters in length. However even that >> doesn't work as it has essentially chunked it into 21 char blocks, rather >> than what I hoped for of >> 0-18 >> 0-19 >> 0-20 >> 0-21 >> 1-19 >> 1-20 >> 1-21 >> 1-22 >> >> etc. >> >> Can anyone let me know if this is already possible in BioPerl, or how one >> would go about it with regex. Sadly I'm fairly new to perl and getting to >> grips with BioPerl, so please treat me gently :). >> >> Many thanks, >> >> Ben >> >> >> >> -- >> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From diriano at uni-potsdam.de Tue Aug 5 06:28:58 2008 From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon) Date: Tue, 05 Aug 2008 12:28:58 +0200 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> Message-ID: <48982B6A.4050304@uni-potsdam.de> Hi all, Or you might try a non-bioperl solution that works pretty well, check: http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 Best, Diego Bernd Web wrote: > Hi, > > There is a BioPerl Utility script doing this. > See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header. > > " scripts/utilities/bp_nrdb.PLS > Make a non-redundant database based on sequence, not id. Requires > Digest::MD5." > > Alternatively, you can make a hash using the sequences as keys. > > > Regards, > Bernd > > On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan wrote: >> Hi, there , >> >> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? >> >> thanks a lot. >> yours, >> shaohua >> ----- Original Message ----- >> From: "Benbo" >> To: >> Sent: Sunday, August 03, 2008 4:05 AM >> Subject: [Bioperl-l] Finding possible primers regex >> >> >>> Hi there, >>> I'm trying to write a perl script to scan an aligned multiple entry fasta >>> file and find possible primers. So far I've produced a string which contains >>> bases which match all sequences and * where they don't match e.g. >>> 1) TTAGCCTAA >>> 2) TTAGCAGAA >>> 3) TTACCCTAA >>> >>> would give TTA*C**AA. >>> >>> I want to parse this string and pull out all sequences which are 18-21 bp in >>> length and have no more than 4 * in them. >>> >>> So far, I've got this: >>> >>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>> print "$1\n"; >>> } >>> >>> hoping to match all fragments 18-21 characters in length. However even that >>> doesn't work as it has essentially chunked it into 21 char blocks, rather >>> than what I hoped for of >>> 0-18 >>> 0-19 >>> 0-20 >>> 0-21 >>> 1-19 >>> 1-20 >>> 1-21 >>> 1-22 >>> >>> etc. >>> >>> Can anyone let me know if this is already possible in BioPerl, or how one >>> would go about it with regex. Sadly I'm fairly new to perl and getting to >>> grips with BioPerl, so please treat me gently :). >>> >>> Many thanks, >>> >>> Ben >>> >>> >>> >>> -- >>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ___________________________________ Diego Mauricio Ria?o Pach?n Biologist - PhD student AG Mueller-Roeber Institute for Biochemistry and Biology University of Potsdam Address: Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel: +49 331 977 2809 Fax: +49 331 977 2512 web: http://www.geocities.com/dmrp.geo From cjfields at uiuc.edu Tue Aug 5 11:19:54 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Aug 2008 10:19:54 -0500 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <48982B6A.4050304@uni-potsdam.de> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Aug 5 11:19:54 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Aug 2008 10:19:54 -0500 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <48982B6A.4050304@uni-potsdam.de> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From lengjingmao at gmail.com Tue Aug 5 11:24:22 2008 From: lengjingmao at gmail.com (Shaohua Fan) Date: Tue, 5 Aug 2008 23:24:22 +0800 Subject: [Bioperl-l] how to remove indentical sequences from a dataset References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F> hi, thanks a lot for the help! cheers, shaohua ----- Original Message ----- From: "Chris Fields" To: "Diego Mauricio Riano Pachon" Cc: "Bernd Web" ; ; "Shaohua Fan" Sent: Tuesday, August 05, 2008 11:19 PM Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From martin.senger at gmail.com Tue Aug 5 22:53:07 2008 From: martin.senger at gmail.com (Martin Senger) Date: Wed, 6 Aug 2008 10:53:07 +0800 Subject: [Bioperl-l] Bio::Biblio doesn't find articles Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com> I am afraid that the server that serves the MEDLINE database to the Bio::Biblio module (using the SOAP protocol), and that is running at EBI, may be not fully supported. I am not working at EBI anymore and I have stopped to monitor their servers. I am still their collaborator - but I am not, unfortunately, involved in the MEDLINE tools anymore. I would be happy to continue to maintain the Bio::Biblio module but it relies on a server that I do not anymore control. Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,m.senger at cgiar.org skype: martinsenger From Russell.Smithies at agresearch.co.nz Wed Aug 6 17:20:04 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 7 Aug 2008 09:20:04 +1200 Subject: [Bioperl-l] not BioPerl Message-ID: Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit? Unfortunately, I can't even get their examples working as I'm behind a firewall and documentation on setting proxy stuff is virtually non-existant :-( Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 6 17:33:27 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 Aug 2008 16:33:27 -0500 Subject: [Bioperl-l] not BioPerl In-Reply-To: References: Message-ID: Looks like they're binary releases for 32- and 64-bit linux (quite large, at 25 MB). Would be nice to have the C++ bindings for other OS's (my guess is this was set up via swig). I have access to a linux cluster, so I may give this a try soon. chris On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ > Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From vinaykmittal at gatech.edu Wed Aug 6 16:56:22 2008 From: vinaykmittal at gatech.edu (Mittal, Vinay K) Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT) Subject: [Bioperl-l] Error installing Biopel Module Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu> Hi, I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors: ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core I don't know what the problem is. I have never used Bioperl Modules before. Thanks. -- -------- Vinay Kumar Mittal MS,Bioinformatics Georgia Institute of Technology From rfrancis at ichr.uwa.edu.au Wed Aug 6 21:11:28 2008 From: rfrancis at ichr.uwa.edu.au (Richard Francis) Date: Thu, 07 Aug 2008 09:11:28 +0800 Subject: [Bioperl-l] AlignIO::clustalw match_line query Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> Dear List, I wonder if you can help. I?m having trouble finding out on which criteria the conserved and semi-conserved substitution decisions for a match line produced from the match_line function in AlignIO are based. I note that match_line produces the same output as an alignment match line would from ClustalW and indeed is used in the AlignIO::clustalw module, but are the substitution decisions based on the same Venn diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color ie are they faithful to the generation of the match line from within ClustalW itself? I need to know this as part of a paper I?m writing so I would really appreciate your help with this. Kind regards and thanks in advance, Richard Francis ##################################################################################### This e-mail message has been scanned for Viruses and Content and cleared by MailMarshal ##################################################################################### From jason at bioperl.org Wed Aug 6 22:26:06 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Aug 2008 19:26:06 -0700 Subject: [Bioperl-l] AlignIO::clustalw match_line query In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> Message-ID: Implemented independently, but it was based on what the clustalw documentation says. The main code is in the match_line function in Bio::SimpleAlign. See the CONSERVATION_GROUPS Hash which looks like this: So a 'strong' (":") on the match line would be coded where the residues seen in a column are only 'S', 'T', or 'A' (for example). 'strong' => [ qw( STA NEQK NHQK NDEQ QHRK MILV MILF HY FYW )], 'weak' => [ qw( CSA ATV SAG STNK STPA SGND SNDEQK NDEQHK NEQHRK FVLIM HFY )],); } It was checked against clustalw output by hand when it was implemented. If you know of any inconsistencies, let use know. -jason On Aug 6, 2008, at 6:11 PM, Richard Francis wrote: > Dear List, > > I wonder if you can help. > > I?m having trouble finding out on which criteria the conserved and > semi-conserved substitution decisions for a match line produced > from the > match_line function in AlignIO are based. > > I note that match_line produces the same output as an alignment match > line would from ClustalW and indeed is used in the AlignIO::clustalw > module, but are the substitution decisions based on the same Venn > diagram at http://www.ebi.ac.uk/Tools/clustalw2/ > clustalw_help.html#color > ie are they faithful to the generation of the match line from within > ClustalW itself? > > I need to know this as part of a paper I?m writing so I would really > appreciate your help with this. > > Kind regards and thanks in advance, > > Richard Francis > ###################################################################### > ############### > This e-mail message has been scanned for Viruses and Content and > cleared > by MailMarshal > ###################################################################### > ############### > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From betts at embl.de Thu Aug 7 08:42:59 2008 From: betts at embl.de (Matthew Betts) Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST) Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure Message-ID: Hi, Has any one tried to draw secondary structure with Bio::Graphics? i.e. two different types of glyph with different colours on the same track. Could use a hash reference to get the different glyph types (would be nice if there was a cylinder glyph and a thick arrow glyph), or heterogeneous segments to get the different colours, but I can't see how to do both at the same time. Any example code or suggestions on how I could implement it would be great. Thanks, Matthew -- Matthew Betts PhD, Russell Group (Structural Bioinformatics) EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany phone: +49 (0)6221 387 8305; mailto:betts at embl.de From cain.cshl at gmail.com Thu Aug 7 10:08:39 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 7 Aug 2008 10:08:39 -0400 Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure In-Reply-To: References: Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> Hi Matthew, I don't have any code examples, but people have used GBrowse for protein secondary structure, which uses Bio::Graphics underneath the hood. If you want to put more than one glyph and/or more than one color in a track, it is fairly easy. You just need to provide a callback for each option when you create the track, like this: $panel->add_track($features_array_ref, -glyph => sub { #code to set the glyph according the attributes of the feature }, -bgcolor => sub { #code to set the color }, -fgcolor => 'black', ...etc... ); For more information, see the biographics howto: http://www.bioperl.org/wiki/HOWTO:Graphics Scott On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts wrote: > > Hi, > > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two > different types of glyph with different colours on the same track. > > Could use a hash reference to get the different glyph types (would be nice > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous > segments to get the different colours, but I can't see how to do both at > the same time. > > Any example code or suggestions on how I could implement it would be > great. > > Thanks, > > Matthew > > -- > Matthew Betts PhD, Russell Group (Structural Bioinformatics) > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany > phone: +49 (0)6221 387 8305; mailto:betts at embl.de > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From betts at embl.de Thu Aug 7 12:27:28 2008 From: betts at embl.de (Matthew Betts) Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST) Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> References: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> Message-ID: Hi Scott, Thanks for that, was a great help - I didn't realise I could use a code ref for anything other than the glyph name. I'm now doing this: $panel->add_track( '-bgcolor' => sub { my($feature) = @_; $feature->display_name eq 'strand' ? 'cyan' : 'magenta'; }, '-strand_arrow' => sub { my($feature) = @_; $feature->display_name eq 'strand' ? 1 : 0; }, ); Matthew On Thu, 7 Aug 2008, Scott Cain wrote: > Hi Matthew, > > I don't have any code examples, but people have used GBrowse for > protein secondary structure, which uses Bio::Graphics underneath the > hood. > > If you want to put more than one glyph and/or more than one color in a > track, it is fairly easy. You just need to provide a callback for > each option when you create the track, like this: > > $panel->add_track($features_array_ref, > -glyph => sub { #code to set the glyph > according the attributes of the feature }, > -bgcolor => sub { #code to set the color }, > -fgcolor => 'black', > ...etc... > ); > > For more information, see the biographics howto: > > http://www.bioperl.org/wiki/HOWTO:Graphics > > Scott > > > > On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts wrote: > > > > Hi, > > > > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two > > different types of glyph with different colours on the same track. > > > > Could use a hash reference to get the different glyph types (would be nice > > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous > > segments to get the different colours, but I can't see how to do both at > > the same time. > > > > Any example code or suggestions on how I could implement it would be > > great. > > > > Thanks, > > > > Matthew > > > > -- > > Matthew Betts PhD, Russell Group (Structural Bioinformatics) > > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany > > phone: +49 (0)6221 387 8305; mailto:betts at embl.de > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From jay at jays.net Thu Aug 7 12:32:29 2008 From: jay at jays.net (Jay Hannah) Date: Thu, 07 Aug 2008 11:32:29 -0500 Subject: [Bioperl-l] not BioPerl In-Reply-To: References: Message-ID: <489B239D.8060305@jays.net> Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > Do people actually use the NCBI C++ Toolkit for things outside of NCBI? What? I tried to leverage it a year or so ago for an Entrez/sequence/search project and got nowhere. j From jcherry at ncbi.nlm.nih.gov Thu Aug 7 13:06:28 2008 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT) Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) Message-ID: For those who may be wondering what this is about, a Perl interface to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. The C++ Toolkit is the main code base that we develop and use at NCBI. It includes many things that may be of interest to BioPerl users, such as sequence analysis algorithms, means for interacting with NCBI databases, and facilities for reading, writing, and manipulating NCBI data model objects (usually defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable from ASN.1 and XML). Russell, I think you can make things work from behind a firewall by setting some environment variables: set CONN_FIREWALL to 1, possibly set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT as appropriate. Please email me if you can't get things to work. I'll see that decent instructions for this are included in the next release. Josh Cherry On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ > Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > > > Russell Smithies From tristan.lefebure at gmail.com Thu Aug 7 13:35:24 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 7 Aug 2008 13:35:24 -0400 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees Message-ID: <200808071335.24668.tristan.lefebure@gmail.com> Hi list, I'm using a script very similar to bp_taxonomy2tree.pl distributed with BioPerl (with the only difference that I'm using taxids instead of taxon names). Basically, the script generates a taxonomic tree given a list of taxids using the NCBI taxonomy db. For each taxon, it generates a taxon object, and then merge this object to a tree object that keeps growing. It runs very well with a small number of taxa, but with many taxa (>1000), it is very very very slow (about a week for 3000 taxa). The slowness is due to the function merge_lineage (line 65), which merges the existing tree object with a new taxon object. I guess that the difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes in common between the tree and the new taxon object... Would you have any idea of how to get around the problem? Should I look under the hood of merge_lineage to try to improve it for large trees? Thanks! Version: bioperl-1.5.2_102 OS: GNU/Linux -Tristan From cjfields at illinois.edu Thu Aug 7 13:38:53 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Aug 2008 12:38:53 -0500 Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) In-Reply-To: References: Message-ID: Josh, Thanks for the update. I saw that these are only binaries for linux 32/64-bit. Are there plans to either support other OS's (OS X, Win, etc) or to maybe make a release with the XS-bindings so users can work towards that? With additional support I can see this easily fitting into several spots in BioPerl, but otherwise I'm unsure. chris On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote: > For those who may be wondering what this is about, a Perl interface > to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ > . The C++ Toolkit is the main code base that we develop and use at > NCBI. It includes many things that may be of interest to BioPerl > users, such as sequence analysis algorithms, means for interacting > with NCBI databases, and facilities for reading, writing, and > manipulating NCBI data model objects (usually defined by ASN.1 > specifications; writeable as ASN.1, XML, and JSON, and readable from > ASN.1 and XML). > > Russell, I think you can make things work from behind a firewall by > setting some environment variables: set CONN_FIREWALL to 1, possibly > set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and > CONN_HTTP_PROXY_PORT as appropriate. Please email me if you can't > get things to work. I'll see that decent instructions for this are > included in the next release. > > Josh Cherry > > > On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > >> Has anyone taken a look at the new Perl interface to the NCBI C++ >> Toolkit? >> Unfortunately, I can't even get their examples working as I'm >> behind a >> firewall and documentation on setting proxy stuff is virtually >> non-existant :-( >> >> >> Russell Smithies > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jcherry at ncbi.nlm.nih.gov Thu Aug 7 14:04:17 2008 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT) Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) In-Reply-To: References: Message-ID: Chris, Support for other OS's is definitely a possibility, depending on community feedback (how useful are the wrappers in general, and how much demand is there for them on other platforms?). I wish I could magically make them available for Windows and OS X, but there are some technical issues to work out. Josh On Thu, 7 Aug 2008, Chris Fields wrote: > Josh, > > Thanks for the update. I saw that these are only binaries for linux > 32/64-bit. Are there plans to either support other OS's (OS X, Win, etc) or > to maybe make a release with the XS-bindings so users can work towards that? > With additional support I can see this easily fitting into several spots in > BioPerl, but otherwise I'm unsure. > > chris > > On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote: > >> For those who may be wondering what this is about, a Perl interface to the >> NCBI C++ Toolkit is available at >> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. The C++ Toolkit is >> the main code base that we develop and use at NCBI. It includes many >> things that may be of interest to BioPerl users, such as sequence analysis >> algorithms, means for interacting with NCBI databases, and facilities for >> reading, writing, and manipulating NCBI data model objects (usually defined >> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable >> from ASN.1 and XML). >> >> Russell, I think you can make things work from behind a firewall by setting >> some environment variables: set CONN_FIREWALL to 1, possibly set >> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT >> as appropriate. Please email me if you can't get things to work. I'll see >> that decent instructions for this are included in the next release. >> >> Josh Cherry >> >> >> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: >> >>> Has anyone taken a look at the new Perl interface to the NCBI C++ >>> Toolkit? >>> Unfortunately, I can't even get their examples working as I'm behind a >>> firewall and documentation on setting proxy stuff is virtually >>> non-existant :-( >>> >>> >>> Russell Smithies >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > From bix at sendu.me.uk Thu Aug 7 18:20:29 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Aug 2008 23:20:29 +0100 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com> References: <200808071335.24668.tristan.lefebure@gmail.com> Message-ID: <489B752D.2080209@sendu.me.uk> Tristan Lefebure wrote: > I'm using a script very similar to bp_taxonomy2tree.pl distributed with > BioPerl (with the only difference that I'm using taxids instead of taxon > names). Basically, the script generates a taxonomic tree given a list of > taxids using the NCBI taxonomy db. For each taxon, it generates a taxon > object, and then merge this object to a tree object that keeps growing. It > runs very well with a small number of taxa, but with many taxa (>1000), it is > very very very slow (about a week for 3000 taxa). > > The slowness is due to the function merge_lineage (line 65), which merges the > existing tree object with a new taxon object. I guess that the difficulty > with a big tree (i.e. more than 1000 leaf) is to find the nodes in common > between the tree and the new taxon object... > > Would you have any idea of how to get around the problem? Should I look under > the hood of merge_lineage to try to improve it for large trees? Yes, please do. It might have been me that wrote that, in which case I didn't do anything fancy or consider the above problem. From cjfields at illinois.edu Thu Aug 7 20:42:16 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Aug 2008 19:42:16 -0500 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <489B752D.2080209@sendu.me.uk> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: > Tristan Lefebure wrote: >> I'm using a script very similar to bp_taxonomy2tree.pl distributed >> with BioPerl (with the only difference that I'm using taxids >> instead of taxon names). Basically, the script generates a >> taxonomic tree given a list of taxids using the NCBI taxonomy db. >> For each taxon, it generates a taxon object, and then merge this >> object to a tree object that keeps growing. It runs very well with >> a small number of taxa, but with many taxa (>1000), it is very very >> very slow (about a week for 3000 taxa). >> The slowness is due to the function merge_lineage (line 65), which >> merges the existing tree object with a new taxon object. I guess >> that the difficulty with a big tree (i.e. more than 1000 leaf) is >> to find the nodes in common between the tree and the new taxon >> object... >> Would you have any idea of how to get around the problem? Should I >> look under the hood of merge_lineage to try to improve it for large >> trees? > > Yes, please do. It might have been me that wrote that, in which case > I didn't do anything fancy or consider the above problem. Actually I thought that was fixed; wasn't some caching added for that script at one point? chris From bix at sendu.me.uk Fri Aug 8 03:50:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 08 Aug 2008 08:50:50 +0100 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> Message-ID: <489BFADA.1060308@sendu.me.uk> Chris Fields wrote: > > On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> I'm using a script very similar to bp_taxonomy2tree.pl distributed >>> with BioPerl (with the only difference that I'm using taxids instead >>> of taxon names). Basically, the script generates a taxonomic tree >>> given a list of taxids using the NCBI taxonomy db. For each taxon, it >>> generates a taxon object, and then merge this object to a tree object >>> that keeps growing. It runs very well with a small number of taxa, >>> but with many taxa (>1000), it is very very very slow (about a week >>> for 3000 taxa). >>> The slowness is due to the function merge_lineage (line 65), which >>> merges the existing tree object with a new taxon object. I guess that >>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find >>> the nodes in common between the tree and the new taxon object... >>> Would you have any idea of how to get around the problem? Should I >>> look under the hood of merge_lineage to try to improve it for large >>> trees? >> >> Yes, please do. It might have been me that wrote that, in which case I >> didn't do anything fancy or consider the above problem. > > Actually I thought that was fixed; Oh yeah. Looks like I did something related to 'speedup for merge_lineage()' on the 18th Dec 2006. Tristan, checkout Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem. From tristan.lefebure at gmail.com Fri Aug 8 12:02:32 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Fri, 8 Aug 2008 12:02:32 -0400 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <489BFADA.1060308@sendu.me.uk> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> <489BFADA.1060308@sendu.me.uk> Message-ID: Yes indeed, with the svn code it took 10 minutes (compared to one week!) Thanks, -Tristan On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala wrote: > Chris Fields wrote: > >> >> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: >> >> Tristan Lefebure wrote: >>> >>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with >>>> BioPerl (with the only difference that I'm using taxids instead of taxon >>>> names). Basically, the script generates a taxonomic tree given a list of >>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon >>>> object, and then merge this object to a tree object that keeps growing. It >>>> runs very well with a small number of taxa, but with many taxa (>1000), it >>>> is very very very slow (about a week for 3000 taxa). >>>> The slowness is due to the function merge_lineage (line 65), which >>>> merges the existing tree object with a new taxon object. I guess that the >>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes >>>> in common between the tree and the new taxon object... >>>> Would you have any idea of how to get around the problem? Should I look >>>> under the hood of merge_lineage to try to improve it for large trees? >>>> >>> >>> Yes, please do. It might have been me that wrote that, in which case I >>> didn't do anything fancy or consider the above problem. >>> >> >> Actually I thought that was fixed; >> > > Oh yeah. Looks like I did something related to 'speedup for > merge_lineage()' on the 18th Dec 2006. Tristan, checkout > Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem. > From rvos at interchange.ubc.ca Fri Aug 8 19:59:20 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Fri, 8 Aug 2008 16:59:20 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> Hi, while going through a large genbank file (ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc errors. Just for the record (I doubt this does anyone any good), I got: perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3) perl(391) malloc: *** error: can't allocate region perl(391) malloc: *** set a breakpoint in szone_error to debug Out of memory! What I was trying to do is go through the file, and only write out those seq objects that aren't human, and that have CDS features, i.e.: ################################################ #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $dir = shift @ARGV; # the directory with *.gz files my $out = shift @ARGV; # the directory to write to... mkdir $out if not -d $out; # ...which may need to be created opendir my $dirhandle, $dir or die $!; for my $archive ( readdir $dirhandle ) { next if $archive !~ /\.gz$/; my $file = $archive; $file =~ s/\.gz$//; # external call to the gunzip utility, # such that we keep the archive system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" ); # object that parses genbank files, # returns Bio::Seq objects my $reader = Bio::SeqIO->new( '-format' => 'genbank', '-file' => "${dir}/${file}" ); # object that receives Bio::Seq objects, # writes genbank files my $writer = Bio::SeqIO->new( '-format' => 'genbank', '-file' => ">${out}/${file}", ); while ( my $seq = $reader->next_seq ) { my $name = $seq->species->binomial; if ( $name ne 'Homo sapiens' ) { # search for coding sequences among the features my $HasCDS = 0; FEATURE: for my $f ( $seq->get_SeqFeatures ) { if ( $f->primary_tag eq 'CDS' ) { $HasCDS++; last FEATURE; } } # write the sequence to file if ( $HasCDS ) { $writer->write_seq( $seq ); } } } # delete the extracted, unfiltered file unlink "${dir}/${file}"; } ################################################ Okay, so it runs out of memory. Can I do something to fix that? Should I flush on either of the I/O objects after each $seq? Could there be memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them explicitly or something like that? Thanks, Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From David.Messina at sbc.su.se Sat Aug 9 07:04:04 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 9 Aug 2008 13:04:04 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> Hi Rutger, I ran your script on the same genbank file and, while I did not run out of memory, I did see what appears to be a memory leak. Even when I manually undef'd the reader and writer object every 1000 records, memory usage continued to grow. I can't quite figure out what's going on, though. If I run a different program using SeqIO (the simple sequence converter from the SeqIO HOWTO) on the same input file, I don't see this same runaway growth. Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the sequence converter holds steady at about 12MB of real memory, whereas on 5.10 it grows, albeit slowly, for as long as the program is executing. When I killed it about 20% of the way through the file, it was up to about 44MB of real memory. Anyone else have a chance to look at this? Dave From rvos at interchange.ubc.ca Sat Aug 9 07:36:20 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Sat, 9 Aug 2008 04:36:20 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Hi Dave, thanks for the reply. The memory usage is in fact much more atrocious than just 44 mb: I'm actually looping over all 36 such archives (the genbank primates), and on my macbook it steadily increase to over 1gb of memory. What seemed to be helping somewhat is to call $reader->flush and $writer->flush after each seq, at least to the extent that I make it through that one file, but last time I tried I didn't get much further: the whole terminal process died shortly after instead. I seem to vaguely recall that even if perl free()'s memory, that doesn't necessarily mean that the memory is returned to the OS for the runtime of the program - depending on the OS and perl version. What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. Rutger On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina wrote: > Hi Rutger, > I ran your script on the same genbank file and, while I did not run out of > memory, I did see what appears to be a memory leak. Even when I manually > undef'd the reader and writer object every 1000 records, memory usage > continued to grow. > > I can't quite figure out what's going on, though. > If I run a different program using SeqIO (the simple sequence converter from > the SeqIO HOWTO) on the same input file, I don't see this same runaway > growth. > Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the > sequence converter holds steady at about 12MB of real memory, whereas on > 5.10 it grows, albeit slowly, for as long as the program is executing. When > I killed it about 20% of the way through the file, it was up to about 44MB > of real memory. > Anyone else have a chance to look at this? > > Dave > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From David.Messina at sbc.su.se Sat Aug 9 08:58:56 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 9 Aug 2008 14:58:56 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> > > I seem to vaguely recall that even if perl free()'s memory that doesn't > necessarily mean that the memory is returned to the OS for the runtime of > the program I believe that's correct. > What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. Dave From cjfields at illinois.edu Sat Aug 9 09:56:19 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Aug 2008 08:56:19 -0500 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu> There is definitely a memory leak. I can confirm it on OSX 10.4/10.5 using bioperl-live. I'll try looking into it this weekend, but I can't promise when it'll be fixed; my laptop is on the fritz. chris On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >> I seem to vaguely recall that even if perl free()'s memory that >> doesn't >> necessarily mean that the memory is returned to the OS for the >> runtime of >> the program > > > I believe that's correct. > > > >> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >> > > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. > > > Dave From cjfields at illinois.edu Sat Aug 9 10:15:23 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Aug 2008 09:15:23 -0500 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> Forgot to mention, maybe we can file this as a bug? It's a pretty serious one but it should be easy to narrow down; the change had to be introduced fairly recently. chris On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >> I seem to vaguely recall that even if perl free()'s memory that >> doesn't >> necessarily mean that the memory is returned to the OS for the >> runtime of >> the program > > > I believe that's correct. > > > >> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >> > > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hlapp at gmx.net Sat Aug 9 12:00:46 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Aug 2008 12:00:46 -0400 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> This smells of circular references somewhere. I think the first point I would go looking is the species storing - does the problem go away if you turn that off? Maybe the version of weaken() is at play here? -hilmar On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: > Forgot to mention, maybe we can file this as a bug? It's a pretty > serious one but it should be easy to narrow down; the change had to > be introduced fairly recently. > > chris > > On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: > >>> >>> I seem to vaguely recall that even if perl free()'s memory that >>> doesn't >>> necessarily mean that the memory is returned to the OS for the >>> runtime of >>> the program >> >> >> I believe that's correct. >> >> >> >>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>> >> >> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >> >> >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 9 12:07:30 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Aug 2008 12:07:30 -0400 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> References: <18792782.post@talk.nabble.com> <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net> This looks like a neat trick. Do you think it's worth including as a SimpleAlign method (obviously w/o the printing to STDOUT)? I can imagine that a lot of people might appreciate it. -hilmar On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: > On Aug 2, 2008, at 3:05 PM, Benbo wrote: > >> >> Hi there, >> I'm trying to write a perl script to scan an aligned multiple entry >> fasta >> file and find possible primers. So far I've produced a string which >> contains >> bases which match all sequences and * where they don't match e.g. >> 1) TTAGCCTAA >> 2) TTAGCAGAA >> 3) TTACCCTAA >> >> would give TTA*C**AA. >> >> I want to parse this string and pull out all sequences which are >> 18-21 bp in >> length and have no more than 4 * in them. >> >> So far, I've got this: >> >> while($fragment_match =~ /([GTAC*]{18,21})/g){ >> print "$1\n"; >> } >> >> hoping to match all fragments 18-21 characters in length. However >> even that >> doesn't work as it has essentially chunked it into 21 char blocks, >> rather >> than what I hoped for of >> 0-18 >> 0-19 >> 0-20 >> 0-21 >> 1-19 >> 1-20 >> 1-21 >> 1-22 >> >> etc. >> >> Can anyone let me know if this is already possible in BioPerl, or >> how one >> would go about it with regex. Sadly I'm fairly new to perl and >> getting to >> grips with BioPerl, so please treat me gently :). >> >> Many thanks, >> >> Ben > > There is a trick to this which is discussed more extensively in > 'Mastering Regular Expressions'. Essentially you have to embed code > into the regex and trick the parser into backtracking using a > negative lookahead. The match itself fails (i.e. no match is > returned), but the embedded code is executed for each match attempt, > > The following script is a slight modification of one I used which > checks the consensus string from the input alignment (in aligned > FASTA format here), extracts the alignment slice using that match, > then spit the alignment out to STDOUT in clustalw format. This > should work for perl 5.8 and up, but it's only been tested on perl > 5.10. You should be able to use this to fit what you want. > > my $in = Bio::AlignIO->new(-file => $file, > -format => 'fasta'); > my $out = Bio::AlignIO->new(-fh => \*STDOUT, > -format => 'clustalw'); > > while (my $aln = $in->next_aln) { > my $c = $aln->consensus_string(100); > my @matches; > $c =~ m/ > ([GTAC?]{18,21}) > (?{my $match = check_match($1); > push @matches, [$match, > pos(), > length($match)] > if defined $match;}) > (?!) > /xig; > for my $match (@matches) { > my ($hit, $st, $end) = ($match->[0], > $match->[1] - $match->[2] + 1, > $match->[1]); > my $newaln = $aln->slice($st, $end); > $out->write_aln($newaln); > } > } > > sub check_match { > my $match = shift; > return unless $match; > my $ct = $match =~ tr/?/?/; > return $match if $ct <= 4; > } > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rvos at interchange.ubc.ca Sat Aug 9 13:47:33 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Sat, 9 Aug 2008 10:47:33 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com> I am sure my version of weaken() works as advertised. Is there a way to turn off species storing from outside the code base or do you mean I go and start commenting bits out in Bio::SeqIO::genbank (or wherever)? On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp wrote: > This smells of circular references somewhere. I think the first point I > would go looking is the species storing - does the problem go away if you > turn that off? Maybe the version of weaken() is at play here? > > -hilmar > > On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: > >> Forgot to mention, maybe we can file this as a bug? It's a pretty serious >> one but it should be easy to narrow down; the change had to be introduced >> fairly recently. >> >> chris >> >> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >>>> >>>> I seem to vaguely recall that even if perl free()'s memory that doesn't >>>> necessarily mean that the memory is returned to the OS for the runtime >>>> of >>>> the program >>> >>> >>> I believe that's correct. >>> >>> >>> >>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>>> >>> >>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >>> >>> >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From hartzell at alerce.com Sat Aug 9 14:33:51 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat, 9 Aug 2008 11:33:51 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Message-ID: <18589.58127.57270.352974@almost.alerce.com> I'm pretty sure that this fixes the problem: g. Index: Bio/Species.pm =================================================================== --- Bio/Species.pm (revision 14791) +++ Bio/Species.pm (working copy) @@ -340,6 +340,7 @@ } $self->{_species} = $species; + weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'}); } return $self->{_species}; } From cjfields at illinois.edu Sat Aug 9 15:08:27 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Sat, 9 Aug 2008 14:08:27 -0500 (CDT) Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu> I'm pretty sure it's not due to a particular version of weaken(), though it does sound like a circular references issue. I have tried this with perl 5.8.6, 5.8.8, and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak issues. You can try using SeqBuilder to get rid of the Bio::Species object. I'll give that a try when I can. Unfortunately my laptop is now with the local Apple geniuses awaiting a motherboard, so I can't get to it right away (I'll give it a try on my wife's laptop). chris ---- Original message ---- >Date: Sat, 9 Aug 2008 10:47:33 -0700 >From: "Rutger Vos" >Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO? >To: "Hilmar Lapp" >Cc: Chris Fields , bioperl list > >I am sure my version of weaken() works as advertised. Is there a way >to turn off species storing from outside the code base or do you mean >I go and start commenting bits out in Bio::SeqIO::genbank (or >wherever)? > >On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp wrote: >> This smells of circular references somewhere. I think the first point I >> would go looking is the species storing - does the problem go away if you >> turn that off? Maybe the version of weaken() is at play here? >> >> -hilmar >> >> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: >> >>> Forgot to mention, maybe we can file this as a bug? It's a pretty serious >>> one but it should be easy to narrow down; the change had to be introduced >>> fairly recently. >>> >>> chris >>> >>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >>> >>>>> >>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't >>>>> necessarily mean that the memory is returned to the OS for the runtime >>>>> of >>>>> the program >>>> >>>> >>>> I believe that's correct. >>>> >>>> >>>> >>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>>>> >>>> >>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >>>> >>>> >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ================================================= ========== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> ================================================= ========== >> >> >> >> > > > >-- >Dr. Rutger A. Vos >Department of zoology >University of British Columbia >http://www.nexml.org >http://rutgervos.blogspot.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 9 20:17:52 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat, 9 Aug 2008 17:17:52 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <18589.58127.57270.352974@almost.alerce.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> Message-ID: <18590.13232.892714.952555@almost.alerce.com> George Hartzell writes: > > I'm pretty sure that this fixes the problem: > > g. > > Index: Bio/Species.pm > =================================================================== > --- Bio/Species.pm (revision 14791) > +++ Bio/Species.pm (working copy) > @@ -340,6 +340,7 @@ > } > > $self->{_species} = $species; > + weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'}); > } > return $self->{_species}; > } Actually, it's a bit clearer with the weaken moved up in the block so that it's closer to where the new tree is allocated. Chris suggested that I go ahead and I commit it. g. From David.Messina at sbc.su.se Sun Aug 10 05:57:07 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 10 Aug 2008 11:57:07 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <18590.13232.892714.952555@almost.alerce.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> <18590.13232.892714.952555@almost.alerce.com> Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> Nice, George -- holds steady at about 32MB now. Much better. :) Dave From hartzell at alerce.com Sun Aug 10 12:51:39 2008 From: hartzell at alerce.com (George Hartzell) Date: Sun, 10 Aug 2008 09:51:39 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> <18590.13232.892714.952555@almost.alerce.com> <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> Message-ID: <18591.7323.244987.436383@almost.alerce.com> Dave Messina writes: > Nice, George -- holds steady at about 32MB now. > Much better. :) Good to hear. Bonus points go to rvos@ for providing such a nice clean bug report and test case, it made running it down much more appealing. g. From valiente at lsi.upc.edu Mon Aug 11 04:09:37 2008 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Mon, 11 Aug 2008 11:09:37 +0300 Subject: [Bioperl-l] get_lca method very slow on many nodes In-Reply-To: References: Message-ID: Despite the speedup for merge_lineage, the get_lca method still runs very slow on a large number of nodes (say, 1500 nodes) and it does not rely on merge_lineage. In the get_lca method, all the lineages are first collected in @paths in order to later find their $lca, while it might be faster to process each $path as soon as it is obtained with the get_lineage_nodes method. Any other ideas how to speedup the get_lca method? Thanks, Gabriel From lmanchon at univ-montp2.fr Mon Aug 11 12:32:20 2008 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Mon, 11 Aug 2008 18:32:20 +0200 Subject: [Bioperl-l] protein pattern scan Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr> Hi, do you know if it's possible to search protein motif in a multifasta protein file using bioperl to return the motif, the position and the name of the corresponding sequence ? thank you for your help. +---------------------------------------------+ Laurent Manchon Email: lmanchon at univ-montp2.fr +---------------------------------------------+ From cjfields at illinois.edu Mon Aug 11 13:32:05 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT) Subject: [Bioperl-l] protein pattern scan Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu> This is covered the FAQ: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F chris ---- Original message ---- >Date: Mon, 11 Aug 2008 18:32:20 +0200 >From: Laurent Manchon >Subject: [Bioperl-l] protein pattern scan >To: bioperl-l at lists.open-bio.org > >Hi, > >do you know if it's possible to search protein motif in a multifasta >protein file >using bioperl to return the motif, the position and the name of the >corresponding sequence ? > >thank you for your help. > > >+---------------------------------------------+ > Laurent Manchon > Email: lmanchon at univ-montp2.fr >+---------------------------------------------+ >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Mon Aug 11 13:44:37 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Aug 2008 18:44:37 +0100 Subject: [Bioperl-l] get_lca method very slow on many nodes In-Reply-To: References: Message-ID: <48A07A85.6050601@sendu.me.uk> Gabriel Valiente wrote: > Despite the speedup for merge_lineage, the get_lca method still runs > very slow on a large number of nodes (say, 1500 nodes) and it does not > rely on merge_lineage. In the get_lca method, all the lineages are first > collected in @paths in order to later find their $lca, while it might be > faster to process each $path as soon as it is obtained with the > get_lineage_nodes method. If you try that idea out and it works, please do commit it. I've no further suggestions atm, but I haven't had a chance to look at it to remind myself what happens. From cjfields at illinois.edu Mon Aug 11 15:50:38 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT) Subject: [Bioperl-l] Finding possible primers regex Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu> When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration). Any preference? chris ---- Original message ---- >Date: Sat, 9 Aug 2008 12:07:30 -0400 >From: Hilmar Lapp >Subject: Re: [Bioperl-l] Finding possible primers regex >To: Chris Fields >Cc: Benbo , Bioperl-l at lists.open-bio.org > >This looks like a neat trick. Do you think it's worth including as a >SimpleAlign method (obviously w/o the printing to STDOUT)? I can >imagine that a lot of people might appreciate it. > > -hilmar > >On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: > >> On Aug 2, 2008, at 3:05 PM, Benbo wrote: >> >>> >>> Hi there, >>> I'm trying to write a perl script to scan an aligned multiple entry >>> fasta >>> file and find possible primers. So far I've produced a string which >>> contains >>> bases which match all sequences and * where they don't match e.g. >>> 1) TTAGCCTAA >>> 2) TTAGCAGAA >>> 3) TTACCCTAA >>> >>> would give TTA*C**AA. >>> >>> I want to parse this string and pull out all sequences which are >>> 18-21 bp in >>> length and have no more than 4 * in them. >>> >>> So far, I've got this: >>> >>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>> print "$1\n"; >>> } >>> >>> hoping to match all fragments 18-21 characters in length. However >>> even that >>> doesn't work as it has essentially chunked it into 21 char blocks, >>> rather >>> than what I hoped for of >>> 0-18 >>> 0-19 >>> 0-20 >>> 0-21 >>> 1-19 >>> 1-20 >>> 1-21 >>> 1-22 >>> >>> etc. >>> >>> Can anyone let me know if this is already possible in BioPerl, or >>> how one >>> would go about it with regex. Sadly I'm fairly new to perl and >>> getting to >>> grips with BioPerl, so please treat me gently :). >>> >>> Many thanks, >>> >>> Ben >> >> There is a trick to this which is discussed more extensively in >> 'Mastering Regular Expressions'. Essentially you have to embed code >> into the regex and trick the parser into backtracking using a >> negative lookahead. The match itself fails (i.e. no match is >> returned), but the embedded code is executed for each match attempt, >> >> The following script is a slight modification of one I used which >> checks the consensus string from the input alignment (in aligned >> FASTA format here), extracts the alignment slice using that match, >> then spit the alignment out to STDOUT in clustalw format. This >> should work for perl 5.8 and up, but it's only been tested on perl >> 5.10. You should be able to use this to fit what you want. >> >> my $in = Bio::AlignIO->new(-file => $file, >> -format => 'fasta'); >> my $out = Bio::AlignIO->new(-fh => \*STDOUT, >> -format => 'clustalw'); >> >> while (my $aln = $in->next_aln) { >> my $c = $aln->consensus_string(100); >> my @matches; >> $c =~ m/ >> ([GTAC?]{18,21}) >> (?{my $match = check_match($1); >> push @matches, [$match, >> pos(), >> length($match)] >> if defined $match;}) >> (?!) >> /xig; >> for my $match (@matches) { >> my ($hit, $st, $end) = ($match->[0], >> $match->[1] - $match->[2] + 1, >> $match->[1]); >> my $newaln = $aln->slice($st, $end); >> $out->write_aln($newaln); >> } >> } >> >> sub check_match { >> my $match = shift; >> return unless $match; >> my $ct = $match =~ tr/?/?/; >> return $match if $ct <= 4; >> } >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >=========================================================== >: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >=========================================================== > > > From hlapp at gmx.net Mon Aug 11 22:35:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Aug 2008 22:35:13 -0400 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu> References: <20080811145038.BHO59267@expms6.cites.uiuc.edu> Message-ID: Actually, now that you ask I'm wondering whether one wouldn't sometimes want to retain the relationship between the match and the resulting spliced alignment? If so, neither AlignIO nor array would accomplish that, right? Other than that I myself don't have a strong preference either way. I suppose AlignIO stream is somewhat more extensible, since as you say it could be coupled to a file if the resulting set of alignments is really large. -hilmar On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote: > When I can I could try generating a method which accepts a regex/ > Bio::Tools::SeqPattern and returns an AlignIO stream or array of > SimpleAlign instances (the former could be attached to a temp file > for iteration). Any preference? > > chris > > ---- Original message ---- >> Date: Sat, 9 Aug 2008 12:07:30 -0400 >> From: Hilmar Lapp >> Subject: Re: [Bioperl-l] Finding possible primers regex >> To: Chris Fields >> Cc: Benbo , Bioperl-l at lists.open-bio.org >> >> This looks like a neat trick. Do you think it's worth including as a >> SimpleAlign method (obviously w/o the printing to STDOUT)? I can >> imagine that a lot of people might appreciate it. >> >> -hilmar >> >> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: >> >>> On Aug 2, 2008, at 3:05 PM, Benbo wrote: >>> >>>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple entry >>>> fasta >>>> file and find possible primers. So far I've produced a string which >>>> contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char blocks, >>>> rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>> >>> There is a trick to this which is discussed more extensively in >>> 'Mastering Regular Expressions'. Essentially you have to embed code >>> into the regex and trick the parser into backtracking using a >>> negative lookahead. The match itself fails (i.e. no match is >>> returned), but the embedded code is executed for each match attempt, >>> >>> The following script is a slight modification of one I used which >>> checks the consensus string from the input alignment (in aligned >>> FASTA format here), extracts the alignment slice using that match, >>> then spit the alignment out to STDOUT in clustalw format. This >>> should work for perl 5.8 and up, but it's only been tested on perl >>> 5.10. You should be able to use this to fit what you want. >>> >>> my $in = Bio::AlignIO->new(-file => $file, >>> -format => 'fasta'); >>> my $out = Bio::AlignIO->new(-fh => \*STDOUT, >>> -format => 'clustalw'); >>> >>> while (my $aln = $in->next_aln) { >>> my $c = $aln->consensus_string(100); >>> my @matches; >>> $c =~ m/ >>> ([GTAC?]{18,21}) >>> (?{my $match = check_match($1); >>> push @matches, [$match, >>> pos(), >>> length($match)] >>> if defined $match;}) >>> (?!) >>> /xig; >>> for my $match (@matches) { >>> my ($hit, $st, $end) = ($match->[0], >>> $match->[1] - $match->[2] + 1, >>> $match->[1]); >>> my $newaln = $aln->slice($st, $end); >>> $out->write_aln($newaln); >>> } >>> } >>> >>> sub check_match { >>> my $match = shift; >>> return unless $match; >>> my $ct = $match =~ tr/?/?/; >>> return $match if $ct <= 4; >>> } >>> >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mirhan at indiana.edu Mon Aug 11 23:46:35 2008 From: mirhan at indiana.edu (Han, Mira) Date: Mon, 11 Aug 2008 23:46:35 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: Message-ID: Hi, Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it. I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes. ( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. ) I forgot about the interface part. At least for my purpose I would think it's a good function to have in the interface. I agree that adding a recursive option to the get_all_Annotation would be better. Mira On 8/11/08 11:28 PM, "Hilmar Lapp" wrote: Hi Mira - On Aug 11, 2008, at 11:33 AM, Han, Mira wrote: > Added get_deep_Annotations in Annotation::Collection > in order to get annotations that are within nested collections. I hope I'm not contradicting Chris here, but we will probably want to think about this a bit more. Your implementation won't work as it is assuming an interface function that isn't defined on the interface (both get_deep_Annotations() and _deep_annotation_helper()). Also, it does nearly the same as get_all_Annotations(), and passing on the keys to nested collections should maybe simply be an option to that method. Alternatively, one could add an option -recurse to get_Annotation. The other difference you note is that your method does not flatten the nested annotations, but unless I am missing something your implementation does flatten annotations from nested collections. So even if we need a separate method for this, something like get_nested_Annotations() would probably be a more appropriate name, and if we do need a separate method, it should be compelling enough to add it to the interface too (as otherwise your code will only work with certain implementation classes). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From mirhan at indiana.edu Tue Aug 12 00:00:28 2008 From: mirhan at indiana.edu (Han, Mira) Date: Tue, 12 Aug 2008 00:00:28 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu> Message-ID: Oh yes, I meant get_Annotations, I want a get_Annotations that is recursive and passes the keys to the recursive calls. On 8/11/08 11:54 PM, "Hilmar Lapp" wrote: Hi Mira - On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > Yes it is true that it's similar to get_all_Annotations, it's > basically a recursive version of it. I suppose you mean get_Annotations(), right? (get_all_Annotations() is already recursive) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From hlapp at duke.edu Mon Aug 11 23:54:43 2008 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 11 Aug 2008 23:54:43 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu> Hi Mira - On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > Yes it is true that it's similar to get_all_Annotations, it's > basically a recursive version of it. I suppose you mean get_Annotations(), right? (get_all_Annotations() is already recursive) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From mrphysh at juno.com Tue Aug 12 10:30:36 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Tue, 12 Aug 2008 14:30:36 GMT Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com> I am studying bioperl and making progress. I have been struggling with the database retrieval from on-line databases. This is an example................ #!/usr/bin/perl -w use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); exit; This script gives Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12. BEGIN failed--compilation aborted at ee_bpo.pl line 12. I have chased around with the paths in @INC, using "use lib'. This is an install problem. The original installation was with perl Makefile.pl. I reinstalled over the old with cpan. stuff like this: cpan> o conf prerequisites_policy follow cpan> i /bioperl/ cpan> install Bundle::BioPerl cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz This all seemed to proceed smoothly. this guy did not produce an error. use Bio::Perl; I am almost thinking that the problem is with the perl. But regular ftp through perl works: use Net::FTP;#I found this in usr/share/perl/5.8.8/Net As a perl command this module seems to work. I looked in the archives and found nothing. I think I have done my homework. any ideas? I run Ubuntu on a pentium III (and love it). the version of Ubuntu is new. the Perl (and MySQL) came with the OS: perl 5.8.8 John Brigham in Denver. ____________________________________________________________ Click to get a free auto insurance quotes from top companies. http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/ From jay at jays.net Tue Aug 12 11:08:59 2008 From: jay at jays.net (Jay Hannah) Date: Tue, 12 Aug 2008 10:08:59 -0500 Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com> References: <20080812.083036.25924.0@webmail02.vgs.untd.com> Message-ID: On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote: > Can't locate IO/String.pm in @INC ... > cpan> install Bundle::BioPerl > cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz > cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz > This all seemed to proceed smoothly bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date? Here's lots of info about installing BioPerl: http://www.bioperl.org/wiki/Getting_BioPerl I recommend using bioperl-live directly from SVN, but I'm sort of a rebel like that. :) Alternately, you could try just doing a cpan> install IO::String HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From heikki at sanbi.ac.za Thu Aug 14 09:14:48 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 14 Aug 2008 15:14:48 +0200 Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ? Message-ID: <200808141514.49124.heikki@sanbi.ac.za> A generic method for retrieving nodes from a Bio::Tree::TreeI objects is Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' attribute unless a field is given. I can retrieve nodes based on internal id like this: $tree->find_node(-internal_id => $internal_id); I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. However, the POD documentation claims that it retrieves by internal id. What needs to be done? A. Fix the doc to speak about id B. Fix to code to retrieve by internal_id C. Fix the doc and create findnode_by_internal_id() C. Remove findnode_by_id() as redundant and confusing D. Deprecate findnode_by_id() as redundant and confusing There are no tests for findnode_by_id() which to me tilts selection to D and A for now. Any other opinions? -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Thu Aug 14 18:28:20 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 14 Aug 2008 18:28:20 -0400 Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API? In-Reply-To: <48A448DD.4000206@psb.ugent.be> References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com> <48A448DD.4000206@psb.ugent.be> Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net> Hi Erick, how did you determine that go-perl is specific to GO? I've found it to work quite well for any kind of OBO-formatted ontology. Also, you note that BioPerl doesn't have the ability to write in certain formats, and to intersect and "unify" (would you mind explaining what you mean by that?) ontologies. It seems that your implementation of RDF etc export isn't really reusable or modular in any way, but I'd love to bring the intersection function over to BioPerl (BTW when you decided to roll your own ontology API, did you get the impression that BioPerl isn't receptive to you adding to it?). Would you mind pointing me to the place in the code where I would find that, as I can't seem to find it. -hilmar On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote: > Hi Arne, > > if you plan to work with PERL, you might take a look at ONTO-PERL : > > http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042 > http://search.cpan.org/dist/ONTO-PERL/ > http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html > > ONTO-PERL has been used intensively to build the Cell Cycle Ontology. > > cheers, > Erick > > Arne Muller wrote: >> Dear All, >> >> I'm new to this list and don't know much about ontologies in general >> (I worked a bit with GO some time ago). >> >> Let me explain my problem: We have several related vocabularies >> (non-hierarchical and redundant because of different spellings etc >> ...) to describe organs and tissues in our department, and we need to >> map each of these vocabs to all of our other legacy vocabs that >> describe similar concepts. We'd like to use the adult mouse anatomy >> ontology and modify/extend it with additional terms (if necessary), >> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs >> to existing terms in the MA ontology. The goal is that different >> units >> in our department use slightly different vocabulary to describe >> samples, and we now need link these different system (always the same >> old story ... ;-). >> >> For the moment I'm not planning to turn our messy legacy vocabs into >> OBO formated ontologies and to map them via cross products and the >> OBO >> relation ontology - though this might be the most proper way to do >> ... (comments are welcome). >> >> I'll have to write an "easy to use" tool that allows our data curator >> to easily map the legacy vocabs as dbXrefs of terms in the MA >> ontology. The question is, how am I gonna do this? I've a fairly good >> idea of how my software (java webapp) should look like, but are there >> any APIs and implementations of the OBO model as well as a DB schema >> and mappings between the model and the schema? >> >> I've had a look into the OLS from the EBI that seems to be fairly >> simple (which is good ;-) and that uses the oboedit.jar somewhere at >> the back-end. I've also found something like an obo api on >> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse >> but so far I've not found any documentation nor examples on how to >> get >> started. >> >> I'd be happy to hear how developers and bioinformatics people use obo >> in their own tools (I better ask before going DIY ...). >> >> thanks a lot for your comments and help >> +kind regards, >> >> Arne >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Obo-discuss mailing list >> Obo-discuss at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/obo-discuss >> > > -- > ================================================================== > Erick Antezana http://www.cellcycleontology.org > PhD student > Tel:+32 (0)9 331 38 24 fax:+32 (0)9 3313809 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > erant at psb.ugent.be http://www.psb.ugent.be/~erant > ================================================================== > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Obo-discuss mailing list > Obo-discuss at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/obo-discuss -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjanis at chem.ucla.edu Thu Aug 14 19:37:05 2008 From: mjanis at chem.ucla.edu (Michael Janis) Date: Thu, 14 Aug 2008 16:37:05 -0700 Subject: [Bioperl-l] Code to contribute Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> Hi, I've had some perl code lying around for what seems like forever and I'd like to contribute it to bioperl, if such facilities don't already exist in bioperl. The code implements shuffling (DNA or RNA) keeping the dinucleotide composition (and codon usage) intact through a Eularian path approach as described in Altschul and Erickson (1985). The code seeds the Eularian paths by keeping the first and last nucleotide invariant in the shuffle - which has minimal detrimental effects to the purpose of the algorithm, in my experience. A quick search on the bioperl website shows that there is a mutation.pls script, and facilities for using Sean Eddy's SQUID C library, which implements the same function (I wrote this particular function before I knew how to use C). As such, it's probably not as elegant as Sean Eddy's implementation, but it works - and it's entirely in perl. The bioperl developer pages suggest a post to the mailing list as the best place to start contributing to bioperl. Is this a useful function to add to the project? Best Regards, Michael ------------------------------- Michael Janis mjanis at chem.ucla.edu ------------------------------- From rvos at interchange.ubc.ca Thu Aug 14 19:51:43 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Thu, 14 Aug 2008 16:51:43 -0700 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Sounds exciting! I bet the general advice you'll get is to i) check out the latest code from svn ii) see which bioperl objects/interfaces (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl iii) write a class that performs the algorithm as some sort of analysis factory taking the sequence object (or ideally object interface) as an input iv) run that class by the mailing list v) check it into svn. On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis wrote: > Hi, > > > > I've had some perl code lying around for what seems like forever and I'd > like to contribute it to bioperl, if such facilities don't already exist in > bioperl. The code implements shuffling (DNA or RNA) keeping the > dinucleotide composition (and codon usage) intact through a Eularian path > approach as described in Altschul and Erickson (1985). The code seeds the > Eularian paths by keeping the first and last nucleotide invariant in the > shuffle - which has minimal detrimental effects to the purpose of the > algorithm, in my experience. > > > > A quick search on the bioperl website shows that there is a mutation.pls > script, and facilities for using Sean Eddy's SQUID C library, which > implements the same function (I wrote this particular function before I knew > how to use C). As such, it's probably not as elegant as Sean Eddy's > implementation, but it works - and it's entirely in perl. > > > > The bioperl developer pages suggest a post to the mailing list as the best > place to start contributing to bioperl. Is this a useful function to add to > the project? > > > > Best Regards, > > > > Michael > > > > ------------------------------- > > Michael Janis > > mjanis at chem.ucla.edu > > ------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From mjanis at chem.ucla.edu Thu Aug 14 19:55:04 2008 From: mjanis at chem.ucla.edu (Michael Janis) Date: Thu, 14 Aug 2008 16:55:04 -0700 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu> Thanks, Rutger, I'll do exactly that! (give me a few days) Best Regards, Michael ------------------------------- Michael Janis mjanis at chem.ucla.edu ------------------------------- -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos Sent: Thursday, August 14, 2008 4:52 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Code to contribute Sounds exciting! I bet the general advice you'll get is to i) check out the latest code from svn ii) see which bioperl objects/interfaces (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl iii) write a class that performs the algorithm as some sort of analysis factory taking the sequence object (or ideally object interface) as an input iv) run that class by the mailing list v) check it into svn. On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis wrote: > Hi, > > > > I've had some perl code lying around for what seems like forever and I'd > like to contribute it to bioperl, if such facilities don't already exist in > bioperl. The code implements shuffling (DNA or RNA) keeping the > dinucleotide composition (and codon usage) intact through a Eularian path > approach as described in Altschul and Erickson (1985). The code seeds the > Eularian paths by keeping the first and last nucleotide invariant in the > shuffle - which has minimal detrimental effects to the purpose of the > algorithm, in my experience. > > > > A quick search on the bioperl website shows that there is a mutation.pls > script, and facilities for using Sean Eddy's SQUID C library, which > implements the same function (I wrote this particular function before I knew > how to use C). As such, it's probably not as elegant as Sean Eddy's > implementation, but it works - and it's entirely in perl. > > > > The bioperl developer pages suggest a post to the mailing list as the best > place to start contributing to bioperl. Is this a useful function to add to > the project? > > > > Best Regards, > > > > Michael > > > > ------------------------------- > > Michael Janis > > mjanis at chem.ucla.edu > > ------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Aug 14 21:17:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 15 Aug 2008 13:17:23 +1200 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Message-ID: You forgot 2 points, vi) write documentation/examples, and vii) write tests ;-) Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Rutger Vos > Sent: Friday, 15 August 2008 11:52 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Fwd: Code to contribute > > Sounds exciting! I bet the general advice you'll get is to i) check > out the latest code from svn ii) see which bioperl objects/interfaces > (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl > iii) write a class that performs the algorithm as some sort of > analysis factory taking the sequence object (or ideally object > interface) as an input iv) run that class by the mailing list v) check > it into svn. > > On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis wrote: > > Hi, > > > > > > > > I've had some perl code lying around for what seems like forever and I'd > > like to contribute it to bioperl, if such facilities don't already exist in > > bioperl. The code implements shuffling (DNA or RNA) keeping the > > dinucleotide composition (and codon usage) intact through a Eularian path > > approach as described in Altschul and Erickson (1985). The code seeds the > > Eularian paths by keeping the first and last nucleotide invariant in the > > shuffle - which has minimal detrimental effects to the purpose of the > > algorithm, in my experience. > > > > > > > > A quick search on the bioperl website shows that there is a mutation.pls > > script, and facilities for using Sean Eddy's SQUID C library, which > > implements the same function (I wrote this particular function before I knew > > how to use C). As such, it's probably not as elegant as Sean Eddy's > > implementation, but it works - and it's entirely in perl. > > > > > > > > The bioperl developer pages suggest a post to the mailing list as the best > > place to start contributing to bioperl. Is this a useful function to add to > > the project? > > > > > > > > Best Regards, > > > > > > > > Michael > > > > > > > > ------------------------------- > > > > Michael Janis > > > > mjanis at chem.ucla.edu > > > > ------------------------------- > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > > > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mirhan at indiana.edu Fri Aug 15 02:11:46 2008 From: mirhan at indiana.edu (Han, Mira) Date: Fri, 15 Aug 2008 02:11:46 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: Message-ID: Hi, I've fixed the get_deep_Annotations() to get_nested_Annotations() It has arguments -keys and -recursive, And behaves exactly like get_Annotations() when recursive is not set (tested by replacing the get_Annotations() in the Annotation.t) I made it a new function instead of modifying get_Annotations() because I wasn't sure how to modify it to be backwards compatible. I thought of adding the function to the interface AnnotationCollectionI, But it seemed like get_all_Annotations() was missing from the interface as well, So decided to ask if it should be added to the interface at all. Isn't it possible that certain implementations of the interface has functions that are only specific to that implementation? Mira On 8/12/08 12:00 AM, "Mira Han" wrote: Oh yes, I meant get_Annotations, I want a get_Annotations that is recursive and passes the keys to the recursive calls. On 8/11/08 11:54 PM, "Hilmar Lapp" wrote: Hi Mira - On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > Yes it is true that it's similar to get_all_Annotations, it's > basically a recursive version of it. I suppose you mean get_Annotations(), right? (get_all_Annotations() is already recursive) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at illinois.edu Fri Aug 15 09:59:42 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Aug 2008 08:59:42 -0500 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Message-ID: Agreed! We're hoping to move to a more structured core after 1.6 which will require decent documentation and tests for inclusion. My feeling is one should free to add code samples to relevant pages/sections in the BioPerl wiki, or write up your own HOWTO (it's not terribly hard to do, and it adds to your karma). chris On Aug 14, 2008, at 8:17 PM, Smithies, Russell wrote: > You forgot 2 points, > > vi) write documentation/examples, and vii) write tests > > ;-) > > > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Rutger Vos >> Sent: Friday, 15 August 2008 11:52 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Fwd: Code to contribute >> >> Sounds exciting! I bet the general advice you'll get is to i) check >> out the latest code from svn ii) see which bioperl objects/interfaces >> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl >> iii) write a class that performs the algorithm as some sort of >> analysis factory taking the sequence object (or ideally object >> interface) as an input iv) run that class by the mailing list v) >> check >> it into svn. >> >> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis > wrote: >>> Hi, >>> >>> >>> >>> I've had some perl code lying around for what seems like forever and > I'd >>> like to contribute it to bioperl, if such facilities don't already > exist in >>> bioperl. The code implements shuffling (DNA or RNA) keeping the >>> dinucleotide composition (and codon usage) intact through a Eularian > path >>> approach as described in Altschul and Erickson (1985). The code > seeds the >>> Eularian paths by keeping the first and last nucleotide invariant in > the >>> shuffle - which has minimal detrimental effects to the purpose of > the >>> algorithm, in my experience. >>> >>> >>> >>> A quick search on the bioperl website shows that there is a > mutation.pls >>> script, and facilities for using Sean Eddy's SQUID C library, which >>> implements the same function (I wrote this particular function > before I knew >>> how to use C). As such, it's probably not as elegant as Sean Eddy's >>> implementation, but it works - and it's entirely in perl. >>> >>> >>> >>> The bioperl developer pages suggest a post to the mailing list as > the best >>> place to start contributing to bioperl. Is this a useful function > to add to >>> the project? >>> >>> >>> >>> Best Regards, >>> >>> >>> >>> Michael >>> >>> >>> >>> ------------------------------- >>> >>> Michael Janis >>> >>> mjanis at chem.ucla.edu >>> >>> ------------------------------- >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com >> >> >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Fri Aug 15 10:12:10 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Aug 2008 09:12:10 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <3371D481-1416-4976-9846-83CF83395039@illinois.edu> The method get_all_annotation_keys() is present in AnnotationCollectionI but not get_all_Annotations(), though I doubt it is set up for recursive retrieval (something that might be worth testing). I don't have a problem adding this in. Hilmar, thoughts? chris On Aug 15, 2008, at 1:11 AM, Han, Mira wrote: > > Hi, > I've fixed the get_deep_Annotations() to get_nested_Annotations() > It has arguments -keys and -recursive, > And behaves exactly like get_Annotations() when recursive is not set > (tested by replacing the get_Annotations() in the Annotation.t) > I made it a new function instead of modifying get_Annotations() > because I wasn't sure how to modify it to be backwards compatible. > I thought of adding the function to the interface > AnnotationCollectionI, > But it seemed like get_all_Annotations() was missing from the > interface as well, > So decided to ask if it should be added to the interface at all. > Isn't it possible that certain implementations of the interface has > functions that are only specific to that implementation? > > Mira > > > On 8/12/08 12:00 AM, "Mira Han" wrote: > > > Oh yes, > I meant get_Annotations, > I want a get_Annotations that is recursive and passes the keys to > the recursive calls. > > > > On 8/11/08 11:54 PM, "Hilmar Lapp" wrote: > > Hi Mira - > > On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > >> Yes it is true that it's similar to get_all_Annotations, it's >> basically a recursive version of it. > > > I suppose you mean get_Annotations(), right? (get_all_Annotations() is > already recursive) > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jorvis at gmail.com Fri Aug 15 15:45:23 2008 From: jorvis at gmail.com (Joshua Orvis) Date: Fri, 15 Aug 2008 15:45:23 -0400 Subject: [Bioperl-l] help creating de novo GFF3 Message-ID: I don't have a lot of experience with Bioperl and have used it mostly for simple format conversions or parsing Genbank files. I need to create a quick script to create GFF3 and decided to give bioperl a try again instead of just printing the columns myself but have had a few problems. My apologies for the narrative here but I know it can sometimes be informative to hear 'how' a user arrived at a problem rather than just knowing the problem itself. Is there a documented explicit mapping between the GFF3 columns and the predefined tags (ID, Name, etc.) and their Bioperl object attribute equivalents? Is it preferrable to create Bio::SeqFeature::Generic objects and pass them to Bio::Tools::GFF->write_feature or rather to create Bio::SeqFeature::Annotated and pass them to Bio::FeatureIO::gff ? I may be overlooking it, but a simple tutorial showing how to create and define a new sequence object, attach annotations to it and dump in GFF format seems to be missing. This seems like a basic thing to do - most of the documentation I find is about converting between formats rather than creating new annotation. Here are some of the problems I (a typical naive user?) ran into when adventuring with bioperl here. My first attempt resulted in the string "SEQ" as column 0 in all my GFF output. I thought that maybe this was because my features weren't 'attached' to a sequence, so I created a Bio::Seq::RichSeq object and tried both (separately): $seq->add_SeqFeature( $feat ); and $feat->attach_seq( $seq ); Neither changed the first column of output. Looking at the docs.bioperl.orgmethods for Bio::SeqFeature::Generic I found the seq_id attribute, which came with the warning: "This attribute should *not* be used in GFF dumping" - but since it's the only thing I did that worked, I used it anyway. Next I wanted to have ID tags within my last column. I first tried setting all the relevant attributes I could see on my features (id, primary_tag, display_name, display_id, etc.) but none of these caused ID=? to be created. Next, I tried something like this: my $feat = new Bio::SeqFeature::Annotated ( -start => $start, -end => $end, -strand => $strand, -primary => 'gene', -seq_id => $asmbl_id, ## this works but is discouraged -tag => { ID => $transcript->{pub_locus}, product_name => $transcript->{com_name}, ec_number => $transcript->{'ec#'}, gene_symbol => $transcript->{gene_sym} } ); My hopes that passing it via the -tag option would do the trick failed, as it created a line like this instead: 10263 . gene 58512 56983 . + . iD=AN9220.4; Notice the 'ID' -> 'iD' transformation (without any command-line warnings). I'm still stuck on this one (Parent would be next) but overall guidance or pointers to a tutorial/documentation I'm overlooking would be great. JO From jason at bioperl.org Fri Aug 15 19:00:04 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Aug 2008 16:00:04 -0700 Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ? In-Reply-To: <200808141514.49124.heikki@sanbi.ac.za> References: <200808141514.49124.heikki@sanbi.ac.za> Message-ID: I think D. There should probably only be one find_node function. findnode_by_id was written by Ramiro for the re-rooting code and I guess it wasn't checked to reduce unneeded functions. I don't have any problems removing/deprecating it but will need to update the code that depends on it to use find_node properly. -jason On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote: > A generic method for retrieving nodes from a Bio::Tree::TreeI > objects is > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the > 'id' > attribute unless a field is given. I can retrieve nodes based on > internal id > like this: > > $tree->find_node(-internal_id => $internal_id); > > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that > retrieves by id. > However, the POD documentation claims that it retrieves by internal > id. > > What needs to be done? > > A. Fix the doc to speak about id > B. Fix to code to retrieve by internal_id > C. Fix the doc and create findnode_by_internal_id() > C. Remove findnode_by_id() as redundant and confusing > D. Deprecate findnode_by_id() as redundant and confusing > > There are no tests for findnode_by_id() which to me tilts selection > to D and A > for now. > > Any other opinions? > > -Heikki > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From schmidtc at udel.edu Fri Aug 15 19:42:44 2008 From: schmidtc at udel.edu (Carl Schmidt) Date: Fri, 15 Aug 2008 19:42:44 -0400 Subject: [Bioperl-l] lazy symbol binding Message-ID: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu> When I attempt bp_load_gff.pl I get the following error: dyld: lazy symbol binding failed: Symbol not found: _mysql_init Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ auto/DBD/mysql/mysql.bundle Expected in: dynamic lookup dyld: Symbol not found: _mysql_init Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ auto/DBD/mysql/mysql.bundle Expected in: dynamic lookup Trace/BPT trap Any suggestions? I apologize if this is the wrong place for posting such a question. Thanks Carl Carl J. Schmidt Department of Animal & Food Sciences University of Delaware Newark, DE 19716 schmidtc at udel.edu http://copland.udel.edu/~schmidtc From rvos at interchange.ubc.ca Fri Aug 15 20:11:48 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Fri, 15 Aug 2008 17:11:48 -0700 Subject: [Bioperl-l] Fwd: lazy symbol binding In-Reply-To: <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com> References: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu> <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com> Message-ID: <2bb9b24a0808151711m5eba5500k468a0effc711f3c@mail.gmail.com> ---------- Forwarded message ---------- From: Rutger Vos Date: Fri, Aug 15, 2008 at 5:11 PM Subject: Re: [Bioperl-l] lazy symbol binding To: Carl Schmidt This looks like a faulty install of the combo of DBI/DBD::mysql Specifically, the perl driver (DBD::mysql) for the mysql database builds a dynamic library on installation, which the DBI interface tries to load dynamically, but fails. Unfortunately, this is by no means bioperl related. Try a reinstall as per the instructions at: http://search.cpan.org/~capttofu/DBD-mysql-4.008/lib/DBD/mysql/INSTALL.pod On Fri, Aug 15, 2008 at 4:42 PM, Carl Schmidt wrote: > When I attempt bp_load_gff.pl > I get the following error: > > dyld: lazy symbol binding failed: Symbol not found: _mysql_init > Referenced from: > /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _mysql_init > Referenced from: > /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > Any suggestions? I apologize if this is the wrong place for posting such a > question. > > Thanks > Carl > > Carl J. Schmidt > Department of Animal & Food Sciences > University of Delaware > Newark, DE 19716 > schmidtc at udel.edu > http://copland.udel.edu/~schmidtc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From hlapp at duke.edu Sat Aug 16 13:43:46 2008 From: hlapp at duke.edu (Hilmar Lapp) Date: Sat, 16 Aug 2008 13:43:46 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <3371D481-1416-4976-9846-83CF83395039@illinois.edu> References: <3371D481-1416-4976-9846-83CF83395039@illinois.edu> Message-ID: On Aug 15, 2008, at 10:12 AM, Chris Fields wrote: > The method get_all_annotation_keys() is present in > AnnotationCollectionI but not get_all_Annotations() Interesting. I wonder whether that was a result of the forward and reverse surgeries done to the Annotation* interfaces. (I'm off-line as I write this so can't check.) > , though I doubt it is set up for recursive retrieval (something > that might be worth testing). I don't have a problem adding this > in. Hilmar, thoughts? get_all_Annotations() has always been recursive (similarly as get_all_SeqFeatures() is for SeqI). However, the recursive behavior is different from the behavior that Mira wants. Specifically, if get_all_Annotations() finds a nested collection under a matching tag, it will consider the entire nested collection as match, and returns the recursively flattened out annotation objects it contains. What Mira needs (if I am understanding her implementation correctly) is recursively retrieving annotations if their tag matches the query key or set of keys. I.e., a nested collection would be searched for matching tags even if the tag of the collection itself does not match, and if it does match, only those of the contained annotations would be returned that have matching tags. I'm not sure whether it's better to fold both behaviors into one method which has an optional argument to control which one is desired, or to have two methods. I'm leaning towards having two methods, because support for an added optional argument in external implementations of the interface is hard to test for, as opposed to testing for the presence/absence of a new method. On the other hand, if the existing method wasn't even on the interface to begin with (which I'm not yet convinced about) then that shouldn't really be an issue. It seems Chris you are also for adding a second method (and putting get_all_Annotations() (back) into the interface)? Does anyone else have thoughts or preferences on this? -hilmar > > > chris > > On Aug 15, 2008, at 1:11 AM, Han, Mira wrote: > >> >> Hi, >> I've fixed the get_deep_Annotations() to get_nested_Annotations() >> It has arguments -keys and -recursive, >> And behaves exactly like get_Annotations() when recursive is not >> set (tested by replacing the get_Annotations() in the Annotation.t) >> I made it a new function instead of modifying get_Annotations() >> because I wasn't sure how to modify it to be backwards compatible. >> I thought of adding the function to the interface >> AnnotationCollectionI, >> But it seemed like get_all_Annotations() was missing from the >> interface as well, >> So decided to ask if it should be added to the interface at all. >> Isn't it possible that certain implementations of the interface has >> functions that are only specific to that implementation? >> >> Mira >> >> >> On 8/12/08 12:00 AM, "Mira Han" wrote: >> >> >> Oh yes, >> I meant get_Annotations, >> I want a get_Annotations that is recursive and passes the keys to >> the recursive calls. >> >> >> >> On 8/11/08 11:54 PM, "Hilmar Lapp" wrote: >> >> Hi Mira - >> >> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: >> >>> Yes it is true that it's similar to get_all_Annotations, it's >>> basically a recursive version of it. >> >> >> I suppose you mean get_Annotations(), right? (get_all_Annotations() >> is >> already recursive) >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : >> =========================================================== >> >> >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From heikki at sanbi.ac.za Sun Aug 17 03:02:31 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 17 Aug 2008 09:02:31 +0200 Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ? In-Reply-To: References: <200808141514.49124.heikki@sanbi.ac.za> Message-ID: <200808170902.32485.heikki@sanbi.ac.za> Done. findnode_by_id() was not and is not used anywhere in BioPerl core code. Thanks for input, -Heikki On Saturday 16 August 2008 01:00:04 Jason Stajich wrote: > I think D. > > There should probably only be one find_node function. > > findnode_by_id was written by Ramiro for the re-rooting code and I > guess it wasn't checked to reduce unneeded functions. I don't have > any problems removing/deprecating it but will need to update the code > that depends on it to use find_node properly. > > -jason > > On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote: > > A generic method for retrieving nodes from a Bio::Tree::TreeI > > objects is > > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the > > 'id' > > attribute unless a field is given. I can retrieve nodes based on > > internal id > > like this: > > > > $tree->find_node(-internal_id => $internal_id); > > > > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that > > retrieves by id. > > However, the POD documentation claims that it retrieves by internal > > id. > > > > What needs to be done? > > > > A. Fix the doc to speak about id > > B. Fix to code to retrieve by internal_id > > C. Fix the doc and create findnode_by_internal_id() > > C. Remove findnode_by_id() as redundant and confusing > > D. Deprecate findnode_by_id() as redundant and confusing > > > > There are no tests for findnode_by_id() which to me tilts selection > > to D and A > > for now. > > > > Any other opinions? > > > > -Heikki > > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Sun Aug 17 03:02:31 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 17 Aug 2008 09:02:31 +0200 Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ? In-Reply-To: References: <200808141514.49124.heikki@sanbi.ac.za> Message-ID: <200808170902.32485.heikki@sanbi.ac.za> Done. findnode_by_id() was not and is not used anywhere in BioPerl core code. Thanks for input, -Heikki On Saturday 16 August 2008 01:00:04 Jason Stajich wrote: > I think D. > > There should probably only be one find_node function. > > findnode_by_id was written by Ramiro for the re-rooting code and I > guess it wasn't checked to reduce unneeded functions. I don't have > any problems removing/deprecating it but will need to update the code > that depends on it to use find_node properly. > > -jason > > On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote: > > A generic method for retrieving nodes from a Bio::Tree::TreeI > > objects is > > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the > > 'id' > > attribute unless a field is given. I can retrieve nodes based on > > internal id > > like this: > > > > $tree->find_node(-internal_id => $internal_id); > > > > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that > > retrieves by id. > > However, the POD documentation claims that it retrieves by internal > > id. > > > > What needs to be done? > > > > A. Fix the doc to speak about id > > B. Fix to code to retrieve by internal_id > > C. Fix the doc and create findnode_by_internal_id() > > C. Remove findnode_by_id() as redundant and confusing > > D. Deprecate findnode_by_id() as redundant and confusing > > > > There are no tests for findnode_by_id() which to me tilts selection > > to D and A > > for now. > > > > Any other opinions? > > > > -Heikki > > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From mike.thon at gmail.com Mon Aug 18 01:01:43 2008 From: mike.thon at gmail.com (Michael Thon) Date: Mon, 18 Aug 2008 07:01:43 +0200 Subject: [Bioperl-l] Build.PL options? Message-ID: Hi all - I am trying to write a port of bioperl 1.5.2 to enable its installation with the macports system (www.macports.org). I'm not too familiar with the Build.PL build system - is there any way to disable the dependency resolution that the build script does (i.e. without patching the script)? Thanks Mike From David.Messina at sbc.su.se Mon Aug 18 03:37:53 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 18 Aug 2008 09:37:53 +0200 Subject: [Bioperl-l] Build.PL options? In-Reply-To: References: Message-ID: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com> Hi Mike, Great to hear you're planning add BioPerl to MacPorts. Thanks! I'm afraid I don't know how to disable the dependency resolution, but while you're waiting for others to chime in , here are a couple of things you might try if you haven't already: - The Build.PL for BioPerl is configured to automatically write out a Makefile.PL for you. If you're more familiar with MakeFile.PL, then you could work with that. Module::Build tries to maintain a certain level of cross-compatibility with ExtUtils::MakeMaker; how do you disable dependency resolution with the Makefile.PL system? - The Build.PL API is pretty flexible, and it's easy to change just about any behavior by passing parameters to Build.PL on the command line. See http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/API.pod and http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/Cookbook.pm I'm curious -- it seems to me that a major purpose of Build.PL and Makefile.PL is specifying dependencies (and installing them where necessary). Does MacPorts override that and do its own dependency-checking for Perl modules? Dave From neetisomaiya at gmail.com Mon Aug 18 07:45:38 2008 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 18 Aug 2008 17:15:38 +0530 Subject: [Bioperl-l] need help in parsing KEGG data Message-ID: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> Hi, I am fetching data from the ent gene file of KEGG which is available here : ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch gene names and pathways in which they participate. I am getting the gene names fine. But this method "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){ }" does'nt seem to be working. I am not able to get the data of the pathways in which the gene is involved. Can someone please suggest how I can get the pathway data of genes from the KEGG ent file?? Thanks. -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Mon Aug 18 08:13:08 2008 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 18 Aug 2008 17:43:08 +0530 Subject: [Bioperl-l] need help in parsing KEGG data In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> Message-ID: <764978cf0808180513k75388671if72ca7913f6e7fc2@mail.gmail.com> Hi, I am fetching data from the ent gene file of KEGG which is available here : ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch gene names and pathways in which they participate. I am getting the gene names fine. But this method "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){ }" does'nt seem to be working. I am not able to get the data of the pathways in which the gene is involved. Can someone please suggest how I can get the pathway data of genes from the KEGG ent file?? Thanks. -- -Neeti Even my blood says, B positive From johnsonm at gmail.com Mon Aug 18 09:26:26 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 18 Aug 2008 08:26:26 -0500 Subject: [Bioperl-l] need help in parsing KEGG data In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> Message-ID: On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya wrote: > I am fetching data from the ent gene file of KEGG which is available here : > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent > > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch > gene names and pathways in which they participate. I am getting the gene > names fine. But this method > > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){ > }" > > does'nt seem to be working. I am not able to get the data of the pathways in > which the gene is involved. > > Can someone please suggest how I can get the pathway data of genes from the > KEGG ent file?? What exactly do you mean by "doesn't seem to be working" and what version of BioPerl are you using? The code below seems to function as expected with BioPerl 1.5.2, producing output like this: hsa04612 Antigen processing and presentation hsa01430 Cell Communication hsa04020 Calcium signaling pathway hsa04080 Neuroactive ligand-receptor interaction hsa04540 Gap junction ... ... ... #!/wherever/bin/perl use strict; use warnings; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]); while (my $seq = $seqio->next_seq()) { foreach my $pathway ($seq->annotation->get_Annotations('pathway')) { ## $pathway should be a Bio::Annotation::Comment print $pathway->text(), "\n"; } } From johnsonm at gmail.com Mon Aug 18 09:29:38 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 18 Aug 2008 08:29:38 -0500 Subject: [Bioperl-l] need help in parsing KEGG data In-Reply-To: References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> Message-ID: On Mon, Aug 18, 2008 at 8:26 AM, Mark Johnson wrote: > What exactly do you mean by "doesn't seem to be working" and what > version of BioPerl are you using? The code below seems to function as > expected with BioPerl 1.5.2, producing output like this: Note that I downloaded ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent and provided the path to it as the argument to the script. From mike.thon at gmail.com Mon Aug 18 15:48:02 2008 From: mike.thon at gmail.com (Michael Thon) Date: Mon, 18 Aug 2008 21:48:02 +0200 Subject: [Bioperl-l] Build.PL options? In-Reply-To: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com> References: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com> Message-ID: <5BE575B1-67E1-41B1-AB2F-728558377DDA@gmail.com> On Aug 18, 2008, at 9:37 AM, Dave Messina wrote: > > I'm curious -- it seems to me that a major purpose of Build.PL and > Makefile.PL is specifying dependencies (and installing them where > necessary). Does MacPorts override that and do its own dependency- > checking for Perl modules? > Hi Dave - Thanks for the links- I will check them out. MacPorts can resolve dependencies that are specified for each package, much like rpm and other linux packaging systems, so its probably better to disable the dependency resolution in the bioperl build script and let macports handle them. It looks like I can patch the build.PL script pretty easily, unless I can find a better way. Mike From johnsonm at gmail.com Mon Aug 18 16:53:48 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 18 Aug 2008 15:53:48 -0500 Subject: [Bioperl-l] Bio::Annotation issues with BioSQL Message-ID: I'm presently refactoring an in-house protein annotation pipeline and converting it to use BioSQL as a data store. I've noticed some slightly screwy behavior with regard to how some of the Bio::Annotation classes are handled: -Instances of Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue attached to the annotation collection for a sequence feature (Bio::SeqFeature::Generic) are converted to tags/values on the feature. -Instances of Bio::AnnotationDBLink with attached comments loose the comment. I'm storing and retrieving things thusly: my $dbadp = Bio::DB::BioDB->new( -database => 'biosql', -user => $user', -pass => $pass, -dbname => $ora_instance, -driver => 'Oracle' ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $seq = Bio::Seq->new( -id => 'DEBUG001', -accession_number => 'DBG001', -desc => 'Debug Sequence', -seq => 'GATTACA', -namespace => 'DEBUG', ); my $feature = Bio::SeqFeature::Generic->new( -seq_id => 'DEBUG001', -display_name => 'FEAT0001', -primary => 'debug', -source => 'test', -start => 3, -end => 5, -strand => 1, ); my $dblink = Bio::Annotation::DBLink->new( -database => 'FAKE001', -primary_id => 'FK1234567890'', -comment => 'This is a fake comment', ); $feature->annotation->add_Annotation('ANNO0001, $dblink); $seq->add_SeqFeature($feature); my $pseq = $dbadp->create_persistent($seq); $pseq->store(); $adp->commit(); my $dbadp = Bio::DB::BioDB->new( ... ); my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); my $query = Bio::DB::Query::BioQuery->new(); $query->datacollections([ "Bio::PrimarySeqI s", ]); $query->where(["s.display_id like DEBUG%'"]); my $result = $adp->find_by_query($query); while (my $seq = $result->next_object()) { my @features = $seq->get_SeqFeatures(); foreach my $feature (@features) { ## Contents of Bio::Annotation::SimpleValue and Bio::Annotation::StructeredValue have ## migrated to tag/value pairs on $feature and are missing from $annotation_collection. ## ## Comments have gone missing from Bio::Annotation::DBLink, but DBLinks are otherwise intact and present. my $annotation_collection = $feature->annotation(); ... ... } } Is bioperl-db / BioSQL trying to tell me that I shouldn't be using Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Is there even a place in the BioSQL schema for a comment to be attached to a DBLink? From neetisomaiya at gmail.com Tue Aug 19 00:31:28 2008 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 19 Aug 2008 10:01:28 +0530 Subject: [Bioperl-l] need help in parsing KEGG data In-Reply-To: References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com> Message-ID: <764978cf0808182131p620a2dedu40b651be50be5b3c@mail.gmail.com> Thanks a lot for the reply. It was a problem of the bioperl version. ~Neeti. On Mon, Aug 18, 2008 at 6:56 PM, Mark Johnson wrote: > On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya > wrote: > > > I am fetching data from the ent gene file of KEGG which is available here > : > > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent > > > > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch > > gene names and pathways in which they participate. I am getting the gene > > names fine. But this method > > > > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){ > > }" > > > > does'nt seem to be working. I am not able to get the data of the pathways > in > > which the gene is involved. > > > > Can someone please suggest how I can get the pathway data of genes from > the > > KEGG ent file?? > > What exactly do you mean by "doesn't seem to be working" and what > version of BioPerl are you using? The code below seems to function as > expected with BioPerl 1.5.2, producing output like this: > > hsa04612 Antigen processing and presentation > hsa01430 Cell Communication > hsa04020 Calcium signaling pathway > hsa04080 Neuroactive ligand-receptor interaction > hsa04540 Gap junction > ... > ... > ... > > #!/wherever/bin/perl > > use strict; > use warnings; > > use Bio::SeqIO; > > > my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]); > > while (my $seq = $seqio->next_seq()) { > > foreach my $pathway ($seq->annotation->get_Annotations('pathway')) { > > ## $pathway should be a Bio::Annotation::Comment > print $pathway->text(), "\n"; > > } > > } > -- -Neeti Even my blood says, B positive From wgallin at ualberta.ca Tue Aug 19 02:25:27 2008 From: wgallin at ualberta.ca (Warren Gallin) Date: Tue, 19 Aug 2008 00:25:27 -0600 Subject: [Bioperl-l] EUtilities help Message-ID: Hi, Is there a cogent document on using Bio::DB::EUtilities with Bioperl 1.5.2 around somewhere? We upgraded and now my scripts are broken when invoking it. Any pointers appreciated. Thanks, Warren Gallin From David.Messina at sbc.su.se Tue Aug 19 03:30:26 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 19 Aug 2008 09:30:26 +0200 Subject: [Bioperl-l] EUtilities help In-Reply-To: References: Message-ID: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com> Hi Warren, Are you upgrading to 1.5.2 or downgrading from bioperl-live? If the former, you might consider going all the way to bioperl-live, whose EUtilities support is improved significantly and documented extensively here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook If the latter, I don't believe there is, but Chris Fields will know for sure and will probably chime in. Dave To get bioperl-live as a tarball: http://www.bioperl.org/DIST/nightly_builds/ or via Subversion: http://www.bioperl.org/wiki/Using_Subversion From bix at sendu.me.uk Tue Aug 19 04:34:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Aug 2008 09:34:50 +0100 Subject: [Bioperl-l] Build.PL options? In-Reply-To: References: Message-ID: <48AA85AA.7010405@sendu.me.uk> Michael Thon wrote: > Hi all - I am trying to write a port of bioperl 1.5.2 to enable its > installation with the macports system (www.macports.org). I'm not too > familiar with the Build.PL build system - is there any way to disable > the dependency resolution that the build script does (i.e. without > patching the script)? How else will you be doing the dependency resolution? If your system just installs all dependencies itself beforehand, then dependency resolution won't be invoked in Build.PL :) Otherwise, don't run Build.PL at all (why do you need to?), or have your system answer default to all questions: echo | perl Build.PL From mike.thon at gmail.com Tue Aug 19 11:16:03 2008 From: mike.thon at gmail.com (Michael Thon) Date: Tue, 19 Aug 2008 17:16:03 +0200 Subject: [Bioperl-l] Build.PL options? In-Reply-To: <48AA85AA.7010405@sendu.me.uk> References: <48AA85AA.7010405@sendu.me.uk> Message-ID: > How else will you be doing the dependency resolution? If your system > just installs all dependencies itself beforehand, then dependency > resolution won't be invoked in Build.PL :) > Otherwise, don't run Build.PL at all (why do you need to?), Good point. Now, why didn't I think of that? :) Does Build.PL do anything other than copy Perl modules (and install dependencies)? > or have your system answer default to all questions: > echo | perl Build.PL > From downloadondemand at gmail.com Thu Aug 14 19:06:04 2008 From: downloadondemand at gmail.com (N) Date: Fri, 15 Aug 2008 02:06:04 +0300 Subject: [Bioperl-l] HOWTO:Graphics/BLAST output Message-ID: <923c9ce30808141606k61d9cc23nb18e55dec5112ac4@mail.gmail.com> Hello again! Followed HOWTO:Graphics and encountered problem. While parsing blast output i have clearly offending for me part of file. The problem is that in this hit there are two HSPs, but the second one is in "not right" strand orientation (Plus/Plus vs. Plus/Minus). How can i tell bioperl to use only HSPs oriented along with the best-scored HSP? Or better, althrough not related to this mailing list: How can i tell BLAST to put this second HSP to a separate hit? What am i doing/thinking wrong? Attached small png with problem. The second HSP is in white, but it is biologically without sense, isn't it? Thanks. BLASTN 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|145226209 (471 letters) Database: ../database/1000 24 sequences; 19,482 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gi|145226176 hypothetical_protein 599 e-174 gi|145226174 hypothetical_protein 30 0.007 gi|145226175 ATP-dependent_exoDNAse_(exonuclease_V)_alpha_subuni... 26 0.11 gi|145226190 hypothetical_protein 24 0.43 gi|145226195 protein_of_unknown_function_DUF1526 22 1.7 gi|145226193 hypothetical_protein 22 1.7 gi|145226187 aminoglycoside_nucleotidyltransferase 22 1.7 gi|145226181 diguanylate_cyclase 22 1.7 gi|145226179 hypothetical_protein 22 1.7 gi|145226178 hypothetical_protein 22 1.7 gi|145226177 putative_methyl-accepting_chemotaxis_sensory_transd... 22 1.7 >gi|145226176 hypothetical_protein Length = 477 Score = 599 bits (302), Expect = e-174 Identities = 428/470 (91%) Strand = Plus / Plus Query: 1 atgaatgcagacgtgtcagtcaaccagtggaatccgttagaggtagccgctgaggcgaca 60 |||||||| ||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct: 1 atgaatgcggacgtgtcagtcaaccagtggaatccgttagaggtggccgctgaggcgaca 60 Query: 61 atcgctgccgccacagccgcgctggtgtgggaaggcccagacagctacggggtgctggaa 120 || ||||||||||| ||||| |||||||||||||||||||||||||||||||| |||| Sbjct: 61 attgctgccgccacggccgcattggtgtgggaaggcccagacagctacggggtgttggag 120 Query: 121 cgggtcgccggggccacagcgaaaggcatagcaacagctcggatagccgccgaaatcatg 180 ||||| ||||||||||||||||||||||| ||||||||||||||| |||||||||||||| Sbjct: 121 cgggtagccggggccacagcgaaaggcatggcaacagctcggataaccgccgaaatcatg 180 Query: 181 gctgacgtcaccacctcagttcagttcactgcggccaccgaacatgcgcgcggcggcgct 240 |||||||||||||||||||||||||||||||||||| |||| ||||||||||||||||| Sbjct: 181 gctgacgtcaccacctcagttcagttcactgcggccgacgaagatgcgcgcggcggcgct 240 Query: 241 gtagcgggacttccggggtggctggcgccgcggtgggcggcgtccgtgcgtgccgcactg 300 |||||||| ||||||||||||||||||||||||||||||||||||||||| | ||||||| Sbjct: 241 gtagcggggcttccggggtggctggcgccgcggtgggcggcgtccgtgcgcggcgcactg 300 Query: 301 gacgaactcgaagccgccgggcggcccggctacgccatggtcaaggcgatcacctggcct 360 ||||| ||||||||||||||||| |||||| || |||| | || || || |||| Sbjct: 301 gacgagctcgaagccgccgggcgccccggcgacatcatgatgaaagcccggacacggccg 360 Query: 361 gccttgcgcagcgtcgcggggtggacccaagacgggccgctgcaaacatggcagacggct 420 || |||||||||| |||| ||||||||||||||| |||||||| ||||||||||||||| Sbjct: 361 gcactgcgcagcgtggcggtgtggacccaagacggaccgctgcagacatggcagacggct 420 Query: 421 ctaattgtgagcgaagcacggactgctctggctcaccgcgtaggcgtctg 470 || || ||| ||||||||||||||||||||||||||||||||||||||| Sbjct: 421 ctgatcgtggacgaagcacggactgctctggctcaccgcgtaggcgtctg 470 Score = 22.3 bits (11), Expect = 1.7 Identities = 11/11 (100%) Strand = Plus / Minus Query: 124 gtcgccggggc 134 ||||||||||| Sbjct: 333 gtcgccggggc 323 The rest of output truncated... -------------- next part -------------- A non-text attachment was scrubbed... Name: tst.png Type: image/png Size: 1668 bytes Desc: not available URL: From UKaraoz at lbl.gov Thu Aug 14 20:03:51 2008 From: UKaraoz at lbl.gov (Ulas Karaoz) Date: Thu, 14 Aug 2008 17:03:51 -0700 Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for blasttable Message-ID: Hi, I found out that the save_output routine in RemoteBlast.pm doesn't save the output properly into a file when the Blast output is a hittable(blasttable). Might the reason be the fact that the tabular output has a line that starts with a # while the parser is looking for a line starting with just BLASTN, as in the section pasted below: if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i || $l =~/^RPS-BLAST\s*.+$/i ) { $seentop=1; } From erant at psb.ugent.be Fri Aug 15 08:25:59 2008 From: erant at psb.ugent.be (Erick Antezana) Date: Fri, 15 Aug 2008 14:25:59 +0200 Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API? In-Reply-To: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net> References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com> <48A448DD.4000206@psb.ugent.be> <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net> Message-ID: <48A575D7.7010709@psb.ugent.be> Hi Hilmar, Hilmar Lapp wrote: > Hi Erick, > > how did you determine that go-perl is specific to GO? I've found it to > work quite well for any kind of OBO-formatted ontology. we have used go-perl during the gestation (2005) of the ontologies we built/handled/etc. In particular while dealing with GO (as it was originally conceived for that purpose). It is extremely useful indeed. However; as new extensions were needed by our team, new modules were developed on top of the existing ones and that code was actually inducing a new development branch which ended up in something called onto-perl. Onto-perl, as you might have realized, is strongly influenced by go-perl. On the other hand, we had had some communications with Chris Mungall at that time, and he manifested he might drop further development on go-perl. Nevertheless, some time later he told me he will continue, which is good since many systems are based on it. But, by then we already had a sort of independent package which has shown to be useful. > > Also, you note that BioPerl doesn't have the ability to write in > certain formats, and to intersect and "unify" (would you mind > explaining what you mean by that?) ontologies. While working with several OBO ontologies, we needed to have them (or part of them) merged, intersected, join (=get one ontology=unify). It can be of course a bit subjective..since you can "unify" ontologies based on different features/approaches/etc and while building application ontologies (such as CCO) you might be confronted to identify identical terms coming from different ontologies and get only one in your integrated resource.... > It seems that your implementation of RDF etc export isn't really > reusable or modular in any way, The exports (RDF, OWL, ...) are part of the Ontology module's functionality. have you had any particular problems while exporting an ontology? Please let us know so that it can be fixed or improved. On the other hand, that module is nowadays undergoing a deep improvement (not released yet) to accommodate a huge set of "ontologiz-ed" resources into an RDF repository. I would be also interested in discussing about it so that we could improve it. > but I'd love to bring the intersection function over to BioPerl excellent! I think this is an appropriate time to make the diverse developments converge so that the users community could have a standard set of tools. We are interested in following up these discussions. > (BTW when you decided to roll your own ontology API, did you get the > impression that BioPerl isn't receptive to you adding to it?). as I mentioned, we took originally go-perl since it offered much more functionalities that BioPerl::Ontology::* > Would you mind pointing me to the place in the code where I would find > that, as I can't seem to find it. I forwarded your request to the developer who might give you more details about it. > > -hilmar > cheers, Erick From bix at sendu.me.uk Tue Aug 19 11:56:32 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Aug 2008 16:56:32 +0100 Subject: [Bioperl-l] Build.PL options? In-Reply-To: References: <48AA85AA.7010405@sendu.me.uk> Message-ID: <48AAED30.1090103@sendu.me.uk> Michael Thon wrote: >> How else will you be doing the dependency resolution? If your system >> just installs all dependencies itself beforehand, then dependency >> resolution won't be invoked in Build.PL :) >> Otherwise, don't run Build.PL at all (why do you need to?), > > Good point. Now, why didn't I think of that? :) > Does Build.PL do anything other than copy Perl modules (and install > dependencies)? It generates the 'Build' script, which provides a whole host of functions: ./Build help The only one other than 'install' that might be relevant to you is ./Build test to run the test suite, if testing is part of you own system. There are other standard ways to run the tests though, that don't need you to create the Build script. Somewhere along the lines, probably during the install function, it also creates man files and other forms of documentation iirc, and installs those too. If you handle dependencies, installation and testing (or don't do testing) yourself, and don't care about man files (perldoc is good enough?) you can ignore Build.PL. From xxq.t.xu at gmail.com Tue Aug 19 12:57:41 2008 From: xxq.t.xu at gmail.com (XQ Xu) Date: Tue, 19 Aug 2008 09:57:41 -0700 Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer Message-ID: <3fde82050808190957y271aa52eh30e39a438cc8a8e3@mail.gmail.com> Hi all, I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use Primer3 to calculate Tm for some pre-defined primers; however there is no direct way to calculate Tm with Primer3. I have to call Primer3 and supply a pre-defined primer, a template, etc to let it run and hopefully Primer3 finds a pair of primers for me, then I have to open the output and find out what the Tm is for my pre-defined primer. Do I miss any function that can do this quickly for me? I know there's another module (Bio::SeqFeature::Primer) can do this quickly, but the Tm is calculated with different parameters; therefore it's not good to use it while I use Primer3 to design primers. Any input? Thanks! -Tony From cjfields at illinois.edu Tue Aug 19 13:39:03 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Aug 2008 12:39:03 -0500 Subject: [Bioperl-l] EUtilities help In-Reply-To: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com> References: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com> Message-ID: As mentioned in previous posts and in the original POD, the original Bio::DB::EUtilities was experimental (unstable API). It is deprecated in favor of the bioperl-live interface, which splits the user agent and parameter handling (Bio::DB::EUtilities, EUtilParameters) from the parsers (Bio::Tools::EUtilities). Unfortunately, the original design was too rushed so any XML changes broke the tools; I basically had to start from the ground up again. Any changes to eutil output should now be easier to deal with (famous last words). I am still planning on adding a few things to it (including tests for the parser and user agent) but it shouldn't change substantially from what is in the cookbook. chris On Aug 19, 2008, at 2:30 AM, Dave Messina wrote: > Hi Warren, > > Are you upgrading to 1.5.2 or downgrading from bioperl-live? > > If the former, you might consider going all the way to bioperl-live, > whose > EUtilities support is improved significantly and documented > extensively > here: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > If the latter, I don't believe there is, but Chris Fields will know > for sure > and will probably chime in. > > > Dave > > To get bioperl-live > as a tarball: http://www.bioperl.org/DIST/nightly_builds/ > or via Subversion: http://www.bioperl.org/wiki/Using_Subversion > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Tue Aug 19 14:00:42 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Aug 2008 13:00:42 -0500 Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for blasttable In-Reply-To: References: Message-ID: <4B8B416E-2DA7-4B1E-9DE2-E1301B467637@illinois.edu> Saving tabular BLAST to a file is working for me using bioperl-live. NCBI recently changed tabular BLAST output which broke parsing (there is an extra column now, can't remember what), but it is now fixed. chris On Aug 14, 2008, at 7:03 PM, Ulas Karaoz wrote: > Hi, > > I found out that the save_output routine in RemoteBlast.pm doesn't > save the output properly into a file when the Blast output is a > hittable(blasttable). Might the reason be the fact that the tabular > output has a line that starts with a # while the parser is looking > for a line starting with just BLASTN, as in the section pasted below: > > if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i || > $l =~/^RPS-BLAST\s*.+$/i ) { > $seentop=1; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hlapp at gmx.net Tue Aug 19 13:56:42 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Aug 2008 13:56:42 -0400 Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > I'm presently refactoring an in-house protein annotation pipeline > and converting it to use BioSQL as a data store. I've noticed some > slightly screwy behavior with regard to how some of the > Bio::Annotation classes are handled: > > -Instances of Bio::Annotation::SimpleValue and > Bio::Annotation::StructuredValue attached to the annotation collection > for a sequence feature (Bio::SeqFeature::Generic) are converted to > tags/values on the feature. > > -Instances of Bio::Annotation::DBLink with attached comments loose > the comment. > [...] > $query->where(["s.display_id like DEBUG%'"]); There's a single quote missing here, but I'm assuming that's a result of copy/paste editing? > [...] > Is bioperl-db / BioSQL trying to tell me that I shouldn't be using > Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue? Your example code doesn't contain an example for where you are getting the B::A::StructuredValue object from. If you didn't create that yourself, it would be good to know what you did to end up with that. Chris Fields has written B::A::Tagtree which would be way forward, and if you created the object yourself, can you take a look at that and see whether that class wouldn't serve your purpose as well or even better? In order to be stored in BioSQL structured (hierarchical, nested) annotation is flattened into a string representation, because BioSQL can't store nested annotation collections natively. Right now if I am not mistaken upon retrieval this is not converted back into a B::A::Tagtree object but rather left flat. This is being worked on though, we've just discussed some issues connected with that. I could make B::A::StructuredValue work the same way, but I'm not sure what it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the hood, which is much cleaner, and more extensible in the future. As for SimpleValue annotation versus tag/value annotation for seqfeatures, yes right now these are treated interchangeably for the purposes of BioSQL and Bioperl-db. You can do this easily too on your end by using Bio::SeqFeature::AnnotationAdaptor. > Is there even a place in the BioSQL schema for a comment to be > attached > to a DBLink? No there isn't. I thought it is but it turns out that this isn't yet one of the desirable extensions to BioSQL from 1.1.x onwards, as documented on the wiki: http://www.biosql.org/wiki/Enhancement_Requests I'll add it (but feel free to do so yourself, especially if you have other enhancmenets). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From johnsonm at gmail.com Wed Aug 20 14:43:25 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 20 Aug 2008 13:43:25 -0500 Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp wrote: > On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote: > There's a single quote missing here, but I'm assuming that's a result of > copy/paste editing? Yes, I was a bit sloppy with the example. > Your example code doesn't contain an example for where you are getting the > B::A::StructuredValue object from. If you didn't create that yourself, it > would be good to know what you did to end up with that. Chris Fields has > written B::A::Tagtree which would be way forward, and if you created the > object yourself, can you take a look at that and see whether that class > wouldn't serve your purpose as well or even better? I created the B::A::StructuredValue myself. I'm using it to store the output from PSORTb, which gives a cellular localization and a score for a protein sequence (gene), which I'm trying to keep paired together, if possible. I'll take a look at B::A::Tagtree, that's probably a better fit. > In order to be stored in BioSQL structured (hierarchical, nested) annotation > is flattened into a string representation, because BioSQL can't store nested > annotation collections natively. Right now if I am not mistaken upon > retrieval this is not converted back into a B::A::Tagtree object but rather > left flat. This is being worked on though, we've just discussed some issues > connected with that. The data I have isn't really deeply nested. I just like to keep related annotation in one object, if possible. > I could make B::A::StructuredValue work the same way, but I'm not sure what > it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the > hood, which is much cleaner, and more extensible in the future. Perhaps B::A::StructuredValue should be deprecated? > As for SimpleValue annotation versus tag/value annotation for seqfeatures, > yes right now these are treated interchangeably for the purposes of BioSQL > and Bioperl-db. You can do this easily too on your end by using > Bio::SeqFeature::AnnotationAdaptor. I'll check out the AnnotationAdaptor, but I'll probably just end using seqfeature tags/values. They're functionally equivalent to B::A::SimpleValue. >> Is there even a place in the BioSQL schema for a comment to be attached >> to a DBLink? > > No there isn't. I thought it is but it turns out that this isn't yet one of > the desirable extensions to BioSQL from 1.1.x onwards, as documented on the > wiki: > > http://www.biosql.org/wiki/Enhancement_Requests > > I'll add it (but feel free to do so yourself, especially if you have other > enhancmenets). I'll take a look at the wiki....I'll file that as a feature request if I get there before you do it. From cjfields at illinois.edu Wed Aug 20 16:25:55 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 20 Aug 2008 15:25:55 -0500 Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL In-Reply-To: References: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net> Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu> On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote: > ... > >> I could make B::A::StructuredValue work the same way, but I'm not >> sure what >> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag >> under the >> hood, which is much cleaner, and more extensible in the future. > > Perhaps B::A::StructuredValue should be deprecated? Probably. The only place it was used in core was SeqIO::swiss (and now that uses Tagtree in bioperl-live). Let me know if you have any problems with Bio::Annotation::Tagtree. I am planning on doing some more work with it soon. chris From cjfields at illinois.edu Thu Aug 21 10:26:22 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Aug 2008 09:26:22 -0500 Subject: [Bioperl-l] Annotations issue (GenBank) Message-ID: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu> I'm working on a GenBank patch and noticed a few cases where annotations are being stored as all uppercase strings (CONTIG, WGS, etc). I'm planning on converting these to lowercase (e.g. 'wgs', 'contig') for consistency with other annotation tag values. I'm making sure output is consistent as well. These are used fairly infrequently so I don't think it should cause problems, but just in case, does anyone have a problem with this change? chris Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From awitney at sgul.ac.uk Thu Aug 21 10:41:10 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 21 Aug 2008 15:41:10 +0100 Subject: [Bioperl-l] adding HSP information to BLAST output graphic (Bio::Graphics) Message-ID: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk> Hi, I am going through the Bio::Graphics HOWTO on the wiki. Looking at render_blast4.pl, the description text describes the whole hit and is set for the whole track, but i would like to be able to add HSP information such as the identity matches onto the picture, this is stored in the $hsp object. How would i go about adding that to the picture? the relevant piece of code is: my $track = $panel->add_track( -glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => 'blue', -font2color => 'red', -sort_order => 'high_score', -description => sub { my $feature = shift; return unless $feature- >has_tag('description'); my ($description) = $feature- >each_tag_value('description'); my $score = $feature->score; "$description, score=$score"; # "score=$score"; }, ); next unless $hit->significance < 1E-20; my $feature = Bio::SeqFeature::Generic->new( -score => $hit->raw_score, -display_name => $hit->name, -tag => { description => $hit->description }, ); while( my $hsp = $hit->next_hsp ) { $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $track->add_feature($feature); thanks for any help adam From cjfields at illinois.edu Thu Aug 21 12:01:11 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Aug 2008 11:01:11 -0500 Subject: [Bioperl-l] Annotations issue (GenBank) In-Reply-To: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu> References: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu> Message-ID: <0ACE20F0-43E5-4E12-9437-094871766083@illinois.edu> I went ahead and committed this; if there are any disagreements about it I can back it out or modify as needed. chris On Aug 21, 2008, at 9:26 AM, Chris Fields wrote: > I'm working on a GenBank patch and noticed a few cases where > annotations are being stored as all uppercase strings (CONTIG, WGS, > etc). I'm planning on converting these to lowercase (e.g. 'wgs', > 'contig') for consistency with other annotation tag values. I'm > making sure output is consistent as well. > > These are used fairly infrequently so I don't think it should cause > problems, but just in case, does anyone have a problem with this > change? > > chris From mshafiullah at mail.unomaha.edu Thu Aug 21 15:35:54 2008 From: mshafiullah at mail.unomaha.edu (Mohammad Shafiullah) Date: Thu, 21 Aug 2008 14:35:54 -0500 Subject: [Bioperl-l] bioperl-network test error Message-ID: To whom it may concern: Encountered the following error while running ./Build test on bioperl-network-1.5.2_100 Can't stat scripts: No such file or directory at /usr/share/perl5/Module/Build/Base.pm line 3836 t/Edge...........ok t/Graph-MD5......1/19 Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31. t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900) All 19 subtests passed t/Graph-Seq......1/16 Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31. t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00) All 16 subtests passed t/IO_dip_tab.....ok t/IO_psi.........ok t/Interaction....ok t/Node...........ok t/ProteinNet.....ok Test Summary Report ------------------- t/Graph-MD5 (Wstat: 2304 Tests: 19 Failed: 0) Non-zero exit status: 9 t/Graph-Seq (Wstat: 65280 Tests: 16 Failed: 0) Non-zero exit status: 255 Files=8, Tests=292, 2 wallclock secs ( 0.00 usr 0.02 sys + 1.33 cusr 0.37 csys = 1.72 CPU) Result: FAIL Failed 2/8 test programs. 0/292 subtests failed. Please advise on the issue. Sincerely, - Mohammad From bosborne11 at verizon.net Thu Aug 21 16:43:35 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 21 Aug 2008 16:43:35 -0400 Subject: [Bioperl-l] bioperl-network test error In-Reply-To: References: Message-ID: <83EB2D5B-37F2-4163-8A14-F93ECE740197@verizon.net> Mohammed, Try replacing the ModuleBuildBioperl.pm file that you have with the one that's attached. Then do this inside the bioperl-network-1.5.2_100 directory: ./Build clean perl Build.PL ./Build test What do you see? Also, what's your version of Perl? Brian O. -------------- next part -------------- A non-text attachment was scrubbed... Name: ModuleBuildBioperl.pm Type: text/x-perl-script Size: 40360 bytes Desc: not available URL: -------------- next part -------------- : On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote: > To whom it may concern: > > Encountered the following error while running ./Build test on > bioperl-network-1.5.2_100 > > Can't stat scripts: No such file or directory > at /usr/share/perl5/Module/Build/Base.pm line 3836 > t/Edge...........ok > t/Graph-MD5......1/19 Not an ARRAY reference at > /usr/share/perl5/Heap/Elem.pm line 31. > t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900) > All 19 subtests passed > t/Graph-Seq......1/16 Not an ARRAY reference at > /usr/share/perl5/Heap/Elem.pm line 31. > t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00) > All 16 subtests passed > t/IO_dip_tab.....ok > t/IO_psi.........ok > t/Interaction....ok > t/Node...........ok > t/ProteinNet.....ok > > Test Summary Report > ------------------- > t/Graph-MD5 (Wstat: 2304 Tests: 19 Failed: 0) > Non-zero exit status: 9 > t/Graph-Seq (Wstat: 65280 Tests: 16 Failed: 0) > Non-zero exit status: 255 > Files=8, Tests=292, 2 wallclock secs ( 0.00 usr 0.02 sys + 1.33 > cusr > 0.37 csys = 1.72 CPU) > Result: FAIL > Failed 2/8 test programs. 0/292 subtests failed. > > Please advise on the issue. > > Sincerely, > > - Mohammad > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Aug 21 17:57:17 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 21 Aug 2008 17:57:17 -0400 Subject: [Bioperl-l] bioperl-network test error In-Reply-To: References: Message-ID: Mohammad, Take a look at this: http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2007-06/msg00365.html It discusses that Heap::Elem error when using Graph. Brian O. On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote: > To whom it may concern: > > Encountered the following error while running ./Build test on > bioperl-network-1.5.2_100 > > Can't stat scripts: No such file or directory > at /usr/share/perl5/Module/Build/Base.pm line 3836 > t/Edge...........ok > t/Graph-MD5......1/19 Not an ARRAY reference at > /usr/share/perl5/Heap/Elem.pm line 31. > t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900) > All 19 subtests passed > t/Graph-Seq......1/16 Not an ARRAY reference at > /usr/share/perl5/Heap/Elem.pm line 31. > t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00) > All 16 subtests passed > t/IO_dip_tab.....ok > t/IO_psi.........ok > t/Interaction....ok > t/Node...........ok > t/ProteinNet.....ok > > Test Summary Report > ------------------- > t/Graph-MD5 (Wstat: 2304 Tests: 19 Failed: 0) > Non-zero exit status: 9 > t/Graph-Seq (Wstat: 65280 Tests: 16 Failed: 0) > Non-zero exit status: 255 > Files=8, Tests=292, 2 wallclock secs ( 0.00 usr 0.02 sys + 1.33 > cusr > 0.37 csys = 1.72 CPU) > Result: FAIL > Failed 2/8 test programs. 0/292 subtests failed. > > Please advise on the issue. > > Sincerely, > > - Mohammad > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dominic at bioinf.uni-leipzig.de Fri Aug 22 09:35:21 2008 From: dominic at bioinf.uni-leipzig.de (Dominic Rose) Date: Fri, 22 Aug 2008 15:35:21 +0200 Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm Message-ID: <48AEC099.20106@bioinf.uni-leipzig.de> Hi, just a short suggestion to improve the code: in function sub _build_nt_matrix() one finds the following lines: my $ti_index = $NucleotideIndexes{$ti}; my $tj_index = $NucleotideIndexes{$tj}; if( ! defined $ti_index ) { print "ti_index not defined for $ti\n"; next; } However, it should be possible to stop/silence the printing of that error message. Many alignments contain N's what causes many many "ti_index not defined for N" messages. That should be avoidable. Thanks, Dominic -- Dominic Rose Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig WWW http://www.bioinf.uni-leipzig.de Phone: +49 341 97-16698 Fax: +49 341 97-16679 From heikki at sanbi.ac.za Wed Aug 27 02:23:39 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 27 Aug 2008 08:23:39 +0200 Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm In-Reply-To: <48AEC099.20106@bioinf.uni-leipzig.de> References: <48AEC099.20106@bioinf.uni-leipzig.de> Message-ID: <200808270823.39852.heikki@sanbi.ac.za> Dominic, You are absolutely right. I've changed 'print' into '$self->warn' in the SVN. Now it is possible to set $object->verbose(-1) to silence the warning or, if deemed necessary, set $object->verbose(2) and catch the error with an eval statement. Thanks for reporting this, -Heikki On Friday 22 August 2008 15:35:21 Dominic Rose wrote: > Hi, > > just a short suggestion to improve the code: > > in function > > sub _build_nt_matrix() > > one finds the following lines: > > my $ti_index = $NucleotideIndexes{$ti}; > my $tj_index = $NucleotideIndexes{$tj}; > > if( ! defined $ti_index ) { > print "ti_index not defined for $ti\n"; > next; > } > > However, it should be possible to stop/silence the printing of that > error message. Many alignments contain N's what causes many many > "ti_index not defined for N" messages. That should be avoidable. > > Thanks, > Dominic -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From giles.weaver at googlemail.com Wed Aug 27 06:39:19 2008 From: giles.weaver at googlemail.com (Giles Weaver) Date: Wed, 27 Aug 2008 11:39:19 +0100 Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer In-Reply-To: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com> References: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com> Message-ID: <1d06cd5d0808270339p4432f91bo72f48364696caf5a@mail.gmail.com> Hi Tony, It isn't well documented, but Primer3 includes a program called oligotm, which is used to calculate the Tm of short sequences (up to 32bp). You can run it directly by typing something like "oligotm ACGTACGTACGTACGT" in the terminal. Just typing oligotm will give you the options. If you are using Linux, these snippets of code may help you call oligotm from within a perl script: use IPC::Open3; sub _run_oligotm { my ($class, $sequence) = @_; my $run = "oligotm -tp 1 -sc 1 $sequence"; my $pid = open3(\*WTRFH, \*RDRFH, \*ERRFH, $run); close (WTRFH); my ($tm, $errors); while () { $tm .= $_;} while () { $errors .= $_;} chomp $tm; return ($tm, $errors); } You'll need to put this in a package or edit out the $class bit for it to work. This is my first post to this list. I'm receiving the digest so replying to posts is a bit of a faff. Can anyone recommend a better way of replying to posts than replying to the digest, editing it and pasting the subject into the subject field? Giles Weaver Unilever R&D > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 19 Aug 2008 09:57:41 -0700 > From: "XQ Xu" > Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined > primer > To: bioperl-l at lists.open-bio.org > Message-ID: > <3fde82050808190957y271aa52eh30e39a438cc8a8e3 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all, > I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use > Primer3 to calculate Tm for some pre-defined primers; however there is no > direct way to calculate Tm with Primer3. I have to call Primer3 and supply > a > pre-defined primer, a template, etc to let it run and hopefully Primer3 > finds a pair of primers for me, then I have to open the output and find out > what the Tm is for my pre-defined primer. Do I miss any function that can > do > this quickly for me? > I know there's another module (Bio::SeqFeature::Primer) can do this > quickly, but the Tm is calculated with different parameters; therefore it's > not good to use it while I use Primer3 to design primers. > Any input? > Thanks! > -Tony > From awitney at sgul.ac.uk Wed Aug 27 12:21:12 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Aug 2008 17:21:12 +0100 Subject: [Bioperl-l] adding HSP information to BLAST output graphic (Bio::Graphics) In-Reply-To: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk> References: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk> Message-ID: <8939216F-7AFC-40BB-A416-8508E3E5D871@sgul.ac.uk> after some digging around myself, this seems to do the trick, although i don't know if it will always work, as i am having to add a separate sort function. On 21 Aug 2008, at 15:41, Adam Witney wrote: > > Hi, > > I am going through the Bio::Graphics HOWTO on the wiki. > > Looking at render_blast4.pl, the description text describes the > whole hit and is set for the whole track, but i would like to be > able to add HSP information such as the identity matches onto the > picture, this is stored in the $hsp object. How would i go about > adding that to the picture? > > the relevant piece of code is: > > my $track = $panel->add_track( > -glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -bgcolor => 'blue', > -font2color => 'red', > -sort_order => 'high_score', -part_labels => sub { my ($feature,undef,$partno) = @_; my @features = sort_features($feature->get_SeqFeatures()); return $features[$partno]- >num_identical.'/'.$features[$partno]->length.' (score='. $features[$partno]->score.')' if $features[$partno]; }, > -description => sub { > my $feature = shift; > return unless $feature- > >has_tag('description'); > my ($description) = $feature- > >each_tag_value('description'); > my $score = $feature->score; > "$description, score=$score"; > # "score=$score"; > }, > ); sub sort_features { my @array = @_; if(@array < 2){return @array} my @sorted = sort {$a->start <=> $b->start} @array; return @sorted; } is this the best way to achieve this? thanks adam From mauricio at open-bio.org Thu Aug 28 13:43:04 2008 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 28 Aug 2008 12:43:04 -0500 Subject: [Bioperl-l] Pdoc updates Message-ID: <48B6E3A8.3050305@open-bio.org> For those who use the online Pdoc documentation (doc.bioperl.org), this is to let you know that the CvsWeb links at the top of any module page now link to the proper place in the SVN web interface. Cheers, Mauricio. From jaudall at gmail.com Fri Aug 29 02:46:20 2008 From: jaudall at gmail.com (Joshua Udall) Date: Thu, 28 Aug 2008 23:46:20 -0700 Subject: [Bioperl-l] DB_File and assembly IO Message-ID: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com> Bioperl - I'm trying to read/parse a single cap3 ace file with several thousand contigs. I get a DB_File error at Contig247. Here's the error: ------------- EXCEPTION ------------- MSG: Unable to tie DB_File handle STACK Bio::SeqFeature::Collection::new /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195 STACK Bio::Assembly::Contig::new /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256 STACK Bio::Assembly::IO::ace::next_assembly /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148 STACK toplevel /Users/jaudall/bin/read_ace.pl:214 ------------------------------------- Looking at the Collection::new, the error is on the middle line: $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File', $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE; # or die "Cannot open file: $!\n" ; $self->{'_btree'} || $self->throw("Unable to tie DB_File handle"); return $self; If I uncomment out the $! die statement that I inserted, I get this: 'Cannot open file tree: Too many open files' Apparently the Collection constructor is creating a new index file for each one and the handles for each are sticking around? That confuses me because reading more about the Collection.pm and DB_File, it appeared to me that no files were written by default (as I'm doing), rather the Collection objects are all stored in memory. I'm pretty sure the error is not a permission error, and if it is not the open file-handles, what else should I look for? If I 'warn' the error instead of throwing it, I get: Can't call method "get_dup" on an undefined value at /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360 This kind of makes sense because the index appears not be be created and it can't look stuff up in an undefined tied hash. I'm stuck. Thanks for any help and suggestions. OSX, perl 5.8.8, bioperl-live (svn last week) -- Joshua Udall Assistant Professor 295 WIDB Plant and Wildlife Science Dept. Brigham Young University Provo, UT 84602 801-422-9307 Fax: 801-422-0008 USA From florent.angly at gmail.com Fri Aug 29 04:40:25 2008 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 29 Aug 2008 18:40:25 +1000 Subject: [Bioperl-l] DB_File and assembly IO In-Reply-To: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com> References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com> Message-ID: <48B7B5F9.1050608@gmail.com> Hi Joshua, I don't know the specifics of DB_File, but the 'Cannot open file tree: Too many open files' is pretty explicit. If you're on Unix/Linux you can check the files that are open by your program by typing: lsof | grep name_of_program There is probably a filehandle that in not closed somewhere in your code or the BioPerl code. Best, Florent Joshua Udall wrote: > Bioperl - > > I'm trying to read/parse a single cap3 ace file with several thousand > contigs. I get a DB_File error at Contig247. Here's the error: > > ------------- EXCEPTION ------------- > MSG: Unable to tie DB_File handle > STACK Bio::SeqFeature::Collection::new > /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195 > STACK Bio::Assembly::Contig::new > /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256 > STACK Bio::Assembly::IO::ace::next_assembly > /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148 > STACK toplevel /Users/jaudall/bin/read_ace.pl:214 > ------------------------------------- > > Looking at the Collection::new, the error is on the middle line: > > $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File', > $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE; # or die "Cannot open > file: $!\n" ; > $self->{'_btree'} || $self->throw("Unable to tie DB_File handle"); > return $self; > > If I uncomment out the $! die statement that I inserted, I get this: > > 'Cannot open file tree: Too many open files' > > Apparently the Collection constructor is creating a new index file for each > one and the handles for each are sticking around? That confuses me because > reading more about the Collection.pm and DB_File, it appeared to me that no > files were written by default (as I'm doing), rather the Collection objects > are all stored in memory. I'm pretty sure the error is not a permission > error, and if it is not the open file-handles, what else should I look for? > > > If I 'warn' the error instead of throwing it, I get: > > Can't call method "get_dup" on an undefined value at > /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360 > > This kind of makes sense because the index appears not be be created and it > can't look stuff up in an undefined tied hash. I'm stuck. > > Thanks for any help and suggestions. > > OSX, perl 5.8.8, bioperl-live (svn last week) > > From cjfields at illinois.edu Fri Aug 29 10:30:49 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Aug 2008 09:30:49 -0500 Subject: [Bioperl-l] DB_File and assembly IO In-Reply-To: <48B7B5F9.1050608@gmail.com> References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com> <48B7B5F9.1050608@gmail.com> Message-ID: <5717CE96-EC24-46D9-A922-88702B1647A1@illinois.edu> This is a known problem with Bio::Assembly and stems from having a DB_File tied (opened) for each Bio::Assembly::Contig (via a retained Bio::SeqFeature::Collection). You can extend the number of open filehandles on UNIX'y flavors using ulimit (see following link), but I'm not sure about Win32. http://bugzilla.open-bio.org/show_bug.cgi?id=2320 The general bug is reproducible using the following simple script. If needed adjust the range end in the for loop to exceed the ulimit (via 'ulimit -n); Mac OS X 10.5 is set to 2560. --------------------------- use Bio::Assembly::Contig; my @contigs; push @contigs, Bio::Assembly::Contig->new() for (1..10000); --------------------------- I'll open a bug report on this for tracking (for release 1.7, along with any other Bio::Assembly issues). That doesn't mean it won't get fixed sooner, just that we aren't under pressure with the next release, which already has a full plate. IMO, I don't think there needs to be one SF::Collection per contig; one instance should work do for the entire assembly, using the same SF::Collection passed in to each contig and distinguishing the contig using the SeqFeature seq_id. It would also be nice if we could change that to also allow other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the like, for instance). chris On Aug 29, 2008, at 3:40 AM, Florent Angly wrote: > Hi Joshua, > > I don't know the specifics of DB_File, but the 'Cannot open file > tree: Too many open files' is pretty explicit. > If you're on Unix/Linux you can check the files that are open by > your program by typing: > lsof | grep name_of_program > There is probably a filehandle that in not closed somewhere in your > code or the BioPerl code. > Best, > > Florent > > > > Joshua Udall wrote: >> Bioperl - >> >> I'm trying to read/parse a single cap3 ace file with several thousand >> contigs. I get a DB_File error at Contig247. Here's the error: >> >> ------------- EXCEPTION ------------- >> MSG: Unable to tie DB_File handle >> STACK Bio::SeqFeature::Collection::new >> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195 >> STACK Bio::Assembly::Contig::new >> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256 >> STACK Bio::Assembly::IO::ace::next_assembly >> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148 >> STACK toplevel /Users/jaudall/bin/read_ace.pl:214 >> ------------------------------------- >> >> Looking at the Collection::new, the error is on the middle line: >> >> $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File', >> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE; # or die >> "Cannot open >> file: $!\n" ; >> $self->{'_btree'} || $self->throw("Unable to tie DB_File handle"); >> return $self; >> >> If I uncomment out the $! die statement that I inserted, I get this: >> >> 'Cannot open file tree: Too many open files' >> >> Apparently the Collection constructor is creating a new index file >> for each >> one and the handles for each are sticking around? That confuses me >> because >> reading more about the Collection.pm and DB_File, it appeared to me >> that no >> files were written by default (as I'm doing), rather the Collection >> objects >> are all stored in memory. I'm pretty sure the error is not a >> permission >> error, and if it is not the open file-handles, what else should I >> look for? >> >> >> If I 'warn' the error instead of throwing it, I get: >> >> Can't call method "get_dup" on an undefined value at >> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm >> line 360 >> >> This kind of makes sense because the index appears not be be >> created and it >> can't look stuff up in an undefined tied hash. I'm stuck. >> >> Thanks for any help and suggestions. >> >> OSX, perl 5.8.8, bioperl-live (svn last week) >> From milan.gilic at st.t-com.hr Sat Aug 23 17:15:39 2008 From: milan.gilic at st.t-com.hr (Milan) Date: Sat, 23 Aug 2008 23:15:39 +0200 Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: <48B07DFB.3040201@st.t-com.hr> From dominic at bioinf.uni-leipzig.de Mon Aug 25 06:40:42 2008 From: dominic at bioinf.uni-leipzig.de (Dominic Rose) Date: Mon, 25 Aug 2008 12:40:42 +0200 Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm Message-ID: <48B28C2A.8050803@bioinf.uni-leipzig.de> Hi, just a short suggestion to improve the code: in function sub _build_nt_matrix() one finds the following lines: my $ti_index = $NucleotideIndexes{$ti}; my $tj_index = $NucleotideIndexes{$tj}; if( ! defined $ti_index ) { print "ti_index not defined for $ti\n"; next; } However, it should be possible to avoid the printing of that error message. Many alignments contain N's what causes many many "ti_index not defined for N" messages. It should be possible to switch that message off. Thanks, Dominic -- Dominic Rose Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig WWW http://www.bioinf.uni-leipzig.de Phone: +49 341 97-16698 Fax: +49 341 97-16679