From mmokrejs at ribosome.natur.cuni.cz Fri Sep 1 03:24:06 2006 From: mmokrejs at ribosome.natur.cuni.cz (=?windows-1252?Q?Martin_MOKREJ=8A?=) Date: Fri, 01 Sep 2006 09:24:06 +0200 Subject: [Bioperl-l] Memory requirements for conversion from embl to genbank In-Reply-To: References: <44F6D9CA.3000005@ribosome.natur.cuni.cz> <44F6EEE3.1020903@sendu.me.uk> <44F6FB22.4070308@ribosome.natur.cuni.cz> <44F70555.90809@sendu.me.uk> <44F75CE5.4050401@ribosome.natur.cuni.cz> Message-ID: <44F7E016.6060407@ribosome.natur.cuni.cz> This problem did not appear with 1.5.1. I will contact the people for sure next week. M. Chris Fields wrote: > Yet another problem?!? Martin, I'll ask this again, mainly out of > curiosity: have you ever contacted the people who generated this to let > them know of the problems? > > This one is definitely not valid: can't have a lineage w/o an organism! > > Chris > > On Aug 31, 2006, at 5:04 PM, Martin MOKREJ? wrote: > >> Sendu, >> one more problem with the taxonomic code: >> >> ID 5MLE000012 standard; mRNA; VRL; 421 BP. >> XX >> AC BB136482; >> XX >> DT 26-JUL-2001 (Rel. 15, Created) >> DT 26-JUL-2001 (Rel. 15, Last updated, Version 1) >> XX >> DE 5'UTR in Murine leukemia virus Mo Ampho MCF recombinant gPr80 >> envelope >> DE polyprotein (env) gene, complete cds. >> XX >> DR EMBL; U36991; >> DR UTR; CC147674; >> XX >> OC Viruses; Retro-transcribing viruses; Retroviridae; >> Orthoretrovirinae; >> OC Gammaretrovirus. >> XX >> UT 5'UTR; >> XX >> FH Key Location/Qualifiers >> FH >> FT 5'UTR 1..421 >> FT /source="EMBL::U36991:1..421" >> FT /gene="env" >> FT /product="gPr80 envelope polyprotein" >> FT VECTOR 1..132 >> FT /source="EMBL::U36991:1..132" >> FT /evidence="Similarity" >> FT /db_xref="EMBL:" >> FT /note="Possible vector contamination" >> FT /note="Length=133 BP. Identities=99.2%" >> XX >> SQ Sequence 421 BP; 88 A; 142 C; 118 G; 73 T; 0 other; >> acttgtggtc tcgctgttcc ttgggagggt ctcctctgag tgattgacta >> ccgtcagcgg 60 >> gggtctttca tttgggggct cgtccgggat cgggagaccc ctgcccaggg >> accaccgacc 120 >> caccaccggg agctcactta caggcccttc aagcagtaca acgagaggtc >> tggaagccac 180 >> tggctgcggc ctatcaggac cagcaagacc agccagtgat accacacccc >> ttccgtgtcg 240 >> gcgacaccgt gtgggtacgc cggcaccaga ctaagaactt ggaacctcgt >> tggaaaggac 300 >> cctataccgt cctgctgacc acccccaccg ctctcaaagt agacggcatc >> gctgcgtgga 360 >> tccacgccgc tcacgtaaag gcggcgacaa cccctccggc cggaacagca >> tcaggaccga 420 >> >> c >> 421 >> // >> >> >> >> ------------- EXCEPTION ------------- >> MSG: Must supply a Bio::Taxon >> STACK Bio::DB::Taxonomy::list::ancestor /usr/lib/perl5/site_perl/ >> 5.8.8/Bio/DB/Taxonomy/list.pm:332 >> STACK Bio::Taxon::ancestor /usr/lib/perl5/site_perl/5.8.8/Bio/ >> Taxon.pm:476 >> STACK Bio::Tree::TreeFunctionsI::get_lineage_nodes /usr/lib/perl5/ >> site_perl/5.8.8/Bio/Tree/TreeFunctionsI.pm:197 >> STACK Bio::Tree::Tree::new /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/ >> Tree.pm:112 >> STACK Bio::Species::classification /usr/lib/perl5/site_perl/5.8.8/ >> Bio/Species.pm:182 >> STACK Bio::SeqIO::embl::_read_EMBL_Species /usr/lib/perl5/site_perl/ >> 5.8.8/Bio/SeqIO/embl.pm:1094 >> STACK Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ >> SeqIO/embl.pm:330 >> STACK toplevel testparsing.pl:22 >> >> -------------------------------------- >> >> I guess the 'OS' line missing caused that. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- Dr. Martin Mokrejs Faculty of Science, Charles University Vinicna 5, 128 43 Prague, Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs From daniel.lang at biologie.uni-freiburg.de Fri Sep 1 06:11:58 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri, 01 Sep 2006 12:11:58 +0200 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails Message-ID: <44F8076E.10102@biologie.uni-freiburg.de> Hi, when using Bio::Registry (bioperl-live) to fetch uniprot entries from local indexed uniprot *.dats, I had to realize that several entries could not be retrieved despite the fact that they are present in the files! A closer look reveals that they are of status PRELIMINARY: uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. I don't "grep" PRELIMINARY anywhere in my cvs checkout.. I also can't retrieve the sequences from the online database defined as follows: [swissprot_ebi] protocol=biofetch location=http://www.ebi.ac.uk/cgi-bin/dbfetch dbname=swall Is this a bug or a feature? If its a feature, how can I bypass it? Thanks in advance, Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Schaenzlestr. 1, D-79104 Freiburg fax: +49 761 203 6945 phone: +49 761 203 6974 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang at biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# From sdavis2 at mail.nih.gov Fri Sep 1 07:53:14 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 1 Sep 2006 07:53:14 -0400 Subject: [Bioperl-l] UCSC database backend In-Reply-To: <44F7768F.1060002@bcm.tmc.edu> References: <44F7768F.1060002@bcm.tmc.edu> Message-ID: <200609010753.14323.sdavis2@mail.nih.gov> On Thursday 31 August 2006 19:53, Caleb Davis wrote: > Hi folks, first time caller here. Love the show! > > I just started going through the archive and saw this thread. I vote in > favor of this interface, for what it's worth. What about doing it this > way?: > > $objSeqIO = Bio::SeqIO->new(-file => '~/seq/myseqCustomTrack.bed', > -format => 'bed', > -assembly => 'hg18', > -track => 'hg18_myfavgenes'); #see example Hi, Caleb. Welcome to the list. What you are proposing seems to be two separate but related tasks. First, parse bed-format files into bioperl-compatible sequence objects. Second, once those are in, pull sequence if desired from UCSC. For the first, you could certainly write a parser for bed format that would give back sequence objects. You might also want to look at the GFF format, as there are quite a few tools for GFF parsing, formatting, and sequence retrieval from local databases. For the second task, if what you are after is a straightforward way of retrieving arbitrary sequences bases on location, then you might want to look at the DAS service set up by ucsc. Doing what you propose would be as simple as reading in a format your choice and then constructing a url like: http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr1:1,5000;segment=chr10:52000,53000 Which will return an xml-format file containing two sequences. As you can see, the construction of the URL is trivial. See here for more information. http://genome.ucsc.edu/FAQ/FAQdownloads#download23 Sean From bix at sendu.me.uk Fri Sep 1 08:25:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Sep 2006 13:25:57 +0100 Subject: [Bioperl-l] SiteMatrix changes In-Reply-To: <44F73B8C.3040406@sendu.me.uk> References: <45036910@webmail.utk.edu> <44F73B8C.3040406@sendu.me.uk> Message-ID: <44F826D5.8050404@sendu.me.uk> Sendu Bala wrote: > skirov wrote: >> Sounds OK with me. To summarize: >> 1. Correction is disabled by default. >> 2. Correction should be applied to all positions. >> 3. Thresholds for IUPAC consensus can be user defined. >> 4. A fix for IUPAC consensus calculation: change the defaukt behavior. >> 5. Document the options >> Does this sounds right? > > Yes, sounds good to me. I'll code those up shortly. This is now done. I didn't quite do it the way you suggested: the 'threshold' for IUPAC consensus is implemented as significance level for rounding the frequencies. This way we don't have to suffer some arbitrary cutoff that does unexpected things. You may also want to check the changes to the documentation (mostly in new() and the description) to make sure I understood and explained everything well enough. The consensus string now also enforces the supplied or default threshold, and treats the threshold the way most people might think of such a thing - the minimum acceptable value (inclusive). This doesn't seem to actually change the answer for the few test matrices used by the test scripts (though the test script answers have changed since we're not doing pseudo-count correction anymore). One issue is that there's no way for the user to decide to do pseudo-count correction or not when using the PSM::IO modules. The correction should probably be farmed out to a separate method. I don't plan to do this myself. From skirov at utk.edu Fri Sep 1 09:14:09 2006 From: skirov at utk.edu (skirov) Date: Fri, 1 Sep 2006 09:14:09 -0400 Subject: [Bioperl-l] SiteMatrix changes Message-ID: <450A67B7@webmail.utk.edu> >===== Original Message From Sendu Bala ===== >Sendu Bala wrote: >> skirov wrote: >>> Sounds OK with me. To summarize: >>> 1. Correction is disabled by default. >>> 2. Correction should be applied to all positions. >>> 3. Thresholds for IUPAC consensus can be user defined. >>> 4. A fix for IUPAC consensus calculation: change the defaukt behavior. >>> 5. Document the options >>> Does this sounds right? >> >> Yes, sounds good to me. I'll code those up shortly. > >This is now done. I didn't quite do it the way you suggested: the >'threshold' for IUPAC consensus is implemented as significance level for >rounding the frequencies. This way we don't have to suffer some >arbitrary cutoff that does unexpected things. You may also want to check >the changes to the documentation (mostly in new() and the description) >to make sure I understood and explained everything well enough. > Thanks Sendu. I will do that. >The consensus string now also enforces the supplied or default >threshold, and treats the threshold the way most people might think of >such a thing - the minimum acceptable value (inclusive). This doesn't >seem to actually change the answer for the few test matrices used by the >test scripts (though the test script answers have changed since we're >not doing pseudo-count correction anymore). > >One issue is that there's no way for the user to decide to do >pseudo-count correction or not when using the PSM::IO modules. The >correction should probably be farmed out to a separate method. I don't >plan to do this myself. I will look into this. Stefan >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Fri Sep 1 06:57:28 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 1 Sep 2006 16:27:28 +0530 Subject: [Bioperl-l] NEED HELP URGENTLY Message-ID: <764978cf0609010357p5c035cbbx487650663c7c3182@mail.gmail.com> Hi, I have written a BLAST parser script, and when I try running it on a BLASTN ouput file to get the first 10 hits, it gives me 10 random hits, and not the first 10 hits.Can anyone give me any idea what could be the cause of this behaviour of the parser. Please find attached my parser script and the input and output files. The script can be ran as : perl parserMethod.pl 10, where 10 is the number of hits expected in the parsed output file. -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: parserMethod.pl Type: application/octet-stream Size: 5395 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060901/0e2b8b0e/attachment-0003.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: input Type: application/octet-stream Size: 1488455 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060901/0e2b8b0e/attachment-0004.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: OutPutFile Type: application/octet-stream Size: 132030 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060901/0e2b8b0e/attachment-0005.obj From akarger at CGR.Harvard.edu Fri Sep 1 10:54:39 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 1 Sep 2006 10:54:39 -0400 Subject: [Bioperl-l] Write a fasta file with custom title line. Message-ID: > From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu] > > Staffa, Nick (NIH/NIEHS) [C] wrote: > > I would like to construct title lines for the fasta > > sequences I want to right to a file. > > I don't see in the documentation on-line for SeqIO or > > write_seq how to specify this. > > Please point the way. > > Hi Nick, > > You could use Bio::Seq::BaseSeqProcessor to customize the title line. > Write your own title processing class which should inherit from > Bio::Seq::BaseSeqProcessor overriding its "process_seq" > method. I think requiring someone to write a whole class just to get their favorite FASTA output is a bit much, don't you? They might as well just explicitly print out '>', the ID, a space, the description, a newline, and the sequence (while Bioperlers are adding a description setter in the FASTA writer, if there isn't one). -Amir From cjfields at uiuc.edu Fri Sep 1 11:34:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Sep 2006 10:34:04 -0500 Subject: [Bioperl-l] Write a fasta file with custom title line. In-Reply-To: Message-ID: <001f01c6cddc$0eb4b1a0$15327e82@pyrimidine> Amir, What you describe (ID + description) can be done within Bioperl directly using SeqIO w/o building a class. When using method preferred_id_type() you can choose between primary ID, accession, accession.version, or display ID. One way to customize the output is by using 'display' for the preferred_id_type() argument and passing display_id() the customized string value. The description is automatically appended every time. We could add an additional parameter to make appending the description optional if anyone is interested (should be fairly straightforward to add). I posted this previously but here is the demo again: use Bio::SeqIO; my $seqin = Bio::SeqIO->new(-file => shift @ARGV, -format => 'genbank'); my $seqout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta'); # From Bio::SeqIO::fasta $seqout->preferred_id_type('display'); my $ct = 1; while (my $seq = $seqin->next_seq) { # override the regular display_id with your own $seq->display_id('foo'.$ct); $seqout->write_seq($seq); $ct++; } Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Friday, September 01, 2006 9:55 AM > To: bioperl-l > Subject: Re: [Bioperl-l] Write a fasta file with custom title line. > > > From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu] > > > > Staffa, Nick (NIH/NIEHS) [C] wrote: > > > I would like to construct title lines for the fasta > > > sequences I want to right to a file. > > > I don't see in the documentation on-line for SeqIO or > > > write_seq how to specify this. > > > Please point the way. > > > > Hi Nick, > > > > You could use Bio::Seq::BaseSeqProcessor to customize the title line. > > Write your own title processing class which should inherit from > > Bio::Seq::BaseSeqProcessor overriding its "process_seq" > > method. > > I think requiring someone to write a whole class just to get their > favorite FASTA output is a bit much, don't you? They might as well just > explicitly print out '>', the ID, a space, the description, a newline, > and the sequence (while Bioperlers are adding a description setter in > the FASTA writer, if there isn't one). > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From basu at pharm.sunysb.edu Fri Sep 1 11:42:55 2006 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Fri, 01 Sep 2006 11:42:55 -0400 Subject: [Bioperl-l] Write a fasta file with custom title line. In-Reply-To: References: Message-ID: <44F854FF.6030003@pharm.sunysb.edu> Amir Karger wrote: >>From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu] >> >>Staffa, Nick (NIH/NIEHS) [C] wrote: >> >>>I would like to construct title lines for the fasta >>>sequences I want to right to a file. >>>I don't see in the documentation on-line for SeqIO or >>>write_seq how to specify this. >>>Please point the way. >> >>Hi Nick, >> >>You could use Bio::Seq::BaseSeqProcessor to customize the title line. >>Write your own title processing class which should inherit from >>Bio::Seq::BaseSeqProcessor overriding its "process_seq" >>method. > > > I think requiring someone to write a whole class just to get their > favorite FASTA output is a bit much, don't you? I beleive it will also depend on the nature of application. If there is a potential of reuse, class based inheritance could be advantageous. >They might as well just > explicitly print out '>', the ID, a space, the description, a newline, > and the sequence (while Bioperlers are adding a description setter in > the FASTA writer, if there isn't one). That's true, however based on the original poster's intention, i tried to adhere closely to the bioperl SeqIO system. -siddhartha > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From simon.andrews at bbsrc.ac.uk Fri Sep 1 11:37:42 2006 From: simon.andrews at bbsrc.ac.uk (Simon Andrews) Date: Fri, 1 Sep 2006 16:37:42 +0100 Subject: [Bioperl-l] NEED HELP URGENTLY In-Reply-To: <764978cf0609010357p5c035cbbx487650663c7c3182@mail.gmail.com> References: <764978cf0609010357p5c035cbbx487650663c7c3182@mail.gmail.com> Message-ID: On 1 Sep 2006, at 11:57, neeti somaiya wrote: > Hi, > > I have written a BLAST parser script, and when I try running it on > a BLASTN > ouput file to get the first 10 hits, it gives me 10 random hits, > and not the > first 10 hits.Can anyone give me any idea what could be the cause > of this > behaviour of the parser. Looking at your blast file you appear to have duplicate entries in your blast database (more than one hit to the same accession). In that case your problem is probably this: http://bugzilla.open-bio.org/show_bug.cgi?id=1986 There are patches and workrounds listed in the bug report, but there's no real fix as yet. Hope this helps Simon. From osborne1 at optonline.net Fri Sep 1 11:41:13 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 01 Sep 2006 11:41:13 -0400 Subject: [Bioperl-l] NEED HELP URGENTLY In-Reply-To: <764978cf0609010357p5c035cbbx487650663c7c3182@mail.gmail.com> Message-ID: Neeti, http://www.bioperl.org/wiki/HOWTO:SearchIO#Sorting The behavior you're seeing is the intended behavior of the parser. Brian O. On 9/1/06 6:57 AM, "neeti somaiya" wrote: > Hi, > > I have written a BLAST parser script, and when I try running it on a BLASTN > ouput file to get the first 10 hits, it gives me 10 random hits, and not the > first 10 hits.Can anyone give me any idea what could be the cause of this > behaviour of the parser. > Please find attached my parser script and the input and output files. The > script can be ran as : perl parserMethod.pl 10, where 10 is the number of > hits expected in the parsed output file. From cjfields at uiuc.edu Fri Sep 1 12:11:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Sep 2006 11:11:20 -0500 Subject: [Bioperl-l] NEED HELP URGENTLY In-Reply-To: References: <764978cf0609010357p5c035cbbx487650663c7c3182@mail.gmail.com> Message-ID: I plan on looking at this one this weekend. I would suggest trying XML output parsing to see if that helps. Chris On Sep 1, 2006, at 10:37 AM, Simon Andrews wrote: > > On 1 Sep 2006, at 11:57, neeti somaiya wrote: > >> Hi, >> >> I have written a BLAST parser script, and when I try running it on >> a BLASTN >> ouput file to get the first 10 hits, it gives me 10 random hits, >> and not the >> first 10 hits.Can anyone give me any idea what could be the cause >> of this >> behaviour of the parser. > > Looking at your blast file you appear to have duplicate entries in > your blast database (more than one hit to the same accession). In > that case your problem is probably this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1986 > > There are patches and workrounds listed in the bug report, but > there's no real fix as yet. > > Hope this helps > > Simon. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cdavis at bcm.tmc.edu Fri Sep 1 12:48:29 2006 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Fri, 01 Sep 2006 11:48:29 -0500 Subject: [Bioperl-l] trouble with pairwise_kaks.PLS on Cygwin/XP platform In-Reply-To: References: Message-ID: <44F8645D.6050104@bcm.tmc.edu> Brian, I tried changing the tmpdir environmental variable using the different path formats you suggested, and the only one that works for the clustalw wrappers is the c:/cygwin/tmp type. None of them execute the script all the way through...? --Caleb Brian Osborne wrote: > Caleb, > > In Cygwin there are 3 possible formats for a given path. Examples: > > /cygdrive/c/cygwin/tmp > c:/cygwin/tmp > /tmp > > What works will depend on the application and how it was compiled. Have you > tried all variations? > > Brian O. > > > On 8/31/06 8:00 PM, "Caleb F. Davis" wrote: > > >> out why. I'm guessing it has something to do with the tmpdir >> environmental variable and the differences between windows and unix >> > > > From aw2 at sanger.ac.uk Fri Sep 1 12:24:46 2006 From: aw2 at sanger.ac.uk (Adam Woolfe) Date: Fri, 1 Sep 2006 17:24:46 +0100 (BST) Subject: [Bioperl-l] NEED HELP URGENTLY In-Reply-To: References: Message-ID: If you can face using SearchIO in bioperl-1.2.3 that sorts the output whereas the later versions dont. not sure why! Adam From osborne1 at optonline.net Fri Sep 1 13:50:38 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 01 Sep 2006 13:50:38 -0400 Subject: [Bioperl-l] trouble with pairwise_kaks.PLS on Cygwin/XP platform In-Reply-To: <44F8645D.6050104@bcm.tmc.edu> Message-ID: Caleb, I'm no longer using Cygwin so I'm simply guessing here - how about compiling clustalw within Cygwin rather than using the Windows executable? In this case you'd reset TMPDIR to '/tmp'. By the way, I don't think your problem has to do with the value of TMPDIR, but recompiling is easy and may help you. Brian O. On 9/1/06 12:48 PM, "Caleb Davis" wrote: > Brian, > > I tried changing the tmpdir environmental variable using the different > path formats you suggested, and the only one that works for the clustalw > wrappers is the c:/cygwin/tmp type. None of them execute the script all > the way through...? > > --Caleb > > Brian Osborne wrote: >> Caleb, >> >> In Cygwin there are 3 possible formats for a given path. Examples: >> >> /cygdrive/c/cygwin/tmp >> c:/cygwin/tmp >> /tmp >> >> What works will depend on the application and how it was compiled. Have you >> tried all variations? >> >> Brian O. >> >> >> On 8/31/06 8:00 PM, "Caleb F. Davis" wrote: >> >> >>> out why. I'm guessing it has something to do with the tmpdir >>> environmental variable and the differences between windows and unix >>> >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Sep 1 14:23:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Sep 2006 13:23:03 -0500 Subject: [Bioperl-l] NEED HELP URGENTLY In-Reply-To: Message-ID: <000601c6cdf3$a9f96a40$15327e82@pyrimidine> Yes, but the older SearchIO may not parse more current BLAST output (like 2.2.11). You're usually safe with XML regardless of BLAST version, though: it's the most stable. Interesting that v 1.2.3 works, though, so I'll have to investigate what changed in between then and now. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Adam Woolfe > Sent: Friday, September 01, 2006 11:25 AM > Cc: neeti somaiya; bioperl-l > Subject: Re: [Bioperl-l] NEED HELP URGENTLY > > If you can face using SearchIO in bioperl-1.2.3 that sorts the output > whereas the later versions dont. not sure why! > > Adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Sep 1 17:45:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Sep 2006 16:45:57 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? Message-ID: <000001c6ce10$04deb2a0$15327e82@pyrimidine> Does anyone have suggestions for a test suite other than the regular ol' Test.pm? I would like something that's a bit more flexible for Bioperl but won't require a ton of revisions for current tests. Jason had suggested moving tests over to using Test::More for it's flexibility (and it's a bit easier to document with messages). I think switching over to this makes sense, but there may be other options out there that I don't know about. I'm considering trying out Test::More for my EUtilities tests, but if anyone has suggestions I would be glad to hear them. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From lincoln.stein at gmail.com Fri Sep 1 18:11:31 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 1 Sep 2006 18:11:31 -0400 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <000001c6ce10$04deb2a0$15327e82@pyrimidine> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> Message-ID: <6dce9a0b0609011511w33b4a999w201c82da8541f223@mail.gmail.com> I use Test::More for the CGI module and am very happy with it. Lincoln On 9/1/06, Chris Fields wrote: > > Does anyone have suggestions for a test suite other than the regular ol' > Test.pm? I would like something that's a bit more flexible for Bioperl > but > won't require a ton of revisions for current tests. > > Jason had suggested moving tests over to using Test::More for it's > flexibility (and it's a bit easier to document with messages). I think > switching over to this makes sense, but there may be other options out > there > that I don't know about. > > I'm considering trying out Test::More for my EUtilities tests, but if > anyone > has suggestions I would be glad to hear them. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From mkiwala at watson.wustl.edu Fri Sep 1 18:21:21 2006 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Fri, 01 Sep 2006 17:21:21 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <000001c6ce10$04deb2a0$15327e82@pyrimidine> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> Message-ID: <44F8B261.3040606@watson.wustl.edu> Test::Class is nice. Unfortunately, it would introduce a dependency on a non-standard perl module for anyone planning on running the tests, which would be everyone installing Bioperl. If we used Test::Class Bioperl would either have to force users who wish to run tests during installation to have Test::Class installed, or Bioperl would need a separate set of tests based on Test::More for the purpose of running during installation. I'm not sure if the benefits of using Test::Class outweigh the costs in this instance, but it is a nice testing framework. Chris Fields wrote: > Does anyone have suggestions for a test suite other than the regular ol' > Test.pm? I would like something that's a bit more flexible for Bioperl but > won't require a ton of revisions for current tests. > > Jason had suggested moving tests over to using Test::More for it's > flexibility (and it's a bit easier to document with messages). I think > switching over to this makes sense, but there may be other options out there > that I don't know about. > > I'm considering trying out Test::More for my EUtilities tests, but if anyone > has suggestions I would be glad to hear them. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From goshng at gmail.com Sat Sep 2 07:39:09 2006 From: goshng at gmail.com (Sang Chul Choi) Date: Sat, 2 Sep 2006 07:39:09 -0400 Subject: [Bioperl-l] slow parsing and slow debugging in PDB parsing object Message-ID: <33f36270609020439y60ab2c1dm1d3db6c8c67946c1@mail.gmail.com> Hi, I'm using PDB parsing object, Bio::Structure. I'm not complaining its slow parsing, but I love this object because it has been saving my time of coding. During the debugging of my code, I often call the parsing function, I guess, Structure::IO->next_structure, which could be the parsing object. Is there a way to bypass the parsing part by writing down the object into a file and directly reading the file during the debugging? I think that this might not be an appropriate question to this because this could be a general perl object technique. But, I will appreciate any help. Thank you, Sang Chul -- =============================== Live, Learn, and Love! E-mail : goshng at empal dot com goshng at gmail dot com =============================== From osborne1 at optonline.net Sat Sep 2 16:17:34 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 02 Sep 2006 16:17:34 -0400 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44F8076E.10102@biologie.uni-freiburg.de> Message-ID: Daniel, Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with such a PRELIMINARY entry? Brian O. On 9/1/06 6:11 AM, "Daniel Lang" wrote: > Hi, > > when using Bio::Registry (bioperl-live) to fetch uniprot entries from > local indexed uniprot *.dats, I had to realize that several entries > could not be retrieved despite the fact that they are present in the > files! A closer look reveals that they are of status PRELIMINARY: > > uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. > > I don't "grep" PRELIMINARY anywhere in my cvs checkout.. > I also can't retrieve the sequences from the online database defined as > follows: > [swissprot_ebi] > protocol=biofetch > location=http://www.ebi.ac.uk/cgi-bin/dbfetch > dbname=swall > > Is this a bug or a feature? If its a feature, how can I bypass it? > > Thanks in advance, > Daniel From cjfields at uiuc.edu Sun Sep 3 01:02:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Sep 2006 00:02:28 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <44FA3EC3.3040903@infotech.monash.edu.au> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> Message-ID: <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> From the INSTALL: o SYSTEM REQUIREMENTS - perl 5.005 or later*. Frankly, I think we should change that to v 5.8; it's been out for over three years now. perl 5.10 isn't too far off (let alone Perl6) and perl 5.005, according to CPAN, is 8 years old. Chris On Sep 2, 2006, at 9:32 PM, Torsten Seemann wrote: > Chris Fields wrote: >> Does anyone have suggestions for a test suite other than the >> regular ol' >> Test.pm? I would like something that's a bit more flexible for >> Bioperl but >> won't require a ton of revisions for current tests. > > I notice that BioPerl still comes bundled with t/Test.pm 1.15 and > most of the t/*.t test scripts have a BEGIN block which loads the > bundled one if a local copy doesn't exist. > > Is this still necessary? > > I think Test.pm has been part of the core Perl distribution since > 5.6 (happy to be corrected - it's in "man perltoc" if you have 5.6 > installed), and Test::Simple and Test::More are standard in Perl 5.8. > > What is the minimum Perl version required for bioperl-live? > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Sun Sep 3 04:45:38 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 03 Sep 2006 09:45:38 +0100 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> Message-ID: <44FA9632.6070601@sendu.me.uk> Chris Fields wrote: > From the INSTALL: > > o SYSTEM REQUIREMENTS > > - perl 5.005 or later*. On the wiki it says 'Bioperl currently requires at least Perl 5.6.0 but at least 5.8.0 is recommened [sic]' http://www.bioperl.org/wiki/Installing_BioPerl > Frankly, I think we should change that to v 5.8; it's been out for > over three years now. perl 5.10 isn't too far off (let alone Perl6) > and perl 5.005, according to CPAN, is 8 years old. I'm certainly with you regarding 5.8. Full disclosure: my Bio::PullParserI requires 5.8 for certain things. I was going to add some <5.8 detection and work-around code, but it would be much better if we could just bump the requirement to 5.8. From cjfields at uiuc.edu Sun Sep 3 10:31:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Sep 2006 09:31:37 -0500 Subject: [Bioperl-l] min. perl ver. for next release, was Bioperl tests In-Reply-To: <44FA9632.6070601@sendu.me.uk> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <44FA9632.6070601@sendu.me.uk> Message-ID: <0CD17008-66A9-408A-AAA0-8CD90A280617@uiuc.edu> On Sep 3, 2006, at 3:45 AM, Sendu Bala wrote: > Chris Fields wrote: >> From the INSTALL: >> >> o SYSTEM REQUIREMENTS >> >> - perl 5.005 or later*. > > On the wiki it says 'Bioperl currently requires at least Perl 5.6.0 > but > at least 5.8.0 is recommened [sic]' > > http://www.bioperl.org/wiki/Installing_BioPerl > > >> Frankly, I think we should change that to v 5.8; it's been out for >> over three years now. perl 5.10 isn't too far off (let alone Perl6) >> and perl 5.005, according to CPAN, is 8 years old. > > I'm certainly with you regarding 5.8. Full disclosure: my > Bio::PullParserI requires 5.8 for certain things. I was going to add > some <5.8 detection and work-around code, but it would be much > better if > we could just bump the requirement to 5.8. Agreed. We should probably get some input from Brian, Hilmar, etc. As Torsten and I have pointed out, Test and Test::More are included in 5.8 so there's no need to include it in the distribution. perl v. 5.8 has been out since late 2002, so I think it's feasible. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Mon Sep 4 01:14:09 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 04 Sep 2006 01:14:09 -0400 Subject: [Bioperl-l] Retrieving update sequence from GenBank Message-ID: <03b101c6cfe0$f4c59050$2f01a8c0@GOLHARMOBILE1> I'm using Bio::DB:GenBank::get_Seq_by_id passing in the GI:76628052. The record has been replaced by GI:78369420 (NP_001030432). Is there any way to determine if a sequence has been replaced, and what its replacement ID is, or follow the replacement tree to get the latest sequence? Ryan From aximili23 at gmail.com Mon Sep 4 05:42:21 2006 From: aximili23 at gmail.com (Carlo Lapid) Date: Mon, 4 Sep 2006 17:42:21 +0800 Subject: [Bioperl-l] standalone blast won't work Message-ID: Hi, I installed blast on my machine, and I wrote a test bioperl script that blasts a short test sequence against a small local database that I created. It worked fine. But when I incorporated the same code in a CGI script for a web-based program, I get the following error message in the error logs: ------------- EXCEPTION -------------, referer: http://localhost/asa/cgi-bin/asa.cgi MSG: blastall call crashed: -1 /usr/local/blast-2.2.14/bin/blastall -p blastp -d /usr/local/blast-2.2.14/data/ig_heavy.gp -i /tmp/Cht6r7QkNs -o /tmp/CzYdy6GJ1b , referer: http://localhost/asa/cgi-bin/asa.cgi , referer: http://localhost/asa/cgi-bin/asa.cgi STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/StandAloneBlast.pm:732, referer: http://localhost/asa/cgi-bin/asa.cgi STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/StandAloneBlast.pm:680, referer: http://localhost/asa/cgi-bin/asa.cgi STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/StandAloneBlast.pm:536, referer: http://localhost/asa/cgi-bin/asa.cgi Can anybody help? I've already placed the blast/bin folder with the executable files in the PATH environment variable, and I've pointed BLASTDB and BLASTMAT to the blast/data directory as well. The relevant program code is below: use CGI; use Bio::Tools::Run::StandAloneBlast; { $ENV{PATH}=":/usr/local/blast-2.2.14/bin/:"; } my $sequence = Bio::Seq->new(-id => "test_query", -seq => $input_sequence); my @params = (program => 'blastp', database => '/usr/local/blast-2.2.14 /data/ig_heavy.gp'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($sequence); my $result = $blast_report->next_result; print $cgi->hr, $cgi->p(); print $cgi->br; while( my $hit = $result->next_hit()) { print $cgi->br; print "Hit name: ", $hit->name(), space(5), "Significance: ", $hit->significance(); } Any help would be immensely appreciated. Carlo From n.haigh at sheffield.ac.uk Mon Sep 4 06:02:34 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon, 04 Sep 2006 11:02:34 +0100 Subject: [Bioperl-l] Taxonomy and entrez Message-ID: <44FBF9BA.4010808@sheffield.ac.uk> I'm trying write a script that parses a BLAST report and retrieves the taxanomic information from entrez and prints the binomial name for each hit. I'm new to these objects so I started with the classify_hits_kingdom script from the script directory in CVS. However, I have problems with things that are subspecies etc I have a script that uses the following objects: my $taxdb = Bio::DB::Taxonomy->new(-source => 'entrez'); my $node = $taxdb->get_Taxonomy_Node($taxid); print $node->binomial; However, I get warnings such as: -------------------- WARNING --------------------- MSG: can't create a species object for Brassica rapa subsp. pekinensis (bai cai) because it isn't a species but is a 'subspecies' instead --------------------------------------------------- I've had a look around and it appears that there has been an overhaul of species, taxonomy etc since 1.5.1. I was wondering if someone could point me in the right direction for doing this with bioperl-live? Thanks Nathan From bix at sendu.me.uk Mon Sep 4 06:28:40 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Sep 2006 11:28:40 +0100 Subject: [Bioperl-l] standalone blast won't work In-Reply-To: References: Message-ID: <44FBFFD8.4000908@sendu.me.uk> Carlo Lapid wrote: > Hi, > > I installed blast on my machine, and I wrote a test bioperl script that > blasts a short test sequence against a small local database that I created. > It worked fine. But when I incorporated the same code in a CGI script for a > web-based program, I get the following error message in the error logs: [...] > /usr/local/blast-2.2.14/bin/blastall -p blastp -d /usr/local/blast-2.2.14/data/ig_heavy.gp -i /tmp/Cht6r7QkNs Does a new CGI script that just runs the above command directly work? (you may need to change the input file and add an output file option to see if it works properly) If it crashes as well this is a non-bioperl problem. Perhaps your server is set to kill high cpu/ long-running processes? From bix at sendu.me.uk Mon Sep 4 07:22:12 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Sep 2006 12:22:12 +0100 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: <44FBF9BA.4010808@sheffield.ac.uk> References: <44FBF9BA.4010808@sheffield.ac.uk> Message-ID: <44FC0C64.30204@sendu.me.uk> Nathan Haigh wrote: > I'm trying write a script that parses a BLAST report and retrieves the > taxanomic information from entrez and prints the binomial name for each hit. > > I'm new to these objects so I started with the classify_hits_kingdom > script from the script directory in CVS. However, I have problems with > things that are subspecies etc > > I have a script that uses the following objects: > > my $taxdb = Bio::DB::Taxonomy->new(-source => 'entrez'); > my $node = $taxdb->get_Taxonomy_Node($taxid); > print $node->binomial; > > However, I get warnings such as: > -------------------- WARNING --------------------- > MSG: can't create a species object for Brassica rapa subsp. pekinensis > (bai cai) because it isn't a species but is a 'subspecies' instead > --------------------------------------------------- > > I've had a look around and it appears that there has been an overhaul of > species, taxonomy etc since 1.5.1. I was wondering if someone could > point me in the right direction for doing this with bioperl-live? First install the very latest bioperl-live from cvs: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl-live.tar.gz?tarball=1 (or do a cvs checkout, tarball above may not be up-to-date enough, I only just committed some fixes here) my $taxdb = Bio::DB::Taxonomy->new(-source => 'entrez'); my $taxon - $taxdb->get_taxon(51351); # $taxon isa Bio::Taxon, which has no 'binomial' method print $taxon->scientific_name; # prints 'Brassica rapa subsp. pekinensis' # If you really really want a Bio::Species object (no need): my $species = new Bio::Species(-id => 51351); $species->db_handle($taxdb); print $species->binomial; # prints 'Brassica rapa subsp.' print $species->binomial('FULL'); # prints 'Brassica rapa subsp. pekinensis' print $species->genus; # prints 'Brassica' print $species->species; # prints 'rapa subsp.' print $species->sub_species; # prints 'pekinensis' From osborne1 at optonline.net Mon Sep 4 12:20:09 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 04 Sep 2006 12:20:09 -0400 Subject: [Bioperl-l] Retrieving update sequence from GenBank In-Reply-To: <03b101c6cfe0$f4c59050$2f01a8c0@GOLHARMOBILE1> Message-ID: Ryan, Bio::DB::SeqVersion is designed to do this: use Bio::DB::SeqVersion; my $query = Bio::DB::SeqVersion->new(-type => 'gi'); # all GIs, which will include the GI used to query my @all_gis = $query->get_all(2); # the most recent GI, which may or may not be the GI used to query my $live_gi = $query->get_recent(2); Brian O. On 9/4/06 1:14 AM, "Ryan Golhar" wrote: > I'm using Bio::DB:GenBank::get_Seq_by_id passing in the GI:76628052. > The record has been replaced by GI:78369420 (NP_001030432). > > Is there any way to determine if a sequence has been replaced, and what > its replacement ID is, or follow the replacement tree to get the latest > sequence? > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Sep 4 15:09:42 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Sep 2006 14:09:42 -0500 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: <44FC0C64.30204@sendu.me.uk> Message-ID: <000001c6d055$b4883220$15327e82@pyrimidine> ... > # prints 'Brassica rapa subsp.' > print $species->binomial('FULL'); > # prints 'Brassica rapa subsp. pekinensis' Sendu, To most biologists 'binomial' means 'Genus species'. If we decide to continue using binomial(), adding an argument to retrieve something that isn't binomial in nature doesn't make much sense. The only exception would be if you are always at the species rank, and that isn't always true for sequence file organism data. Also, since we will eventually get rid of Bio::Species altogether, it doesn't make much sense to me why you want to promote using binomial(), which will be deprecated as well. I'm a little confused: shouldn't we be promoting use of your new Bio::Taxon-based (DB-based) API for retrieving taxonomy information instead of using the old Bio::Species-based API? Or am I missing something? Anyway, there is a method for Bio::Species/Bio::Taxon already in place for getting the full name: scientific_name(), which will not be deprecated. The only way I can think of to retrieve the true 'binomial name' w/o reliance on the sequence file data would be to traverse the lineage to the species rank for a node (taxon) and retrieve the scientific_name(), which should always be 'Genus species'. I guess you could also use the genus rank and get something like 'Bacillus sp.'. This data wouldn't be a kludge from the sequence file information, which we are trying to get away from anyway. Chris From bix at sendu.me.uk Mon Sep 4 18:22:41 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Sep 2006 23:22:41 +0100 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: <000001c6d055$b4883220$15327e82@pyrimidine> References: <000001c6d055$b4883220$15327e82@pyrimidine> Message-ID: <44FCA731.80802@sendu.me.uk> Chris Fields wrote: > ... > >> # prints 'Brassica rapa subsp.' >> print $species->binomial('FULL'); >> # prints 'Brassica rapa subsp. pekinensis' > > To most biologists 'binomial' means 'Genus species'. If we decide to > continue using binomial(), adding an argument to retrieve something that > isn't binomial in nature doesn't make much sense. I didn't add it. It was a pre-existing feature of Bio::Species that I have made work for backward compatibility. My interpretation of the option (which can be any true value) is essentially a request for the trinomial name. > Anyway, there is a method for Bio::Species/Bio::Taxon already in place for > getting the full name: scientific_name(), which will not be deprecated. The > only way I can think of to retrieve the true 'binomial name' w/o reliance on > the sequence file data would be to traverse the lineage to the species rank > for a node (taxon) and retrieve the scientific_name(), which should always > be 'Genus species'. binomial() is there for backward compatibility, as is the entire Bio::Species class. It allows people to (hopefully) get the same strings they are used to getting. FYI, binomial() is indeed implemented using traversals of the lineage tree. It just does some fantastic and surprising things to give us that old Bio::Species flavour we all know and love (?). > Also, since we will eventually get rid of Bio::Species altogether, it > doesn't make much sense to me why you want to promote using binomial(), > which will be deprecated as well. I'm a little confused: shouldn't we be > promoting use of your new Bio::Taxon-based (DB-based) API for retrieving > taxonomy information instead of using the old Bio::Species-based API? Or am > I missing something? I'm not promoting it. I stated that you didn't need to use Bio::Species. I mentioned it in case the OP wanted the familiarity of Bio::Species and the behaviour of its old methods. From cjfields at uiuc.edu Mon Sep 4 20:31:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Sep 2006 19:31:26 -0500 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: <44FCA731.80802@sendu.me.uk> References: <000001c6d055$b4883220$15327e82@pyrimidine> <44FCA731.80802@sendu.me.uk> Message-ID: <35A7634B-8B0B-40DC-81EB-CD92F945EE0F@uiuc.edu> On Sep 4, 2006, at 5:22 PM, Sendu Bala wrote: ... > > binomial() is there for backward compatibility, as is the entire > Bio::Species class. It allows people to (hopefully) get the same > strings > they are used to getting. > > FYI, binomial() is indeed implemented using traversals of the lineage > tree. It just does some fantastic and surprising things to give us > that > old Bio::Species flavour we all know and love (?). Heh, not so much. Anyway, I didn't notice the 'argument' form of binomial(). My bad! Still, I have to say, having a 'trinomial' or full name accessible via the binomial() doesn't make too much sense. Oh well, that's part of the reason we're switching. ... > I'm not promoting it. I stated that you didn't need to use > Bio::Species. > I mentioned it in case the OP wanted the familiarity of > Bio::Species and > the behaviour of its old methods. Right, but you didn't indicate that Bio::Species is deprecated, either. If I'm told that something can be accomplished two different ways (Bio::Taxon, Bio::Species), I would expect it to stay that way. This isn't quite true with Bio::Species, which is slated to be deprecated soon. Not a big deal, but something to remember. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Mon Sep 4 23:46:45 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 4 Sep 2006 23:46:45 -0400 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> Message-ID: <6C9B421D-ED96-4458-8C2D-399614CB80B7@gmx.net> A year or so ago some key servers at the Sanger were still on 5.005. I believe due to some code constructs (regex or some such?) in a couple of modules we are now effectively requiring 5.6.x. Mac OSX 10.2 will have 5.6.0, so requiring a higher version means you do not support Jaguar anymore. -hilmar On Sep 3, 2006, at 1:02 AM, Chris Fields wrote: > From the INSTALL: > > o SYSTEM REQUIREMENTS > > - perl 5.005 or later*. > > Frankly, I think we should change that to v 5.8; it's been out for > over three years now. perl 5.10 isn't too far off (let alone Perl6) > and perl 5.005, according to CPAN, is 8 years old. > > Chris > > > On Sep 2, 2006, at 9:32 PM, Torsten Seemann wrote: > >> Chris Fields wrote: >>> Does anyone have suggestions for a test suite other than the >>> regular ol' >>> Test.pm? I would like something that's a bit more flexible for >>> Bioperl but >>> won't require a ton of revisions for current tests. >> >> I notice that BioPerl still comes bundled with t/Test.pm 1.15 and >> most of the t/*.t test scripts have a BEGIN block which loads the >> bundled one if a local copy doesn't exist. >> >> Is this still necessary? >> >> I think Test.pm has been part of the core Perl distribution since >> 5.6 (happy to be corrected - it's in "man perltoc" if you have 5.6 >> installed), and Test::Simple and Test::More are standard in Perl 5.8. >> >> What is the minimum Perl version required for bioperl-live? >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Sep 4 23:52:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 4 Sep 2006 23:52:56 -0400 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <44FA9632.6070601@sendu.me.uk> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <44FA9632.6070601@sendu.me.uk> Message-ID: On Sep 3, 2006, at 4:45 AM, Sendu Bala wrote: > > I'm certainly with you regarding 5.8. Full disclosure: my > Bio::PullParserI requires 5.8 for certain things. I was going to add > some <5.8 detection and work-around code, but it would be much > better if > we could just bump the requirement to 5.8. You can always require a certain version for a module, so long as the module is "optional." (as in "non-core"; the double quotes acknowledge that there is no clear definition on what should fall into that bin and what shouldn't) Just make sure your test exists gracefully and the module prints a clear message about the cause of death when it dies. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Sep 5 00:00:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 5 Sep 2006 00:00:33 -0400 Subject: [Bioperl-l] min. perl ver. for next release, was Bioperl tests In-Reply-To: <0CD17008-66A9-408A-AAA0-8CD90A280617@uiuc.edu> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <44FA9632.6070601@sendu.me.uk> <0CD17008-66A9-408A-AAA0-8CD90A280617@uiuc.edu> Message-ID: On Sep 3, 2006, at 10:31 AM, Chris Fields wrote: > perl v. 5.8 has been out since late 2002, so I think it's feasible. Just a note - it really doesn't matter at all how long these versions have been available. What you need to know from people is how many are still on versions prior to the one you want to make a requirement. For those who are it is unlikely that they are because they enjoy running an ages old version of perl. More likely, the sysadmin is unwilling or cannot upgrade perl, e.g. due to side effects. Bioperl requiring a higher version will not change this. By requiring a higher version you are dropping support for these people, or asking them to please install their own local version of perl. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Sep 5 00:12:32 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 5 Sep 2006 00:12:32 -0400 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: <35A7634B-8B0B-40DC-81EB-CD92F945EE0F@uiuc.edu> References: <000001c6d055$b4883220$15327e82@pyrimidine> <44FCA731.80802@sendu.me.uk> <35A7634B-8B0B-40DC-81EB-CD92F945EE0F@uiuc.edu> Message-ID: On Sep 4, 2006, at 8:31 PM, Chris Fields wrote: > Right, but you didn't indicate that Bio::Species is deprecated, > either. It's not deprecated just yet, is it? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From goshng at gmail.com Tue Sep 5 01:47:21 2006 From: goshng at gmail.com (Sang Chul Choi) Date: Tue, 5 Sep 2006 01:47:21 -0400 Subject: [Bioperl-l] Any way to figure out a changed SwissProt ID? Message-ID: <33f36270609042247g7ffbbe23j6bca60dd6ba91049@mail.gmail.com> Hi, I'm trying to fetch a protein sequence named Q9H2C9, which must be swiss prot sequence. I could not fetch it using Bio::DB::GenBank. I found out that the actual sequence name is not Q9H2C9, but P00533 by searching NCBI's GenBank. I guess that the sequence name had changed. Is there a way to figure out that a SwissProt ID's protein sequence has changed to which other ID using BioPerl? Thank you, Sang Chul -- =============================== Live, Learn, and Love! E-mail : goshng at empal dot com goshng at gmail dot com =============================== From torsten.seemann at infotech.monash.edu.au Sat Sep 2 22:32:35 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 03 Sep 2006 12:32:35 +1000 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <000001c6ce10$04deb2a0$15327e82@pyrimidine> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> Message-ID: <44FA3EC3.3040903@infotech.monash.edu.au> Chris Fields wrote: > Does anyone have suggestions for a test suite other than the regular ol' > Test.pm? I would like something that's a bit more flexible for Bioperl but > won't require a ton of revisions for current tests. I notice that BioPerl still comes bundled with t/Test.pm 1.15 and most of the t/*.t test scripts have a BEGIN block which loads the bundled one if a local copy doesn't exist. Is this still necessary? I think Test.pm has been part of the core Perl distribution since 5.6 (happy to be corrected - it's in "man perltoc" if you have 5.6 installed), and Test::Simple and Test::More are standard in Perl 5.8. What is the minimum Perl version required for bioperl-live? -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Tue Sep 5 00:34:47 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 05 Sep 2006 14:34:47 +1000 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <6C9B421D-ED96-4458-8C2D-399614CB80B7@gmx.net> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <6C9B421D-ED96-4458-8C2D-399614CB80B7@gmx.net> Message-ID: <44FCFE67.7030003@infotech.monash.edu.au> >> Frankly, I think we should change that to v 5.8; it's been out for >> over three years now. perl 5.10 isn't too far off (let alone Perl6) >> and perl 5.005, according to CPAN, is 8 years old. > A year or so ago some key servers at the Sanger were still on 5.005. > I believe due to some code constructs (regex or some such?) in a couple > of modules we are now effectively requiring 5.6.x. Mac OSX 10.2 will > have 5.6.0, so requiring a higher version means you do not support > Jaguar anymore. Well that suggests we should try and make 5.6 the minimum. My quick audit below suggests we already require it anyway. But we need to audit what Perl features any modules, especially core ones, are using which may came from 5.8. Here's a quick summary: Perl 5.6 - http://www.perl.com/pub/a/2000/04/whatsnew.html % perldoc perl56delta - unicode - threading - version tuples - lexical warnings - lvalue subs - weak references Bio/SeqFeature/Gene/GeneStructure.pm ? - posix character classes [: :] Bio/LiveSeq/IO/Loader.pm ? Bio/Tools/GuessSeqFormat.pm ? - binary numbers 0bNNNNN - 'our' variables Bio/DB/Biblio/eutils.pm Bio/DB/Expression/geo.pm Bio/DB/Expression.pm Bio/SeqIO/chaos.pm Bio/Tools/Run/StandAloneBlast.pm Bio/Tools/Run/.#StandAloneBlast.pm.1.57 Bio/Tools/SiRNA/Ruleset/tuschl.pm Bio/Tools/WebBlat.pm - auto vivify file handles - 3 argument open() - largefile support Surely something, perhaps the Index or GFF3 stuff? Perl 5.8 - http://www.perl.com/pub/a/2003/01/16/whatsnew.html % perldoc perl58delta - unicode proper - threading proper - new PerlIO magic - safe signals - Test::More Chris F desires to use this. - no pseudo hashes - blessing a ref into another ref - self tying arrays and hashes - no $* in regexp - pack/unpack D/F -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From Bernhard.Schmalhofer at biomax.com Tue Sep 5 06:12:29 2006 From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer) Date: Tue, 05 Sep 2006 12:12:29 +0200 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <44FCFE67.7030003@infotech.monash.edu.au> References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <6C9B421D-ED96-4458-8C2D-399614CB80B7@gmx.net> <44FCFE67.7030003@infotech.monash.edu.au> Message-ID: <44FD4D8D.3000506@biomax.com> Torsten Seemann wrote: > >> Frankly, I think we should change that to v 5.8; it's been out for > >> over three years now. perl 5.10 isn't too far off (let alone Perl6) > >> and perl 5.005, according to CPAN, is 8 years old. I remember that a while back the same kind of discussion took place on the Parrot mailing list, http://www.parrotcode.org. The result was to require 5.6.1 or later, as perl 5.6.0 wasn't very stable. With 5.6.1 required, some cleanup could be done, e.g. - Replace 'use vars qw( $dummy );' with 'our $dummy;' - Get rid of IO::Scalar. Just my $0.02, Bernhard Schmalhofer -- ************************************************** Dipl.-Physiker Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 895574-839 Fax: +49 89 895574-825 eMail: Bernhard.Schmalhofer at biomax.com Website: www.biomax.com ************************************************** From daniel.lang at biologie.uni-freiburg.de Tue Sep 5 05:57:59 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Tue, 05 Sep 2006 11:57:59 +0200 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: References: Message-ID: <44FD4A27.3060401@biologie.uni-freiburg.de> Hi Brian, sorry for the belated response! I've compiled you a set of 100 PRELIMINARY entries from the latest uniprot_trembl release. I've tried to reproduce the bug using only these as input to build an index, but (sadly) all of them can be retrieved using the latest checkout:-( Maybe its not connected to these entries after all, but the size or some other feature of the uniprot distribution? I now could make it work using the 1.5.1 release. Originally, I've built the index using flat protocol, when I try bdb and bioperl-live even more problems occur: bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c uniprot_sprot.dat ------------- EXCEPTION ------------- MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two non-consecutive nodes with the same name. Can't cope! STACK Bio::DB::Taxonomy::list::add_lineage /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 STACK Bio::DB::Taxonomy::list::new /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 STACK Bio::DB::Taxonomy::new /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 STACK Bio::Species::classification /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 STACK Bio::SeqIO::swiss::_read_swissprot_Species /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 STACK Bio::SeqIO::swiss::next_seq /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 STACK Bio::DB::Flat::parse_one_record /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 STACK Bio::DB::Flat::BDB::_index_file /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 STACK Bio::DB::Flat::BDB::build_index /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 STACK toplevel /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 But I think this is connected to the new changes to taxonomy handling in Bio::Taxon... I'm unsure wether to submit this separately, but I could also provide an example of such a swissprot entry that causes this error. Thanks, again. Daniel Brian Osborne wrote: > Daniel, > > Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with such a > PRELIMINARY entry? > > Brian O. > > > On 9/1/06 6:11 AM, "Daniel Lang" > wrote: > >> Hi, >> >> when using Bio::Registry (bioperl-live) to fetch uniprot entries from >> local indexed uniprot *.dats, I had to realize that several entries >> could not be retrieved despite the fact that they are present in the >> files! A closer look reveals that they are of status PRELIMINARY: >> >> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. >> >> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. >> I also can't retrieve the sequences from the online database defined as >> follows: >> [swissprot_ebi] >> protocol=biofetch >> location=http://www.ebi.ac.uk/cgi-bin/dbfetch >> dbname=swall >> >> Is this a bug or a feature? If its a feature, how can I bypass it? >> >> Thanks in advance, >> Daniel > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 100_exemplary_PRELIMINARY.swiss Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060905/74109b66/attachment-0001.pl From bix at sendu.me.uk Tue Sep 5 08:13:46 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Sep 2006 13:13:46 +0100 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FD4A27.3060401@biologie.uni-freiburg.de> References: <44FD4A27.3060401@biologie.uni-freiburg.de> Message-ID: <44FD69FA.70107@sendu.me.uk> Daniel Lang wrote: [...] > ------------- EXCEPTION ------------- > MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, > Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, > Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two > non-consecutive nodes with the same name. Can't cope! [...] > But I think this is connected to the new changes to taxonomy handling in > Bio::Taxon... > I'm unsure wether to submit this separately, but I could also provide an > example of such a swissprot entry that causes this error. Yes, it is to do with the taxonomy changes. If you can send me the problem entry that had a species of Xenopus tropicalis, I'll fix this particular problem. I don't know about the original problem, but if Brian doesn't solve it please add a bug report to http://bugzilla.bioperl.org/index.cgi Cheers, Sendu. From cjfields at uiuc.edu Tue Sep 5 08:22:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 07:22:56 -0500 Subject: [Bioperl-l] Taxonomy and entrez In-Reply-To: References: <000001c6d055$b4883220$15327e82@pyrimidine> <44FCA731.80802@sendu.me.uk> <35A7634B-8B0B-40DC-81EB-CD92F945EE0F@uiuc.edu> Message-ID: <9AA228A6-C120-4E20-8965-F42389CF31A0@uiuc.edu> Hilmar, Not yet. But it's planned to be removed by maybe 1.7-1.8. I don't think it's productive indicating support for a module we're trying to get rid of. We should be trying to steer people towards using Bio::Taxon. Chris On Sep 4, 2006, at 11:12 PM, Hilmar Lapp wrote: > > On Sep 4, 2006, at 8:31 PM, Chris Fields wrote: > >> Right, but you didn't indicate that Bio::Species is deprecated, >> either. > > It's not deprecated just yet, is it? > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From daniel.lang at biologie.uni-freiburg.de Tue Sep 5 08:39:11 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Tue, 05 Sep 2006 14:39:11 +0200 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FD69FA.70107@sendu.me.uk> References: <44FD4A27.3060401@biologie.uni-freiburg.de> <44FD69FA.70107@sendu.me.uk> Message-ID: <44FD6FEF.20605@biologie.uni-freiburg.de> Hi Sendu, thanks for the fast response! Here is the example... Cheers, Daniel Sendu Bala wrote: > Daniel Lang wrote: > [...] >> ------------- EXCEPTION ------------- >> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, >> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, >> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two >> non-consecutive nodes with the same name. Can't cope! > [...] >> But I think this is connected to the new changes to taxonomy handling in >> Bio::Taxon... >> I'm unsure wether to submit this separately, but I could also provide an >> example of such a swissprot entry that causes this error. > > Yes, it is to do with the taxonomy changes. If you can send me the > problem entry that had a species of Xenopus tropicalis, I'll fix this > particular problem. > > I don't know about the original problem, but if Brian doesn't solve it > please add a bug report to http://bugzilla.bioperl.org/index.cgi > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: example_taxonomy_bug Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060905/970f5834/attachment.pl From bix at sendu.me.uk Tue Sep 5 09:44:53 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Sep 2006 14:44:53 +0100 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FD6FEF.20605@biologie.uni-freiburg.de> References: <44FD4A27.3060401@biologie.uni-freiburg.de> <44FD69FA.70107@sendu.me.uk> <44FD6FEF.20605@biologie.uni-freiburg.de> Message-ID: <44FD7F55.1000503@sendu.me.uk> Daniel Lang wrote: > Hi Sendu, > > thanks for the fast response! > Here is the example... Thanks. This should now be fixed in cvs (grab swiss.pm 1.93 at least). Please clarify if you can reliably reproduce your original problem. From cjfields at uiuc.edu Tue Sep 5 10:13:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 09:13:43 -0500 Subject: [Bioperl-l] (completely OT) Dr. Ewan Birney on BMC Message-ID: <001b01c6d0f5$7eb00e70$15327e82@pyrimidine> All, Dr. Ewan Birney espouses the benefits of open-access publishing! http://www.biomedcentral.com/profiles/movies/ Sorry if this embarrasses you Ewan! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Sep 5 10:50:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 09:50:52 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <44FD4D8D.3000506@biomax.com> Message-ID: <002201c6d0fa$afc73fb0$15327e82@pyrimidine> > Torsten Seemann wrote: > > >> Frankly, I think we should change that to v 5.8; it's been out for > > >> over three years now. perl 5.10 isn't too far off (let alone Perl6) > > >> and perl 5.005, according to CPAN, is 8 years old. > > I remember that a while back the same kind of discussion took place on > the Parrot mailing list, http://www.parrotcode.org. > The result was to require 5.6.1 or later, as perl 5.6.0 wasn't very > stable. > > With 5.6.1 required, some cleanup could be done, e.g. > - Replace 'use vars qw( $dummy );' with 'our $dummy;' > - Get rid of IO::Scalar. > > > Just my $0.02, > > Bernhard Schmalhofer I remember something about problems with v5.6. So I agree here: at the very least, we should be using 5.6.1. We probably should think about syntax updates as My point with requiring v. 5.8 is we can take advantage of several features present in v. 5.8 and not present in v. 5.6; Test::More was only one of them. Torsten pointed out several features that were major changes between 5.6 and 5.8. However, Hilmar also has a strong point about leaving those perl 5.6 users behind. Hilmar, do you have any suggestions on how we would poll users for their perl versions? I suppose we could do something like that here if needed. Anyway, until we know more we could stick with requiring v. 5.6.1 and strongly recommending v. 5.8, and move to a 5.8 requirement later (maybe for bioperl v. 1.6). As for Test::More, we could always include it as a requirement along with v. 5.6.1 if needed, or include it in Bundle::Bioperl. Or (most extreme) just include it in the distribution like we currently do with Test (not my favorite option, just more bioperl-core bloat). Chris From osborne1 at optonline.net Tue Sep 5 11:06:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 05 Sep 2006 11:06:27 -0400 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FD4A27.3060401@biologie.uni-freiburg.de> Message-ID: Daniel, Well, if you can isolate the bug please add it to bugzilla. Brian O. On 9/5/06 5:57 AM, "Daniel Lang" wrote: > Hi Brian, > > sorry for the belated response! > I've compiled you a set of 100 PRELIMINARY entries from the latest > uniprot_trembl release. I've tried to reproduce the bug using only these > as input to build an index, but (sadly) all of them can be retrieved > using the latest checkout:-( > Maybe its not connected to these entries after all, but the size or some > other feature of the uniprot distribution? > I now could make it work using the 1.5.1 release. > > Originally, I've built the index using flat protocol, when I try bdb and > bioperl-live even more problems occur: > > bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c uniprot_sprot.dat > > ------------- EXCEPTION ------------- > MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, > Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, > Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two > non-consecutive nodes with the same name. Can't cope! > STACK Bio::DB::Taxonomy::list::add_lineage > /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 > STACK Bio::DB::Taxonomy::list::new > /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 > STACK Bio::DB::Taxonomy::new > /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 > STACK Bio::Species::classification > /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 > STACK Bio::SeqIO::swiss::_read_swissprot_Species > /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 > STACK Bio::SeqIO::swiss::next_seq > /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 > STACK Bio::DB::Flat::parse_one_record > /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 > STACK Bio::DB::Flat::BDB::_index_file > /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 > STACK Bio::DB::Flat::BDB::build_index > /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 > STACK toplevel > /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 > > But I think this is connected to the new changes to taxonomy handling in > Bio::Taxon... > I'm unsure wether to submit this separately, but I could also provide an > example of such a swissprot entry that causes this error. > > Thanks, again. > > Daniel > > Brian Osborne wrote: >> Daniel, >> >> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with such a >> PRELIMINARY entry? >> >> Brian O. >> >> >> On 9/1/06 6:11 AM, "Daniel Lang" >> wrote: >> >>> Hi, >>> >>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from >>> local indexed uniprot *.dats, I had to realize that several entries >>> could not be retrieved despite the fact that they are present in the >>> files! A closer look reveals that they are of status PRELIMINARY: >>> >>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. >>> >>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. >>> I also can't retrieve the sequences from the online database defined as >>> follows: >>> [swissprot_ebi] >>> protocol=biofetch >>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch >>> dbname=swall >>> >>> Is this a bug or a feature? If its a feature, how can I bypass it? >>> >>> Thanks in advance, >>> Daniel >> >> > > > > ID Q50214_METMT PRELIMINARY; PRT; 163 AA. > AC Q50214; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanococcoides methylutens. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanococcoides. > OX NCBI_TaxID=2226; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22235; AAC43410.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q50214; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17548 MW; 9E1945AD201B2C68 CRC64; > GSYMSGGVGF TQYATAAYTN NILDDNLYYN VDYINDKYDG AANKGTDNKV KATMDVVKDI > ATESTIYGLE NYEKYPTALE DHFGGSQRAT VLSAAAGSAT ALATGNGNAG LSAWYLSMYL > HKEAHGRLGF FGFDLQDQCG ATNVFSFQSD EGLPVELRGP NYP > // > > ID Q8NKL4_9EURY PRELIMINARY; PRT; 239 AA. > AC Q8NKL4; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Saarnio S., Fritze H., Yrjala K.; > RT "Depth related diversity of methanogen Archaea in Finnish oligotrophic > RT fen."; > RL FEMS Microbiol. Ecol. 42:441-449(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ489768; CAD33991.2; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q8NKL4; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25658 MW; 7FC48B771D61AC25 CRC64; > FIAAYNMCAG EAAVADLAFA AKHASLVEMA NMLPARRARG PNEPGGLSFG FMADMVQTDR > IYPDDPVRSP LEVVAAGCML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DYSYYGNDYA > KKYGANGKAP KTMDVVNDIA TEVTLYGIEQ YEKYPTTLED HFGGSQRATV LAAASGVAAA > IATGNSNAGL SAWYLSMLLH KDAWGRLGFY GYDLQDQCGS TNVFSVRSDE GSPDELRGA > // > > ID O93734_9EURY PRELIMINARY; PRT; 330 AA. > AC O93734; > DT 01-MAY-1999, integrated into UniProtKB/TrEMBL. > DT 01-MAY-1999, sequence version 1. > DT 30-MAY-2006, entry version 22. > DE F420-dependent alcohol dehydrogenase (Fragment). > GN Name=adf; > OS Methanoculleus thermophilus. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC Methanomicrobiaceae; Methanoculleus. > OX NCBI_TaxID=2200; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Berk H., Thauer R.K.; > RL Submitted (FEB-1999) to the EMBL/GenBank/DDBJ databases. > RN [2] > RP X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS). > RX PubMed=15016352; DOI=10.1016/j.str.2004.02.010; > RA Aufhammer S.W., Warkentin E., Berk H., Shima S., Thauer R.K., > RA Ermler U.; > RT "Coenzyme binding in F420-dependent secondary alcohol dehydrogenase, a > RT member of the bacterial luciferase family."; > RL Structure 12:361-370(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; Y18729; CAA77275.1; -; Genomic_DNA. > DR PDB; 1RHC; X-ray; A=1-330. > DR InterPro; IPR011251; Luciferase_like. > DR Pfam; PF00296; Bac_luciferase; 1. > FT NON_TER 330 330 > SQ SEQUENCE 330 AA; 37158 MW; CD1528E3B0AF11EB CRC64; > MKTQIGYFAS LEQYRPMDAL EQAIRAEKVG FDSVWVDDHF HPWYHDNAQS AQAWAWMGAA > LQATKKVFIS TCITCPIMRY NPAIVAQTFA TLRQMYPGRV GVAVGAGEAM NEVPVTGEWP > SVPVRQDMTV EAVKVMRMLW ESDKPVTFKG DYFTLDKAFL YTKPDDEVPL YFSGMGPKGA > KLAGMYGDHL MTVAAAPSTL KNVTIPKFEE GAREAGKDPS KMEHAMLIWY SVDPDYDKAV > EALRFWAGCL VPSMFKYKVY DPKEVQLHAN LVHCDTIKEN YMCATDAEEM IKEIERFKEA > GINHFCLGNS SPDVNFGIDI FKEVIPAVRD > // > > ID Q50215_9EURY PRELIMINARY; PRT; 163 AA. > AC Q50215; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanohalophilus mahii. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanohalophilus. > OX NCBI_TaxID=2176; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22237; AAC43411.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q50215; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17617 MW; 764122F547B7EDD7 CRC64; > GSYMSGGVGF TQYSTAAYTN NILDDNLYYD VEYINDKYDG AAEKGIDNKV APSMDVIKDI > ATESTLYGLE NYEKYPVALE DHFGGSQRAT VLSAAAGSAG SLATGNANAG LSAWYLCMYL > HKEGHGRLGF FGFDLQDQCG ATNTFSYQSD EGLPHELRGP NYP > // > > ID Q9C4G1_9EURY PRELIMINARY; PRT; 157 AA. > AC Q9C4G1; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 17. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR47. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosaetaceae; environmental samples. > OX NCBI_TaxID=143078; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313850; AAK16880.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q9C4G1; 1-157. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 157 157 > SQ SEQUENCE 157 AA; 16858 MW; 6A87301935FB48DF CRC64; > GSYMSGGVGF TQYATAAYTN DVLDDFCYYG VDFANDKFGG LAKAPQTLDV AKELGTEVTL > YGMEQYESFP TLLEDHFGGS QRASVLAAAS GITAAIASGH SQVGLAAWYL SMLLHKEGWG > RLGFFGYDLQ DQCGPTNVFS YQSDEGNPVE LRGANYP > // > > ID Q59578_METTF PRELIMINARY; PRT; 569 AA. > AC Q59578; O08492; P97184; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 28. > DE Tungsten formylmethanofuran dehydrogenase (EC 1.2.99.5). > GN Name=fwdA; > OS Methanobacterium thermoformicicum. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanothermobacter. > OX NCBI_TaxID=145262; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=Marburg/DSM 2133; > RX MEDLINE=96163477; PubMed=8575452; > RA Hochheimer A., Schmitz R.A., Thauer R.K., Hedderich R.; > RT "The tungsten formylmethanofuran dehydrogenase from Methanobacterium > RT thermoautotrophicum contains sequence motifs characteristic for > RT enzymes containing molybdopterin dinucleotide."; > RL Eur. J. Biochem. 234:910-920(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; X87970; CAA61213.1; -; Genomic_DNA. > DR PIR; S63545; S57456. > DR GO; GO:0018493; F:formylmethanofuran dehydrogenase activity; IEA. > DR GO; GO:0016491; F:oxidoreductase activity; IEA. > DR InterPro; IPR013108; Amidohydro_3. > DR InterPro; IPR012027; FwdA. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF07969; Amidohydro_3; 1. > DR PIRSF; PIRSF006453; FwdA; 1. > KW Oxidoreductase. > SQ SEQUENCE 569 AA; 62916 MW; 6002F909C6458F99 CRC64; > MEYIIKNGFV YCPLNNVDGE KMDICLRDGK IVESVSDSAK VIDASGKIVM PGGVDPHSHI > AGAKVNVGRM YRPEDSRRDA EKFKAGRAGS GFSVPSTFMT GYRYAQMGYT TAMEAAMPPL > LARHTHEEFH DTPIIDHAAY PLFGNNWFVM EYLKEGDVDA CAAYASWLLK ATKGYTIKIV > NPAGTEAWGW GGNVHGIHDP APYFDITPAE IIKGLAEVNE KLQLPHSIHL HCNDLGHPGN > YETTLASFDV PKNIKANPAT GERDTVLYAT HVQFHSYGGT TWRDFVSEAP KIADYVNKND > HIVIDVGQIT LDETTTMTAD GPMEYDLHSL NGLKWANCDV ELETGSGVVP FIYSARAPVP > AVQWAIGMEL FLLIDDPAKV CLTTDSPNAG PFTRYPRVIA WLMSNKYRMN LIEGELHKWA > QRKSTIATVD REYTFSEIAQ ITRSTSAKVL GLSETKGHLG VGADADIAVY NINPETIDPS > VDYMAIEEGF SRAAYVLKDG EIVVKDGEVV ASPHGRTYWV DTRVDESTYN EVLAKVESKF > KQYYSVNFAN YPVQDEYLPK SAPVKGVML > // > > ID Q48921_METBU PRELIMINARY; PRT; 163 AA. > AC Q48921; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanococcoides burtonii. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanococcoides. > OX NCBI_TaxID=29291; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 6242; > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22234; AAC43406.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q48921; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17494 MW; D1B83159FB0DA521 CRC64; > GSYMSGGVGF TQYATAAYTN NILDDNLYYN VDYINDKYDG AANKGADNKV KATMDVVKDI > ATESTIYGIE NYEKYPTALE DHFGGSQRAT VLSAAAGSAT ALATGNGNAG LSGWYLSMYL > HKEALGRLGF FGFDLQDQCG ATNVFSFQSD EGLPLELRGP NYP > // > > ID Q9C4H3_9EURY PRELIMINARY; PRT; 163 AA. > AC Q9C4H3; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR34. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; environmental samples. > OX NCBI_TaxID=143058; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313838; AAK16868.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q9C4H3; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17533 MW; F881F49FA3B2AC71 CRC64; > GSYMSGGVGF TQYATAAYTD DILDNNVYYD VDYINDKYNG AANVGKDNKV KASLDVVKDI > ATESTLYGIE TYEKFPTALE DHFGGSQRAT VLAAAAGVAC SLATANANAG LSGWYLSMYL > HKEAWGRLGF FGYDLQDQCG ATNVLSYQGD EGLPDELRGP NYP > // > > ID Q9C4T7_9ARCH PRELIMINARY; PRT; 148 AA. > AC Q9C4T7; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS uncultured archaeon 94D. > OC Archaea; environmental samples. > OX NCBI_TaxID=140222; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Hougaard E., Westermann P.; > RL Submitted (MAY-2000) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF268650; AAK07761.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q9C4T7; 1-148. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 148 148 > SQ SEQUENCE 148 AA; 16014 MW; 8FE4799E15A3F012 CRC64; > TQYATAAYTD DILDNNVYYN VDYINDKYNG AANVGTDNKV KATLDVVKDI ATESTLYGIE > TYEKFPTALE DHFGGSQRAT VLAAAAGVAT ALATANANAG LSGWYLSMYL HKEAWGRLGF > FGFDLQDQCG ATNVLSYQGD EGLPDELR > // > > ID Q9C4U9_9ARCH PRELIMINARY; PRT; 149 AA. > AC Q9C4U9; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS uncultured archaeon 22C. > OC Archaea; environmental samples. > OX NCBI_TaxID=140195; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Hougaard E., Westermann P.; > RL Submitted (MAY-2000) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF268623; AAK07749.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q9C4U9; 1-149. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 149 149 > SQ SEQUENCE 149 AA; 16054 MW; 96C1717B04EB559E CRC64; > YMSGGVGFTQ YATAAYTNDV LDDFSYYGVD YANDKFGGFA KAPATIDVAK ELATEVTLYG > IEQYEAFPTL LEDHFGGSQR AAVLAAASGI TSAIATGHSQ IGLAGWYLSM LLHKEAWGRL > GFFGYDLQDQ CGPTNVFSYQ SDEGNPLEL > // > > ID Q50386_9EURY PRELIMINARY; PRT; 163 AA. > AC Q50386; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanohalophilus. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae. > OX NCBI_TaxID=2175; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=HCM6; > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22255; AAC43422.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q50386; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17595 MW; FD6122A402A2F226 CRC64; > GSYMSGGVGF TQYSTAAYTN NILDDNLYYD VEYINDKYDG AADKGIDNKV APSMDVIKDI > ATESTLYGLE NYEKYPVALE DHFGGSQRAT VLSAAAGSAG SLATGNANAG LSAWYLCMYL > IKEGHGRLGF FGYDLQDQCG ATNTFSYQSD EGLPHELRGP NYP > // > > ID Q977Q9_9CREN PRELIMINARY; PRT; 194 AA. > AC Q977Q9; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Uncharacterized protein. > OS uncultured crenarchaeote 4B7. > OC Archaea; Crenarchaeota; Thermoprotei; marine archaeal group 1; > OC environmental samples. > OX NCBI_TaxID=44557; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21633832; PubMed=11772643; DOI=10.1128/AEM.68.1.335-345.2002; > RA Beja O., Koonin E.V., Aravind L., Taylor L.T., Seitz H., Stein J.L., > RA Bensen D.C., Feldman R.A., Swanson R.V., DeLong E.F.; > RT "Comparative genomic analysis of archaeal genotypic variants in a > RT single population and in two different oceanic provinces."; > RL Appl. Environ. Microbiol. 68:335-345(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U40238; AAK66811.1; -; Genomic_DNA. > SQ SEQUENCE 194 AA; 21665 MW; 0B0E1876DB9B074E CRC64; > MNYKIPSDAM VGKYTVVADT HFGQIEKSFF VINDFDVVNS VPSSEPSSKV SKIIEKFNRI > AGNEISITLD EKSTVESTLV PRVIQGSLFT SARGEESSVN LRITTTNGQC IIGPDSNCLV > TESTRKPGEI YSLVSIDDVN YNIRYSGNDV RLEKFSIVPE DSNSKIVLDT WNVEIIKDDQ > PSRFYYKVSY VALQ > // > > ID Q9C4G2_9EURY PRELIMINARY; PRT; 157 AA. > AC Q9C4G2; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR46. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=143120; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313849; AAK16879.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q9C4G2; 1-157. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 157 157 > SQ SEQUENCE 157 AA; 16867 MW; 4D6E12DDBA0F8976 CRC64; > GGYMSGGVGF TMYATPAYTN DILDDYVYWG VDYLANKFGK PGSAKATIET VKDIGTEVTL > YGIEQYEKYP TTLEDHFGGS QRATVLALAA GTATAMATGH SNAGLSAWYL SMYLHKEAWG > RLGFYGYDLQ DQCGATNVFS LGSDEGCLGE VRGANYP > // > > ID Q977H6_9EURY PRELIMINARY; PRT; 138 AA. > AC Q977H6; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanobacterium sp. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanobacterium; environmental samples. > OX NCBI_TaxID=176306; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=22315376; PubMed=12427943; > RA Luton P.E., Wayne J.M., Sharp R.J., Riley P.W.; > RT "The mcrA gene as an alternative to 16S rRNA in the phylogenetic > RT analysis of methanogen populations in landfill."; > RL Microbiology 148:3521-3530(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF414031; AAL29280.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q977H6; 1-138. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 138 138 > SQ SEQUENCE 138 AA; 15223 MW; 602A614E8FA498CF CRC64; > YTDNILDDFT YFGKEYVEDK YGITEAPNNM DTVLDVASEV TFYALEQFED YPALLETIFG > GSQRASIVAA AAGCSTAFAT GNAQTGLSGW YLSMYLHKEQ HSRLGFYGYD LQDQCGASNV > FSIRGDEGLP TELRGANY > // > > ID Q9C4E7_9EURY PRELIMINARY; PRT; 239 AA. > AC Q9C4E7; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-ME29. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=143086; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313864; AAK16894.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q9C4E7; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25859 MW; EA6EEA4517DAC8EA CRC64; > MISAYKQAAG EAATGDFAYA AKHAEVIHMG NALPQRRARG ENEPGGIAFG FLADIVQTGR > KYPDDPVRQT LDVVAAGAML YDQIWLGSYM SGGVGFTQYA TASYTDNVLD DFTYFGQEYV > EDKYGMTEAP NNMDTVLDVA SEVNFYALEQ FEEYPALLET IFGGSQRASI VAAAAGCSTA > FATGNAQTGL SGWYLSMYLH KEQHSRLGFY GYDLQDQCGA SNVFSIRGDE GLPLEARGA > // > > ID Q48922_9EURY PRELIMINARY; PRT; 163 AA. > AC Q48922; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanolobus bombayensis. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanolobus. > OX NCBI_TaxID=38023; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=B-1; > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22257; AAC43407.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q48922; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17484 MW; 399388A228E6EFD9 CRC64; > GSYMSGGVGF TQYATAAYCN NILDDNLYYN VDYINDKYDG AANKGTDNKV AANLDVVKDI > ATESTIYGLE NYEKYPTTLE DHFGGSQRAT SLSAAAGSAV SIATGNGNAG LSGWYLCMYL > HKEALGRLGF FGYDLQDQCG ATNVLSYQSD EGLALELRGP NYP > // > > ID Q977H5_9EURY PRELIMINARY; PRT; 146 AA. > AC Q977H5; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 14. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanocorpusculum sp. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC Methanocorpusculaceae; Methanocorpusculum; environmental samples. > OX NCBI_TaxID=176309; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=22315376; PubMed=12427943; > RA Luton P.E., Wayne J.M., Sharp R.J., Riley P.W.; > RT "The mcrA gene as an alternative to 16S rRNA in the phylogenetic > RT analysis of methanogen populations in landfill."; > RL Microbiology 148:3521-3530(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF414032; AAL29281.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 146 146 > SQ SEQUENCE 146 AA; 16360 MW; 7CC6067D71956D53 CRC64; > YTDNVLDDFT YYGMDYLHDK YKIDWKNPNP KDKVKATQEV VNDIATEVNL YGMEQYEQFP > TMMEDHFGGS QRAAVLAAAS GITTSIATGN SNAGLNGWYL SMLLHKDGWS RLGFFGYDLQ > DQCGSANSLS IRPDEGCIGE FRGPNY > // > > ID Q9C4D4_9EURY PRELIMINARY; PRT; 239 AA. > AC Q9C4D4; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-ME44. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=143090; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313877; AAK16907.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q9C4D4; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25806 MW; 3EA01889C8018009 CRC64; > MISAYKQAAG EAATGDFAYA AKHAEVVHMG SYLPVRRARG ENEPGGIAFG FLADIVQTPR > AYPDDPVRQT LDAVAAGAML YDQIWLGSYM SGGVGFTQYA TAAYTDNVLD DFTYFGQEYV > EDKYGMTEAP NNMDTVLDVA SEVNFYALEQ FEDYPALLET IFGGSQRASI VAAAAGCSTA > FATGNAQTGL SGWYLSMYLH KEQHSRLGFY GYDLQDQCGA SNVFSIRGDE GLPLEARGA > // > > ID Q977I1_9EURY PRELIMINARY; PRT; 146 AA. > AC Q977I1; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 14. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanocorpusculum sp. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC Methanocorpusculaceae; Methanocorpusculum; environmental samples. > OX NCBI_TaxID=176309; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=22315376; PubMed=12427943; > RA Luton P.E., Wayne J.M., Sharp R.J., Riley P.W.; > RT "The mcrA gene as an alternative to 16S rRNA in the phylogenetic > RT analysis of methanogen populations in landfill."; > RL Microbiology 148:3521-3530(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF414026; AAL29275.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 146 146 > SQ SEQUENCE 146 AA; 16309 MW; 5B6B73519374DB7F CRC64; > YTDNILDDFT YYGMDYLHDK YKIDTKNPNP NDKVKATQEV VNDIASEVNL YGMEQYEQFP > TMMEDHFGGS QRAAVLAAAS GITTSIATSN SNAGLNGWYL SMLMHKDGWS RLGFFGYDLQ > DQCGSANSLS IRPDEGCIGE FRGPNY > // > > ID Q8NKK8_9EURY PRELIMINARY; PRT; 239 AA. > AC Q8NKK8; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Saarnio S., Fritze H., Yrjala K.; > RT "Depth related diversity of methanogen Archaea in Finnish oligotrophic > RT fen."; > RL FEMS Microbiol. Ecol. 42:441-449(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ489774; CAD33997.2; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q8NKK8; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25656 MW; 1A486F1D22686292 CRC64; > FIAAYNMCAG EAAVADLAFA AKHASLVEMA NMLPARRARG PNEPGGLSFG FMADICQKDR > IKPDDPVEVS LEVVAAGCML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DYSYYGNDYA > KKYGANGKAP KTMDVVNDLA TEVTLYGIEQ YEKYPTTLED HFGGSQRATV LAAASGVTAA > IATGNSNAGL SAWYQSMLLH KDAWGRLGFY GYDLQDQCGS TNVFSVRSDE GSPDELRGA > // > > ID Q977G2_9EURY PRELIMINARY; PRT; 139 AA. > AC Q977G2; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS Methanobrevibacter ruminantium. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanobrevibacter. > OX NCBI_TaxID=83816; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 1093; > RX MEDLINE=22315376; PubMed=12427943; > RA Luton P.E., Wayne J.M., Sharp R.J., Riley P.W.; > RT "The mcrA gene as an alternative to 16S rRNA in the phylogenetic > RT analysis of methanogen populations in landfill."; > RL Microbiology 148:3521-3530(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF414046; AAL29295.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q977G2; 1-139. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 139 139 > SQ SEQUENCE 139 AA; 15325 MW; CAB4B4D04B22FB49 CRC64; > YTDNVLDDFS YFGKDYVEDK YGDLCSAPND MDTVLDVGSA VTFYSLEQYE EYPALLETHF > GGSQRAAVVS AASGISTAFA TGNAQTGLSA WYLAQYLHKE QHSRLGFYGY DLQDQCGAAN > VFAIRNDEGL PLELRGPNY > // > > ID Q9C4I3_9EURY PRELIMINARY; PRT; 157 AA. > AC Q9C4I3; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR24. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosaetaceae; environmental samples. > OX NCBI_TaxID=143073; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313828; AAK16858.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 157 157 > SQ SEQUENCE 157 AA; 16803 MW; F5FCAD90BACC7FFC CRC64; > GSYMSGGVGF TQYATAAYTN DVLDDFCYYG VDFAVDKLGG FAKAPKTLDV AKELGTEVTL > YGMEQYEAYP TLLEDHFGGS QRASVLAAAS GITAAIASGH SQVGLAAWYL SMLLHKEGWG > RLGFFGYDLQ GQCGPTNVFS YQSDEGNPVE MRGANYP > // > > ID Q8NKK4_9EURY PRELIMINARY; PRT; 239 AA. > AC Q8NKK4; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Saarnio S., Fritze H., Yrjala K.; > RT "Depth related diversity of methanogen Archaea in Finnish oligotrophic > RT fen."; > RL FEMS Microbiol. Ecol. 42:441-449(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ489778; CAD34001.2; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q8NKK4; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25584 MW; EFB2217B1B240F09 CRC64; > FIAAYNMCAG EAAVADLAFA AKHASLVEMA SMLPARRARG PNEPGGLSFG FMADICQKDR > IKPDDPVEVS LEVVAAGCML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DYSYYGNDYA > KKYGANGKAP KTMDVVNDIA TEVTLYGIEQ YEKYPTTLED HFGGSQRATV LAAASGVAAA > IATGNSNAGL SAWYLSMLLH KDAWGRLGFY GYDLQDQCGS TNVFSVRSDE GSPDELRGA > // > > ID Q8NKK9_9EURY PRELIMINARY; PRT; 239 AA. > AC Q8NKK9; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Saarnio S., Fritze H., Yrjala K.; > RT "Depth related diversity of methanogen Archaea in Finnish oligotrophic > RT fen."; > RL FEMS Microbiol. Ecol. 42:441-449(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ489773; CAD33996.2; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q8NKK9; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25669 MW; 4B9ABECE39E14B60 CRC64; > FIAAYNMCAG EAAVADLAFA AKHASLVEMA NMLPARRARG PNEPGGLSFG FMADICRKDR > IKPDDPVEVS LEVVAAGCML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DYSYYGNDYA > KKYGANGKAP KTMDVVNDLA TEVTLYGIEQ YEKYPTTLED HFGGSQRATV LAAASGVTAA > IATGNSNAGL SAWYLSMLLH KDAWGRLGFY GYDLQDQCGS TNVFSVRSDE GSPDELRGA > // > > ID O74030_METWO PRELIMINARY; PRT; 569 AA. > AC O74030; > DT 01-NOV-1998, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1998, sequence version 1. > DT 30-MAY-2006, entry version 22. > DE Tungsten formylmethanofuran dehydrogenase subunit fwdA. > GN Name=fwdA; > OS Methanobacterium wolfei. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanothermobacter. > OX NCBI_TaxID=145261; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Hochheimer A., Hedderich R., Thauer R.K.; > RT "The formylmethanofuran dehydrogenase isoenzymes in Methanobacterium > RT wolfei and Methanobacterium thermoautotrophicum: Induction of the > RT molybdenum isoenzyme by molybdate and constitutive synthesis of the > RT tungsten isoenzyme."; > RL Submitted (JUL-1998) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ009688; CAA08783.1; -; Genomic_DNA. > DR InterPro; IPR013108; Amidohydro_3. > DR InterPro; IPR012027; FwdA. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF07969; Amidohydro_3; 1. > DR PIRSF; PIRSF006453; FwdA; 1. > SQ SEQUENCE 569 AA; 62668 MW; 39C0351A7932E678 CRC64; > MEYIIKNGFV YCPLNGVDGE KMDICVKDGK IVESVSDSAK VIDASGKIVM PGGVDPHSHI > AGAKVNVGRM YRPEDSKRDA EKFKGGRAGS GFSVPSTFMT GYRYAQMGYT TAMEAAMPPL > LARHTHEEFH DTPIIDHAAY PLFGNNWFVM EYLKEGDVDA CAAYASWLLR ATKGYTIKIV > NPAGTEAWGW GGNVHGIYDP APYFDITPAE IIKGLAEVNE KLQLPHSIHL HCNDLGHPGN > YETTLASFDV PKNIKPNPAT GSRDTVLYAT HVQFHSYGGT TWRDFLSEAP KIADYVNKND > HIVIDVGQIT LDETTTMTAD GPMEYDLHSL NGLKWANCDV ELETGSGVVP FIYSARAPVP > AVQWAIGMEL FLLIDNLEKV CLTTDSPNAG PFTRYPRVIA WLMSNKYRMN LIEGELHKWA > QRKSTVATID REYTFSEIAQ ITRATSAKVL GLSDTKGHLG VGADADIAVY DINPETIDPS > AEYMAIEEAF SRAACVLKDG GIVVKDGEVV ASPHGRTYWV DTQVDESIYS EVLANVESKF > KQYYSVNFAN YPVQDDYLPK SAPVKGVML > // > > ID Q9C4J6_9EURY PRELIMINARY; PRT; 156 AA. > AC Q9C4J6; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR09. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=143096; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313815; AAK16845.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q9C4J6; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16957 MW; 9D40CDE824BF5950 CRC64; > GSYMSGGVGF TQYATAAYTD NVLDDFTYFG QEYVEDKYGM TEAPNNMDTV LDVASEVNFY > ALEQFEDYPA LLETIFGGSQ RASIVAAAAG CSTAFATGNA QTGLSGWYLS MYLHKEQHSR > LGFYGYDLQD QCGASNVFSI RGDEGLPLEA RGANYP > // > > ID Q9C4J4_9EURY PRELIMINARY; PRT; 155 AA. > AC Q9C4J4; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 16. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS uncultured methanogen RS-MCR11. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=143097; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313817; AAK16847.1; -; Genomic_DNA. > DR HSSP; Q49605; 1E6V. > DR SMR; Q9C4J4; 1-155. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 155 155 > SQ SEQUENCE 155 AA; 16792 MW; 43455219C74F304A CRC64; > GAYMPGGVGF TQYATAAYTD DILDDFLYYG KEYVEDKFGI CQAKADMDVV KDISSEVTLY > AMEQYEIPTL LESHFGGSQR AAVAAAAAGC STAFATGNSN AGINGWYLSQ ILHKEVHSRL > GFYGYDLQDQ CGASNSLSVR SDEGLIHELR GPNYP > // > > ID Q9C4G7_9EURY PRELIMINARY; PRT; 156 AA. > AC Q9C4G7; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-MCR41. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=143118; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313844; AAK16874.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q9C4G7; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 17046 MW; 8410E6ADB309F0A7 CRC64; > GSYMSGGVGF TQYATAAYTD DILDDFTYYG YDYAKGKYKI GQTKPTMDIV NDLGTEVTLY > GIEQYEKYPT TLEDHFGGSQ RATVLAAASG VTTAMATGNS NAGLSAWYLS MYLHKEAWGR > LGFFGYDLQD QCGATNVFSC RSDEGAIDEL RGPNYP > // > > ID Q8NKL6_9EURY PRELIMINARY; PRT; 239 AA. > AC Q8NKL6; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Saarnio S., Fritze H., Yrjala K.; > RT "Depth related diversity of methanogen Archaea in Finnish oligotrophic > RT fen."; > RL FEMS Microbiol. Ecol. 42:441-449(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ489765; CAD33988.2; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q8NKL6; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25637 MW; F32E0A0F95048BE3 CRC64; > FIAAYNMCAG EAAVADLAFA AKHASLVEMA NMLPARRARG PNEPGGLSFG FMADMVQTDR > IYPDDPVRSS LEVVAAGCML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DYSYYGNDYA > KKYGEDGKAP ATMDVVNDLG TEVTLYGTEQ YEKYPTTLED HFGGSQRATV LAAASGVTAA > IATGNSNAGL SAWYLSMLLH KDAWGRLGFY GYDLQDQCGS TNVFSVRSDE GAPDELRGA > // > > ID Q977S5_9CREN PRELIMINARY; PRT; 105 AA. > AC Q977S5; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 14. > DE Rhodanese-related sulfurtransferase/oxidoreductase. > OS uncultured crenarchaeote 4B7. > OC Archaea; Crenarchaeota; Thermoprotei; marine archaeal group 1; > OC environmental samples. > OX NCBI_TaxID=44557; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21633832; PubMed=11772643; DOI=10.1128/AEM.68.1.335-345.2002; > RA Beja O., Koonin E.V., Aravind L., Taylor L.T., Seitz H., Stein J.L., > RA Bensen D.C., Feldman R.A., Swanson R.V., DeLong E.F.; > RT "Comparative genomic analysis of archaeal genotypic variants in a > RT single population and in two different oceanic provinces."; > RL Appl. Environ. Microbiol. 68:335-345(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U40238; AAK66795.1; -; Genomic_DNA. > DR GO; GO:0016740; F:transferase activity; IEA. > DR InterPro; IPR001763; Rhodanese-like. > DR Pfam; PF00581; Rhodanese; 1. > DR SMART; SM00450; RHOD; 1. > DR PROSITE; PS50206; RHODANESE_3; 1. > KW Transferase. > SQ SEQUENCE 105 AA; 11534 MW; 4633DAA6005AD61B CRC64; > MSIQYKLIMA ESISKDQLKQ EQGSFVIIDV REPDEISNGA IENSKNMSLG LAIRNAKKGQ > IDDLKDKKIC VYCASGYRGN IAADELVKAG FSAVNLEGGY MAWTQ > // > > ID Q9C4F8_9EURY PRELIMINARY; PRT; 239 AA. > AC Q9C4F8; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogen RS-ME05. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=143081; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21217833; PubMed=11321536; > RX DOI=10.1046/j.1462-2920.2001.00179.x; > RA Lueders T., Chin K.-J., Conrad R., Friedrich M.; > RT "Molecular analyses of methyl-coenzyme M reductase alpha-subunit > RT (mcrA) genes in rice field soil and enrichment cultures reveal the > RT methanogenic phenotype of a novel archaeal lineage."; > RL Environ. Microbiol. 3:194-204(2001). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF313853; AAK16883.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q9C4F8; 1-238. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25933 MW; AE0DC2B140FAB645 CRC64; > MISAYKQAAG EAATGDFAYA AKHAEVILMG NALPQRRARG ENEPGGIAFG FLADIVQTGR > KYPBDPVRQT LDVVAAGAML YDQIWLGSYM SRGVGFTQYA TASYTDNVLD DFTYFGQEYV > EDKYGMTEAP NNMDTVLDVA SEVNFYALEQ FEEYPALLET IFGGSQRASI VAAAAGCSTA > FATGNAQTGL SGWYLSMYLH KEQHSRLGFY GYDLQDQCGA SNVFSIRGDE GLPLEARGA > // > > ID Q977L5_9CREN PRELIMINARY; PRT; 122 AA. > AC Q977L5; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Rossmann-fold nucleotide-binding protein. > OS uncultured crenarchaeote 74A4. > OC Archaea; Crenarchaeota; Thermoprotei; marine archaeal group 1; > OC environmental samples. > OX NCBI_TaxID=166279; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21633832; PubMed=11772643; DOI=10.1128/AEM.68.1.335-345.2002; > RA Beja O., Koonin E.V., Aravind L., Taylor L.T., Seitz H., Stein J.L., > RA Bensen D.C., Feldman R.A., Swanson R.V., DeLong E.F.; > RT "Comparative Genomic Analysis of Archaeal Genotypic Variants in a > RT Single Population and in Two Different Oceanic Provinces."; > RL Appl. Environ. Microbiol. 68:335-345(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF393466; AAK96093.1; -; Genomic_DNA. > SQ SEQUENCE 122 AA; 12806 MW; 942F6F9091E4CC24 CRC64; > MNACSHGAKD GNGITVGILP QDDPKFANDY CDIVIPSGMG FTRDFLNALS ADGVIIVGGG > SGTLSEVCAA YMSKKPMVAI RGIESSITPY IDGYVDHRKN VKIIGADTAK EAVEKILELI > TA > // > > ID Q49114_9EURY PRELIMINARY; PRT; 163 AA. > AC Q49114; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 24. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanohalobium evestigatum. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanohalobium. > OX NCBI_TaxID=2322; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22236; AAC43408.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q49114; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17559 MW; 8D906F78370AC39D CRC64; > GSYMSGGVGF TQYATAAYTN NILDDNLYYD IEYINDKYNN AGKPGADNKA PATMDVVKDI > ATESTLYGLE NYEKYPTALE DHFGGSQRAT VLSAAAGSAT SIATGNSNAG LSAWYLSMYL > HKEAHGRLGF FGFDLQDQCG AANVFSYQSD EGLPLELRGP NYP > // > > ID Q8NKW0_ACIAM PRELIMINARY; PRT; 179 AA. > AC Q8NKW0; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Hypothetical protein. > OS Acidianus ambivalens (Desulfurolobus ambivalens). > OC Archaea; Crenarchaeota; Thermoprotei; Sulfolobales; Sulfolobaceae; > OC Acidianus. > OX NCBI_TaxID=2283; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM3772, and Lei 10; > RX MEDLINE=22830413; PubMed=12949162; DOI=10.1099/mic.0.26455-0; > RA Laska S., Lottspeich F., Kletzin A.; > RT "Membrane-bound hydrogenase and sulfur reductase of the > RT hyperthermophilic and acidophilic archaeon Acidianus ambivalens."; > RL Microbiology 149:2357-2371(2003). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM3772, and Lei 10; > RA Laska S.; > RL Thesis (2000), Institute of Microbiology and Genetics, Darmstadt > RL University of Technology, Darmstadt, Germany. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ320523; CAC86880.1; -; Genomic_DNA. > DR GO; GO:0005488; F:binding; IEA. > DR InterPro; IPR011990; TPR-like_helical. > DR InterPro; IPR001440; TPR_1. > DR InterPro; IPR013105; TPR_2. > DR InterPro; IPR013026; TPR_region. > DR Pfam; PF00515; TPR_1; 1. > DR Pfam; PF07719; TPR_2; 2. > DR SMART; SM00028; TPR; 3. > DR PROSITE; PS50005; TPR; 4. > DR PROSITE; PS50293; TPR_REGION; 1. > KW Hypothetical protein; Repeat; TPR repeat. > SQ SEQUENCE 179 AA; 21241 MW; 57267A2E8FC4B26E CRC64; > MSDIDIKINE YKRLIKIEPN NAWHHFHLGE ALEEKGELND AIEEYSRAVE LDPTFPDFHF > KKAEALLKLS KFQEAVQVYD NAIRKDPKDA HLYHYFKGEI YEELGKYEEA LKEYNRAIEW > SPNNSWYRQA KIYLLEKMNR LEDALKEVDE LIKIDSSAVN LELRKEILAK LKNNNNIKV > // > > ID Q50828_9EURY PRELIMINARY; PRT; 163 AA. > AC Q50828; > DT 01-NOV-1996, integrated into UniProtKB/TrEMBL. > DT 01-NOV-1996, sequence version 1. > DT 30-MAY-2006, entry version 25. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrI; > OS Methanolobus vulcani. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanolobus. > OX NCBI_TaxID=38026; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=96174929; PubMed=8590683; > RA Springer E., Sachs M.S., Woese C.R., Boone D.R.; > RT "Partial gene sequences for the A subunit of methyl-coenzyme M > RT reductase (mcrI) as a phylogenetic tool for Methanosarcinaceae."; > RL Int. J. Syst. Bacteriol. 45:554-559(1995). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; U22245; AAC43427.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q50828; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17522 MW; D0A88D74E71C032D CRC64; > GSYMSGGVGF TQYATAAYCN NILDDNLYYN VDYINDKYDG AANKGTDNKV KASLDVVKDI > ATESTIYGLE NYEKYPTTLE DHFGGSQRAT SLSAAAGSAV ALATGNGNAG LSGWYLCMYL > HKEAHGRLGF FGYDLQDQCG ATNVLSYQSD EGLALELRGP NYP > // > > ID Q8NKM9_9CREN PRELIMINARY; PRT; 221 AA. > AC Q8NKM9; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 15. > DE Putative phosphoserine phosphatase. > OS uncultured crenarchaeote. > OC Archaea; Crenarchaeota; environmental samples. > OX NCBI_TaxID=29281; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=22252841; PubMed=12366755; > RX DOI=10.1046/j.1462-2920.2002.00345.x; > RA Quaiser A., Ochsenreiter T., Klenk H.P., Kletzin A., Treusch A.H., > RA Meurer G., Eck J., Sensen C.W., Schleper C.; > RT "First insight into the genome of an uncultivated crenarchaeote from > RT soil."; > RL Environ. Microbiol. 4:603-611(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ496176; CAD42691.1; -; Genomic_DNA. > DR GO; GO:0003824; F:catalytic activity; IEA. > DR GO; GO:0016791; F:phosphoric monoester hydrolase activity; IEA. > DR GO; GO:0008152; P:metabolism; IEA. > DR InterPro; IPR005834; Dehalogen-like_hydro. > DR InterPro; IPR006383; HAD-SF_hydro_IB_PSP-like. > DR Pfam; PF00702; Hydrolase; 1. > DR TIGRFAMs; TIGR01488; HAD-SF-IB; 1. > SQ SEQUENCE 221 AA; 24742 MW; 88F6ED94D289722E CRC64; > MLYPMEFKST LAVFDMDGTL IDGRLIEVLS KKFGLYAQVR HIQSDKSIPG YVKTQKIAAV > IRGIEEREIE IALDSIPPAK NSQEVISLLK KKGFRIGIIT DSYSVAAQAL VNKLDLDFFY > ANELKVDNGI VTGEINMPLG WEKIDCFCKN SVCKRYHMEI HAKKICADIK NTIAIGDTKG > DLCMIKQAGI GIAYMPKDKY INETINKVNT PDMIGVLDFI E > // > > ID Q977G1_9EURY PRELIMINARY; PRT; 138 AA. > AC Q977G1; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS Methanosphaera stadtmanae. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanosphaera. > OX NCBI_TaxID=2317; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 3091; > RX MEDLINE=22315376; PubMed=12427943; > RA Luton P.E., Wayne J.M., Sharp R.J., Riley P.W.; > RT "The mcrA gene as an alternative to 16S rRNA in the phylogenetic > RT analysis of methanogen populations in landfill."; > RL Microbiology 148:3521-3530(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF414047; AAL29296.1; -; Genomic_DNA. > DR HSSP; Q49605; 1E6V. > DR SMR; Q977G1; 1-138. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 138 138 > SQ SEQUENCE 138 AA; 14930 MW; D4A7A5D2426675C4 CRC64; > YTDEILDDFI YYGKDYVEGK YGGLCQAEAT SEVVKDIASE VTLYGLEQYE IPAALEDHFG > GSQRAAVLAA GAGCSVAFAT ANSNAGVNGW YLSQLLHKEG HSRLGFYGYD LQDQCGSSNS > LSVRSDEGLI HELRGPNY > // > > ID Q8NKT2_ACIAM PRELIMINARY; PRT; 145 AA. > AC Q8NKT2; > DT 01-OCT-2002, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2002, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Molybdopterin biosynthesis protein (Fragment). > GN Name=moeA; > OS Acidianus ambivalens (Desulfurolobus ambivalens). > OC Archaea; Crenarchaeota; Thermoprotei; Sulfolobales; Sulfolobaceae; > OC Acidianus. > OX NCBI_TaxID=2283; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=Lei 10; > RX MEDLINE=22830413; PubMed=12949162; DOI=10.1099/mic.0.26455-0; > RA Laska S., Lottspeich F., Kletzin A.; > RT "Membrane-bound hydrogenase and sulfur reductase of the > RT hyperthermophilic and acidophilic archaeon Acidianus ambivalens."; > RL Microbiology 149:2357-2371(2003). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=Lei 10; > RA Laska S.; > RL Thesis (2000), Department of Institute of Microbiology and Genetics, > RL Darmstadt University of Technology, Darmstadt, Germany. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ345004; CAC86942.1; -; Genomic_DNA. > DR GO; GO:0006777; P:Mo-molybdopterin cofactor biosynthesis; IEA. > DR InterPro; IPR001453; MPT_bd. > DR Pfam; PF00994; MoCF_biosynth; 1. > DR ProDom; PD002460; MoCF_biosynth; 1. > FT NON_TER 1 1 > SQ SEQUENCE 145 AA; 15661 MW; 7A6AC63B61AA3465 CRC64; > VIKAKADKYG WGIVYKEIVP DDENKIVEAI KSAVNNGAEA VIVTGGTSVD ATDKTPIAIS > KLGKIISYGL PIKPTTMSIV AMWDSIPIFG ISAGGIYYRD YNAIDVMFTR LMAGIIPRKE > DIAMMGHGGL LPNFQPKMKL QSINS > // > > ID Q977L6_9CREN PRELIMINARY; PRT; 393 AA. > AC Q977L6; > DT 01-DEC-2001, integrated into UniProtKB/TrEMBL. > DT 01-DEC-2001, sequence version 1. > DT 30-MAY-2006, entry version 19. > DE 3-hydroxyacyl-CoA dehydrogenase. > OS uncultured crenarchaeote 74A4. > OC Archaea; Crenarchaeota; Thermoprotei; marine archaeal group 1; > OC environmental samples. > OX NCBI_TaxID=166279; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=21633832; PubMed=11772643; DOI=10.1128/AEM.68.1.335-345.2002; > RA Beja O., Koonin E.V., Aravind L., Taylor L.T., Seitz H., Stein J.L., > RA Bensen D.C., Feldman R.A., Swanson R.V., DeLong E.F.; > RT "Comparative Genomic Analysis of Archaeal Genotypic Variants in a > RT Single Population and in Two Different Oceanic Provinces."; > RL Appl. Environ. Microbiol. 68:335-345(2002). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF393466; AAK96092.1; -; Genomic_DNA. > DR HSSP; Q16836; 1F0Y. > DR GO; GO:0009331; C:glycerol-3-phosphate dehydrogenase complex; IEA. > DR GO; GO:0004367; F:glycerol-3-phosphate dehydrogenase (NAD+) a...; IEA. > DR GO; GO:0016491; F:oxidoreductase activity; IEA. > DR GO; GO:0006631; P:fatty acid metabolism; IEA. > DR GO; GO:0006072; P:glycerol-3-phosphate metabolism; IEA. > DR InterPro; IPR006108; 3HC_DH_C. > DR InterPro; IPR006176; 3HCDH_NAD_bd. > DR InterPro; IPR008927; 6DGDH_C_like. > DR InterPro; IPR000205; FAD/NAD(P)_BS. > DR InterPro; IPR006168; NAD_Gly3P_DH. > DR Pfam; PF00725; 3HCDH; 1. > DR Pfam; PF02737; 3HCDH_N; 1. > DR PRINTS; PR00077; GPDHDRGNASE. > SQ SEQUENCE 393 AA; 44177 MW; 0384DDDD22E9B91E CRC64; > MKKYVIKDTE FHQHMLVKNI TVLGSGVMGH GIAQVSATAG YNVVLRDIKQ EFLDKAMEKI > KWSLDKLVSK EKITKEEADS IFSRIIPIVD LKEAVKNAEM IIEVVPEIME LKKKVYAELD > AVAGPDVIFA SNTSTLPITE IANTTSRPEK FIGIHFFNPP QLMKLVEIIP GEKTTQAITD > LTQEYVKSVN KQAVPSRKDV PGFIINRLFI PMVHEACYAQ DRTNATLEEI DSAVKFKLGF > PMGIFELADF TGMDVIHKAT VEMHLRDKKV INPHPTIGKM FDEKKLGQKS GEGYYKYSDD > KYERVALSEE LAEKFNPIQL VANILNNAAW LITNGASDIS EIEKAAQLGL GLKKPLFETA > KEIGIKNIVD ELNRLEKEHG EFYKPDPLLI SML > // > > ID Q9C4S0_METAC PRELIMINARY; PRT; 310 AA. > AC Q9C4S0; > DT 01-JUN-2001, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2001, sequence version 1. > DT 30-MAY-2006, entry version 14. > DE Hypothetical protein (Fragment). > OS Methanosarcina acetivorans. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanosarcina. > OX NCBI_TaxID=2214; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=C2A; > RX PubMed=15882413; DOI=10.1111/j.1365-2958.2005.04616.x; > RA Pritchett M.A., Metcalf W.W.; > RT "Genetic, physiological and biochemical characterization of multiple > RT methanol methyltransferase isozymes in Methanosarcina acetivorans > RT C2A."; > RL Mol. Microbiol. 56:1183-1194(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AF319042; AAK07602.1; -; Genomic_DNA. > DR GO; GO:0003677; F:DNA binding; IEA. > DR GO; GO:0003887; F:DNA-directed DNA polymerase activity; IEA. > DR GO; GO:0006260; P:DNA replication; IEA. > DR InterPro; IPR004013; PHP_C. > DR InterPro; IPR003141; PHP_N. > DR Pfam; PF02811; PHP; 1. > DR SMART; SM00481; POLIIIAc; 1. > KW Hypothetical protein. > FT NON_TER 1 1 > SQ SEQUENCE 310 AA; 35378 MW; 0E738DBCF3A7D5AC CRC64; > QEGWKKVDLH VHSSCSYDVP PAKAMHPSVL FEKARAKGLD YVTFTDHDTV EAYNLLGWDR > EGLVPGVEIS IKDPENIGHT LHVNVFELDS EEFGELEAIV NQEHDFKSFI RYLRLHDLPH > IYNHPFWFAI GDRPNLRAVP ELIKQFPVIE YNMQDLTEKN LITAALARKY GKGLAATTDS > HTGGMGAVYT LAKGDSFREY FDNIKNGRSY MVIEGGARRH LTKELNAWVE LVFSMDRNAR > GEVGFTTNVK TFDRLIGFFA NGKIREFPRI NGLAMKFFQN FSRSGLPAYM YMRAEKPLIS > RIEKVVNLTA > // > > ID Q7ZAF4_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAF4; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured Methanosarcinaceae archaeon. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; environmental samples. > OX NCBI_TaxID=176230; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125612; AAN02179.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAF4; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17640 MW; 075472D65C69DF73 CRC64; > GAYMSGGVGF TQYATAAYTD DILDDFVYYG YDYIKGKYGV AKAKCTMDVV NDIGTEVTLY > GIEQYEKYPT TLEDHFGGSQ RATVLSAAAG SVTSMATGNA NAGLSAWYLS MYLHKEAWGR > LGFFGYDLQD QCGATNVFSC RSDEGAIDEL RGPNYPNYAM NV > // > > ID Q7ZAF2_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAF2; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS uncultured Methanobacteriaceae archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=176233; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125614; AAN02181.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAF2; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17577 MW; 6B661B485480F0B3 CRC64; > GSYMSGGVGF TQYATAAYTD NVLDDFTYYG KEYVEDKYGM TEAPNTMETV LDVGSEVTFY > ALEQFEEYPA LLETVFGGSQ RASIVAAAAG ASTGFATGNA QTGLSAWYLS MYLHKEQHSR > LGFYGYDLQD QCGAANTFAI RGDEGLPLEA RGANYPNYAM NV > // > > ID Q7ZAD5_9EURY PRELIMINARY; PRT; 163 AA. > AC Q7ZAD5; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured Methanobacteriaceae archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=176233; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125640; AAN02207.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAD5; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17793 MW; 8CAB3E679B971E1F CRC64; > LGSYMSGGVG FTQYATASYT DNVLDDFTYF GQEYVEDKYG MTEAPNNMDT VLDVASEVNF > YALEQFEEYP ALLETIFGGS QRASIVAAAA GCSTAFATGN AQTGLSGWYL SMYLHKEQHS > RLGFYGYDLQ DQCGASNVFS IRGDEGLPLE ARGANYPNYA MNV > // > > ID Q7ZAC4_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAC4; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS uncultured Methanobacteriaceae archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=176233; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125659; AAN02226.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAC4; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17679 MW; D7357D7790590D5F CRC64; > GSYMSGGVGF TQYATASYTD NVLDDFTYFG QEYVEDKYGM TEAPNNMDTV LDVASEVNFY > ALEQFEEYPA LLETIFGGSQ RASIVAAAAG CSTAFATGNA QTGLSGWYLS MYLHKEQHSR > LGFYGYDLQD QCGASNVFSI RGDEGLPLEA RGANYPNYAM NV > // > > ID Q7ZAC2_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAC2; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured Methanosarcinaceae archaeon. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; environmental samples. > OX NCBI_TaxID=176230; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125661; AAN02228.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAC2; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17641 MW; 7989F6A4E5B474E1 CRC64; > GSYMSGGVGF TQYATAAYTD DILDDFCYYG YDYIKGKYGV AKAKCTMDVV NDIGSEVTLY > GIEQYEKYPT TLEDHFGGSQ RATVLSAAAG VTTSIATGNA NAGLSAWYLS MYLHKEAWGR > LGFFGYDLQD QCGATNVFSC RSDEGAIDEL RGPNYPNYAM NV > // > > ID Q7ZAC0_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAC0; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS uncultured Methanobacteriaceae archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=176233; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125663; AAN02230.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAC0; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17593 MW; B47E9015E1F4FFDB CRC64; > GSYMSGGVGF TQYATAAYTD NVLDDFTYFG QEYVEDKYGM TEAPNTMETV LDVGSEVTFY > ALEQFEDYPA LLETIFGGSQ RASIVAAAAG CSAGFATGNA QTGLSAWYLS MYLHKEQHSR > LGFYGYDLQD QCGASNVFSI RGDEGLPLEA RGANYPNYAM NV > // > > ID Q7ZAB9_9EURY PRELIMINARY; PRT; 162 AA. > AC Q7ZAB9; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured Methanobacteriaceae archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; environmental samples. > OX NCBI_TaxID=176233; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Chin K.-J., Lueders T., Klose M., Conrad R.; > RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY125665; AAN02232.1; -; Genomic_DNA. > DR HSSP; P11558; 1HBN. > DR SMR; Q7ZAB9; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17603 MW; 92756C77904C5818 CRC64; > GSYMSGGVGF TQYATASYTD NVLDDFTYFG QEYVEDKYGM TEAPNNMDTV PDVASEVNFY > ALEQFEEYPA LLETIFGGSQ RASIVAAAAG CSTAFATGNA QTGLSGWYLS MYLHKEQHSR > LGSYGYDLQD QCGASNVFSI RGDEGLPLEA RGANYPNYAM NV > // > > ID Q7ZA90_9EURY PRELIMINARY; PRT; 147 AA. > AC Q7ZA90; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS Methanothermobacter thermophilus. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanothermobacter. > OX NCBI_TaxID=49341; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 6529; > RA Lausten T., Westermann P., Ahring B.K.; > RL Submitted (MAY-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY289753; AAQ18239.1; -; Genomic_DNA. > DR HSSP; Q49605; 1E6V. > DR SMR; Q7ZA90; 1-147. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 147 147 > SQ SEQUENCE 147 AA; 15996 MW; 2B4EA5B0D40B0DAE CRC64; > GGVGFTQYAT AAYTDDILDD FVYYGMEYVD DKYGICGTKP TMDVVRDIST EVTLYSLEQY > EEYPTLLEDH FGGSQRAAVA AAAAGCSTAF ATGNSNAGIN GWYLSQILHK EAHSRLGFYG > YDLQDQCGAS NSLSIRSDEG LIHELRG > // > > ID Q7ZA87_9EURY PRELIMINARY; PRT; 147 AA. > AC Q7ZA87; > DT 01-OCT-2003, integrated into UniProtKB/TrEMBL. > DT 01-OCT-2003, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase II alpha subunit (Fragment). > GN Name=mrtA; > OS Methanothermobacter thermoflexus. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanothermobacter. > OX NCBI_TaxID=49340; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 7268; > RA Lausten T., Westermann P., Ahring B.K.; > RL Submitted (MAY-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY303951; AAQ21199.1; -; Genomic_DNA. > DR HSSP; Q49605; 1E6V. > DR SMR; Q7ZA87; 1-147. > DR InterPro; IPR008924; MCR_a/b_chain_C. > FT NON_TER 1 1 > FT NON_TER 147 147 > SQ SEQUENCE 147 AA; 16012 MW; D0B87FB0D41DD7A0 CRC64; > GGVGFTQYAT ASYTDDILDD FVYYGMEYVD DKYGICGTKP TMDVVRDIST EVTLYSLEQY > EEYPTLLEDH FGGSQRAAVA AAAAGCSTAF ATGNSNAGIN GWYLSQILHK EAHSRLGFYG > YDLQDQCGAS NSLSIRSDEG LIHELRG > // > > ID Q877F5_9EURY PRELIMINARY; PRT; 240 AA. > AC Q877F5; > DT 01-JUN-2003, integrated into UniProtKB/TrEMBL. > DT 01-JUN-2003, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS Methanomethylovorans hollandica. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosarcinaceae; Methanomethylovorans. > OX NCBI_TaxID=101192; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX MEDLINE=22748895; PubMed=12866859; DOI=10.1078/072320203322346173; > RA Simankova M.V., Kotsyurbenko O.R., Lueders T., Nozhevnikova A.N., > RA Wagner B., Conrad R., Friedrich M.W.; > RT "Isolation and characterization of new strains of methanogens from > RT cold terrestrial habitats."; > RL Syst. Appl. Microbiol. 26:312-318(2003). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY260442; AAP20897.1; -; Genomic_DNA. > DR HSSP; P07962; 1E6Y. > DR SMR; Q877F5; 1-240. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 240 240 > SQ SEQUENCE 240 AA; 25625 MW; 2C23726350D3BE77 CRC64; > EAAVADLSYA AKHAGVISMG EMLPARRARG PNEPGGISFG HLSDIVQTSR VDPEDPAHVA > LEVVGAGCML YDQIWLGSYM SGGVGFTQYA TAAYTNNILD DNLYYNVDYI NDKYNGAAKK > GTDNKVKATL EVVKDIATES TLYGMENYEK YPTALEDHFG GSQRATVLSA AAGAATAIAT > GNGNAGLSGW YLSMYLHKEG HGRLGFFGFD LQDQCGATNV LSYQSDEGLP VELRGPNYPN > // > > ID Q75NC6_9ARCH PRELIMINARY; PRT; 216 AA. > AC Q75NC6; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase (Fragment). > GN Name=mcrA; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15574947; DOI=10.1128/AEM.70.12.7445-7455.2004; > RA Inagaki F., Tsunogai U., Suzuki M., Kosaka A., Machiyama H., Takai K., > RA Nunoura T., Nealson K.H., Horikoshi K.; > RT "Characterization of C1-metabolizing prokaryotic communities in > RT methane seep habitats at the Kuroshima Knoll, southern Ryukyu Arc, by > RT analyzing pmoA, mmoX, mxaF, mcrA, and 16S rRNA genes."; > RL Appl. Environ. Microbiol. 70:7445-7455(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AB176931; BAD16563.1; -; Genomic_DNA. > DR SMR; Q75NC6; 1-216. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 216 216 > SQ SEQUENCE 216 AA; 22858 MW; D618DE46A9F2AF4B CRC64; > AGEAAVADLA FGAKHAGLVS MSEMLPARRA RGPNEPGGLS FGYMADIVQT SRITPDDPCN > VTLEVAGAAS MLYDQIWLGG YMSGGVGFTQ FASCVYTNDI LDDFLYWGAD YIVDKYGEYG > KAEVSLGAIK DIATVVTLYG LENYEEYPTA LEDHFGGSQR ATVIAISAGG STALATGNSS > AAMSAWHLSM YLHKEAWGRL GAFGYDQQDQ CGAANV > // > > ID Q70CN7_9EURY PRELIMINARY; PRT; 253 AA. > AC Q70CN7; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Galand P.E., Juottonen H., Fritze H., Yrjala K.; > RL Submitted (OCT-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ586250; CAE51953.1; -; Genomic_DNA. > DR SMR; Q70CN7; 2-253. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 253 253 > SQ SEQUENCE 253 AA; 27113 MW; 2D29FDF761E13A32 CRC64; > FFISAYSMCA GEAAVADLSF AAKHAALVSM GEMLPARRAR GPNEPGGLSF GHISDIIQTS > RKAIDDPAKV ALEVVGAGCM LYDQIWLGSY MSGGVGFTQY ATAAYTDDIL DNNVYYNIDY > INDKYKGAAK VGKDNKIKAT LDVVKDIATE STIYGIETYE KFPTALEDHF GGSQRATVLA > AAAGVATAIA TANANAGLSA WYLSMYLHKE AWGRLGFFGY DLQDQCGATN VLSYQGDEGL > PDELRGPNYP NYA > // > > ID Q704C6_THETE PRELIMINARY; PRT; 735 AA. > AC Q704C6; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 10. > DE Trehalose-6-phosphate synthase-phosphatase. > GN Name=tpsp; > OS Thermoproteus tenax. > OC Archaea; Crenarchaeota; Thermoprotei; Thermoproteales; > OC Thermoproteaceae; Thermoproteus. > OX NCBI_TaxID=2271; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15028704; DOI=10.1128/JB.186.7.2179-2194.2004; > RA Siebers B., Tjaden B., Michalke K., Doerr C., Ahmed H., Zaparty M., > RA Gordon P., Sensen C.W., Zibat A., Klenk H.-P., Schuster S.C., > RA Hensel R.; > RT "Reconstruction of the central carbohydrate metabolism of > RT Thermoproteus tenax using genomic and biochemical data."; > RL J. Bacteriol. 186:2179-2194(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ621287; CAF18468.1; -; Genomic_DNA. > DR GO; GO:0003825; F:alpha,alpha-trehalose-phosphate synthase (U...; IEA. > DR GO; GO:0003824; F:catalytic activity; IEA. > DR GO; GO:0008152; P:metabolism; IEA. > DR GO; GO:0005992; P:trehalose biosynthesis; IEA. > DR InterPro; IPR001830; Glyco_trans_20. > DR InterPro; IPR006379; HAD-SF_hydro_IIB. > DR InterPro; IPR012766; Trehalose_OtsA. > DR InterPro; IPR003337; Trehalose_PPase. > DR Pfam; PF00982; Glyco_transf_20; 1. > DR Pfam; PF02358; Trehalose_PPase; 1. > DR TIGRFAMs; TIGR01484; HAD-SF-IIB; 1. > DR TIGRFAMs; TIGR00685; T6PP; 1. > DR TIGRFAMs; TIGR02400; trehalose_OtsA; 1. > SQ SEQUENCE 735 AA; 82040 MW; AF7A68E3844804ED CRC64; > MGGQVRLIVV SNRLPVTISP SGEIRESVGG LATAMKSFLG AVNGGRELGL EEVVWVGWSG > VPSERESNDL RERLRGMGLE PVPLSSEEVE GFYEGFSNST LWPLFHGFSE YATYEEKHWR > AYRGVNEKYA KAVVALARPG DLVWIHDYHL MLAPAIVREA AEVGVGFFLH IPFPPAELLQ > LLPSEWRREI LEGLLGSDLV GFHTYEYSAN FSRSVVRFLG YKVEMGAIAV GHRRVRVGVF > PIGIDFDRFY NSSQDPSVVE EMAKLREMLG RAKVVFSIDR LDYTKGVLRR VAAWERFLRE > HPEWRGRAVF VLVVVPSRTG VPMYEEMKRQ IDREVGRING ELGELNWVPI VYLYRFIPSP > TLMALYNIAD VALITPLRDG MNLVAKEFVA SKRDCRGVLI LSELAGASKE LAEALVINPN > DVGGTAEAIA EALSMSEDEQ CRRIRAMQER LRMRDVVRWG TDFIYSLISA KSAREEVEKA > LRYMEELSVD KLKSDFAKAK RRLLLLDYDG TLVPHYPYPH MAVPDGDLLE LLSRLAALPE > TAVYVVSGRG RDFLDGWLGR LPVGLVAEHG FFLKHPGGEW KSLGKVDPSW RQYAKGIMED > FASNVPGSFV EVKEAGIAWH YRNADETIAE KAVVELIDAL SNALAGSGLS ILRGKKVVEV > RPAGYTKGTA AKMLLDELSP DFVFVAGDDE TDEGMFEVAP QSAYTVKVGP GPTLAKFRVG > DYRGLRSLLE QLRPP > // > > ID Q704B9_THETE PRELIMINARY; PRT; 610 AA. > AC Q704B9; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Glycogen phosphorylase (EC 2.4.1.1). > GN Name=glgP; > OS Thermoproteus tenax. > OC Archaea; Crenarchaeota; Thermoprotei; Thermoproteales; > OC Thermoproteaceae; Thermoproteus. > OX NCBI_TaxID=2271; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15028704; DOI=10.1128/JB.186.7.2179-2194.2004; > RA Siebers B., Tjaden B., Michalke K., Doerr C., Ahmed H., Zaparty M., > RA Gordon P., Sensen C.W., Zibat A., Klenk H.-P., Schuster S.C., > RA Hensel R.; > RT "Reconstruction of the central carbohydrate metabolism of > RT Thermoproteus tenax using genomic and biochemical data."; > RL J. Bacteriol. 186:2179-2194(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ621294; CAF18475.1; -; Genomic_DNA. > DR GO; GO:0004645; F:phosphorylase activity; IEA. > DR GO; GO:0016757; F:transferase activity, transferring glycosyl...; IEA. > DR GO; GO:0009058; P:biosynthesis; IEA. > DR GO; GO:0005975; P:carbohydrate metabolism; IEA. > DR InterPro; IPR000173; GAP_DH. > DR InterPro; IPR001296; Glyco_trans_1. > DR InterPro; IPR000811; Glyco_trans_35. > DR PANTHER; PTHR11468; Glyco_trans_35; 1. > DR Pfam; PF00534; Glycos_transf_1; 1. > DR PIRSF; PIRSF000460; Pprylas_GlgP; 1. > DR PROSITE; PS00071; GAPDH; UNKNOWN_1. > KW Glycosyltransferase; Transferase. > SQ SEQUENCE 610 AA; 68870 MW; E136805C0AE3F9BD CRC64; > MLPSPYPPVI GDLAATTNLP VLTFVKPVNG PAANMSLLCG ESGSVPGGSS SHNIFIASPL > PPMKSLAIEA SSSCTTNFLL VRSTLRILPC HPGMDCLSIL IYHFYLRRVA RFGDIAFKLT > MPPRAMIVSI TPELALDGSK NYAGGLGVLE GDKFYAAARL GVPYTVITLL YDRGYVEYTE > QNGQLEPTEE DQSNFIKRLE LFRTCWTTVG GEEVEVAFYL YRLNTATAVF VKPLRPAWAA > QATERLYIER TELERFRKYI ILAKAALRYI EKFIGWDRVK YVDLQEAYTA LVPLLKPDPR > YRLVVHTPAP WGHPTFPSRF FRQELGFDLA MNPVVLTELG LAASSEGVVV SKKMVHFAAR > TFPHHAHKIK AVTNAVEIPR WQHPAVADVK GPEELKAARA RVKEEALRAL GVRTDRPVIV > WARRLTSYKR PHYMVQLIEE VNTDLFFILG GRAHPNDEYG RRIMAEFKRL AATRPNVLYI > PSLYVEDMRK IIWAADIFTF TPFSGWEASG TSFMKAGING VPSVASRDGA VVEVVRDRYN > GWLFGEDRTE LVSPDTPDID EREYAEFKQK VNEALDALAD GSYWEVAFNA YKTFREYYSM > ERLFKEYGYL > // > > ID Q702A3_9CREN PRELIMINARY; PRT; 448 AA. > AC Q702A3; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 14. > DE Putative molybdopterin biosynthesis protein. > OS uncultured crenarchaeote. > OC Archaea; Crenarchaeota; environmental samples. > OX NCBI_TaxID=29281; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Treusch A.H., Kletzin A., Raddatz G., Ochsenreiter T., Quaiser A., > RA Meurer G., Schuster S.C., Schleper C.; > RT "Characterization of Large-Insert DNA Libraries from Soil for > RT Environmental Genomic Studies of Archaea."; > RL Environ. Microbiol. 9:970-980(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ627421; CAF28721.1; -; Genomic_DNA. > DR HSSP; P12282; 1JWA. > DR GO; GO:0003824; F:catalytic activity; IEA. > DR GO; GO:0006790; P:sulfur metabolism; IEA. > DR InterPro; IPR000205; FAD/NAD(P)_BS. > DR InterPro; IPR009036; MoeB. > DR InterPro; IPR007901; MoeZ_MoeB. > DR InterPro; IPR000594; ThiF_NAD_FAD_bd. > DR InterPro; IPR003749; ThiS. > DR Pfam; PF05237; MoeZ_MoeB; 1. > DR Pfam; PF00899; ThiF; 1. > DR Pfam; PF02597; ThiS; 1. > SQ SEQUENCE 448 AA; 49046 MW; 3FDED96767FCF5DA CRC64; > MTNIEFVIPS VLTKGTGEKK IPLDATDLQD AFTKITEQLG EDFKRKVLDM NGKPRSLINI > YINGKNMRFS NDGMATKLNS GDSIYILPAV AGGSELKNED LQRYSRQIML EEIGFVGLEK > LRKAKVCVVG VGGIGNPVVT QLTAMGVGKL KIVDRDIIEI SNLHRQHLYT ENDLGKVKVE > AAKERLEKIN SSVEIEALPN SVTKYTAESI IRGYDIVVDA LDSIDARYAL NDACIKLNIP > LIYAGALGML GSVCTIIPNK TACLRCIFPA LAEDDMPTCS TEGVHPSILY LVGGIQVSEA > VKIILGEKPT LENKLMYVDL NDLSLEKISV FRQEECPSCG TKRIDIDELE TKQLIIEELC > GRDRGKRTYT VTPSHISSSL NLIGIEKNAE RLGYTIKTKG ELGLTIMSNN SDNLSISFMS > SGAATIVGAK SEDEALSIYK SFVDDIKP > // > > ID Q6ZXB1_9EURY PRELIMINARY; PRT; 156 AA. > AC Q6ZXB1; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15812059; DOI=10.1128/AEM.71.4.2195-2198.2005; > RA Galand P.E., Fritze H., Conrad R., Yrjala K.; > RT "Pathways for methanogenesis and diversity of methanogenic Archaea in > RT three different boreal peatlands."; > RL Appl. Environ. Microbiol. 71:2195-2198(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ704551; CAG28707.1; -; Genomic_DNA. > DR SMR; Q6ZXB1; 1-155. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16799 MW; 721FEB54E1886332 CRC64; > GGVGFTQYAT AAYTDNILDD YSYYGNDYAK KYGADGKAPA TMDVVNDLGT EVTLYGIEQY > EKYPTTLEDH FGGSQRATVL AAASGVTAAI ATGNSNAGLS AWYLSMLLHK DAWGRLGFYG > YDLQDQCGST NTFSVRSDEG APDELRGANY PNYAMK > // > > ID Q6ZXA7_9EURY PRELIMINARY; PRT; 156 AA. > AC Q6ZXA7; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15812059; DOI=10.1128/AEM.71.4.2195-2198.2005; > RA Galand P.E., Fritze H., Conrad R., Yrjala K.; > RT "Pathways for methanogenesis and diversity of methanogenic Archaea in > RT three different boreal peatlands."; > RL Appl. Environ. Microbiol. 71:2195-2198(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ704555; CAG28711.1; -; Genomic_DNA. > DR SMR; Q6ZXA7; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16863 MW; B2F9EBC5FE28283D CRC64; > GGVGFTQYAT AAYTDNILDD YTEYGVDYIK KKHGGIGKAK CTQEVVSDIA TEVNLYGMKQ > YEQYPTALES HFGGSQRASV LAAASGISVG LATANSNAGL NGWYLSMLMH KEGWSRLGFF > GYDLQDQCGS ANCMSIRPDE GLLGELRGPN YPNYAM > // > > ID Q6V265_9EURY PRELIMINARY; PRT; 154 AA. > AC Q6V265; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Nercessian O.G., Bienvenu N., Moreira D., Prieur D., Jeanthon C.; > RL Submitted (JUL-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY354021; AAQ56610.1; -; Genomic_DNA. > DR SMR; Q6V265; 1-154. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 154 154 > SQ SEQUENCE 154 AA; 16894 MW; 5E8D758B5BD5179D CRC64; > GGVGFTQYAT AAYTDDILDD FCYYGLDYVE NKYGRNGTKP TMDVVEDIAT EVTLYALEQY > DEYPTLLEDH FGGSQRAAVA AAAAGVSTCM ATGNSNAGVN AWYLSQIMHK EYHSRLGFYG > YDLQDQCGAS NSLSIRNDES SPLELRGPNY PNYA > // > > ID Q6V263_9EURY PRELIMINARY; PRT; 154 AA. > AC Q6V263; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Nercessian O.G., Bienvenu N., Moreira D., Prieur D., Jeanthon C.; > RL Submitted (JUL-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY354023; AAQ56612.1; -; Genomic_DNA. > DR SMR; Q6V263; 1-154. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 154 154 > SQ SEQUENCE 154 AA; 16689 MW; 1BD57F0837C60913 CRC64; > GVGFTQYATA VYTDNILDGY VYYGLDYVED KYGLAEAEPD MDVVKDVATE VTLYGLEQYE > RYPAAMETHF GGSQRAAVCA AAAGCSVAFA TGHAQAGVNG WYLSQILHKE AHGRLGFYGY > DLQDQCGAAN SLSVRSDEGL PLELRGPNYP NYAM > // > > ID Q6V262_9EURY PRELIMINARY; PRT; 156 AA. > AC Q6V262; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Nercessian O.G., Bienvenu N., Moreira D., Prieur D., Jeanthon C.; > RL Submitted (JUL-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY354024; AAQ56613.1; -; Genomic_DNA. > DR SMR; Q6V262; 1-155. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16983 MW; 92FD931E6BBD8D5D CRC64; > GGVGFTQYAT AVYTDNILDD YVYYGLEYVE DKYGLAEADP SMDVVKDVAT EVTLYGLEQY > ERYPAAMETH FGGSQRAAVC AAATGCSVSF ATGHAQAGLN GWYLSQIMHK EGHGRLGFYG > YDLQDQCGAA NTLSVRSDEG LPLELRGPNY PNYAMK > // > > ID Q6V253_9EURY PRELIMINARY; PRT; 154 AA. > AC Q6V253; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS Methanothermococcus okinawensis. > OC Archaea; Euryarchaeota; Methanococci; Methanococcales; > OC Methanococcaceae; Methanothermococcus. > OX NCBI_TaxID=155863; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=DSM 14208; > RA Nercessian O.G., Bienvenu N., Moreira D., Prieur D., Jeanthon C.; > RL Submitted (JUL-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY354033; AAQ56622.1; -; Genomic_DNA. > DR SMR; Q6V253; 1-154. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 154 154 > SQ SEQUENCE 154 AA; 16879 MW; 4C04E07CDA356AE5 CRC64; > GGVGFTQYAT ATYTDDILDD FSYYGYDYVE KKYGINGTKP TMDVVEDIAT EVTLYGLEQY > DEFPALLEDH FGGSQRAGVV AAASGISVCM ATGNSNAGVN GWYLSQILHK EYHSRLGFYG > YDLQDQCGAS NSLSIRNDES SPLELRGPNY PNYA > // > > ID Q6TWD8_9EURY PRELIMINARY; PRT; 374 AA. > AC Q6TWD8; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 10. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS Methanobacterium aarhusense. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC Methanobacteriaceae; Methanobacterium. > OX NCBI_TaxID=256826; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=H2-LR; > RA Shlimon A.G., Friedrich M.W., Niemann H., Ramsing N.B., Finster K.; > RT "Methanobacterium aarhusense sp. nov., a novel methanogen isolated > RT from a marine sediment (Aarhus Bay, Denmark)."; > RL Int. J. Syst. Evol. Microbiol. 54:759-763(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY386125; AAR27839.1; -; Genomic_DNA. > DR SMR; Q6TWD8; 1-374. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 374 374 > SQ SEQUENCE 374 AA; 40718 MW; 9F8E7CB126CA17D0 CRC64; > VIVGLNTAHT VIEKRLGKEV TPETITHYLG NCNHAMPGAA VVQEHMVETN PSLVADSYVK > IFTGNDEISD EVDQRFVINI NEEFPEEQAK VLKAEVGDGM WQVVKIPTIV SRTCDGGTTS > RWSAMQIGMS MISAYKQAAG EAATGDFAYA AKHAEVVHMG TYLPVRRARG ENEPGGIAFG > FLADICQSSR VNMDDPVRVA LDVVASGAML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD > DFTYYGKEYV EDKFGLCEAP NNMDTVLDVA SEVTFYGLEQ YEEYPALLET QFGGSQRAAV > AAAAAACSTG FATGNAQTAL SGWYLSMYLH KEQHSRLGFY GYDLQDQCGA SNVFSIRGDE > GLPLELRGAN YPNY > // > > ID Q6T569_9EURY PRELIMINARY; PRT; 239 AA. > AC Q6T569; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 10. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured Methanobacteriales archaeon. > OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; > OC environmental samples. > OX NCBI_TaxID=194842; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=14871211; DOI=10.1111/j.1462-2920.2004.00568.x; > RA Newberry C.J., Webster G., Cragg B.A., Parkes R.J., Weightman A.J., > RA Fry J.C.; > RT "Diversity of prokaryotes and methanogenesis in deep subsurface > RT sediments from the Nankai Trough, Ocean Drilling Program Leg 190."; > RL Environ. Microbiol. 6:274-287(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY436537; AAR36119.1; -; Genomic_DNA. > DR SMR; Q6T569; 1-239. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 239 239 > SQ SEQUENCE 239 AA; 25871 MW; 2737778B60BA4F85 CRC64; > MISAYNQCAG EGATGDFAYA SKHAEVIHMG TYLPVRRARA ENEPGGIAFG FLADMVQSTR > VNPDDPVRSA LDVVAAGAAL YDQIWLGSYM SGGVGFTQYA SAAHTDNILD DFLYYGKEYV > EDKFGICEAP NNMDTVLDVG SEVTFYGLEQ YEEYPALLET QFGGSQRASV VSAAAGCATA > FATGNSQTGL SAWYLSMYLH KEQHSRLGFY GYDLQDQCGA SNVFSIRNDE GLPVEMRGP > // > > ID Q6SIF9_9EURY PRELIMINARY; PRT; 160 AA. > AC Q6SIF9; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 12. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY458420; AAR24541.1; -; Genomic_DNA. > DR SMR; Q6SIF9; 2-160. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 160 160 > SQ SEQUENCE 160 AA; 17731 MW; BB46B7EBD0AEABC9 CRC64; > DFTQYATAAY TDNILDDYTY YGMDYIKDKY KVDWKNPSEK GLAKANQDVI NDIATEVTLY > GMEQYEQFPT ALETHFGGSQ RASVLAAASG LSTAIATGNS NAGLNGWYLS MLLHKEGWSR > LGFYGYDLQD QCGSANTELF RADEGCVGEL RGANYPNYAM > // > > ID Q6SEI7_9EURY PRELIMINARY; PRT; 162 AA. > AC Q6SEI7; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY459319; AAR24561.1; -; Genomic_DNA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 18115 MW; 3FF9CD3C09CBC441 CRC64; > WCRFTQYATA AYTDNILDEY TYYGMDYIKD KYKVDWKNPN DKDKVKPTQD IANDMATEVA > LNGMEQYEQF PTLMEDHFGG SQRAGVLAAA CGLTASIATG NSNAGLNAWY LCMLLHKEGW > SRLGFFGYDL QDQCGSANSL AIRPDEGAIG ELRGPNYPNY AM > // > > ID Q6SCJ1_9EURY PRELIMINARY; PRT; 156 AA. > AC Q6SCJ1; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY460215; AAR24565.1; -; Genomic_DNA. > DR SMR; Q6SCJ1; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 17088 MW; 427F985B2412A482 CRC64; > GGVGFTQYAT AAYTDNVLDD YTYYGQDYVK DKYKGWTKAP QTQAAVNDIA TEVTLYGMEQ > YENFPSLLED HFGGSQRASV LAAAAGLTCS MATANSNAGL NGWYLSMLLH KEGWSRLGFY > GYDLQDQCGS ANCMAIRSDE GLITELRGAN YPNYAM > // > > ID Q6MZC9_9ARCH PRELIMINARY; PRT; 589 AA. > AC Q6MZC9; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 13. > DE Adenine deaminase (EC 3.5.4.2). > GN Name=adeC; ORFNames=C2_0013; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=14685246; DOI=10.1038/nature02207; > RA Krueger M., Meyerdierks A., Gloeckner F.O., Amann R., Widdel F., > RA Kube M., Reinhardt R., Kahnt J., Boecher R., Thauer R.K., Shima S.; > RT "A conspicuous nickel protein in microbial mats that oxidise methane > RT anaerobically."; > RL Nature 426:878-881(2003). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; BX649197; CAE46371.1; -; Genomic_DNA. > DR GO; GO:0000034; F:adenine deaminase activity; IEA. > DR GO; GO:0016787; F:hydrolase activity; IEA. > DR GO; GO:0006146; P:adenine catabolism; IEA. > DR InterPro; IPR006679; Adenine_deam. > DR InterPro; IPR006680; Amidohydro_1. > DR InterPro; IPR011550; Amidohydro_like. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF01979; Amidohydro_1; 1. > DR ProDom; PD001248; Amidohydro_like; 1. > DR TIGRFAMs; TIGR01178; ade; 1. > KW Hydrolase. > SQ SEQUENCE 589 AA; 64490 MW; 062BF973558662E9 CRC64; > MASTNLLKMD PYLLIKFPKL KCQKMNALID IALGDKKADL VLKNVNLVNV CSGEIYETDI > AIAHGLIAGL GRYEGRLEID AQDNYAVPGL IDGHTHIEMS MLSVREFAKA VVPRGTTAVV > ADPHEIANVL GIAGIRALFD DARTTALKVF CMAPSCVPST DPSLGLETSG ATIDHTEIRK > LLRMDGIIGL AEVMNYVGVI AKEEVVWEKI EIAKALKMPI DGHAPLLSGK ELNAYVVSGA > GSDHENTTYE ESQEKLRLGM RVMVREGSVA KNLKKIVPLL RTVDTRNCML VTDGDRTPLD > LKDEGYLDYV LRRAIEEGID PVKAVQMCTI NPAQWFKLDA EIGCIAPGKI ADIVLLKNLD > TFEVEKVIVN GKPDFAQRSE YSFNYPQYRE SVRIMRVKPD EFVIEQEGAI KARIIGLIEG > ELLTEELVEE ISGIEIARDI LEIGVLERHH YSGNMGLGFV KGFGLKIGAI ASTIAHDSHN > IVVIGTDEED MALACNRLKD IGGGIVLCNG KEVTSELKLP IAGLMSDNGL DYVMRKQKEM > DDNISEMGCK LPAPFIAISF LALPVIPKLK ITDKGLVDVA KREIVGVFL > // > > ID Q6JIA4_9EURY PRELIMINARY; PRT; 160 AA. > AC Q6JIA4; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY458406; AAR22531.1; -; Genomic_DNA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 160 160 > SQ SEQUENCE 160 AA; 17857 MW; 45AAC1637FA4A82E CRC64; > DFTQYATAAY TDNILDEYTY YGMDYIKDKY KVDWKNPNEK DKVKPTQDIV NDMATEVALN > GMEQYEQFPT LMEDHFGGSQ RAGVLAAACG LTTSIATGNS NAGLNAWYLC MLLHKEGWSR > LGFFGYDLQD QCGSANSLAI RPDEGAIGEL RGPNYPNYAM > // > > ID Q6JIA0_9EURY PRELIMINARY; PRT; 160 AA. > AC Q6JIA0; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY458410; AAR22535.1; -; Genomic_DNA. > DR SMR; Q6JIA0; 2-160. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 160 160 > SQ SEQUENCE 160 AA; 17638 MW; 41D7BC21326D12ED CRC64; > DFTQYATAAY TDNILDDYTY YGMDYVKDKY KVDYKNPSEK GLAKSTQDVI NDIATEVTLY > GMEQYEQFPT ALETHFGGSQ RASVLAAASG LSTAIATGNS NAGLNGWYLS MLLHKEGWSR > LGFYGYDLQD QCGSANTESF RADEGAVGEL RGANYPNYAM > // > > ID Q6JI97_9EURY PRELIMINARY; PRT; 162 AA. > AC Q6JI97; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured euryarchaeote. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=114243; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15528519; DOI=10.1128/AEM.70.11.6559-6568.2004; > RA Castro H., Ogram A., Reddy K.R.; > RT "Phylogenetic characterization of methanogenic assemblages in > RT eutrophic and oligotrophic areas of the Florida Everglades."; > RL Appl. Environ. Microbiol. 70:6559-6568(2004). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY458413; AAR22538.1; -; Genomic_DNA. > DR SMR; Q6JI97; 1-162. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 162 162 > SQ SEQUENCE 162 AA; 17550 MW; 9FE49DB657A17C6A CRC64; > GGVGFTQYAT AAYTDDILDN NVYYDVDYIN QKYKGAAEVG KNNKIKATLD VVKDIATEST > IYGIETYEKF PTALEDHFGG SQRATVLAAA AGVAIALATA NANAGLSGWY LSMYLHKEAW > GRLGFFGYDL QDQCGATNVL SYQGDEGLPD ELRGPNYPNY AM > // > > ID Q6ITY5_9EURY PRELIMINARY; PRT; 156 AA. > AC Q6ITY5; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanosaeta sp. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; > OC Methanosaetaceae; Methanosaeta; environmental samples. > OX NCBI_TaxID=183756; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15946291; DOI=10.1111/j.1462-2920.2004.00766.x; > RA Banning N., Brock F., Fry J.C., Parkes R.J., Hornibrook E.R., > RA Weightman A.J.; > RT "Investigation of the methanogen population structure and activity in > RT a brackish lake sediment."; > RL Environ. Microbiol. 7:947-960(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY625582; AAT45704.1; -; Genomic_DNA. > DR SMR; Q6ITY5; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16803 MW; 804C926EB4A05656 CRC64; > GGVGFTQYAT AAYTNDVLDD FSYYACDYGV DKYGGWGEAP AVLDTSKDIA TETTLYAMEQ > YEAFPTLLED HFGGSQRSAV MAAASAIGSA CLTGNSQSGL AGWYLSHLIH KDGWGRMGFF > GYDLQDQCGP TNVFSYQGDE GSPLELRGAN YPNYAM > // > > ID Q6ITY2_9EURY PRELIMINARY; PRT; 163 AA. > AC Q6ITY2; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanomicrobiales archaeon. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC environmental samples. > OX NCBI_TaxID=183760; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15946291; DOI=10.1111/j.1462-2920.2004.00766.x; > RA Banning N., Brock F., Fry J.C., Parkes R.J., Hornibrook E.R., > RA Weightman A.J.; > RT "Investigation of the methanogen population structure and activity in > RT a brackish lake sediment."; > RL Environ. Microbiol. 7:947-960(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY625585; AAT45707.1; -; Genomic_DNA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17935 MW; C7854F3AF24865F5 CRC64; > GGVGFTQYAT AAYTDNILDE YTYYGMDYIK QKYKVDWANP AEKDKVKPTQ DIVNDIATEV > ALNGMEQYEQ YPTLMEDHFG GSQRAGVLAA ACGLTCSIAT GNSNAGLNGW YLCMLLHKEG > WSRLGFFGYD LQDQCGSANS LAIRPDEGAI GEFRGPNYSN YAM > // > > ID Q6ITX6_9EURY PRELIMINARY; PRT; 163 AA. > AC Q6ITX6; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanomicrobiales archaeon. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC environmental samples. > OX NCBI_TaxID=183760; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15946291; DOI=10.1111/j.1462-2920.2004.00766.x; > RA Banning N., Brock F., Fry J.C., Parkes R.J., Hornibrook E.R., > RA Weightman A.J.; > RT "Investigation of the methanogen population structure and activity in > RT a brackish lake sediment."; > RL Environ. Microbiol. 7:947-960(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY625591; AAT45713.1; -; Genomic_DNA. > DR SMR; Q6ITX6; 1-163. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 163 163 > SQ SEQUENCE 163 AA; 17859 MW; 0B21BA1171B080C1 CRC64; > GGVGFTQYAT AAYTDNILDD YTYYGMDYIK DKYKVDWKNP NEKGLAKSNQ DVINDIATEV > TLYGMEQYEQ FPTALETHFG GSQRASVLAA ASGLSTAIAT GNSNAGLNGW YLSMLLHKEG > WSRLGFFGYD LQDQCGSTNS LSVRPDEGAV GELRGPNYPN YAM > // > > ID Q6ITX0_9EURY PRELIMINARY; PRT; 211 AA. > AC Q6ITX0; > DT 05-JUL-2004, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2004, sequence version 1. > DT 30-MAY-2006, entry version 10. > DE Methyl-coenzyme M reductase alpha subunit (Fragment). > GN Name=mcrA; > OS uncultured Methanomicrobiales archaeon. > OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales; > OC environmental samples. > OX NCBI_TaxID=183760; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15946291; DOI=10.1111/j.1462-2920.2004.00766.x; > RA Banning N., Brock F., Fry J.C., Parkes R.J., Hornibrook E.R., > RA Weightman A.J.; > RT "Investigation of the methanogen population structure and activity in > RT a brackish lake sediment."; > RL Environ. Microbiol. 7:947-960(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY625597; AAT45719.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 211 211 > SQ SEQUENCE 211 AA; 22996 MW; D6135CB3891F28BF CRC64; > AAVADLAFAA KHAGVIQMGD ILPARRARGP NEPGGIKFGH FADMIQADRK YPNDPARATL > EVVGAGAMLF DQIWLGSYMS GGVGFTQYAT AAYTDNILDD YTYYGMDYIK SKYKVNWQSP > SEKDKVKATQ DVVNDIATEV NLYGMEQYEQ YPTALEDHFG GSQRASVLAA ASGISVSIAT > GNSNAGLNGW YLSMLMHKEG WSRLGFFGYD L > // > > ID Q69GZ5_METVO PRELIMINARY; PRT; 1199 AA. > AC Q69GZ5; > DT 13-SEP-2004, integrated into UniProtKB/TrEMBL. > DT 13-SEP-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Chromosomal segregation protein. > GN Name=SMC; > OS Methanococcus voltae. > OC Archaea; Euryarchaeota; Methanococci; Methanococcales; > OC Methanococcaceae; Methanococcus. > OX NCBI_TaxID=2188; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=PS; > RA Feldman R., Overbeek R., Whitman W.; > RT "Chromosomal segregation protein SMC."; > RL Submitted (OCT-2003) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY288521; AAQ22369.1; -; Genomic_DNA. > DR GO; GO:0005694; C:chromosome; IEA. > DR GO; GO:0005524; F:ATP binding; IEA. > DR GO; GO:0016887; F:ATPase activity; IEA. > DR GO; GO:0005525; F:GTP binding; IEA. > DR GO; GO:0005515; F:protein binding; IEA. > DR GO; GO:0051277; P:chromosome organization and biogenesis (sen...; IEA. > DR InterPro; IPR003439; ABC_transp_like. > DR InterPro; IPR005289; GTP_bd. > DR InterPro; IPR010935; SMC_hinge. > DR InterPro; IPR003395; SMC_N. > DR InterPro; IPR011891; SMC_prok_A. > DR Pfam; PF06470; SMC_hinge; 1. > DR Pfam; PF02463; SMC_N; 1. > DR ProDom; PD000006; ABC_transporter; 1. > DR TIGRFAMs; TIGR00650; MG442; 1. > DR TIGRFAMs; TIGR02169; SMC_prok_A; 1. > SQ SEQUENCE 1199 AA; 137830 MW; 6BB179C2979CAA1D CRC64; > MISISEIHLK NFKSFKNTKL KIPDGFTAIL GPNGSGKSNT IDGICFVLGK TSAKSLRAGK > FNQLITYHNG KRADYAEVTL FFDNINREIP IDSDKVGICR KVKLNGDNNY YVVWYEVEKQ > NTKINTESSQ KKTSKASKVE KRRRMKKNEV LDLLSKISLI ADGPNIILQG DLLRIIDTSP > NERRKILDEV SGVAEFDEKS EKAKKELSQA REYIEKIDIR INEVRANLEK LKKEKEDAEK > YTVYNKKLKV TKYILTSKKV EFLKMVLDET KDEIEALKET KNCYIQDISN IDSEIIGLKV > KINELVNELN EKGSEEVMEL HKSIKELEVN LNNDKNALEN AIDDLKHTLK MEESKNNDLN > ETKEKINNIR IDTLKKEAEA KVLIKEIEKL NEERQNLEKK VEQSESQVKA LKNQESKLSE > RINDTQKELY GLKNELNQLE NTLNNRTFDY QKNNETIENL TNQIAEFSDL EDTKKLYKEL > EDIAVELEFS KKKLQEKITE RNDSQSKLDN LHSEYVKENA RIKTLKDMEN FSLDRAVKGV > LDAKLPGVVD IAGNLAKTKG EYKTAIEVAG GARLNHIVVK KMDDGSRAIN YLKQKRLGRA > TFLPMDRIKG MDAKDISDTG IIGKAIDLVE FDIKYTNVFK FIFGNTHIVD NLENAKKLSL > KYKARFVTLE GEVIEPSGAM VGGNIRRNSA IKVDIDMKKL TNLSEDIKEL EQILSNVKDE > IERLNNKINT CSTRKLELDN RLKIARDQEF KKEEITKSNN LKIKELNMLN SKIDDEISEL > TDEKEILSQK VQNLDNKLSE VMGQRERIVN EIKSYENSEL SKRIKEIDHK IRENESSKNT > LENEIKKGAI LVKEVLIPKI SELNSNIKSL ADKKNMFKNS VEIYKSNIES NSSILSDKRG > KYEELTKGLK DLTDKKECYE LEIENLQNNK EELREKATDI DNQVNVINVD RAKYETRLEE > EERKLYLCDT LENIEDISDE MIEETYSLEI DDLERNQALL ESSIKKLEPV NMRAIEDYDF > INERYEELFG KRKEYEQEEG KYLQLISEVQ KRKKETFMKT YDRVAENYEQ IYGEIGGNGK > LSLENEEDPF SGGLLIDASP MNKQLQNLDV MSGGEKSLTA LAFLFAIQRL NPSPFYVLDE > VDAALDTKNA SLIGDMISNA SKESQFIVIS HREQMISKSN VMYGVCMENG LSKIVSVKL > // > > ID Q64EJ0_9ARCH PRELIMINARY; PRT; 169 AA. > AC Q64EJ0; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Carbon monoxide dehydrogenase beta subunit. > GN Name=cdhC; ORFNames=GZ11A10_42; > OS uncultured archaeon GZfos11A10. > OC Archaea; environmental samples. > OX NCBI_TaxID=285399; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714815; AAU82187.1; -; Genomic_DNA. > DR GO; GO:0003824; F:catalytic activity; IEA. > DR GO; GO:0008152; P:metabolism; IEA. > DR InterPro; IPR003704; CO_DH_CoA_synth. > DR Pfam; PF02552; CO_dh; 1. > DR PIRSF; PIRSF006035; CO_dh_b_ACDS_e; 1. > DR TIGRFAMs; TIGR00315; cdhB; 1. > SQ SEQUENCE 169 AA; 18111 MW; 0D34595386D8A56A CRC64; > MTRAYQCANV PGPHLGKVGT SASVASAIKR AKRPVLIVGS ELTNLDWALD LAKKQDIPVV > ATAHVAGTMR EKGVDPDRVL GAIEITNLLK SPEWGGIRGE GQHDLAIFVD VKYYLEAQML > STLKHFAPHI KTISLTKGHQ PNADQSLPNL SEKKLGKFLA GVVEGLAEG > // > > ID Q64EE9_9ARCH PRELIMINARY; PRT; 565 AA. > AC Q64EE9; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Adenine deaminase. > GN Name=adeC; ORFNames=GZ11H11_35; > OS uncultured archaeon GZfos11H11. > OC Archaea; environmental samples. > OX NCBI_TaxID=285398; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714816; AAU82228.1; -; Genomic_DNA. > DR GO; GO:0000034; F:adenine deaminase activity; IEA. > DR GO; GO:0016787; F:hydrolase activity; IEA. > DR GO; GO:0006146; P:adenine catabolism; IEA. > DR InterPro; IPR006679; Adenine_deam. > DR InterPro; IPR006680; Amidohydro_1. > DR InterPro; IPR011550; Amidohydro_like. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF01979; Amidohydro_1; 1. > DR ProDom; PD001248; Amidohydro_like; 1. > DR TIGRFAMs; TIGR01178; ade; 1. > SQ SEQUENCE 565 AA; 61595 MW; CE23ED0096F61E64 CRC64; > MNDLVDIASG NKKADLVLKN ANLVNVCSGE IYETDIAIAH GLIAGLGRYE GRVEIDAQDN > YAVPGLIDGH THIEMSMLSV REFAKAVVPR GTTAVVADPH EIANVLGIAG IRALLDDART > TALKVFCMAP SCVPSTDPSL GLETSGATLD HREIRKLLQM DGIIGLAEVM NYVGVIAKEE > EVWEKIEIAK ALKMPIDGHA PLLSGKELNA YVVSGAGSDH ENTTYEESRE KLRLGMRVMV > REGSVAKNLK NIAPLLRTVD TRNCMLVTDG DRTPLDLKDE GYLDYVLRRA IEEGIDPVKA > VQMCTINPAQ WFKLDNIIGC IAPGKIADIV LLKKLDTFEV EKVIVNGKPD FAQRSEYSFK > YPQYRESVRI MRVKPDEFVI AQEGAIKARI IGLIEGELLT EELVEEISGI EIARDILEIG > VLERHHYSGN MGLGFVKGFG LKTGAIASTI AHDSHNIVVI GTNEEDMALA CNRLKAIGGG > IVLCNGKEVT SELKLPIAGL MSDKGLDYVM RKQKEMDDSI SEMGCKLPAP FIAISFLALP > VIPKLKITDK GLVDVAKREI VAVFL > // > > ID Q64DQ6_9ARCH PRELIMINARY; PRT; 308 AA. > AC Q64DQ6; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Dolichol-P-glucose synthetase. > GN ORFNames=GZ17F1_42; > OS uncultured archaeon GZfos17F1. > OC Archaea; environmental samples. > OX NCBI_TaxID=285395; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714823; AAU82471.1; -; Genomic_DNA. > DR InterPro; IPR005242; CHP374. > DR Pfam; PF03706; UPF0104; 1. > DR TIGRFAMs; TIGR00374; CHP374; 1. > SQ SEQUENCE 308 AA; 33388 MW; FC3342236818F92B CRC64; > MTSILAILIL FLIASIIGRA DIIDELRTAS PALIALSIPV YLISWPLRGI RYQQILAEIG > VKEDLNFLVG CIAVSQSANV FLPARIGDVA RAYLLKKVKN ISWITGLSSL VVERMFDIIA > ITVIGGFAVL GLRGMLLDQW VTDVITIAGL ATVTVFGILV LFLKLRTTIG GVIDRFIREI > ALVSTNPRAF AIVLTGSLLV WLVDTLTCFV ILNAFSEQIS LSMIPVIFLA IAVGNLTKIF > PITPGSIGPY EAVLTGIFSL GGIDTAIGFA AAVLDHFVKN LVTLILGRVY LSRFDLSWSQ > LVEQSEQQ > // > > ID Q64DQ1_9ARCH PRELIMINARY; PRT; 462 AA. > AC Q64DQ1; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Tungsten formylmethanofuran dehydrogenase subunit A. > GN Name=fwdA; ORFNames=GZ17G11_3; > OS uncultured archaeon GZfos17G11. > OC Archaea; environmental samples. > OX NCBI_TaxID=285394; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714824; AAU82476.1; -; Genomic_DNA. > DR GO; GO:0016787; F:hydrolase activity; IEA. > DR InterPro; IPR006680; Amidohydro_1. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF01979; Amidohydro_1; 1. > SQ SEQUENCE 462 AA; 52175 MW; 44E101B5EB9B5B27 CRC64; > MAIQIRNGTV YDPANGIEGE KMDIFVDDGK IVDEIKPDEI IDATDKTVMA GGIDVHSHVA > TYGLNLARFT FGFPTVGEIG YAYALMGYTH VNEPLMTLNT ASYVHHELSS IPLVDTSALI > VLSLYDIDKE IREGDKASVK SALPLLLDLT KSIDIKIYDV RVRYAKKGFF YRDIDVTKCL > KFFYELVEEP KILLRTYPEL LDEDTSVLRG FCMAHIGSGI GDDARYDAAI EILNKGGAVD > LGIVSPYQDE AVNMRIGYRA ATEKFVSMDI GLEQPLIISE VDEKREGNNK SLRFALDALE > YLPSSNISFS TDSPSGCLFS SYPKLFTWLL SRRNRKELMP DVEYSLYELA QITRTNPARQ > LGLENKGHLG IGADADIAIY DMDENTGWRE LEKRLGNCSF LLKDGEAIIR EGKLNKERAR > KKTFYFVPEV TEKAYESELI ERICNRRSFR AEHLRVDECF LH > // > > ID Q64DN6_9ARCH PRELIMINARY; PRT; 510 AA. > AC Q64DN6; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase I subunit alpha. > GN Name=mcrA; ORFNames=GZ18B6_1; > OS uncultured archaeon GZfos18B6. > OC Archaea; environmental samples. > OX NCBI_TaxID=285363; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714825; AAU82491.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > SQ SEQUENCE 510 AA; 56185 MW; 9716DE2F2CCC05F5 CRC64; > MPQGQRFLMP YMMNHTDIMV NADDLHWINN AAMQQCWDDM KRGIVLGLDD AHGLLEARLG > KEVTPDTISH YMEVLNHALP GGAVIQEHMV ETKPMLVNDS YAKIFTGDDD LADAVDRRFL > LDINKEFAAG WEKPGEQADQ LKDAIGKKIW QVLWMPTVVG RMTDGGTMFR WVGMQVGMTM > ISAYKLCAGE SVTGEFAYYA KHAAVVQLSN YMPVKRARAH NEPGGMPLGI NADSVRSNAL > FPNDPIRNEL ESIAVAAMVY DQLWFGTYMS GGVGFTQYAS ATYTDNILED FCYKGCEIGL > DYAGGEMASI KGDKLNMDIL EEIIRAENDY ALTQYEAYPT VAESHFGGSV RACCAAAGCG > SAVACATGLA QPTLSAWSLS QLGHYERKGR LGFFGYDLQD QCTACGSYSY QSDEGMPFEM > RGVNYPNYAM NVGHQSAYGG LVAGAHCANH DAWVLSPLWK VAFSDRDLPF DRGYVTREYG > LGANREYTKV AGERDLIIAG YYGREPGAKL > // > > ID Q64C70_9ARCH PRELIMINARY; PRT; 561 AA. > AC Q64C70; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase subunit alpha. > GN Name=mcrA; ORFNames=GZ26B2_4; > OS uncultured archaeon GZfos26B2. > OC Archaea; environmental samples. > OX NCBI_TaxID=285406; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714839; AAU83007.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > SQ SEQUENCE 561 AA; 61463 MW; C30CF6E30A992C71 CRC64; > MAYKYPSEKI FVEALKAKFA GLDLADQKTR YVRAGFVQNA RKREFQAAGM RVAEQRGIKQ > YDVNVHLGGM TLGQRQLVPY KLSTQPDIVE GDDLHYVNNP AMQQMWDDMK RTIIVGMDLA > HETLEKRLGK EVTPESIAGY MEAVNHTMPG AAIVQEHMVE THPGLVDDCY VKMFTGDDEL > ADEIDSQYVI NINDLFDKEG QNEKIKAAIG KTTWQAIHIP TIVVRCCDGG NTSRWSAMQL > GMSYIAAYNM CAGEAAVADL AFAAKHAAAV QMAEMLPARR ARSPNEPGGL SFGYCADMVQ > KMRVKPDDPV LYTLEVVACG TMLYDQIWLG SYMSGGVGFT QYATAAYTND VLDDFTYYGY > DYALNKFGPD GTAPNDLATA TDLATEVTLN GMECYEDYPT LLEDHFGGSQ RAGILAAASA > CTTGIATGNA QVALSAWYMS MYVHKEGWGR LGFFGYDLQD QCGATNVCSY QGDEGCCLEL > RGANYPNYAM NVGHQGEYAG FTGAAHAGAH DAYCCNPLIK VCFADPSLVF DFADIRKEYA > RGAMRTFRPA GERSLVIPAG V > // > > ID Q64BR7_9ARCH PRELIMINARY; PRT; 332 AA. > AC Q64BR7; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Ketol-acid reductoisomerase. > GN ORFNames=GZ26G2_30; > OS uncultured archaeon GZfos26G2. > OC Archaea; environmental samples. > OX NCBI_TaxID=285389; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714843; AAU83160.1; -; Genomic_DNA. > DR GO; GO:0016853; F:isomerase activity; IEA. > DR GO; GO:0004455; F:ketol-acid reductoisomerase activity; IEA. > DR GO; GO:0009082; P:branched chain family amino acid biosynthesis; IEA. > DR InterPro; IPR013023; AcH_isomrdctse. > DR InterPro; IPR000506; AcH_isomrdctse_C. > DR InterPro; IPR013116; IlvN. > DR Pfam; PF01450; IlvC; 1. > DR Pfam; PF07991; IlvN; 1. > DR TIGRFAMs; TIGR00465; ilvC; 1. > KW Isomerase. > SQ SEQUENCE 332 AA; 36857 MW; D5904ACF0CB78E4D CRC64; > MEILHDEDVD DSILRDKTIA VMGYGAQGDA QANCLKDSGI NVVIGETEIL GGNKNPSWEK > AKEDGFEVLP IDKAAEKGDV VHILLPDEVQ PAIYENQIKP QLKAGKALCF SHGFNICFKR > IVPPEDVDVI MVAPKAPGTE ERKAYLEGFG VPGLVAVKQN PSGEAREVAL AMTKAMHWTK > AGILECTFEQ ETYEDLFGEQ CVLCGGLVEL MRNGFEVLVE AGYPPEMAYF ECVHEMKLIV > DLVWQGGIKR MAEVISNTAE YGMWAVGHQI IGPEVKEKMK EALKRVENGE FANEWVDEYK > RGIPFLKASR EKMGEHQVET VGAEIRKLFA QK > // > > ID Q64BA0_9ARCH PRELIMINARY; PRT; 215 AA. > AC Q64BA0; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Tungsten formylmethanofuran dehydrogenase subunit A. > GN Name=fwdA; ORFNames=GZ27E7_1; > OS uncultured archaeon GZfos27E7. > OC Archaea; environmental samples. > OX NCBI_TaxID=285383; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714847; AAU83327.1; -; Genomic_DNA. > DR InterPro; IPR013108; Amidohydro_3. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF07969; Amidohydro_3; 1. > SQ SEQUENCE 215 AA; 24158 MW; 23A47E5E6B10C395 CRC64; > MYSGKSPVNT IQWAIGQELA LLVKDPWKVA LTTDHPNAGP FIGYPIIISM LMSKKRRDES > AEDMHKVIFD RAIYPSIDRE YDLYEIAIIT RGMAAKALGL HEHGKGHLGV GADGDVAIYD > MSPEERDAGK IQKAFLNTKY TIKGGQIVVK DGEVVATPDG KTYFVTPECD DTLTDEMMVK > LKDKFDHYYS VTYNNYAVQD AYVPNPYEIK APWRS > // > > ID Q64AT2_9ARCH PRELIMINARY; PRT; 225 AA. > AC Q64AT2; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 6. > DE Hypothetical protein. > GN ORFNames=GZ29E12_7; > OS uncultured archaeon GZfos29E12. > OC Archaea; environmental samples. > OX NCBI_TaxID=285380; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714851; AAU83495.1; -; Genomic_DNA. > DR InterPro; IPR013216; Methyltransf_11. > DR Pfam; PF08241; Methyltransf_11; 1. > KW Hypothetical protein. > SQ SEQUENCE 225 AA; 25282 MW; A3479529FFAFADA0 CRC64; > MGKIPIFEDG AEEYDKWFDE NPFAYKSEVL ALGKFIPKNS KGLEVGVGTG RFAAPMGIQM > GVEPARAMAD IARKRGIEVY EAKAEELPFD NESFDFILMA TTICFLQNPM LALQESTRVI > KSGGYIIIGM IDSDSFLGKA YESKKKDSKF YRYAHFYSVN QVLKWVRKLG YCHIKICQTI > FKNPGKITAT EPVKDGSGRG GFVVISAQKD VRLDFHPDHV RRSLS > // > > ID Q64AN1_9ARCH PRELIMINARY; PRT; 568 AA. > AC Q64AN1; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 11. > DE Adenine deaminase. > GN Name=adeC; ORFNames=GZ30H9_33; > OS uncultured archaeon GZfos30H9. > OC Archaea; environmental samples. > OX NCBI_TaxID=285372; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714852; AAU83546.1; -; Genomic_DNA. > DR GO; GO:0000034; F:adenine deaminase activity; IEA. > DR GO; GO:0016787; F:hydrolase activity; IEA. > DR GO; GO:0006146; P:adenine catabolism; IEA. > DR InterPro; IPR006679; Adenine_deam. > DR InterPro; IPR006680; Amidohydro_1. > DR InterPro; IPR011550; Amidohydro_like. > DR InterPro; IPR011059; Metal-dep_hydro_comp. > DR Pfam; PF01979; Amidohydro_1; 1. > DR ProDom; PD001248; Amidohydro_like; 1. > DR TIGRFAMs; TIGR01178; ade; 1. > SQ SEQUENCE 568 AA; 62160 MW; 182260FF0D5B4F01 CRC64; > MTKMNDLVDI ALGNKKADLV LKNVNLVNVC SGEIYETDIA IAHDVIAGLG RYEGRLEIDA > QDNYAVPGLI DGHTHIEMSM LSVREFAKAV VPRGTTAVVA DPHEIANVLG IAGIRALLDD > ARTTALKVFC MAPSCVPSTD PSLGLETSGA MIDHTEIRKL LRMDGIIGLA EVMNYVGVIA > KEEEVWEKIE IAKALKMPID GHAPLLSGKE LNAYVVSGAG SDHENTTYEE SEEKLRLGMR > VMVREGSVAK NLKNIAPLLR TVDTRNCMLV TDGDRTPLDL KDEGYLDYVL RRAIEEGIDP > VKAVQMCTIN PAQWFKLDAE IGCIAPGKIA DIVLLKKLDT FEVEKVIVNG KPDFAQRSEY > SFNYPHYRES VRIMRVKPDE FVIEQEGAIK ARIIGLIEGE LQTEELVEEI SGIEIARDIL > EICVLERHHY SGNMGLGFVK GFGLKTGAIA STIAHDSHNI VVIGTNEEDM AVACNRLKDI > GGGIVLCNGE EVTSELKLPI AGLMSDNGLD YVMRKQKEMD DNISEMGCKL PAPFIAISFL > ALPVIPKLKI TDKGLVDVAK REIVGVFL > // > > ID Q649Z7_9ARCH PRELIMINARY; PRT; 449 AA. > AC Q649Z7; > DT 25-OCT-2004, integrated into UniProtKB/TrEMBL. > DT 25-OCT-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl coenzyme M reductase subunit beta. > GN Name=mcrB; ORFNames=GZ33H6_29; > OS uncultured archaeon GZfos33H6. > OC Archaea; environmental samples. > OX NCBI_TaxID=285376; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15353801; DOI=10.1126/science.1100025; > RA Hallam S.J., Putnam N., Preston C.M., Detter J.C., Rokhsar D., > RA Richardson P.M., DeLong E.F.; > RT "Reverse methanogenesis: testing the hypothesis with environmental > RT genomics."; > RL Science 305:1457-1462(2004). > RN [2] > RP NUCLEOTIDE SEQUENCE. > RA Putnam N., Detter J.C., Richardson P.M., Rokhsar D.; > RL Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY714858; AAU83780.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR003179; MCR_beta. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02241; MCR_beta; 1. > DR Pfam; PF02783; MCR_beta_N; 1. > DR PIRSF; PIRSF000263; Meth_CoM_rd_beta; 1. > SQ SEQUENCE 449 AA; 48001 MW; 88711BD50CEBBFB9 CRC64; > MYSMMCRVSE ISKEEKKNMA DEIDLYDDRG SVLAKGVPLQ AISPLKNAAM RKIVNLTIRT > GAVDLAGLEK KLKTGAIAGR GMVIRGVAKD LPILDKASTI AGEVEEMLRI ESGDDTEVTL > MSGEKRMLIQ VPTARILADY SVGLTAAMGA LTHAIIDVCD VSMWDAPYVH AAVWGMYPQN > PDPADGAVKM LVDIPMKNEG PGFTLRNIPV NHLAATVRKR AIQGAALTMI LEEAAQFEMG > NCMGPHERGH LLDLAYEGLN ANNLLYNLVK DNGKGTLADV VYGLVEKAKA DGVIKPKKKM > GSGYVVYEAD DPQLWNAYAS AGLLAAVCVN CGAMRAGQSV PGCIMYYNVL LEHETGMPGV > DYGMTQGASV SSSFFSHSIY GGGGPGVFYG NHIVTRHPKG QFIPCFCASM CLDADTMYFS > PARTSALYGE VLGAIPEFAE PMKAVAGAV > // > > ID Q5W1H7_9EURY PRELIMINARY; PRT; 156 AA. > AC Q5W1H7; > DT 07-DEC-2004, integrated into UniProtKB/TrEMBL. > DT 07-DEC-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl-coenzyme M reductase, subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Juottonen H., Galand P.E., Tuittila E.S., Laine J., Fritze H., > RA Yrjala K.; > RT "Methanogen communities and Bacteria along an ecohydrological gradient > RT in a northern raised bog complex."; > RL Environ. Microbiol. 7:1547-1557(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ853818; CAH68737.1; -; Genomic_DNA. > DR SMR; Q5W1H7; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16826 MW; 933E29E6D0D3479E CRC64; > GGVGFTQYAT AAYTNDVLDD FCYYTVDYAN DKYGGFAKAP ATLEVAKDIA TESTLYSAEQ > YESFPTLLED HFGGSQRASV MAAASGIGAA IASGHSQIGL AGWYLSMLLH KEAWGRLGFF > GYDLQDQCGP TNVFSYQSDE GNPTELRGAN YPNYAM > // > > ID Q5W1G6_9EURY PRELIMINARY; PRT; 155 AA. > AC Q5W1G6; > DT 07-DEC-2004, integrated into UniProtKB/TrEMBL. > DT 07-DEC-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl-coenzyme M reductase, subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Juottonen H., Galand P.E., Tuittila E.S., Laine J., Fritze H., > RA Yrjala K.; > RT "Methanogen communities and Bacteria along an ecohydrological gradient > RT in a northern raised bog complex."; > RL Environ. Microbiol. 7:1547-1557(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ853829; CAH68748.1; -; Genomic_DNA. > DR SMR; Q5W1G6; 1-155. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 155 155 > SQ SEQUENCE 155 AA; 16687 MW; BFF954EC7783321D CRC64; > GGVGFTQYAT AAYTDNILDD YSYYGNDYAK KYGADGKAPS TMDVVNDLGT EVTLYGIEQY > EKYPTTLEDH FGGSQRATVL AAASGVTAAI ATGNSNAGLS AWYLSMLLHK DAWGRLGFYG > YDLQDQCGST NTFSVRSDEG APDELRGANY PNYAM > // > > ID Q5W1G5_9EURY PRELIMINARY; PRT; 156 AA. > AC Q5W1G5; > DT 07-DEC-2004, integrated into UniProtKB/TrEMBL. > DT 07-DEC-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl-coenzyme M reductase, subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Juottonen H., Galand P.E., Tuittila E.S., Laine J., Fritze H., > RA Yrjala K.; > RT "Methanogen communities and Bacteria along an ecohydrological gradient > RT in a northern raised bog complex."; > RL Environ. Microbiol. 7:1547-1557(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ853830; CAH68749.1; -; Genomic_DNA. > DR SMR; Q5W1G5; 1-156. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16861 MW; 01CD98138E827CCC CRC64; > GGVGFTQYAT AAYTDSILDD YCYYGLDYIK AKHGGLGKAK KTQEVLNDIA TEVTLYGMEQ > YEQYPTTLES HFGGSQRASV LAAAAGISCA PATANSNAGL NGWYMSMLAH KEGWSRLGFF > GYDLQDQCGS TNSMSIRPDE GCIGELRGPN YPNYAM > // > > ID Q5W1G4_9EURY PRELIMINARY; PRT; 156 AA. > AC Q5W1G4; > DT 07-DEC-2004, integrated into UniProtKB/TrEMBL. > DT 07-DEC-2004, sequence version 1. > DT 30-MAY-2006, entry version 8. > DE Methyl-coenzyme M reductase, subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Juottonen H., Galand P.E., Tuittila E.S., Laine J., Fritze H., > RA Yrjala K.; > RT "Methanogen communities and Bacteria along an ecohydrological gradient > RT in a northern raised bog complex."; > RL Environ. Microbiol. 7:1547-1557(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ853831; CAH68750.1; -; Genomic_DNA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 156 156 > SQ SEQUENCE 156 AA; 16855 MW; 7B44DA99BF5C1A14 CRC64; > GGVGFTQYAT ATYTDNILDD MCYNIADYVK KKYGGFTKAK VSMDTILDVA TEATEYGLDQ > YDMYPTLAET HFGGSQRTSV VSASAGVGVS LATGRAQAGV NGWYLSEILH KEIAGRLGFY > GYDAQDQMGA ANSFAFRGDE GLPMELRGPN YPNYAM > // > > ID Q5TIW9_9EURY PRELIMINARY; PRT; 155 AA. > AC Q5TIW9; > DT 21-DEC-2004, integrated into UniProtKB/TrEMBL. > DT 21-DEC-2004, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Methyl-coenzyme M reductase, subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Juottonen H., Galand P.E., Tuittila E.S., Laine J., Fritze H., > RA Yrjala K.; > RT "Methanogen communities and Bacteria along an ecohydrological gradient > RT in a northern raised bog complex."; > RL Environ. Microbiol. 7:1547-1557(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ862825; CAH89321.1; -; Genomic_DNA. > DR SMR; Q5TIW9; 1-155. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 155 155 > SQ SEQUENCE 155 AA; 16685 MW; A4EB54EB3E1D9469 CRC64; > GGVGFTQYAT AAYTDNILDD YAYYGNDYAK KYGADGKAPA TMDVVNDLGT EVTLYGIEQY > EKYPTTLEDH FGGSQRATVL AAASGVTTAI ATGNSNAGLS AWYLSMLLHK DAWGRLGFYG > YDLQDQCGST NTFSVRSDEG APDELRGANY PNYAM > // > > ID Q5NVZ6_9ARCH PRELIMINARY; PRT; 400 AA. > AC Q5NVZ6; > DT 04-JAN-2005, integrated into UniProtKB/TrEMBL. > DT 04-JAN-2005, sequence version 1. > DT 30-MAY-2006, entry version 9. > DE Eukaryotic type DNA primase, small subunit (EC 2.7.7.-). > GN Name=priA; ORFNames=orf10; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=16329940; DOI=10.1016/j.femsec.2004.12.004; > RA Erkel C., Kemnitz D., Kube M., Ricke P., Chin K.-J., Dedysh S., > RA Reinhardt R., Conrad R., Liesack W.; > RT "Retrieval of first genome data for rice cluster I methanogens by a > RT combination of cultivation and molecular techniques."; > RL FEMS Microbiol. Ecol. 53:187-204(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; CR626857; CAH04810.1; -; Genomic_DNA. > DR GO; GO:0003896; F:DNA primase activity; IEA. > DR GO; GO:0016779; F:nucleotidyltransferase activity; IEA. > DR GO; GO:0016740; F:transferase activity; IEA. > DR GO; GO:0006269; P:DNA replication, synthesis of RNA primer; IEA. > DR InterPro; IPR002755; DNA_primase_S. > DR PANTHER; PTHR10536; DNA_primase_S; 1. > DR Pfam; PF01896; DNA_primase_S; 2. > DR PIRSF; PIRSF005538; DNA_primase_S; 1. > DR TIGRFAMs; TIGR00335; primase_sml; 1. > KW Nucleotidyltransferase; Transferase. > SQ SEQUENCE 400 AA; 45530 MW; 482EDC1DAB000E66 CRC64; > MNDQTRAFLR ERFRDYYSRS KVFVPPGLAQ REWGFIFHEE TVGVAMRRHK AFNSEGELAD > YLRSMPPAHA YHSAAYYRHP QAPTMQEKDW LGADLIFDLD ADHLPGVKNM TYGEMLDNVK > VEIIRLIDEF LIDDLGFREK DMDIVFSGGR GYHVHVRDER VRTLKSPERR EIVDYLLGTG > LEPDRMFIRT NQRIDTGTTS VAGVWLIRGF DSVPGGWDRR VARHIVEKLD QIGRLPDKDA > KEALRAFSLE SKDVKRILHV ARDPASLQKI REKGLIELSG NLEGFFRSIL AGTIDQFKVS > LAGKTDEPVT ADIKRLIRLP GSIHGGSSFR VTPLTRAQLE SFNPLEDAII FSDDPVRVLV > TRPAVVEMKG KIYRVSEGVG RLPENVAMFL MCRGSADYEP > // > > ID Q5IEN3_9ARCH PRELIMINARY; PRT; 240 AA. > AC Q5IEN3; > DT 15-FEB-2005, integrated into UniProtKB/TrEMBL. > DT 15-FEB-2005, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Methyl coenzyme-M reductase (Fragment). > GN Name=mcrA; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15746419; DOI=10.1126/science.1102556; > RA Kelley D.S., Karson J.A., Fruh-Green G.L., Yoerger D.R., Shank T.M., > RA Butterfield D.A., Hayes J.M., Schrenk M.O., Olson E.J., > RA Proskurowski G., Jakuba M., Bradley A., Larson B., Ludwig K., > RA Glickson D., Buckman K., Bradley A.S., Brazelton W.J., Roe K., > RA Elend M.J., Delacour A., Bernasconi S.M., Lilley M.D., Baross J.A., > RA Summons R.E., Sylva S.P.; > RT "A serpentinite-hosted ecosystem: the Lost City hydrothermal field."; > RL Science 307:1428-1434(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY760635; AAW31860.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 240 240 > SQ SEQUENCE 240 AA; 25826 MW; A0C50445B5347CC7 CRC64; > QIGMSFINAY KMCAGESATR EFAFMAKHAS VVQMANYMPV RRARATSELG GLPLGITCDM > TRSPALFPND PVRAALESIA DGALLLDQLW FGTYMSGGVG FTQYASATYT DNILEDFCYK > AEEIAIDMFG DHCAAEPTME NIEKLVRAES EYCLTQYEAY PTVAESHFGG SVRAACQSAG > AAVAVACATG SADAALNGWA LAQLLHYASV GRLGFYGYDL QDQCTSSTSF SYRSDEGLPF > // > > ID Q5IEN2_9ARCH PRELIMINARY; PRT; 242 AA. > AC Q5IEN2; > DT 15-FEB-2005, integrated into UniProtKB/TrEMBL. > DT 15-FEB-2005, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Methyl coenzyme-M reductase (Fragment). > GN Name=mcrA; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15746419; DOI=10.1126/science.1102556; > RA Kelley D.S., Karson J.A., Fruh-Green G.L., Yoerger D.R., Shank T.M., > RA Butterfield D.A., Hayes J.M., Schrenk M.O., Olson E.J., > RA Proskurowski G., Jakuba M., Bradley A., Larson B., Ludwig K., > RA Glickson D., Buckman K., Bradley A.S., Brazelton W.J., Roe K., > RA Elend M.J., Delacour A., Bernasconi S.M., Lilley M.D., Baross J.A., > RA Summons R.E., Sylva S.P.; > RT "A serpentinite-hosted ecosystem: the Lost City hydrothermal field."; > RL Science 307:1428-1434(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY760636; AAW31861.1; -; Genomic_DNA. > DR SMR; Q5IEN2; 1-234. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 242 242 > SQ SEQUENCE 242 AA; 26091 MW; 87DF2FF18E27E82D CRC64; > SYSMCAGEAA VADLSYAAKH AGVIQMGDML PARRARSPNE PGGVSFGHLA DIIQTSRVKS > DDPTKVALEV IGAGCMLYDQ IWLGSYMSGR VGFTQYATAA YTDNILDDNM YYMVDYINEK > YNGAADKGVD NKVEATLDVV KDIATESTLY GLENYELYPT TLESHFGGSQ RATVLSAAAG > CSTSLATGNG NAGLSGWYLS MYLHKEAHGR LGFYGYDLQD QCGAANVFSY QSDEDCLLNF > VV > // > > ID Q5IEM9_9ARCH PRELIMINARY; PRT; 242 AA. > AC Q5IEM9; > DT 15-FEB-2005, integrated into UniProtKB/TrEMBL. > DT 15-FEB-2005, sequence version 1. > DT 30-MAY-2006, entry version 7. > DE Methyl coenzyme-M reductase (Fragment). > GN Name=mcrA; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=15746419; DOI=10.1126/science.1102556; > RA Kelley D.S., Karson J.A., Fruh-Green G.L., Yoerger D.R., Shank T.M., > RA Butterfield D.A., Hayes J.M., Schrenk M.O., Olson E.J., > RA Proskurowski G., Jakuba M., Bradley A., Larson B., Ludwig K., > RA Glickson D., Buckman K., Bradley A.S., Brazelton W.J., Roe K., > RA Elend M.J., Delacour A., Bernasconi S.M., Lilley M.D., Baross J.A., > RA Summons R.E., Sylva S.P.; > RT "A serpentinite-hosted ecosystem: the Lost City hydrothermal field."; > RL Science 307:1428-1434(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY760639; AAW31864.1; -; Genomic_DNA. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 242 242 > SQ SEQUENCE 242 AA; 25984 MW; 26DB7A0FE1509985 CRC64; > MSFINAYKMC AGESATGEFA FMAKHASVVQ MANYMPVRRA RATNELGGLP LGITCDMTRS > PALFPNDPVR AALESIAVGA LLLDQLWFGT YMSGGVGFTQ YASATYTDNI LEDFCYKAEE > IAIDMFGDHC AAEPTMENIE KLVRAESEYC LTQYEAYPTV AESHFGGSVR AACQSAGAAV > AVACATGSAD AALNGWALAQ LLHYASVGRL GFYGYDLQDQ CTSSTSFSYR SDEGLPFEMR > GA > // > > ID Q5EGK4_9EURY PRELIMINARY; PRT; 256 AA. > AC Q5EGK4; > DT 15-MAR-2005, integrated into UniProtKB/TrEMBL. > DT 15-MAR-2005, sequence version 1. > DT 30-MAY-2006, entry version 6. > DE McrA (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RA Kormas K.A., Meziti A., Dahlman A., de Lange G., Lykousis V.; > RT "Molecular fingerprinting of methanogens in eastern Mediterranean Sea > RT mud volcanoes."; > RL Geochim. Cosmochim. Acta 69:A77-A77(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY883171; AAW80300.1; -; Genomic_DNA. > DR SMR; Q5EGK4; 1-256. > DR GO; GO:0016782; F:transferase activity, transferring sulfur-c...; IEA. > DR GO; GO:0015948; P:methanogenesis; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR InterPro; IPR003183; MCR_alpha_N. > DR InterPro; IPR009024; MCR_fer_like. > DR Pfam; PF02249; MCR_alpha; 1. > DR Pfam; PF02745; MCR_alpha_N; 1. > FT NON_TER 1 1 > FT NON_TER 256 256 > SQ SEQUENCE 256 AA; 27697 MW; 37D4C4F5C8AE04C2 CRC64; > AMQIGMSFIS SYHMCAGEAA VADLAFTAKH AGLVEMSEML PARRARGPNE PGGLSFGHMC > DIVQTSRKFR DDPCKIALET CAAAMMLYDQ IWLGGYMSGG VGFTQYATSA YTNNTVDDNL > YADTEYGWDT FGTSIGDCKA PSIDIIRDMG TWGALYGLEL YENYPTALED HFGGSQRATV > ISTATGAACA ITTGNSNAGL SAWYLSMYLH KEAHGRLGFF GYDLQDQCGA TNVFSYQSDE > GLLAEMRGAN YPNYAM > // > > ID Q573A6_9ARCH PRELIMINARY; PRT; 201 AA. > AC Q573A6; > DT 10-MAY-2005, integrated into UniProtKB/TrEMBL. > DT 10-MAY-2005, sequence version 1. > DT 30-MAY-2006, entry version 6. > DE Methyl-coenzyme M reductase alpha subunit (EC 2.8.4.1) (Fragment). > GN Name=mcrA; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=16034418; DOI=10.1038/nature03796; > RA Webster G., Parkes J., Cragg B.A., Weightman A.J., Newberry C.J., > RA Ferdelman T., Kallmeyer J., Jorgensen J., Aiello B.B., Fry J.C.; > RT "Deep sub-seafloor prokaryotes stimulated at interfaces over > RT geological time."; > RL Nature 436:390-394(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AJ867766; CAI30322.1; -; Genomic_DNA. > DR SMR; Q573A6; 1-197. > DR GO; GO:0050524; F:coenzyme-B sulfoethylthiotransferase activity; IEA. > DR GO; GO:0016740; F:transferase activity; IEA. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > KW Transferase. > FT NON_TER 1 1 > FT NON_TER 201 201 > SQ SEQUENCE 201 AA; 21402 MW; 8D267F2311353444 CRC64; > VSMGEMLPAR RARGPNEPGG LSFGHLSDII QASRVSKDPA KIALEVVGAG CMLYDQIWLG > SYMSGGVGFT QYATAAYTDD ILDNNVYYDV DYINDKYNGA ANVGKDNKVK ASLDVVKDIA > TESTLYGIET YEKFPTALED HFGGSQRATV LAAAAGVACS LATANANAGL SGWYLSMYLH > KEAWGKLGSF GFDLQDQCGA P > // > > ID Q4U340_9ARCH PRELIMINARY; PRT; 613 AA. > AC Q4U340; > DT 05-JUL-2005, integrated into UniProtKB/TrEMBL. > DT 05-JUL-2005, sequence version 1. > DT 30-MAY-2006, entry version 5. > DE Alpha glucosidase (Fragment). > OS archaeon #33-9. > OC Archaea. > OX NCBI_TaxID=328513; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RC STRAIN=#33-9; > RA Ng C.C., Shyu Y.T.; > RT "Thermo-stable protein alpha glucosidase translated from cell-free > RT expression system from thermophile."; > RL Submitted (APR-2005) to the EMBL/GenBank/DDBJ databases. > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; DQ017387; AAY42612.1; -; Genomic_DNA. > DR GO; GO:0004339; F:glucan 1,4-alpha-glucosidase activity; IEA. > DR GO; GO:0005976; P:polysaccharide metabolism; IEA. > DR InterPro; IPR008928; 6hp_glycosidase. > DR InterPro; IPR011613; Glyco_hydro_15_rel. > DR InterPro; IPR012343; Glyco_trans_sub. > DR Pfam; PF00723; Glyco_hydro_15; 1. > FT NON_TER 1 1 > SQ SEQUENCE 613 AA; 71334 MW; 32E033F5BF8C4D8B CRC64; > RMLINFDEKG RIVDIYYPYI GMENQTSGNP IRLAIWDKDK NTISLDEEWE TTVLYLDEAN > MVEIRSDIRK LGLSLLSYNF LDPDDPIYMS IIKVANNENN NRNIKIFFIH DINLYSNPFG > DTAFYDPLSL SIVHYKSKRY LAFKVFTTTS IFSEYNVGKG DLIGDIYDGN LGLNGIENGD > VNSSMGIEAN IDPNSYVNLY YVIVADRNLE DLRQKIRKIN FANVETSFTL TYMFWRNWLK > KNKLFRNSLM QDIKRVYDVS LFVIRNHMDI NGSIIASSDF SFVKIYGDSY QYCWPRDAAI > AAYALDLAGY KELALRHFQF ISNIVNSEGF LYHKYNPNTT LASSWHPWFY KGKRIYPIQE > DETALEVWAI VSHYEKYEDI DEILPLYKKF VKPALKFMMS FMEEGLPKPS FDLWEERYGI > HIYTVSTVYG ALTKGAKLAY DVGDEILSED LSDTSGLLKG MVLKRMTYNG RFIRRIDEEN > NQDLTVDSSL YAPFFFGLVD ANDKIMINTI NEIENKLTVN GGVIRYENDM YQRRKKQPNP > WIITTLWLAE YYATINDKNK ANEYIKWVIN RALPTGFLPE QVDPETFEPT SVTPLVWSHA > EFIIAINKLL NHI > // > > ID Q4QXS4_9EURY PRELIMINARY; PRT; 154 AA. > AC Q4QXS4; > DT 19-JUL-2005, integrated into UniProtKB/TrEMBL. > DT 19-JUL-2005, sequence version 1. > DT 30-MAY-2006, entry version 6. > DE Methyl-coenzyme M reductase subunit A (Fragment). > GN Name=mcrA; > OS uncultured methanogenic archaeon. > OC Archaea; Euryarchaeota; environmental samples. > OX NCBI_TaxID=198240; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=16329940; DOI=10.1016/j.femsec.2004.12.004; > RA Erkel C., Kemnitz D., Kube M., Ricke P., Chin K.-J., Dedysh S., > RA Reinhardt R., Conrad R., Liesack W.; > RT "Retrieval of first genome data for rice cluster I methanogens by a > RT combination of cultivation and molecular techniques."; > RL FEMS Microbiol. Ecol. 53:187-204(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY683452; AAV85896.1; -; Genomic_DNA. > DR SMR; Q4QXS4; 1-154. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 154 154 > SQ SEQUENCE 154 AA; 16794 MW; E06374541C3AD2FC CRC64; > GGVGFTQYAT AAYTDDILDD FCYYGYDYIK GKYGIAKAKP TMDVVNDIGT EVTLYGIEQY > EKYPTTLEDH FGGSQRATVL SAAAGVTTSL ATGNANAGLS AWYLSMYLHK EAWGRLGFFG > YDLQDQCGAT NVFSCRSDEG AIDELRGPNY PNYA > // > > ID Q49KC1_9ARCH PRELIMINARY; PRT; 176 AA. > AC Q49KC1; > DT 13-SEP-2005, integrated into UniProtKB/TrEMBL. > DT 13-SEP-2005, sequence version 1. > DT 30-MAY-2006, entry version 5. > DE Methyl coenzyme M reductase (Fragment). > GN Name=mcr; > OS uncultured archaeon. > OC Archaea; environmental samples. > OX NCBI_TaxID=115547; > RN [1] > RP NUCLEOTIDE SEQUENCE. > RX PubMed=16085853; DOI=10.1128/AEM.71.8.4592-4601.2005; > RA Dhillon A., Lever M., Lloyd K.G., Albert D.B., Sogin M.L., Teske A.; > RT "Methanogen diversity evidenced by molecular characterization of > RT methyl coenzyme M reductase A (mcrA) genes in hydrothermal sediments > RT of the Guaymas Basin."; > RL Appl. Environ. Microbiol. 71:4592-4601(2005). > CC ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http:// > www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ----------------------------------------------------------------------- > DR EMBL; AY837766; AAX46041.1; -; Genomic_DNA. > DR SMR; Q49KC1; 1-176. > DR InterPro; IPR008924; MCR_a/b_chain_C. > DR InterPro; IPR009047; MCR_alpha_C. > DR Pfam; PF02249; MCR_alpha; 1. > FT NON_TER 1 1 > FT NON_TER 176 176 > SQ SEQUENCE 176 AA; 19653 MW; 736595D964EAC724 CRC64; > YDQIWLGSYM SGGVGFTQYA TAAYTDNILD DFTYYGMDYL HDKYKIDWKN PNPKDKVKAT > QEVVNDIAAE VNLYSMEQYE QFPTMMEDHF GGSQRAAVLG AACGLTTSIA TGNSNAGLNG > WYLSMLMHKD GWSRLGFFGY DLQDQCGSAN SLSIRPDEGC IGEFRGPNYP NYAMNV > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Tue Sep 5 11:45:34 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Sep 2006 16:45:34 +0100 Subject: [Bioperl-l] min. perl ver. for next release, was Bioperl tests In-Reply-To: References: <000001c6ce10$04deb2a0$15327e82@pyrimidine> <44FA3EC3.3040903@infotech.monash.edu.au> <9743018B-E89D-475A-808E-37598FC547F4@uiuc.edu> <44FA9632.6070601@sendu.me.uk> <0CD17008-66A9-408A-AAA0-8CD90A280617@uiuc.edu> Message-ID: <44FD9B9E.4050206@sendu.me.uk> Hilmar Lapp wrote: > On Sep 3, 2006, at 10:31 AM, Chris Fields wrote: > >> perl v. 5.8 has been out since late 2002, so I think it's feasible. > > Just a note - it really doesn't matter at all how long these versions > have been available. What you need to know from people is how many > are still on versions prior to the one you want to make a requirement. > > For those who are it is unlikely that they are because they enjoy > running an ages old version of perl. More likely, the sysadmin is > unwilling or cannot upgrade perl, e.g. due to side effects. Bioperl > requiring a higher version will not change this. But there may also be lots of people who just never had any reason to upgrade. > By requiring a higher version you are dropping support for these > people Even if no module in Bioperl required the latest Perl to function, to what extent do we support people on older Perls anyway? How can we claim the latest version of Bioperl is compatible with 5.005 (or whatever), when few (none?) of the developers test the code fully on earlier versions of Perl? How do we properly support something we don't use? Anyway, support for older version of Perl shouldn't be dropped 'just because', but perhaps only when something critical can no longer be done at all under old Perls. For those that dislike all the backward-compatibility cruft necessary, one would suppose that the version of Bioperl released for Perl 6 would be a fresh, clean start. From lincoln.stein at gmail.com Tue Sep 5 13:43:13 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 5 Sep 2006 13:43:13 -0400 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <002201c6d0fa$afc73fb0$15327e82@pyrimidine> References: <44FD4D8D.3000506@biomax.com> <002201c6d0fa$afc73fb0$15327e82@pyrimidine> Message-ID: <6dce9a0b0609051043u14ccfc0dta0f3bff25d38efd2@mail.gmail.com> I think that we need to require at least 5.6.1. I don't want to test against earlier versions. Lincoln On 9/5/06, Chris Fields wrote: > > > Torsten Seemann wrote: > > > >> Frankly, I think we should change that to v 5.8; it's been out for > > > >> over three years now. perl 5.10 isn't too far off (let alone > Perl6) > > > >> and perl 5.005, according to CPAN, is 8 years old. > > > > I remember that a while back the same kind of discussion took place on > > the Parrot mailing list, http://www.parrotcode.org. > > The result was to require 5.6.1 or later, as perl 5.6.0 wasn't very > > stable. > > > > With 5.6.1 required, some cleanup could be done, e.g. > > - Replace 'use vars qw( $dummy );' with 'our $dummy;' > > - Get rid of IO::Scalar. > > > > > > Just my $0.02, > > > > Bernhard Schmalhofer > > I remember something about problems with v5.6. So I agree here: at the > very > least, we should be using 5.6.1. We probably should think about syntax > updates as > > My point with requiring v. 5.8 is we can take advantage of several > features > present in v. 5.8 and not present in v. 5.6; Test::More was only one of > them. Torsten pointed out several features that were major changes > between > 5.6 and 5.8. However, Hilmar also has a strong point about leaving those > perl 5.6 users behind. > > Hilmar, do you have any suggestions on how we would poll users for their > perl versions? I suppose we could do something like that here if needed. > > Anyway, until we know more we could stick with requiring v. 5.6.1 and > strongly recommending v. 5.8, and move to a 5.8 requirement later (maybe > for > bioperl v. 1.6). As for Test::More, we could always include it as a > requirement along with v. 5.6.1 if needed, or include it in > Bundle::Bioperl. > Or (most extreme) just include it in the distribution like we currently do > with Test (not my favorite option, just more bioperl-core bloat). > > Chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue Sep 5 14:31:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 13:31:10 -0500 Subject: [Bioperl-l] min. perl ver. for next release, was Bioperl tests In-Reply-To: <44FD9B9E.4050206@sendu.me.uk> Message-ID: <003b01c6d119$75f75990$15327e82@pyrimidine> ... > But there may also be lots of people who just never had any reason to > upgrade. There are plenty of reasons to upgrade, just check Torsten's post for v. 5.6 and v.5.8. And, like I mentioned before, v. 5.10 is not far off either. The wiki states that v5.6 is required; based on Bernhard's and Lincoln's comments I think we should stick with a minimum of v. 5.6.1 for now in the next release. > Even if no module in Bioperl required the latest Perl to function, to > what extent do we support people on older Perls anyway? > > How can we claim the latest version of Bioperl is compatible with 5.005 > (or whatever), when few (none?) of the developers test the code fully on > earlier versions of Perl? How do we properly support something we don't > use? > > Anyway, support for older version of Perl shouldn't be dropped 'just > because', but perhaps only when something critical can no longer be done > at all under old Perls. I don't think we would drop support 'just because.' If anything, we want the ability to use the many advantages that v. 5.8 offers. As for extra modules (like my Test::More) we could add those to Bundle::BioPerl as a prereq. Test::More supposedly works for 5.0045 and up. Requiring that we continually support a version of perl that is eight years old and is no longer actively maintained (v5.005, which is what the INSTALL docs state) doesn't make much sense. In my opinion, we effectively hobble ourselves by not allowing the developers to utilize advantages in newer versions of perl. > For those that dislike all the backward-compatibility cruft necessary, > one would suppose that the version of Bioperl released for Perl 6 would > be a fresh, clean start. Yes, but then, using the same logic, we would run into the same problems down the road with Perl6.8. At some point we have to address how long we can support older versions of perl. Three years past the perl release? Ten? 'Indefinitely' is not really the best answer. We could use a rough 'support window', where we could stop actively supporting perl versions five years older than the bioperl version, using v5.6.1 here as an example. We could also recommend (push?) v5.8. Here, by 'active support' I mean if certain modules don't work b/c the perl version is too old (5.005 or less), we will not modify code to support it. At the very least, it will not be our top priority. Chris From cjfields at uiuc.edu Tue Sep 5 16:04:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 15:04:25 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <6dce9a0b0609051043u14ccfc0dta0f3bff25d38efd2@mail.gmail.com> Message-ID: <000001c6d126$802bbbb0$15327e82@pyrimidine> > I think that we need to require at least 5.6.1. I don't want to test > against > earlier versions. > > Lincoln I agree. Perl 5.005 is pretty old. As for using Test::More, we could add Test::Simple to the bioperl /t directory instead of Test and use a similar eval{} test to check for a local installation of Test::More, using the one prepackaged with Bioperl as a fallback. Alternatively, we can add Test::Simple prereqs to Makefile.PL and maybe get Chris D. to add it to Bundle::BioPerl. Test::Simple should work for older versions of perl, even 5.0045. Chris From cjfields at uiuc.edu Tue Sep 5 16:36:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Sep 2006 15:36:28 -0500 Subject: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? In-Reply-To: <6C9B421D-ED96-4458-8C2D-399614CB80B7@gmx.net> Message-ID: <000501c6d12a$f70661a0$15327e82@pyrimidine> Hilmar, > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, September 04, 2006 10:47 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; Torsten Seemann > Subject: Re: [Bioperl-l] Bioperl tests: Test, Test::Simple, Test::More? > > A year or so ago some key servers at the Sanger were still on 5.005. > > I believe due to some code constructs (regex or some such?) in a > couple of modules we are now effectively requiring 5.6.x. Mac OSX > 10.2 will have 5.6.0, so requiring a higher version means you do not > support Jaguar anymore. > > -hilmar You can install a newer (v 5.8) version of perl on Jaguar, if needed. Apple has some directions on how you could do this: http://developer.apple.com/internet/opensource/perl.html Anyway, we're getting off the subject (tests). I have tried adding Test::Simple's lib directory contents (Test/) to t/ in my local bioperl-live checkout and ran tests with perl 5.6 (which doesn't have Test::Simple installed) and EUtilities.t, which requires Test::More in Test::Simple. It seems to work fine. I just ran a simple eval{} similar to what we use for Test.pm currently in all the tests. The Test::Simple directory is ~125 KB, so isn't too big compared to some of the test data we've added over the years. Do you think it's worth adding Test::Simple to CVS? It would give us a bit more flexibility here and we wouldn't have to worry about the perl versioning issues. Chris From daniel.lang at biologie.uni-freiburg.de Wed Sep 6 05:11:42 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed, 06 Sep 2006 11:11:42 +0200 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: References: Message-ID: <44FE90CE.6000000@biologie.uni-freiburg.de> Hi Brian, I'm iterating now over all uniprot_trembl sequences and record for which retrieval fails - Lets see if STANDARDs also fail... How is the second field of the swissprot ID line handled anyway? Because PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss. On the other side I'm still confused why there's no error or warning when the retrieval fails. Can you give me a hint which modules (besides swiss.pm) to look at? Cheers, Daniel Brian Osborne wrote: > Daniel, > > Well, if you can isolate the bug please add it to bugzilla. > > Brian O. > > > On 9/5/06 5:57 AM, "Daniel Lang" > wrote: > >> Hi Brian, >> >> sorry for the belated response! >> I've compiled you a set of 100 PRELIMINARY entries from the latest >> uniprot_trembl release. I've tried to reproduce the bug using only these >> as input to build an index, but (sadly) all of them can be retrieved >> using the latest checkout:-( >> Maybe its not connected to these entries after all, but the size or some >> other feature of the uniprot distribution? >> I now could make it work using the 1.5.1 release. >> >> Originally, I've built the index using flat protocol, when I try bdb and >> bioperl-live even more problems occur: >> >> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c uniprot_sprot.dat >> >> ------------- EXCEPTION ------------- >> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, >> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, >> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two >> non-consecutive nodes with the same name. Can't cope! >> STACK Bio::DB::Taxonomy::list::add_lineage >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 >> STACK Bio::DB::Taxonomy::list::new >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 >> STACK Bio::DB::Taxonomy::new >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 >> STACK Bio::Species::classification >> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 >> STACK Bio::SeqIO::swiss::_read_swissprot_Species >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 >> STACK Bio::SeqIO::swiss::next_seq >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 >> STACK Bio::DB::Flat::parse_one_record >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 >> STACK Bio::DB::Flat::BDB::_index_file >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 >> STACK Bio::DB::Flat::BDB::build_index >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 >> STACK toplevel >> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 >> >> But I think this is connected to the new changes to taxonomy handling in >> Bio::Taxon... >> I'm unsure wether to submit this separately, but I could also provide an >> example of such a swissprot entry that causes this error. >> >> Thanks, again. >> >> Daniel >> >> Brian Osborne wrote: >>> Daniel, >>> >>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with such a >>> PRELIMINARY entry? >>> >>> Brian O. >>> >>> >>> On 9/1/06 6:11 AM, "Daniel Lang" >>> wrote: >>> >>>> Hi, >>>> >>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from >>>> local indexed uniprot *.dats, I had to realize that several entries >>>> could not be retrieved despite the fact that they are present in the >>>> files! A closer look reveals that they are of status PRELIMINARY: >>>> >>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. >>>> >>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. >>>> I also can't retrieve the sequences from the online database defined as >>>> follows: >>>> [swissprot_ebi] >>>> protocol=biofetch >>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch >>>> dbname=swall >>>> >>>> Is this a bug or a feature? If its a feature, how can I bypass it? >>>> >>>> Thanks in advance, >>>> Daniel >>> >> >> From cjfields at uiuc.edu Wed Sep 6 08:31:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Sep 2006 07:31:01 -0500 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FE90CE.6000000@biologie.uni-freiburg.de> References: <44FE90CE.6000000@biologie.uni-freiburg.de> Message-ID: <6BA761DC-6A97-4DCD-95AA-73F6177262AB@uiuc.edu> Daniel, Could you add a bug report to Bugzilla that describes the problems (conversion to STANDARD)? Attach an example input file with PRELIMINARY and an example output file which ends up as STANDARD. The lack of warning with no retrieval is odd but not unheard of. Specifically, it may be something wrong with biofetch's way of retrieving remote data (not Bio::Registry). We can probably add a check for that. Chris On Sep 6, 2006, at 4:11 AM, Daniel Lang wrote: > Hi Brian, > > I'm iterating now over all uniprot_trembl sequences and record for > which > retrieval fails - Lets see if STANDARDs also fail... > > How is the second field of the swissprot ID line handled anyway? > Because > PRELIMINARYs end up as STANDARD when being parsed by > Bio::SeqIO::swiss. > > On the other side I'm still confused why there's no error or warning > when the retrieval fails. Can you give me a hint which modules > (besides > swiss.pm) to look at? > > Cheers, > Daniel > > Brian Osborne wrote: >> Daniel, >> >> Well, if you can isolate the bug please add it to bugzilla. >> >> Brian O. >> >> >> On 9/5/06 5:57 AM, "Daniel Lang" > freiburg.de> >> wrote: >> >>> Hi Brian, >>> >>> sorry for the belated response! >>> I've compiled you a set of 100 PRELIMINARY entries from the latest >>> uniprot_trembl release. I've tried to reproduce the bug using >>> only these >>> as input to build an index, but (sadly) all of them can be retrieved >>> using the latest checkout:-( >>> Maybe its not connected to these entries after all, but the size >>> or some >>> other feature of the uniprot distribution? >>> I now could make it work using the 1.5.1 release. >>> >>> Originally, I've built the index using flat protocol, when I try >>> bdb and >>> bioperl-live even more problems occur: >>> >>> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c >>> uniprot_sprot.dat >>> >>> ------------- EXCEPTION ------------- >>> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, >>> Vertebrata, >>> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, >>> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had >>> two >>> non-consecutive nodes with the same name. Can't cope! >>> STACK Bio::DB::Taxonomy::list::add_lineage >>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 >>> STACK Bio::DB::Taxonomy::list::new >>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 >>> STACK Bio::DB::Taxonomy::new >>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 >>> STACK Bio::Species::classification >>> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 >>> STACK Bio::SeqIO::swiss::_read_swissprot_Species >>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 >>> STACK Bio::SeqIO::swiss::next_seq >>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 >>> STACK Bio::DB::Flat::parse_one_record >>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 >>> STACK Bio::DB::Flat::BDB::_index_file >>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 >>> STACK Bio::DB::Flat::BDB::build_index >>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 >>> STACK toplevel >>> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl: >>> 113 >>> >>> But I think this is connected to the new changes to taxonomy >>> handling in >>> Bio::Taxon... >>> I'm unsure wether to submit this separately, but I could also >>> provide an >>> example of such a swissprot entry that causes this error. >>> >>> Thanks, again. >>> >>> Daniel >>> >>> Brian Osborne wrote: >>>> Daniel, >>>> >>>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file >>>> with such a >>>> PRELIMINARY entry? >>>> >>>> Brian O. >>>> >>>> >>>> On 9/1/06 6:11 AM, "Daniel Lang" >>> freiburg.de> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> when using Bio::Registry (bioperl-live) to fetch uniprot >>>>> entries from >>>>> local indexed uniprot *.dats, I had to realize that several >>>>> entries >>>>> could not be retrieved despite the fact that they are present >>>>> in the >>>>> files! A closer look reveals that they are of status PRELIMINARY: >>>>> >>>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; >>>>> 222 AA. >>>>> >>>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. >>>>> I also can't retrieve the sequences from the online database >>>>> defined as >>>>> follows: >>>>> [swissprot_ebi] >>>>> protocol=biofetch >>>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch >>>>> dbname=swall >>>>> >>>>> Is this a bug or a feature? If its a feature, how can I bypass it? >>>>> >>>>> Thanks in advance, >>>>> Daniel >>>> >>> >>> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From koski at cenix-bioscience.com Wed Sep 6 08:38:33 2006 From: koski at cenix-bioscience.com (Liisa Koski) Date: Wed, 6 Sep 2006 14:38:33 +0200 Subject: [Bioperl-l] Entrezgene.pm $uncaptured data Message-ID: <200609061438.33155.koski@cenix-bioscience.com> Hi, I'm curious about the $uncaptured data returned from the entrezgene parser. I can't seem to find any info on it. The $gene and $genestructure data is straightforward (see below) but can someone please tell me how the $uncaptured data is structured? I'm trying to pull out Conserved Domain information and I'm pretty sure it's contained in the $uncaptured data. Thanks, Liisa Below is taken from the POD description: my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => $file, -debug => 'on' ); my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; The $genestructure is a Bio::Cluster::SequenceFamily object. It contains all Refseqs and the genomic contigs that are associated with the particular gene. ....what object does $uncaptured refer to? If any? From osborne1 at optonline.net Wed Sep 6 09:49:41 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 06 Sep 2006 09:49:41 -0400 Subject: [Bioperl-l] Entrezgene.pm $uncaptured data In-Reply-To: <200609061438.33155.koski@cenix-bioscience.com> Message-ID: Liisa, $uncaptured is a reference to a plain array, not an object. Brian O. On 9/6/06 8:38 AM, "Liisa Koski" wrote: > Hi, > I'm curious about the $uncaptured data returned from the entrezgene parser. I > can't seem to find any info on it. The $gene and $genestructure data is > straightforward (see below) but can someone please tell me how the > $uncaptured data is structured? I'm trying to pull out Conserved Domain > information and I'm pretty sure it's contained in the $uncaptured data. > > Thanks, > Liisa > > Below is taken from the POD description: > > my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > -file => $file, > -debug => 'on' ); > my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; > > The $genestructure is a Bio::Cluster::SequenceFamily object. It > contains all Refseqs and the genomic contigs that are associated with > the particular gene. > > > ....what object does $uncaptured refer to? If any? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed Sep 6 10:24:17 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 06 Sep 2006 10:24:17 -0400 Subject: [Bioperl-l] Entrezgene.pm $uncaptured data In-Reply-To: <200609061438.33155.koski@cenix-bioscience.com> Message-ID: Liisa, It looks to me like the CDD information is in the SequenceFamily object. Did you try and look at the Annotations associated with the sequences in the SequenceFamily? Brian O. On 9/6/06 8:38 AM, "Liisa Koski" wrote: > $uncaptured data is structured? I'm trying to pull out Conserved Domain > information From cjfields at uiuc.edu Wed Sep 6 10:59:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Sep 2006 09:59:01 -0500 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FE90CE.6000000@biologie.uni-freiburg.de> Message-ID: <001e01c6d1c5$0128ee60$15327e82@pyrimidine> Brian, I have found the issue with Bio::SeqIO::swiss; apparently UniProt has switched to using the following ID line format: ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE; SEQUENCE_LENGTH. For SwissProt ID's ID CYC_BOVIN STANDARD; PRT; 104 AA. ID GIA2_GIALA STANDARD; PRT; 296 AA. For TrEMBL (preliminary protein): ID Q5XPV6 PRELIMINARY; PRT; 231 AA. SeqIO 'swiss' sequence output currently uses the first (SwissProt) version; it's hardcoded in a sprintf() statement. I guess TrEMBL didn't have a designation before, so this complicates things a little. There are a few other (small) formatting differences I have also found which we could update fairly easily. In the section of the release notes describing differences between SwissProt/EMBL format, this is listed: * EMBL entry ID lines have an additional three-letter taxonomic division 'token' inserted between the data class and the molecule type; I suppose we could use division() to store 'STANDARD' and 'PRELIMINARY' (or 'Swiss-Prot' and 'TrEMBL' if that's nicer). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Daniel Lang > Sent: Wednesday, September 06, 2006 4:12 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using > Bio::Registry fails > > Hi Brian, > > I'm iterating now over all uniprot_trembl sequences and record for which > retrieval fails - Lets see if STANDARDs also fail... > > How is the second field of the swissprot ID line handled anyway? Because > PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss. > > On the other side I'm still confused why there's no error or warning > when the retrieval fails. Can you give me a hint which modules (besides > swiss.pm) to look at? > > Cheers, > Daniel > > Brian Osborne wrote: > > Daniel, > > > > Well, if you can isolate the bug please add it to bugzilla. > > > > Brian O. > > > > > > On 9/5/06 5:57 AM, "Daniel Lang" > > wrote: > > > >> Hi Brian, > >> > >> sorry for the belated response! > >> I've compiled you a set of 100 PRELIMINARY entries from the latest > >> uniprot_trembl release. I've tried to reproduce the bug using only > these > >> as input to build an index, but (sadly) all of them can be retrieved > >> using the latest checkout:-( > >> Maybe its not connected to these entries after all, but the size or > some > >> other feature of the uniprot distribution? > >> I now could make it work using the 1.5.1 release. > >> > >> Originally, I've built the index using flat protocol, when I try bdb > and > >> bioperl-live even more problems occur: > >> > >> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c > uniprot_sprot.dat > >> > >> ------------- EXCEPTION ------------- > >> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, > >> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, > >> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two > >> non-consecutive nodes with the same name. Can't cope! > >> STACK Bio::DB::Taxonomy::list::add_lineage > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 > >> STACK Bio::DB::Taxonomy::list::new > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 > >> STACK Bio::DB::Taxonomy::new > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 > >> STACK Bio::Species::classification > >> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 > >> STACK Bio::SeqIO::swiss::_read_swissprot_Species > >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 > >> STACK Bio::SeqIO::swiss::next_seq > >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 > >> STACK Bio::DB::Flat::parse_one_record > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 > >> STACK Bio::DB::Flat::BDB::_index_file > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 > >> STACK Bio::DB::Flat::BDB::build_index > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 > >> STACK toplevel > >> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 > >> > >> But I think this is connected to the new changes to taxonomy handling > in > >> Bio::Taxon... > >> I'm unsure wether to submit this separately, but I could also provide > an > >> example of such a swissprot entry that causes this error. > >> > >> Thanks, again. > >> > >> Daniel > >> > >> Brian Osborne wrote: > >>> Daniel, > >>> > >>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with > such a > >>> PRELIMINARY entry? > >>> > >>> Brian O. > >>> > >>> > >>> On 9/1/06 6:11 AM, "Daniel Lang" freiburg.de> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from > >>>> local indexed uniprot *.dats, I had to realize that several entries > >>>> could not be retrieved despite the fact that they are present in the > >>>> files! A closer look reveals that they are of status PRELIMINARY: > >>>> > >>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. > >>>> > >>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. > >>>> I also can't retrieve the sequences from the online database defined > as > >>>> follows: > >>>> [swissprot_ebi] > >>>> protocol=biofetch > >>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch > >>>> dbname=swall > >>>> > >>>> Is this a bug or a feature? If its a feature, how can I bypass it? > >>>> > >>>> Thanks in advance, > >>>> Daniel > >>> > >> > >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed Sep 6 12:41:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 06 Sep 2006 12:41:50 -0400 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <001e01c6d1c5$0128ee60$15327e82@pyrimidine> Message-ID: Chris, Yes, I saw this but was waiting for Daniel's sample. division() is not a great way to set this value since it's meant for taxonomic "divisions" (e.g. "PRI" in Genbank). On the other hand what else is there? authority() doesn't seem right either. What about: $seq->seq_version($DATA_CLASS) None of them are ideal but this is the closest, in my opinion. Then "Swiss-prot" and "TrEMBL" could be set by namespace() or authority(). Brian O. On 9/6/06 10:59 AM, "Chris Fields" wrote: > Brian, > > I have found the issue with Bio::SeqIO::swiss; apparently UniProt has > switched to using the following ID line format: > > ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE; SEQUENCE_LENGTH. > > For SwissProt ID's > > ID CYC_BOVIN STANDARD; PRT; 104 AA. > ID GIA2_GIALA STANDARD; PRT; 296 AA. > > For TrEMBL (preliminary protein): > > ID Q5XPV6 PRELIMINARY; PRT; 231 AA. > > SeqIO 'swiss' sequence output currently uses the first (SwissProt) version; > it's hardcoded in a sprintf() statement. I guess TrEMBL didn't have a > designation before, so this complicates things a little. > > There are a few other (small) formatting differences I have also found which > we could update fairly easily. > > In the section of the release notes describing differences between > SwissProt/EMBL format, this is listed: > > * EMBL entry ID lines have an additional three-letter taxonomic division > 'token' inserted between the data class and the molecule type; > > I suppose we could use division() to store 'STANDARD' and 'PRELIMINARY' (or > 'Swiss-Prot' and 'TrEMBL' if that's nicer). > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Daniel Lang >> Sent: Wednesday, September 06, 2006 4:12 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using >> Bio::Registry fails >> >> Hi Brian, >> >> I'm iterating now over all uniprot_trembl sequences and record for which >> retrieval fails - Lets see if STANDARDs also fail... >> >> How is the second field of the swissprot ID line handled anyway? Because >> PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss. >> >> On the other side I'm still confused why there's no error or warning >> when the retrieval fails. Can you give me a hint which modules (besides >> swiss.pm) to look at? >> >> Cheers, >> Daniel >> >> Brian Osborne wrote: >>> Daniel, >>> >>> Well, if you can isolate the bug please add it to bugzilla. >>> >>> Brian O. >>> >>> >>> On 9/5/06 5:57 AM, "Daniel Lang" >>> wrote: >>> >>>> Hi Brian, >>>> >>>> sorry for the belated response! >>>> I've compiled you a set of 100 PRELIMINARY entries from the latest >>>> uniprot_trembl release. I've tried to reproduce the bug using only >> these >>>> as input to build an index, but (sadly) all of them can be retrieved >>>> using the latest checkout:-( >>>> Maybe its not connected to these entries after all, but the size or >> some >>>> other feature of the uniprot distribution? >>>> I now could make it work using the 1.5.1 release. >>>> >>>> Originally, I've built the index using flat protocol, when I try bdb >> and >>>> bioperl-live even more problems occur: >>>> >>>> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c >> uniprot_sprot.dat >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, >>>> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, >>>> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two >>>> non-consecutive nodes with the same name. Can't cope! >>>> STACK Bio::DB::Taxonomy::list::add_lineage >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 >>>> STACK Bio::DB::Taxonomy::list::new >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 >>>> STACK Bio::DB::Taxonomy::new >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 >>>> STACK Bio::Species::classification >>>> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 >>>> STACK Bio::SeqIO::swiss::_read_swissprot_Species >>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 >>>> STACK Bio::SeqIO::swiss::next_seq >>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 >>>> STACK Bio::DB::Flat::parse_one_record >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 >>>> STACK Bio::DB::Flat::BDB::_index_file >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 >>>> STACK Bio::DB::Flat::BDB::build_index >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 >>>> STACK toplevel >>>> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 >>>> >>>> But I think this is connected to the new changes to taxonomy handling >> in >>>> Bio::Taxon... >>>> I'm unsure wether to submit this separately, but I could also provide >> an >>>> example of such a swissprot entry that causes this error. >>>> >>>> Thanks, again. >>>> >>>> Daniel >>>> >>>> Brian Osborne wrote: >>>>> Daniel, >>>>> >>>>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with >> such a >>>>> PRELIMINARY entry? >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> On 9/1/06 6:11 AM, "Daniel Lang" > freiburg.de> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from >>>>>> local indexed uniprot *.dats, I had to realize that several entries >>>>>> could not be retrieved despite the fact that they are present in the >>>>>> files! A closer look reveals that they are of status PRELIMINARY: >>>>>> >>>>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. >>>>>> >>>>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. >>>>>> I also can't retrieve the sequences from the online database defined >> as >>>>>> follows: >>>>>> [swissprot_ebi] >>>>>> protocol=biofetch >>>>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch >>>>>> dbname=swall >>>>>> >>>>>> Is this a bug or a feature? If its a feature, how can I bypass it? >>>>>> >>>>>> Thanks in advance, >>>>>> Daniel >>>>> >>>> >>>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fgarret at ub.edu Wed Sep 6 13:45:05 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Wed, 06 Sep 2006 19:45:05 +0200 Subject: [Bioperl-l] Change in BioPerl module Message-ID: <44FF0921.8030702@ub.edu> Hi all, I'm using Bioperl v1.4 and at the "SimpleAlign.pm" module I think that a minor change could be made to the "gap_line" function in order to use the gap character (defined in $self->gap_char) and not '-'. change line 1228 to => my $gap= ($refchar eq $self->gap_char); change line 1230 to => $gap= 1 if( $seq->[$pos] eq $self->gap_char); What do you think? Thanks, FG From cjfields at uiuc.edu Wed Sep 6 14:09:42 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Sep 2006 13:09:42 -0500 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: Message-ID: <000601c6d1df$a361b6c0$15327e82@pyrimidine> Brian There are problem with the way the division is parsed in SeqIO::swiss (it is pulled from the entry name 'xxxx_yyyy', after the underscore). This leads to some weird 'division' designations, like 'CHLTR', '9PICO', etc. What we probably should do is not set division() at all (or set it to 'unknown') since the release notes specify SwissProt/TrEMBL doesn't use them (and, looking at our old test sequences, never used them). We could always leave it the way it is, but I don't think using these odd designations for divisions makes much sense. I'll probably use namespace() as it seems to fit the closest to storing the specific database name (Swiss-Prot or TrEMBL). I'm using version() to hold the current number for the entry version (latest update version), which should probably go in seq_version() instead. As SwissProt => 'STANDARD' and TrEMBL => 'PRELIMINARY', we could just build the ID line based on the namespace() designation, falling back to the true division() or 'UNK'. Chris > Chris, > > Yes, I saw this but was waiting for Daniel's sample. > > division() is not a great way to set this value since it's meant for > taxonomic "divisions" (e.g. "PRI" in Genbank). On the other hand what else > is there? authority() doesn't seem right either. What about: > > $seq->seq_version($DATA_CLASS) > > None of them are ideal but this is the closest, in my opinion. Then > "Swiss-prot" and "TrEMBL" could be set by namespace() or authority(). > > Brian O. > > On 9/6/06 10:59 AM, "Chris Fields" wrote: > > > Brian, > > > > I have found the issue with Bio::SeqIO::swiss; apparently UniProt has > > switched to using the following ID line format: > > > > ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE; SEQUENCE_LENGTH. > > > > For SwissProt ID's > > > > ID CYC_BOVIN STANDARD; PRT; 104 AA. > > ID GIA2_GIALA STANDARD; PRT; 296 AA. > > > > For TrEMBL (preliminary protein): > > > > ID Q5XPV6 PRELIMINARY; PRT; 231 AA. > > > > SeqIO 'swiss' sequence output currently uses the first (SwissProt) > version; > > it's hardcoded in a sprintf() statement. I guess TrEMBL didn't have a > > designation before, so this complicates things a little. > > > > There are a few other (small) formatting differences I have also found > which > > we could update fairly easily. > > > > In the section of the release notes describing differences between > > SwissProt/EMBL format, this is listed: > > > > * EMBL entry ID lines have an additional three-letter taxonomic division > > 'token' inserted between the data class and the molecule type; > > > > I suppose we could use division() to store 'STANDARD' and 'PRELIMINARY' > (or > > 'Swiss-Prot' and 'TrEMBL' if that's nicer). > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Daniel Lang > >> Sent: Wednesday, September 06, 2006 4:12 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences > using > >> Bio::Registry fails > >> > >> Hi Brian, > >> > >> I'm iterating now over all uniprot_trembl sequences and record for > which > >> retrieval fails - Lets see if STANDARDs also fail... > >> > >> How is the second field of the swissprot ID line handled anyway? > Because > >> PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss. > >> > >> On the other side I'm still confused why there's no error or warning > >> when the retrieval fails. Can you give me a hint which modules (besides > >> swiss.pm) to look at? > >> > >> Cheers, > >> Daniel > >> > >> Brian Osborne wrote: > >>> Daniel, > >>> > >>> Well, if you can isolate the bug please add it to bugzilla. > >>> > >>> Brian O. > >>> > >>> > >>> On 9/5/06 5:57 AM, "Daniel Lang" freiburg.de> > >>> wrote: > >>> > >>>> Hi Brian, > >>>> > >>>> sorry for the belated response! > >>>> I've compiled you a set of 100 PRELIMINARY entries from the latest > >>>> uniprot_trembl release. I've tried to reproduce the bug using only > >> these > >>>> as input to build an index, but (sadly) all of them can be retrieved > >>>> using the latest checkout:-( > >>>> Maybe its not connected to these entries after all, but the size or > >> some > >>>> other feature of the uniprot distribution? > >>>> I now could make it work using the 1.5.1 release. > >>>> > >>>> Originally, I've built the index using flat protocol, when I try bdb > >> and > >>>> bioperl-live even more problems occur: > >>>> > >>>> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c > >> uniprot_sprot.dat > >>>> > >>>> ------------- EXCEPTION ------------- > >>>> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, > >>>> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, > >>>> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two > >>>> non-consecutive nodes with the same name. Can't cope! > >>>> STACK Bio::DB::Taxonomy::list::add_lineage > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 > >>>> STACK Bio::DB::Taxonomy::list::new > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 > >>>> STACK Bio::DB::Taxonomy::new > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 > >>>> STACK Bio::Species::classification > >>>> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 > >>>> STACK Bio::SeqIO::swiss::_read_swissprot_Species > >>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 > >>>> STACK Bio::SeqIO::swiss::next_seq > >>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 > >>>> STACK Bio::DB::Flat::parse_one_record > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 > >>>> STACK Bio::DB::Flat::BDB::_index_file > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 > >>>> STACK Bio::DB::Flat::BDB::build_index > >>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 > >>>> STACK toplevel > >>>> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 > >>>> > >>>> But I think this is connected to the new changes to taxonomy handling > >> in > >>>> Bio::Taxon... > >>>> I'm unsure wether to submit this separately, but I could also provide > >> an > >>>> example of such a swissprot entry that causes this error. > >>>> > >>>> Thanks, again. > >>>> > >>>> Daniel > >>>> > >>>> Brian Osborne wrote: > >>>>> Daniel, > >>>>> > >>>>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with > >> such a > >>>>> PRELIMINARY entry? > >>>>> > >>>>> Brian O. > >>>>> > >>>>> > >>>>> On 9/1/06 6:11 AM, "Daniel Lang" >> freiburg.de> > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries > from > >>>>>> local indexed uniprot *.dats, I had to realize that several entries > >>>>>> could not be retrieved despite the fact that they are present in > the > >>>>>> files! A closer look reveals that they are of status PRELIMINARY: > >>>>>> > >>>>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 > AA. > >>>>>> > >>>>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. > >>>>>> I also can't retrieve the sequences from the online database > defined > >> as > >>>>>> follows: > >>>>>> [swissprot_ebi] > >>>>>> protocol=biofetch > >>>>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch > >>>>>> dbname=swall > >>>>>> > >>>>>> Is this a bug or a feature? If its a feature, how can I bypass it? > >>>>>> > >>>>>> Thanks in advance, > >>>>>> Daniel > >>>>> > >>>> > >>>> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Wed Sep 6 14:58:19 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Wed, 6 Sep 2006 14:58:19 -0400 Subject: [Bioperl-l] Write a fasta file with custom title line. In-Reply-To: <000701c6cd26$c59dc3e0$15327e82@pyrimidine> Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08B40@NIHCESMLBX6.nih.gov> I took someone's fine suggestion and wrote out my own fasta file line by line with a most simple perl subroutine -- most simple! Since I was using BioPerl, I thought I should let it do this work, but some things are simpler without it. Thank you all for your answers. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov )) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Thursday, August 31, 2006 1:56 PM To: Staffa, Nick (NIH/NIEHS) [C]; 'bioperl-l' Subject: RE: [Bioperl-l] Write a fasta file with custom title line. Nick, I think you can do it by changing the preferred ID type to 'display ID' for the SeqIO object, then changing the display ID to whatever you want: use Bio::SeqIO; my $seqin = Bio::SeqIO->new(-file => shift, -format => 'genbank',); my $seqout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta'); $seqout->preferred_id_type('display'); my $ct = 1; while (my $seq = $seqin->next_seq) { $seq->display_id('foo'.$ct); $seqout->write_seq($seq); $ct++; } This version appends the sequence description: >foo1 5'UTR in Aspergillus niger icdA mRNA for NADP-dependent isocitrate dehydrogenase precursor, complete cds. TCCGTTCTTCCCTCATTCCTCCCCAGGTTCTTGCTTATTGCAGGAAAGATTATTTCCCAG AGTGAACAGAAACGATTTTCCGGGTTCTCCGATCTGCCCCGTGAAGGTCCCTTACAGCAA CAGCATCTCGTCCAGTCCGGCTTAACGGCAGCTTCCGAACTCCACCACCGCCTCCTTCCA GCCGAGAGCTTCACGGCTTCTTGCTTGTCTTTCCCTCGGACTTTCCCCGGTTCCTCCCAC ACAGCAGGCCCAGTTACTGTCGAGTCTTTGGCAATCCATCCCACACC >foo2 5'UTR in Fission yeast mRNA for cytochrome c oxidase subunit IV, complete cds. GTTATAAATCATCAATAATTGTCTTTTAAG ... I personally think it would be better to add a 'custom' option so you could add whatever you wanted without the description added in. What did you have in mind here? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) [C] > Sent: Thursday, August 31, 2006 11:57 AM > To: bioperl-l > Subject: [Bioperl-l] Write a fasta file with custom title line. > > I would like to construct title lines for the fasta sequences I want to > right to a file. > I don't see in the documentation on-line for SeqIO or write_seq how to > specify this. > Please point the way. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Sep 6 15:52:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Sep 2006 14:52:24 -0500 Subject: [Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails In-Reply-To: <44FE90CE.6000000@biologie.uni-freiburg.de> Message-ID: <000001c6d1ed$f9f4eb70$15327e82@pyrimidine> Daniel, I have committed to CVS a fix for SeqIO::swiss which gives the correct header based on the input sequence. The originating DB, based on STANDARD or PRELIMINARY (Swiss-Prot or TrEMBL, respectively) is parsed into the sequence namespace which can be retrieved via: $seq->namespace() This is used for rebuilding the ID line and date lines. I'll look into the other issue (no warnings on retrieval) soon. Chris > Hi Brian, > > I'm iterating now over all uniprot_trembl sequences and record for which > retrieval fails - Lets see if STANDARDs also fail... > > How is the second field of the swissprot ID line handled anyway? Because > PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss. > > On the other side I'm still confused why there's no error or warning > when the retrieval fails. Can you give me a hint which modules (besides > swiss.pm) to look at? > > Cheers, > Daniel > > Brian Osborne wrote: > > Daniel, > > > > Well, if you can isolate the bug please add it to bugzilla. > > > > Brian O. > > > > > > On 9/5/06 5:57 AM, "Daniel Lang" > > wrote: > > > >> Hi Brian, > >> > >> sorry for the belated response! > >> I've compiled you a set of 100 PRELIMINARY entries from the latest > >> uniprot_trembl release. I've tried to reproduce the bug using only > these > >> as input to build an index, but (sadly) all of them can be retrieved > >> using the latest checkout:-( > >> Maybe its not connected to these entries after all, but the size or > some > >> other feature of the uniprot distribution? > >> I now could make it work using the 1.5.1 release. > >> > >> Originally, I've built the index using flat protocol, when I try bdb > and > >> bioperl-live even more problems occur: > >> > >> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c > uniprot_sprot.dat > >> > >> ------------- EXCEPTION ------------- > >> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, > >> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea, > >> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two > >> non-consecutive nodes with the same name. Can't cope! > >> STACK Bio::DB::Taxonomy::list::add_lineage > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163 > >> STACK Bio::DB::Taxonomy::list::new > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100 > >> STACK Bio::DB::Taxonomy::new > >> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106 > >> STACK Bio::Species::classification > >> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171 > >> STACK Bio::SeqIO::swiss::_read_swissprot_Species > >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049 > >> STACK Bio::SeqIO::swiss::next_seq > >> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240 > >> STACK Bio::DB::Flat::parse_one_record > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333 > >> STACK Bio::DB::Flat::BDB::_index_file > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235 > >> STACK Bio::DB::Flat::BDB::build_index > >> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218 > >> STACK toplevel > >> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113 > >> > >> But I think this is connected to the new changes to taxonomy handling > in > >> Bio::Taxon... > >> I'm unsure wether to submit this separately, but I could also provide > an > >> example of such a swissprot entry that causes this error. > >> > >> Thanks, again. > >> > >> Daniel > >> > >> Brian Osborne wrote: > >>> Daniel, > >>> > >>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with > such a > >>> PRELIMINARY entry? > >>> > >>> Brian O. > >>> > >>> > >>> On 9/1/06 6:11 AM, "Daniel Lang" freiburg.de> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from > >>>> local indexed uniprot *.dats, I had to realize that several entries > >>>> could not be retrieved despite the fact that they are present in the > >>>> files! A closer look reveals that they are of status PRELIMINARY: > >>>> > >>>> uniprot_trembl.dat:ID Q16EZ1_AEDAE PRELIMINARY; PRT; 222 AA. > >>>> > >>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout.. > >>>> I also can't retrieve the sequences from the online database defined > as > >>>> follows: > >>>> [swissprot_ebi] > >>>> protocol=biofetch > >>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch > >>>> dbname=swall > >>>> > >>>> Is this a bug or a feature? If its a feature, how can I bypass it? > >>>> > >>>> Thanks in advance, > >>>> Daniel > >>> > >> > >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Thu Sep 7 01:09:20 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 07 Sep 2006 01:09:20 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # Message-ID: <04e001c6d23b$c7a3c4e0$2f01a8c0@GOLHARMOBILE1> Hi, I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid accession #, XM_547879.2, for instance. I get the message in return: -------------------- WARNING --------------------- MSG: acc (gb|XM_547879.2) does not exist --------------------------------------------------- If I go to NCBI, and enter the accession, the GenBank entry comes up. At first I suspected it was the version number, but removing the version number still causes the same error. Am I doing something wrong? Ryan From DGroskreutz at twt.com Thu Sep 7 02:01:16 2006 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Thu, 7 Sep 2006 01:01:16 -0500 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 09/05/2006 and will not return until 09/11/2006. I will be out of the office until Monday, Sept. 11th. I will reply when I get back into the office. Thanks, Deb From ewijaya at i2r.a-star.edu.sg Thu Sep 7 02:33:17 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Thu, 07 Sep 2006 14:33:17 +0800 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> Dear Expert, Is there any existing Bioperl module that computes background distributions of nucleotides given a set of DNA sequences? Basically it computes: frequency of nucleotide A(denin) / Total number of bases and so forth for T or C or G. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From sdavis2 at mail.nih.gov Thu Sep 7 06:44:43 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 06:44:43 -0400 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> Message-ID: <200609070644.43126.sdavis2@mail.nih.gov> On Thursday 07 September 2006 02:33, Wijaya Edward wrote: > Dear Expert, > > Is there any existing Bioperl module that > computes background distributions of nucleotides > given a set of DNA sequences? > > Basically it computes: > > frequency of nucleotide A(denin) / Total number of bases > > and so forth for T or C or G. This is pretty simple to do with straight perl. Sean #!/usr/bin/perl use strict; my $DNA = "ACCTGGATCCCGCTTTGACA"; my %base_hash; map {$base_hash{$_}++} split("",$DNA); print "Length of DNA: ",length($DNA),"\n"; foreach my $base (keys %base_hash) { print join("\t",$base,$base_hash{$base}, $base_hash{$base}/length($DNA))."\n"; } From sdavis2 at mail.nih.gov Thu Sep 7 06:48:05 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 06:48:05 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04e001c6d23b$c7a3c4e0$2f01a8c0@GOLHARMOBILE1> References: <04e001c6d23b$c7a3c4e0$2f01a8c0@GOLHARMOBILE1> Message-ID: <200609070648.05374.sdavis2@mail.nih.gov> On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > Hi, > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > accession #, XM_547879.2, for instance. > > I get the message in return: > > -------------------- WARNING --------------------- > MSG: acc (gb|XM_547879.2) does not exist > --------------------------------------------------- > > If I go to NCBI, and enter the accession, the GenBank entry comes up. > At first I suspected it was the version number, but removing the version > number still causes the same error. > > Am I doing something wrong? from the Docs for Bio::DB::Genbank: $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version $seq = $gb->get_Seq_by_gi('405830'); # GI Number So, you might try using get_Seq_by_version(....). I didn't test it, but give that a shot. Sean From sdavis2 at mail.nih.gov Thu Sep 7 06:48:05 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 06:48:05 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04e001c6d23b$c7a3c4e0$2f01a8c0@GOLHARMOBILE1> References: <04e001c6d23b$c7a3c4e0$2f01a8c0@GOLHARMOBILE1> Message-ID: <200609070648.05374.sdavis2@mail.nih.gov> On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > Hi, > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > accession #, XM_547879.2, for instance. > > I get the message in return: > > -------------------- WARNING --------------------- > MSG: acc (gb|XM_547879.2) does not exist > --------------------------------------------------- > > If I go to NCBI, and enter the accession, the GenBank entry comes up. > At first I suspected it was the version number, but removing the version > number still causes the same error. > > Am I doing something wrong? from the Docs for Bio::DB::Genbank: $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version $seq = $gb->get_Seq_by_gi('405830'); # GI Number So, you might try using get_Seq_by_version(....). I didn't test it, but give that a shot. Sean From sdavis2 at mail.nih.gov Thu Sep 7 06:44:43 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 06:44:43 -0400 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> Message-ID: <200609070644.43126.sdavis2@mail.nih.gov> On Thursday 07 September 2006 02:33, Wijaya Edward wrote: > Dear Expert, > > Is there any existing Bioperl module that > computes background distributions of nucleotides > given a set of DNA sequences? > > Basically it computes: > > frequency of nucleotide A(denin) / Total number of bases > > and so forth for T or C or G. This is pretty simple to do with straight perl. Sean #!/usr/bin/perl use strict; my $DNA = "ACCTGGATCCCGCTTTGACA"; my %base_hash; map {$base_hash{$_}++} split("",$DNA); print "Length of DNA: ",length($DNA),"\n"; foreach my $base (keys %base_hash) { print join("\t",$base,$base_hash{$base}, $base_hash{$base}/length($DNA))."\n"; } From khoueiry at ibdm.univ-mrs.fr Thu Sep 7 06:46:25 2006 From: khoueiry at ibdm.univ-mrs.fr (pierre) Date: Thu, 07 Sep 2006 12:46:25 +0200 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> Message-ID: <1157625985.8570.7.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060907/18531010/attachment.pl From cjfields at uiuc.edu Thu Sep 7 09:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 08:07:45 -0500 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions In-Reply-To: <200609070644.43126.sdavis2@mail.nih.gov> References: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> <200609070644.43126.sdavis2@mail.nih.gov> Message-ID: Sean et al, Using t/// is also a very fast way to count bases. This is used in various SeqIO modules for speeding things up. TMTOWTDI! my $a = $dna =~ tr/a/a/; my $c = $dna =~ tr/c/c/; my $g = $dna =~ tr/g/g/; my $t = $dna =~ tr/t/t/; my $len = length($dna) ....blah blah blah Chris On Sep 7, 2006, at 5:44 AM, Sean Davis wrote: > On Thursday 07 September 2006 02:33, Wijaya Edward wrote: >> Dear Expert, >> >> Is there any existing Bioperl module that >> computes background distributions of nucleotides >> given a set of DNA sequences? >> >> Basically it computes: >> >> frequency of nucleotide A(denin) / Total number of bases >> >> and so forth for T or C or G. > > This is pretty simple to do with straight perl. > > Sean > > > #!/usr/bin/perl > use strict; > > my $DNA = "ACCTGGATCCCGCTTTGACA"; > > my %base_hash; > > map {$base_hash{$_}++} split("",$DNA); > > print "Length of DNA: ",length($DNA),"\n"; > foreach my $base (keys %base_hash) { > print join("\t",$base,$base_hash{$base}, > $base_hash{$base}/length($DNA))."\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Sep 7 09:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 08:07:45 -0500 Subject: [Bioperl-l] Bioperl Module for Computing Background Distributions In-Reply-To: <200609070644.43126.sdavis2@mail.nih.gov> References: <3ACF03E372996C4EACD542EA8A05E66A06152C@mailbe01.teak.local.net> <200609070644.43126.sdavis2@mail.nih.gov> Message-ID: Sean et al, Using t/// is also a very fast way to count bases. This is used in various SeqIO modules for speeding things up. TMTOWTDI! my $a = $dna =~ tr/a/a/; my $c = $dna =~ tr/c/c/; my $g = $dna =~ tr/g/g/; my $t = $dna =~ tr/t/t/; my $len = length($dna) ....blah blah blah Chris On Sep 7, 2006, at 5:44 AM, Sean Davis wrote: > On Thursday 07 September 2006 02:33, Wijaya Edward wrote: >> Dear Expert, >> >> Is there any existing Bioperl module that >> computes background distributions of nucleotides >> given a set of DNA sequences? >> >> Basically it computes: >> >> frequency of nucleotide A(denin) / Total number of bases >> >> and so forth for T or C or G. > > This is pretty simple to do with straight perl. > > Sean > > > #!/usr/bin/perl > use strict; > > my $DNA = "ACCTGGATCCCGCTTTGACA"; > > my %base_hash; > > map {$base_hash{$_}++} split("",$DNA); > > print "Length of DNA: ",length($DNA),"\n"; > foreach my $base (keys %base_hash) { > print join("\t",$base,$base_hash{$base}, > $base_hash{$base}/length($DNA))."\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Sep 7 11:04:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 10:04:45 -0500 Subject: [Bioperl-l] Bio::Location::Fuzzy CoordinatePolicy questions Message-ID: <000301c6d28e$f4cde100$15327e82@pyrimidine> Hilmar (or whomever can answer this), I was looking at a few bug fixes (bug 992 in particular) and noticed that, although LocationI-implementing objects are supposed to use a CoordinatePolicy for determining start/end coordinates for fuzzy locations, Bio::Location::Fuzzy::to_FTstring() does not (it uses max/min_start() and max/min_end() instead). To me, it seems that this should be building the location string using the coordinate_policy->start()/end() methods instead (as suggested in the bug report). The default CoordinatePolicy for Location::Fuzzy is Bio::Location:: WidestCoordPolicy. Would there be any objection to changing this? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Thu Sep 7 10:32:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 07 Sep 2006 10:32:46 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <200609070648.05374.sdavis2@mail.nih.gov> Message-ID: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > Hi, > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > accession #, XM_547879.2, for instance. > > > > I get the message in return: > > > > -------------------- WARNING --------------------- > > MSG: acc (gb|XM_547879.2) does not exist > > --------------------------------------------------- > > > > If I go to NCBI, and enter the accession, the GenBank entry > comes up. > > At first I suspected it was the version number, but removing the > > version number still causes the same error. > > > > Am I doing something wrong? > > from the Docs for Bio::DB::Genbank: > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > So, you might try using get_Seq_by_version(....). I didn't > test it, but give > that a shot. get_Seq_by_version() worked. That does not explain why get_Seq_by_acc does not work with the primary part of the accession #. From sdavis2 at mail.nih.gov Thu Sep 7 11:48:39 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 11:48:39 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> References: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> Message-ID: <200609071148.39823.sdavis2@mail.nih.gov> On Thursday 07 September 2006 10:32, Ryan Golhar wrote: > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > > Hi, > > > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > > accession #, XM_547879.2, for instance. > > > > > > I get the message in return: > > > > > > -------------------- WARNING --------------------- > > > MSG: acc (gb|XM_547879.2) does not exist > > > --------------------------------------------------- > > > > > > If I go to NCBI, and enter the accession, the GenBank entry > > > > comes up. > > > > > At first I suspected it was the version number, but removing the > > > version number still causes the same error. > > > > > > Am I doing something wrong? > > > > from the Docs for Bio::DB::Genbank: > > > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > > $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version > > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > > > So, you might try using get_Seq_by_version(....). I didn't > > test it, but give > > that a shot. > > get_Seq_by_version() worked. > > That does not explain why get_Seq_by_acc does not work with the primary > part of the accession #. As an example of why this shouldn't work, doing a search in entrez (online version) will bring up the newest version of an accession if the version is not included. If one specifies the version, though, one gets that version, even if it is not the newest. So, asking get_Seq_by_acc() with a version and ignoring the version would potentially get you the wrong version for the accession. If you know that you want the most recent version, just strip the version information and use get_Seq_by_acc(). Sean From sdavis2 at mail.nih.gov Thu Sep 7 11:48:39 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 11:48:39 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> References: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> Message-ID: <200609071148.39823.sdavis2@mail.nih.gov> On Thursday 07 September 2006 10:32, Ryan Golhar wrote: > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > > Hi, > > > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > > accession #, XM_547879.2, for instance. > > > > > > I get the message in return: > > > > > > -------------------- WARNING --------------------- > > > MSG: acc (gb|XM_547879.2) does not exist > > > --------------------------------------------------- > > > > > > If I go to NCBI, and enter the accession, the GenBank entry > > > > comes up. > > > > > At first I suspected it was the version number, but removing the > > > version number still causes the same error. > > > > > > Am I doing something wrong? > > > > from the Docs for Bio::DB::Genbank: > > > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > > $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version > > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > > > So, you might try using get_Seq_by_version(....). I didn't > > test it, but give > > that a shot. > > get_Seq_by_version() worked. > > That does not explain why get_Seq_by_acc does not work with the primary > part of the accession #. As an example of why this shouldn't work, doing a search in entrez (online version) will bring up the newest version of an accession if the version is not included. If one specifies the version, though, one gets that version, even if it is not the newest. So, asking get_Seq_by_acc() with a version and ignoring the version would potentially get you the wrong version for the accession. If you know that you want the most recent version, just strip the version information and use get_Seq_by_acc(). Sean From golharam at umdnj.edu Thu Sep 7 10:32:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 07 Sep 2006 10:32:46 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <200609070648.05374.sdavis2@mail.nih.gov> Message-ID: <04e501c6d28a$7cf245d0$2f01a8c0@GOLHARMOBILE1> > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > Hi, > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > accession #, XM_547879.2, for instance. > > > > I get the message in return: > > > > -------------------- WARNING --------------------- > > MSG: acc (gb|XM_547879.2) does not exist > > --------------------------------------------------- > > > > If I go to NCBI, and enter the accession, the GenBank entry > comes up. > > At first I suspected it was the version number, but removing the > > version number still causes the same error. > > > > Am I doing something wrong? > > from the Docs for Bio::DB::Genbank: > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > So, you might try using get_Seq_by_version(....). I didn't > test it, but give > that a shot. get_Seq_by_version() worked. That does not explain why get_Seq_by_acc does not work with the primary part of the accession #. From cjfields at uiuc.edu Thu Sep 7 12:33:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 11:33:39 -0500 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <200609071148.39823.sdavis2@mail.nih.gov> Message-ID: <000101c6d29b$5fb72600$15327e82@pyrimidine> ... > > > > get_Seq_by_version() worked. > > > > That does not explain why get_Seq_by_acc does not work with the primary > > part of the accession #. > > As an example of why this shouldn't work, doing a search in entrez (online > version) will bring up the newest version of an accession if the version > is > not included. If one specifies the version, though, one gets that > version, > even if it is not the newest. So, asking get_Seq_by_acc() with a version > and > ignoring the version would potentially get you the wrong version for the > accession. > > If you know that you want the most recent version, just strip the version > information and use get_Seq_by_acc(). As an aside, if you want only one unique sequence, such as through get_Seq_by* methods, you should consider using the GI. NCBI recommends retrieving sequence data using the GI or accession.version and not the accession only. Using the accession works 99% of the time but I have seen a few instances when retrieving sequence using the accession only gets the wrong sequence via Bio::DB::GenBank/GenPept get_Seq_by_acc(), sometimes getting even mixed sequences: http://article.gmane.org/gmane.comp.lang.perl.bio.general/11560/ Part of the reason is a quirk with some sequences that are returned via EUtilities via NCBI, which may be due to misclassification in the database. Another is the accession isn't considered unique by NCBI as they may not assign it (it may come from another database, for instance), so there may be more than one sequence returned. get_Seq* methods only return one sequence from the stream, that being the first one (they only expect one anyway). If the first one in the sequence stream is the wrong one... Chris From golharam at umdnj.edu Thu Sep 7 13:16:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 07 Sep 2006 13:16:46 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <200609071148.39823.sdavis2@mail.nih.gov> Message-ID: <04f101c6d2a1$659cf2b0$2f01a8c0@GOLHARMOBILE1> > -----Original Message----- > From: Sean Davis [mailto:sdavis2 at mail.nih.gov] > Sent: Thursday, September 07, 2006 11:49 AM > To: golharam at umdnj.edu > Cc: bioperl-l at lists.open-bio.org; 'bioperl-l' > Subject: Re: [Bioperl-l] Problem retrieving CDS by Acession # > > > On Thursday 07 September 2006 10:32, Ryan Golhar wrote: > > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > > > Hi, > > > > > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > > > accession #, XM_547879.2, for instance. > > > > > > > > I get the message in return: > > > > > > > > -------------------- WARNING --------------------- > > > > MSG: acc (gb|XM_547879.2) does not exist > > > > --------------------------------------------------- > > > > > > > > If I go to NCBI, and enter the accession, the GenBank entry > > > > > > comes up. > > > > > > > At first I suspected it was the version number, but > removing the > > > > version number still causes the same error. > > > > > > > > Am I doing something wrong? > > > > > > from the Docs for Bio::DB::Genbank: > > > > > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > > > $seq = $gb->get_Seq_by_version('J00522.1'); # > Accession.version > > > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > > > > > So, you might try using get_Seq_by_version(....). I > didn't test it, > > > but give that a shot. > > > > get_Seq_by_version() worked. > > > > That does not explain why get_Seq_by_acc does not work with the > > primary part of the accession #. > > As an example of why this shouldn't work, doing a search in > entrez (online > version) will bring up the newest version of an accession if > the version is > not included. If one specifies the version, though, one gets > that version, > even if it is not the newest. So, asking get_Seq_by_acc() > with a version and > ignoring the version would potentially get you the wrong > version for the > accession. > > If you know that you want the most recent version, just strip > the version > information and use get_Seq_by_acc(). > > Sean > Sorry, maybe I'm not being clear. Suppose I only had the accession #, XM_547879. If I call get_Seq_by_acc('XM_547879'), it gives the warning above. That shouldn't be because I'm giving a valid accession number. I suspect something is wrong in the parsing of whatever NCBI is returning. From jay at jays.net Thu Sep 7 13:37:24 2006 From: jay at jays.net (Jay Hannah) Date: Thu, 07 Sep 2006 12:37:24 -0500 Subject: [Bioperl-l] t/SeqIO.t -- improvements? Message-ID: <450058D4.5070702@jays.net> So if I understand t/SeqIO.t correctly it reads (for example) t/data/test.fasta and writes t/data/fasta.out and if nothing explodes the test passes. ... Wouldn't it be better if: (1) fasta.out was generated to contain *all* sequences, not just the first. (2) a test was added to verify that fasta.out exactly matches test.fasta (diff is blank). Or have I misunderstood something? Are patches welcome? Shall I submit the patch to this mailing list or elsewhere? Should I worry about Windows compatibility? Thanks, j From MEC at stowers-institute.org Thu Sep 7 14:16:18 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 7 Sep 2006 13:16:18 -0500 Subject: [Bioperl-l] t/SeqIO.t -- improvements? Message-ID: Jey, et al, I like this idea and have wondered the same in the past. I have sometimes done exactly what Jay is suggesting: I code a script, validate its output independently, and then save the 'validated' output as reference for future tests. For instance, I wrote a script TFBSscan (basically a wrapper around TFBS modules) whose test.t looks like this: use Test::More tests => 6; my $diff; #./TFBSscan --notag_db --DBmatrix JASPAR=PFM:ID:MA0026,MA0100,MA0002 ./t/testseqs.fasta > ./t/t1.out & # I manaully validated ./t/t1.out, so, now lets it the basis for a test: $diff = `TFBSscan --notag_db --DBmatrix JASPAR=PFM:ID:MA0026,MA0100,MA0002 ./t/testseqs.fasta | diff ./t/t1.out - `; ok (! $diff, 'JASPAR') or diag("here's the unexpected difference :\n$diff"); etc, etc... I figured that there are some useful abstractions one could put on top of this, perhaps in another Test:: module, but I've not seen it. I'm curious in general as to others' approaches to this both in BioPerl project and elsewhere. Cheers, Malcolm >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jay Hannah >Sent: Thursday, September 07, 2006 12:37 PM >To: bioperl-l >Subject: [Bioperl-l] t/SeqIO.t -- improvements? > >So if I understand t/SeqIO.t correctly it reads (for example) > > t/data/test.fasta > >and writes > > t/data/fasta.out > >and if nothing explodes the test passes. > >... > >Wouldn't it be better if: >(1) fasta.out was generated to contain *all* sequences, not >just the first. >(2) a test was added to verify that fasta.out exactly matches >test.fasta (diff is blank). > >Or have I misunderstood something? > >Are patches welcome? Shall I submit the patch to this mailing >list or elsewhere? Should I worry about Windows compatibility? > >Thanks, > >j > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Thu Sep 7 14:18:45 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 07 Sep 2006 14:18:45 -0400 Subject: [Bioperl-l] Change in BioPerl module In-Reply-To: <44FF0921.8030702@ub.edu> Message-ID: Filipe, It looks like this is fixed in bioperl-live: sub gap_line { my ($self,$gapchar) = @_; $gapchar = $gapchar || $self->gap_char; my %gap_hsh; # column gaps vector foreach my $seq ( $self->each_seq ) { my $i = 0; map {$gap_hsh{$_->[0]} = undef} grep {$_->[1] eq $gapchar} map {[$i++, $_]} split(//, uc ($seq->seq)); } my $gap_line; foreach my $pos ( 0..$self->length-1 ) { $gap_line .= (exists $gap_hsh{$pos}) ? $gapchar:'.'; } return $gap_line; } Although it may not seem so based on version numbers 1.4 is quite old, it was released more than 2.5 years ago. I'd recommend using bioperl-live or version 1.5.1. Brian O. On 9/6/06 1:45 PM, "Filipe Garrett" wrote: > > Hi all, > > I'm using Bioperl v1.4 and at the "SimpleAlign.pm" module I think that a > minor change could be made to the "gap_line" function in order to use > the gap character (defined in $self->gap_char) and not '-'. > > change line 1228 to => my $gap= ($refchar eq $self->gap_char); > > change line 1230 to => $gap= 1 if( $seq->[$pos] eq $self->gap_char); > > > What do you think? > Thanks, > FG > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu Sep 7 14:28:16 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Sep 2006 14:28:16 -0400 Subject: [Bioperl-l] Bio::Location::Fuzzy CoordinatePolicy questions In-Reply-To: <000301c6d28e$f4cde100$15327e82@pyrimidine> References: <000301c6d28e$f4cde100$15327e82@pyrimidine> Message-ID: <5A57B042-C6A6-4CD3-B175-851BA3B7635C@gmx.net> I guess there's arguments either way, depending on what you interpret the contract to to_FTstring() to be. If you argue that to_FTstring() needs to return a GenBank feature table-compliant string then you cannot leave it to the CoordinatePolicy to decide on the end points that satisfy compliance. Conversely, if you argue that to_FTstring() may return a string which is GenBank feature table-compliant only in terms of formatting, then there is freedom of interpretation as to what the end points could be. I'm leaning to the former, as there isn't a universal standard for feature location formatting. Or if there is, then this method doesn't implement it. It really is there to return a GenBank-compliant string. Or so I think - other opinions welcome, and are likely to exist. -hilmar On Sep 7, 2006, at 11:04 AM, Chris Fields wrote: > Hilmar (or whomever can answer this), > > I was looking at a few bug fixes (bug 992 in particular) and > noticed that, > although LocationI-implementing objects are supposed to use a > CoordinatePolicy for determining start/end coordinates for fuzzy > locations, > Bio::Location::Fuzzy::to_FTstring() does not (it uses max/min_start > () and > max/min_end() instead). To me, it seems that this should be > building the > location string using the coordinate_policy->start()/end() methods > instead > (as suggested in the bug report). > > The default CoordinatePolicy for Location::Fuzzy is Bio::Location:: > WidestCoordPolicy. > > Would there be any objection to changing this? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Sep 7 14:56:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Sep 2006 14:56:21 -0400 Subject: [Bioperl-l] t/SeqIO.t -- improvements? In-Reply-To: <450058D4.5070702@jays.net> References: <450058D4.5070702@jays.net> Message-ID: <4FAE924D-45A6-4943-B54D-4B969809E2FE@gmx.net> On Sep 7, 2006, at 1:37 PM, Jay Hannah wrote: > Wouldn't it be better if: > (1) fasta.out was generated to contain *all* sequences, not just > the first. Possibly. > (2) a test was added to verify that fasta.out exactly matches > test.fasta (diff is blank). No. The goal is not exact reproduction (you'd use cp for that) but writing out a file that is valid FASTA format and contains the same information as the input file. You could (and should) still do a semantic diff; just that's a lot harder to implement. There have been threads previously about comparing objects for equality (in the notion that two files which are semantically equal should result in 'equal' objects once read into the object model). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Sep 7 15:04:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 14:04:50 -0500 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04f101c6d2a1$659cf2b0$2f01a8c0@GOLHARMOBILE1> Message-ID: <000301c6d2b0$819042b0$15327e82@pyrimidine> ... > > If you know that you want the most recent version, just strip > > the version > > information and use get_Seq_by_acc(). > > > > Sean > > > > > Sorry, maybe I'm not being clear. Suppose I only had the accession #, > XM_547879. If I call get_Seq_by_acc('XM_547879'), it gives the warning > above. That shouldn't be because I'm giving a valid accession number. > I suspect something is wrong in the parsing of whatever NCBI is > returning. You might try updating your local Bioperl; I can retrieve it using bioperl-live: --------------------- use Bio::DB::GenBank; use Bio::SeqIO; my $fac = Bio::DB::GenBank->new(); my $seq = $fac->get_Seq_by_acc('XM_547879'); my $seqout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank'); $seqout->write_seq($seq); --------------------- Gets this sequence (truncated for brevity): --------------------- LOCUS XM_547879 1056 bp mRNA linear MAM 30-AUG-2005 DEFINITION PREDICTED: Canis familiaris similar to Synaptojanin 2 binding protein (Mitochondrial outer membrane protein 25) (LOC490757), mRNA. ACCESSION XM_547879 VERSION XM_547879.2 GI:73964245 KEYWORDS . SOURCE Canis familiaris (dog). ORGANISM Canis familiaris Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Carnivora; Caniformia; Canidae; Canis. .... --------------------- Chris From sdavis2 at mail.nih.gov Thu Sep 7 15:11:51 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 7 Sep 2006 15:11:51 -0400 Subject: [Bioperl-l] Problem retrieving CDS by Acession # In-Reply-To: <04f101c6d2a1$659cf2b0$2f01a8c0@GOLHARMOBILE1> References: <04f101c6d2a1$659cf2b0$2f01a8c0@GOLHARMOBILE1> Message-ID: <200609071511.51430.sdavis2@mail.nih.gov> On Thursday 07 September 2006 13:16, Ryan Golhar wrote: > > -----Original Message----- > > From: Sean Davis [mailto:sdavis2 at mail.nih.gov] > > Sent: Thursday, September 07, 2006 11:49 AM > > To: golharam at umdnj.edu > > Cc: bioperl-l at lists.open-bio.org; 'bioperl-l' > > Subject: Re: [Bioperl-l] Problem retrieving CDS by Acession # > > > > On Thursday 07 September 2006 10:32, Ryan Golhar wrote: > > > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote: > > > > > Hi, > > > > > > > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid > > > > > accession #, XM_547879.2, for instance. > > > > > > > > > > I get the message in return: > > > > > > > > > > -------------------- WARNING --------------------- > > > > > MSG: acc (gb|XM_547879.2) does not exist > > > > > --------------------------------------------------- > > > > > > > > > > If I go to NCBI, and enter the accession, the GenBank entry > > > > > > > > comes up. > > > > > > > > > At first I suspected it was the version number, but > > > > removing the > > > > > > > version number still causes the same error. > > > > > > > > > > Am I doing something wrong? > > > > > > > > from the Docs for Bio::DB::Genbank: > > > > > > > > $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number > > > > $seq = $gb->get_Seq_by_version('J00522.1'); # > > > > Accession.version > > > > > > $seq = $gb->get_Seq_by_gi('405830'); # GI Number > > > > > > > > So, you might try using get_Seq_by_version(....). I > > > > didn't test it, > > > > > > but give that a shot. > > > > > > get_Seq_by_version() worked. > > > > > > That does not explain why get_Seq_by_acc does not work with the > > > primary part of the accession #. > > > > As an example of why this shouldn't work, doing a search in > > entrez (online > > version) will bring up the newest version of an accession if > > the version is > > not included. If one specifies the version, though, one gets > > that version, > > even if it is not the newest. So, asking get_Seq_by_acc() > > with a version and > > ignoring the version would potentially get you the wrong > > version for the > > accession. > > > > If you know that you want the most recent version, just strip > > the version > > information and use get_Seq_by_acc(). > > > > Sean > > Sorry, maybe I'm not being clear. Suppose I only had the accession #, > XM_547879. If I call get_Seq_by_acc('XM_547879'), it gives the warning > above. That shouldn't be because I'm giving a valid accession number. > I suspect something is wrong in the parsing of whatever NCBI is > returning. I'm not sure if it makes a difference, but an XM_..... is a RefSeq accession, not Genbank. Does using Bio::DB::RefSeq do the trick? Perhaps someone else can verify one way or the other that the refseq (ref) division is treated differently than the genbank (gb) division. Note the error message that you got back that has (gb|XM_547879.2) a gb in it, not a 'ref'. Again, I didn't test this, so take it with a grain of salt. Sean From bix at sendu.me.uk Thu Sep 7 16:08:53 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Sep 2006 21:08:53 +0100 Subject: [Bioperl-l] t/SeqIO.t -- improvements? In-Reply-To: <4FAE924D-45A6-4943-B54D-4B969809E2FE@gmx.net> References: <450058D4.5070702@jays.net> <4FAE924D-45A6-4943-B54D-4B969809E2FE@gmx.net> Message-ID: <45007C55.3050104@sendu.me.uk> Hilmar Lapp wrote: > On Sep 7, 2006, at 1:37 PM, Jay Hannah wrote: > >> Wouldn't it be better if: >> (1) fasta.out was generated to contain *all* sequences, not just >> the first. > > Possibly. > >> (2) a test was added to verify that fasta.out exactly matches >> test.fasta (diff is blank). > > No. The goal is not exact reproduction (you'd use cp for that) but > writing out a file that is valid FASTA format and contains the same > information as the input file. Round-trip tests would be extremely valuable and would be /very/ much appreciated. The lack of any have left some large bugs (eg. in taxonomy parsing) completely unnoticed/unfixed for years. Don't just do a simple diff on the output file since differences may not indicate errors (Hilmar's point). Instead read the output file in again and make sure the resulting object (and any any objects they contain) contains all the same information as the object generated when reading the original input file. Ideally the output file would also be checked independently of the Bioperl parser being tested, but that may be only possible in a limited way (otherwise you'd end up writing a whole new parser...). But eg. if a file format specifies that there is a maximum line width, at least check that the output file has no lines longer than that. (Again, a real problem, and you'll almost certainly discover some bugs related to this if you write the tests.) So if you have the time, please add tests and attach them to specific new bug reports if your tests reveal bugs (http://bugzilla.open-bio.org/), or just email your patch(s) direct to me. Cheers, Sendu. From golharam at umdnj.edu Thu Sep 7 15:54:50 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 07 Sep 2006 15:54:50 -0400 Subject: [Bioperl-l] Cvs update tests fail Message-ID: <04fb01c6d2b7$7a9c9c90$2f01a8c0@GOLHARMOBILE1> So I've updated my installation of bioperl from cvs using 'cvs update'. I reran 'perl Makefile.PL' followed by 'make; make test'. I see the following errors from 'make test': t/BioDBSeqFeature............Can't locate Bio/DB/SeqFeature/Store.pm in @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib /users/golharam/bioperl/bioperl-live/blib/arch /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at t/BioDBSeqFeature.t line 30. BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. t/BioDBSeqFeature............dubious Test returned status 2 (wstat 512, 0x200) Scalar found where operator expected at (eval 152) line 1, near "'int' $__val" (Missing operator before $__val?) DIED. FAILED tests 1-43 Failed 43/43 tests, 0.00% okay t/BioDBSeqFeature_BDB........Can't locate Bio/DB/SeqFeature/Store.pm in @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib /users/golharam/bioperl/bioperl-live/blib/arch /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at t/BioDBSeqFeature.t line 30. BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. t/BioDBSeqFeature_BDB........FAILED tests 1-43 Failed 43/43 tests, 0.00% okay Any ideas? From cjfields at uiuc.edu Thu Sep 7 16:27:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 15:27:52 -0500 Subject: [Bioperl-l] t/SeqIO.t -- improvements? In-Reply-To: <4FAE924D-45A6-4943-B54D-4B969809E2FE@gmx.net> Message-ID: <000701c6d2bc$187f0520$15327e82@pyrimidine> > > Wouldn't it be better if: > > (1) fasta.out was generated to contain *all* sequences, not just > > the first. > > Possibly. My view on that was to demonstrate that you get separate PrimarySeqI objects from a SeqIO stream and you can write one or the other. It would be fairly straightforward to send both sequence for each format to the file. > > (2) a test was added to verify that fasta.out exactly matches > > test.fasta (diff is blank). > > No. The goal is not exact reproduction (you'd use cp for that) but > writing out a file that is valid FASTA format and contains the same > information as the input file. > > You could (and should) still do a semantic diff; just that's a lot > harder to implement. There have been threads previously about > comparing objects for equality (in the notion that two files which > are semantically equal should result in 'equal' objects once read > into the object model). > > -hilmar For diffs, you might be able to try a line-for-line 'diff' using Test::More and like() (yes, I know I'm beating that drum to death). I remember seeing some direct file comparisons somewhere in the test suite, genbank.t maybe, in relation to Bio::Species output. I think there was a little cheating though as the input test data looks like it was already passed through SeqIO! Chris From cjfields at uiuc.edu Thu Sep 7 16:28:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 15:28:38 -0500 Subject: [Bioperl-l] Bio::Location::Fuzzy CoordinatePolicy questions In-Reply-To: <5A57B042-C6A6-4CD3-B175-851BA3B7635C@gmx.net> Message-ID: <000801c6d2bc$33b803a0$15327e82@pyrimidine> > I guess there's arguments either way, depending on what you interpret > the contract to to_FTstring() to be. > > If you argue that to_FTstring() needs to return a GenBank feature > table-compliant string then you cannot leave it to the > CoordinatePolicy to decide on the end points that satisfy compliance. > > Conversely, if you argue that to_FTstring() may return a string which > is GenBank feature table-compliant only in terms of formatting, then > there is freedom of interpretation as to what the end points could be. > > I'm leaning to the former, as there isn't a universal standard for > feature location formatting. Or if there is, then this method doesn't > implement it. It really is there to return a GenBank-compliant > string. Or so I think - other opinions welcome, and are likely to exist. > > -hilmar The default coordinate policy for Bio::Location::Atomic is WidestCoordPolicy, which is GenBank-compliant (i.e. it passes tests when set explicitly). That should work by default as Location::Fuzzy is-a Location::Atomic. It would then be left up to the user to change it if they want something non-compliant (NarrowestCoordPolicy or AvWithinCoordPolicy or whatever). Anyway, I managed to fix the bug w/o having to manipulate the coordinate policy. It passes all tests so I'll go ahead and commit it. Chris From cjfields at uiuc.edu Thu Sep 7 16:35:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 15:35:49 -0500 Subject: [Bioperl-l] Cvs update tests fail In-Reply-To: <04fb01c6d2b7$7a9c9c90$2f01a8c0@GOLHARMOBILE1> Message-ID: <000901c6d2bd$34c119c0$15327e82@pyrimidine> Make sure you run 'make clean; perl Makefile.PL; make;' before you run 'make test' or 'make install'. If that doesn't fix it, what I do (when I run into issues like this) is delete the old bioperl-live directory and checkout a fresh copy, then run 'perl Makefile.PL; make; make test;' from that copy. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Thursday, September 07, 2006 2:55 PM > To: 'bioperl-l' > Subject: [Bioperl-l] Cvs update tests fail > > So I've updated my installation of bioperl from cvs using 'cvs update'. > I reran 'perl Makefile.PL' followed by 'make; make test'. > > I see the following errors from 'make test': > > t/BioDBSeqFeature............Can't locate Bio/DB/SeqFeature/Store.pm in > @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. > ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib > /users/golharam/bioperl/bioperl-live/blib/arch > /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at > t/BioDBSeqFeature.t line 30. > BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. > t/BioDBSeqFeature............dubious > > Test returned status 2 (wstat 512, 0x200) > Scalar found where operator expected at (eval 152) line 1, near "'int' > $__val" > (Missing operator before $__val?) > DIED. FAILED tests 1-43 > Failed 43/43 tests, 0.00% okay > t/BioDBSeqFeature_BDB........Can't locate Bio/DB/SeqFeature/Store.pm in > @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. > ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib > /users/golharam/bioperl/bioperl-live/blib/arch > /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at > t/BioDBSeqFeature.t line 30. > BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. > t/BioDBSeqFeature_BDB........FAILED tests 1-43 > > Failed 43/43 tests, 0.00% okay > > Any ideas? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Thu Sep 7 16:34:01 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 7 Sep 2006 16:34:01 -0400 Subject: [Bioperl-l] Cvs update tests fail In-Reply-To: <04fb01c6d2b7$7a9c9c90$2f01a8c0@GOLHARMOBILE1> References: <04fb01c6d2b7$7a9c9c90$2f01a8c0@GOLHARMOBILE1> Message-ID: <6dce9a0b0609071334g1d981884jf4d4a23f42763da4@mail.gmail.com> When you run "cvs update" be sure to use the -d option in order to create new directories. You probably don't have a Bio/DB/SeqFeature directory at all! cvs update -d Lincoln On 9/7/06, Ryan Golhar wrote: > > So I've updated my installation of bioperl from cvs using 'cvs update'. > I reran 'perl Makefile.PL' followed by 'make; make test'. > > I see the following errors from 'make test': > > t/BioDBSeqFeature............Can't locate Bio/DB/SeqFeature/Store.pm in > @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. > ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib > /users/golharam/bioperl/bioperl-live/blib/arch > /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at > t/BioDBSeqFeature.t line 30. > BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. > t/BioDBSeqFeature............dubious > > Test returned status 2 (wstat 512, 0x200) > Scalar found where operator expected at (eval 152) line 1, near "'int' > $__val" > (Missing operator before $__val?) > DIED. FAILED tests 1-43 > Failed 43/43 tests, 0.00% okay > t/BioDBSeqFeature_BDB........Can't locate Bio/DB/SeqFeature/Store.pm in > @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. > ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib > /users/golharam/bioperl/bioperl-live/blib/arch > /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at > t/BioDBSeqFeature.t line 30. > BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. > t/BioDBSeqFeature_BDB........FAILED tests 1-43 > > Failed 43/43 tests, 0.00% okay > > Any ideas? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Sep 7 17:17:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 16:17:19 -0500 Subject: [Bioperl-l] t/SeqIO.t -- improvements? In-Reply-To: <45007C55.3050104@sendu.me.uk> Message-ID: <000201c6d2c3$011fe140$15327e82@pyrimidine> ... > Round-trip tests would be extremely valuable and would be /very/ much > appreciated. The lack of any have left some large bugs (eg. in taxonomy > parsing) completely unnoticed/unfixed for years. > > Don't just do a simple diff on the output file since differences may not > indicate errors (Hilmar's point). Instead read the output file in again > and make sure the resulting object (and any any objects they contain) > contains all the same information as the object generated when reading > the original input file. That should be fairly simple to do, but I believe some round-trip tests are already performed in embl.t, swiss.t, and genbank.t, so maybe those test suites are better places for them. Anyway, the more tests there are, and the more strenuous the tests, the better. > Ideally the output file would also be checked independently of the > Bioperl parser being tested, but that may be only possible in a limited > way (otherwise you'd end up writing a whole new parser...). But eg. if a > file format specifies that there is a maximum line width, at least check > that the output file has no lines longer than that. (Again, a real > problem, and you'll almost certainly discover some bugs related to this > if you write the tests.) Wonderful idea! This wouldn't be hard to do either. Would this be for all regular non-XML formats (sequences and alignments)? > So if you have the time, please add tests and attach them to specific > new bug reports if your tests reveal bugs > (http://bugzilla.open-bio.org/), or just email your patch(s) direct to me. > > Cheers, > Sendu. Chris From bix at sendu.me.uk Thu Sep 7 17:54:00 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Sep 2006 22:54:00 +0100 Subject: [Bioperl-l] t/SeqIO.t -- improvements? In-Reply-To: <000701c6d2bc$187f0520$15327e82@pyrimidine> References: <000701c6d2bc$187f0520$15327e82@pyrimidine> Message-ID: <450094F8.9030303@sendu.me.uk> Chris Fields wrote: > > For diffs, you might be able to try a line-for-line 'diff' using Test::More > and like() (yes, I know I'm beating that drum to death). I remember seeing > some direct file comparisons somewhere in the test suite, genbank.t maybe, > in relation to Bio::Species output. I think there was a little cheating > though as the input test data looks like it was already passed through > SeqIO! That would have been my additions, and no cheating was involved, as far as I was aware, honest g'vnor! (Seriously, the input couldn't have gone through since previously it was incapable of generating the output that is the input... and if you understood that you win a cookie.) From nkamesh at gmail.com Thu Sep 7 17:51:50 2006 From: nkamesh at gmail.com (kamesh narasimhan) Date: Fri, 8 Sep 2006 03:21:50 +0530 Subject: [Bioperl-l] removing redundant accession numbers Message-ID: Hi ppl, I am newbie to perl/bioperl programming. I currently have a task, (which looks a bit daunting to me now...). I have a text file, in which I have a set of accession numbers and which look like this acession_numbers.txt contain: (a '>'' followed by two lower case alphabets followed by ten digits). >ci0100130090 >ci0100130320 >ci0100130340 >ci0100130574 >ci0100130090 >ci0100130804 >ci0100130945 >ci0100130986 >ci0100130090 >ci0100131137 >ci0100131140 >ci0100130320 >ci0100130340 >ci0100130804 >ci0100130945 Some of the accession numbers may be repeated in the file, like for example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3 times etc; >ci0100130320 2 times etc; I would want the output file for a program telling me, that output file.txt >ci0100130090 - 3 times >ci0100130320 - 2 times ....... I tried perl scripting with the idea of getting to read the $/ = '>' and getting each element in an array....however, ya..i am not able to proceed....and seem to going nowhere.... any help with scripting (and if possible with comments) in this regard will be greatly appreciated. Thanks a zillion in advance From bix at sendu.me.uk Thu Sep 7 18:51:36 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Sep 2006 23:51:36 +0100 Subject: [Bioperl-l] removing redundant accession numbers In-Reply-To: References: Message-ID: <4500A278.9060708@sendu.me.uk> kamesh narasimhan wrote: > Hi ppl, > > I am newbie to perl/bioperl programming. [...] > I tried perl scripting with the idea of getting to read the $/ = '>' > and getting each element in an array....however, ya..i am not able to > proceed....and seem to going nowhere.... > > any help with scripting (and if possible with comments) in this regard > will be greatly appreciated. This list is for Bioperl-specific help. For learning the basics of Perl I would suggest you visit http://learn.perl.org/. However, a hint: the solution to you problem involves using a hash. From cjfields at uiuc.edu Thu Sep 7 19:04:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Sep 2006 18:04:34 -0500 Subject: [Bioperl-l] removing redundant accession numbers In-Reply-To: Message-ID: <000401c6d2d1$fc780640$15327e82@pyrimidine> Not necessarily a Bioperl question, but... Use a hash and autovivification, not an array: my %ids; while (<>) { chomp; $ids{$_}++; } print "$_\t$ids{$_} times\n" for sort keys %ids; Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of kamesh narasimhan > Sent: Thursday, September 07, 2006 4:52 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] removing redundant accession numbers > > Hi ppl, > > I am newbie to perl/bioperl programming. > > I currently have a task, (which looks a bit daunting to me now...). I > have a text file, in which I have a set of accession numbers and which > look like this > > acession_numbers.txt contain: (a '>'' followed by two lower case > alphabets followed by ten digits). > > >ci0100130090 > >ci0100130320 > >ci0100130340 > >ci0100130574 > >ci0100130090 > >ci0100130804 > >ci0100130945 > >ci0100130986 > >ci0100130090 > >ci0100131137 > >ci0100131140 > >ci0100130320 > >ci0100130340 > >ci0100130804 > >ci0100130945 > > Some of the accession numbers may be repeated in the file, like for > example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3 > times etc; >ci0100130320 2 times etc; > > I would want the output file for a program telling me, that > > output file.txt > > >ci0100130090 - 3 times > >ci0100130320 - 2 times > ....... > > I tried perl scripting with the idea of getting to read the $/ = '>' > and getting each element in an array....however, ya..i am not able to > proceed....and seem to going nowhere.... > > any help with scripting (and if possible with comments) in this regard > will be greatly appreciated. > > Thanks a zillion in advance > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nkamesh at gmail.com Thu Sep 7 17:21:33 2006 From: nkamesh at gmail.com (kamesh narasimhan) Date: Fri, 8 Sep 2006 02:51:33 +0530 Subject: [Bioperl-l] removing redundant accession numbers Message-ID: Hi ppl, I am newbie to perl/bioperl programming. I currently have a task, (which looks a bit daunting to me now...). I have a text file, in which I have a set of accession numbers and which look like this acession_numbers.txt contain: (a '>'' followed by two lower case alphabets followed by ten digits). >ci0100130090 >ci0100130320 >ci0100130340 >ci0100130574 >ci0100130090 >ci0100130804 >ci0100130945 >ci0100130986 >ci0100130090 >ci0100131137 >ci0100131140 >ci0100130320 >ci0100130340 >ci0100130804 >ci0100130945 Some of the accession numbers may be repeated in the file, like for example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3 times etc; >ci0100130320 2 times etc; I would want the output file for a program telling me, that output file.txt >ci0100130090 - 3 times >ci0100130320 - 2 times ....... I tried perl scripting with the idea of getting to read the $/ = '>' and getting each element in an array....however, ya..i am not able to proceed....and seem to going nowhere.... any help with scripting (and if possible with comments) in this regard will be greatly appreciated. Thanks a zillion in advance From torsten.seemann at infotech.monash.edu.au Thu Sep 7 18:41:57 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 08 Sep 2006 08:41:57 +1000 Subject: [Bioperl-l] removing redundant accession numbers In-Reply-To: References: Message-ID: <4500A035.1060100@infotech.monash.edu.au> > acession_numbers.txt contain: (a '>'' followed by two lower case > alphabets followed by ten digits). > Some of the accession numbers may be repeated in the file, like for > example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3 > times etc; >ci0100130320 2 times etc; > I would want the output file for a program telling me, that >> ci0100130090 - 3 times >> ci0100130320 - 2 times If you are on a Unix system the easiest way is to use the power of piping between shell commands: % cat acession_numbers.txt | sort | uniq -c 3 >ci0100130090 2 >ci0100130320 2 >ci0100130340 1 >ci0100130574 2 >ci0100130804 2 >ci0100130945 1 >ci0100130986 1 >ci0100131137 1 >ci0100131140 If you want to strip the '>' symbol and put the count after the accession with the 'times', just add more parts to the pipe: % cat acession_numbers.txt | sed -e 's/^>//' | sort | uniq -c | awk '{ print $2,"-",$1,"times" }' ci0100130090 - 3 times ci0100130320 - 2 times ci0100130340 - 2 times ci0100130574 - 1 times ci0100130804 - 2 times ci0100130945 - 2 times ci0100130986 - 1 times ci0100131137 - 1 times ci0100131140 - 1 times Hope that helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From arareko at campus.iztacala.unam.mx Thu Sep 7 20:14:29 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 07 Sep 2006 19:14:29 -0500 Subject: [Bioperl-l] Cvs update tests fail In-Reply-To: <6dce9a0b0609071334g1d981884jf4d4a23f42763da4@mail.gmail.com> References: <04fb01c6d2b7$7a9c9c90$2f01a8c0@GOLHARMOBILE1> <6dce9a0b0609071334g1d981884jf4d4a23f42763da4@mail.gmail.com> Message-ID: <4500B5E5.2000307@campus.iztacala.unam.mx> In a similar way, the -P option removes directories that are empty after running the update: cvs up -dP Mauricio. Lincoln Stein wrote: > When you run "cvs update" be sure to use the -d option in order to create > new directories. You probably don't have a Bio/DB/SeqFeature directory at > all! > > cvs update -d > > Lincoln > > On 9/7/06, Ryan Golhar wrote: >> So I've updated my installation of bioperl from cvs using 'cvs update'. >> I reran 'perl Makefile.PL' followed by 'make; make test'. >> >> I see the following errors from 'make test': >> >> t/BioDBSeqFeature............Can't locate Bio/DB/SeqFeature/Store.pm in >> @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. >> ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib >> /users/golharam/bioperl/bioperl-live/blib/arch >> /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 >> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl >> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at >> t/BioDBSeqFeature.t line 30. >> BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. >> t/BioDBSeqFeature............dubious >> >> Test returned status 2 (wstat 512, 0x200) >> Scalar found where operator expected at (eval 152) line 1, near "'int' >> $__val" >> (Missing operator before $__val?) >> DIED. FAILED tests 1-43 >> Failed 43/43 tests, 0.00% okay >> t/BioDBSeqFeature_BDB........Can't locate Bio/DB/SeqFeature/Store.pm in >> @INC (@INC contains: /users/golharam/cvswork/bioperl-live/ . .. >> ./blib/lib t /users/golharam/bioperl/bioperl-live/blib/lib >> /users/golharam/bioperl/bioperl-live/blib/arch >> /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 >> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl >> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at >> t/BioDBSeqFeature.t line 30. >> BEGIN failed--compilation aborted at t/BioDBSeqFeature.t line 30. >> t/BioDBSeqFeature_BDB........FAILED tests 1-43 >> >> Failed 43/43 tests, 0.00% okay >> >> Any ideas? >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From nkamesh at gmail.com Fri Sep 8 00:04:24 2006 From: nkamesh at gmail.com (kamesh narasimhan) Date: Fri, 08 Sep 2006 09:34:24 +0530 Subject: [Bioperl-l] removing redundant accession numbers In-Reply-To: <4500A035.1060100@infotech.monash.edu.au> References: <4500A035.1060100@infotech.monash.edu.au> Message-ID: Thanks a lot.. that lil bit of commands did the job.... On 9/8/06, Torsten Seemann wrote: > > acession_numbers.txt contain: (a '>'' followed by two lower case > > alphabets followed by ten digits). > > Some of the accession numbers may be repeated in the file, like for > > example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3 > > times etc; >ci0100130320 2 times etc; > > I would want the output file for a program telling me, that > >> ci0100130090 - 3 times > >> ci0100130320 - 2 times > > If you are on a Unix system the easiest way is to use the power of > piping between shell commands: > > % cat acession_numbers.txt | sort | uniq -c > > 3 >ci0100130090 > 2 >ci0100130320 > 2 >ci0100130340 > 1 >ci0100130574 > 2 >ci0100130804 > 2 >ci0100130945 > 1 >ci0100130986 > 1 >ci0100131137 > 1 >ci0100131140 > > If you want to strip the '>' symbol and put the count after the > accession with the 'times', just add more parts to the pipe: > > % cat acession_numbers.txt > | sed -e 's/^>//' > | sort > | uniq -c > | awk '{ print $2,"-",$1,"times" }' > > ci0100130090 - 3 times > ci0100130320 - 2 times > ci0100130340 - 2 times > ci0100130574 - 1 times > ci0100130804 - 2 times > ci0100130945 - 2 times > ci0100130986 - 1 times > ci0100131137 - 1 times > ci0100131140 - 1 times > > Hope that helps, > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > From phil.princely at gmail.com Fri Sep 8 01:53:31 2006 From: phil.princely at gmail.com (Phil Princely) Date: Fri, 8 Sep 2006 14:53:31 +0900 Subject: [Bioperl-l] formatdb Message-ID: hi all I'm new to this list, and new to BioPerl, so excuse me if this is off topic. My background is computers, but I'm working with BLAST at the moment. I've installed BLAST and have it running nicely on MacOS X. I found that entering a new database for use in WWWBLAST was tedious, with 5 or so steps, so I wrote a perl script which does the following: takes the input of a FASTA file name, and n or p, if it's a nucleotide of protein sequence runs it through formatdb, with the correct flags moves the output files to the correct directory changes the config and html files so that the new database is accessible in WWWBLAST My script isn't the prettiest, or the most error resistant. Then I found BioPerl and was wondering if the same functionality is built in. So my question is.. Is there an easy way to run formatdb and change the relevant files using BioPerl? Sorry if this is off topic, and thanks in advance Phil P. From ewijaya at i2r.a-star.edu.sg Fri Sep 8 05:34:43 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Fri, 08 Sep 2006 17:34:43 +0800 Subject: [Bioperl-l] Meaning of Scores and Sample code for Bio::Matrix::Scoring Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061531@mailbe01.teak.local.net> Hi, Where can I find further information on the meaning of the value returns by Bio::Matrix::Scoring functions. For example, I would like to know what is the meaning of these scores, and how was it calculated. 1. expected_score 2. Lambda (is this the prior probablity of a substring generated according to background input sequence probability or something else?) And what are the value we should pass for them. e.g. it is not clear to me what does $newval refer to in this context: $obj->expected_score($newval) Perhaps a pointer of sample script that uses these score can be useful also. The information provided in the doc is quite minimum. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From ewijaya at i2r.a-star.edu.sg Fri Sep 8 05:34:43 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Fri, 08 Sep 2006 17:34:43 +0800 Subject: [Bioperl-l] Meaning of Scores and Sample code for Bio::Matrix::Scoring Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061531@mailbe01.teak.local.net> Hi, Where can I find further information on the meaning of the value returns by Bio::Matrix::Scoring functions. For example, I would like to know what is the meaning of these scores, and how was it calculated. 1. expected_score 2. Lambda (is this the prior probablity of a substring generated according to background input sequence probability or something else?) And what are the value we should pass for them. e.g. it is not clear to me what does $newval refer to in this context: $obj->expected_score($newval) Perhaps a pointer of sample script that uses these score can be useful also. The information provided in the doc is quite minimum. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From hlapp at gmx.net Fri Sep 8 08:34:09 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 8 Sep 2006 08:34:09 -0400 Subject: [Bioperl-l] removing redundant accession numbers In-Reply-To: <000401c6d2d1$fc780640$15327e82@pyrimidine> References: <000401c6d2d1$fc780640$15327e82@pyrimidine> Message-ID: Actually, this is trivial in Unix: $ sort my.list.of.accessions | uniq -c -hilmar On Sep 7, 2006, at 7:04 PM, Chris Fields wrote: > Not necessarily a Bioperl question, but... > > Use a hash and autovivification, not an array: > > my %ids; > > while (<>) { > chomp; > $ids{$_}++; > } > > print "$_\t$ids{$_} times\n" for sort keys %ids; > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of kamesh narasimhan >> Sent: Thursday, September 07, 2006 4:52 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] removing redundant accession numbers >> >> Hi ppl, >> >> I am newbie to perl/bioperl programming. >> >> I currently have a task, (which looks a bit daunting to me now...). I >> have a text file, in which I have a set of accession numbers and >> which >> look like this >> >> acession_numbers.txt contain: (a '>'' followed by two lower case >> alphabets followed by ten digits). >> >>> ci0100130090 >>> ci0100130320 >>> ci0100130340 >>> ci0100130574 >>> ci0100130090 >>> ci0100130804 >>> ci0100130945 >>> ci0100130986 >>> ci0100130090 >>> ci0100131137 >>> ci0100131140 >>> ci0100130320 >>> ci0100130340 >>> ci0100130804 >>> ci0100130945 >> >> Some of the accession numbers may be repeated in the file, like for >> example >ci0100130090 is repeated 3 times, >ci0100130340 is >> repeated 3 >> times etc; >ci0100130320 2 times etc; >> >> I would want the output file for a program telling me, that >> >> output file.txt >> >>> ci0100130090 - 3 times >>> ci0100130320 - 2 times >> ....... >> >> I tried perl scripting with the idea of getting to read the $/ = '>' >> and getting each element in an array....however, ya..i am not able to >> proceed....and seem to going nowhere.... >> >> any help with scripting (and if possible with comments) in this >> regard >> will be greatly appreciated. >> >> Thanks a zillion in advance >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From osborne1 at optonline.net Fri Sep 8 09:44:15 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 08 Sep 2006 09:44:15 -0400 Subject: [Bioperl-l] formatdb In-Reply-To: Message-ID: Phil, Sadly, no. There really should be such a thing, or even just a module to run formatdb, it could be part of the bioperl-run package. Brian O. On 9/8/06 1:53 AM, "Phil Princely" wrote: > Is there an easy way to run formatdb and change the relevant files > using BioPerl? From staffa at niehs.nih.gov Fri Sep 8 10:13:31 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 8 Sep 2006 10:13:31 -0400 Subject: [Bioperl-l] Length of a sequence. Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08B4A@NIHCESMLBX6.nih.gov> Wondering why I'm having trouble finding out how to get the total length of a sequence. I am doing wonderfully parsing all the features information. y'd think this would be simple. What's the best thing to do? use Bio::SeqIO; # Get the sequence ... $seqio_object = Bio::SeqIO->new(-file => "$filename" ); $seq_object = $seqio_object->next_seq; >>> What's the magic code to put HERE????<<<<<<<<<<<<<<<<<<< print "Sequence $fileroot has length $len\n"; # Get the features as an array of objects # This seems to get them all. Makes an array of objects. my @features = $seq_object->get_SeqFeatures(); # just top level Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From dulebl at blic.net Fri Sep 8 10:28:09 2006 From: dulebl at blic.net (dulebl at blic.net) Date: Fri, 8 Sep 2006 10:28:09 -0400 Subject: [Bioperl-l] A Bioperl problem Message-ID: <200609081428.k88ES9ns010172@newportal.open-bio.org> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060908/9c919bc9/attachment.pl From cjfields at uiuc.edu Fri Sep 8 10:31:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Sep 2006 09:31:27 -0500 Subject: [Bioperl-l] Length of a sequence. In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08B4A@NIHCESMLBX6.nih.gov> Message-ID: <000f01c6d353$785831f0$15327e82@pyrimidine> Nick, It should be '$seq_object->length()'. If you need a list of the methods for a particular Bioperl object type, I recommend the Deobfuscator: http://www.bioperl.org/wiki/Deobfuscator http://bioperl.org/cgi-bin/deob_interface.cgi Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) [C] > Sent: Friday, September 08, 2006 9:14 AM > To: bioperl-l > Subject: [Bioperl-l] Length of a sequence. > > Wondering why I'm having trouble finding out how to get the total length > of a sequence. > I am doing wonderfully parsing all the features information. > y'd think this would be simple. > What's the best thing to do? > > > use Bio::SeqIO; > # Get the sequence > ... > $seqio_object = Bio::SeqIO->new(-file => "$filename" ); > $seq_object = $seqio_object->next_seq; > >>> What's the magic code to put HERE????<<<<<<<<<<<<<<<<<<< > print "Sequence $fileroot has length $len\n"; > # Get the features as an array of objects > # This seems to get them all. Makes an array of objects. > my @features = $seq_object->get_SeqFeatures(); # just top level > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Fri Sep 8 10:41:34 2006 From: skirov at utk.edu (skirov) Date: Fri, 8 Sep 2006 10:41:34 -0400 Subject: [Bioperl-l] Length of a sequence. Message-ID: <451C107D@webmail.utk.edu> Sorry, I meant perldoc Bio::Seq of course (/search length) or you can do plain length($seq_object->seq); My point is - read the object documentation first- it would have most answers to your questions. >===== Original Message From "Staffa, Nick (NIH/NIEHS) [C]" ===== >Wondering why I'm having trouble finding out how to get the total length of a sequence. >I am doing wonderfully parsing all the features information. >y'd think this would be simple. >What's the best thing to do? > > >use Bio::SeqIO; ># Get the sequence >.. >$seqio_object = Bio::SeqIO->new(-file => "$filename" ); >$seq_object = $seqio_object->next_seq; >>> What's the magic code to put HERE????<<<<<<<<<<<<<<<<<<< >print "Sequence $fileroot has length $len\n"; ># Get the features as an array of objects ># This seems to get them all. Makes an array of objects. >my @features = $seq_object->get_SeqFeatures(); # just top level > > > >Nick Staffa >Telephone: 919-316-4569 (NIEHS: 6-4569) >Scientific Computing Support Group >NIEHS Information Technology Support Services Contract >(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >National Institute of Environmental Health Sciences >National Institutes of Health >Research Triangle Park, North Carolina > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Sep 8 11:10:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Sep 2006 10:10:41 -0500 Subject: [Bioperl-l] A Bioperl problem In-Reply-To: <200609081428.k88ES9ns010172@newportal.open-bio.org> Message-ID: <001601c6d358$f31a2ec0$15327e82@pyrimidine> Swissprot access was fixed in CVS. You can update from there using the instructions on the wiki. http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows If you can wait a bit longer, we will try to release a PPM for WinXP ActivePerl in the next developer release. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of dulebl at blic.net > Sent: Friday, September 08, 2006 9:28 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] A Bioperl problem > > > I have a problem to access Swiss-prot database and I suspect that it is a > instaling or set up problem. > I use Windows XP sp2 have installed ActivePerl 5.8.7.815 and Bioperl > 1.2.3. and > later I installed Bioperl-1.4 > Well, I tried get sequence from swiss-prot but it all the time writes: > > b From skirov at utk.edu Fri Sep 8 10:35:20 2006 From: skirov at utk.edu (skirov) Date: Fri, 8 Sep 2006 10:35:20 -0400 Subject: [Bioperl-l] Length of a sequence. Message-ID: <451BF908@webmail.utk.edu> Surprisingly the method is length: perldoc -f length Bio::Seq >===== Original Message From "Staffa, Nick (NIH/NIEHS) [C]" ===== >Wondering why I'm having trouble finding out how to get the total length of a sequence. >I am doing wonderfully parsing all the features information. >y'd think this would be simple. >What's the best thing to do? > > >use Bio::SeqIO; ># Get the sequence >.. >$seqio_object = Bio::SeqIO->new(-file => "$filename" ); >$seq_object = $seqio_object->next_seq; >>> What's the magic code to put HERE????<<<<<<<<<<<<<<<<<<< >print "Sequence $fileroot has length $len\n"; ># Get the features as an array of objects ># This seems to get them all. Makes an array of objects. >my @features = $seq_object->get_SeqFeatures(); # just top level > > > >Nick Staffa >Telephone: 919-316-4569 (NIEHS: 6-4569) >Scientific Computing Support Group >NIEHS Information Technology Support Services Contract >(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >National Institute of Environmental Health Sciences >National Institutes of Health >Research Triangle Park, North Carolina > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From georose at gmail.com Fri Sep 8 17:04:48 2006 From: georose at gmail.com (GeOrgE RosEnbERg) Date: Fri, 8 Sep 2006 15:04:48 -0600 Subject: [Bioperl-l] bioperl installation on fedora core 5 Message-ID: <54da06110609081404n7ad4e496l2b226b3dad23f38d@mail.gmail.com> Dear Bioperl, I have installed bioperl 1.5.1 on a fedora core 5 x64 machine and I am having problems parsing information out of blast reports. I went back and started to run bptutorial.pl and I am getting the following error when running #5: Beginning example of sequence manipulation without explicit Seq objects... -------------------- WARNING --------------------- MSG: id (ROA1_HUMAN) does not exist --------------------------------------------------- Can't call method "display_id" on an undefined value at bptutorial.pl line 3945. Could this error be related to my parsing problem or show that I have not installed bioperl correctly? Thank you, George From cjfields at uiuc.edu Fri Sep 8 17:26:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Sep 2006 16:26:24 -0500 Subject: [Bioperl-l] bioperl installation on fedora core 5 In-Reply-To: <54da06110609081404n7ad4e496l2b226b3dad23f38d@mail.gmail.com> Message-ID: <000001c6d38d$709d8a20$15327e82@pyrimidine> You will likely need to update to CVS for proper BLAST parsing, especially if it is using a remote blast (I can't remember bp #5 off the top of my head). NCBI had changed BLAST output formatting which broke BLAST parsing in bioperl 1.5.1. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of GeOrgE RosEnbERg > Sent: Friday, September 08, 2006 4:05 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl installation on fedora core 5 > > Dear Bioperl, > > I have installed bioperl 1.5.1 on a fedora core 5 x64 machine and I am > having problems parsing information out of blast reports. > > I went back and started to run bptutorial.pl and I am getting the > following > error when running #5: > > Beginning example of sequence manipulation without explicit Seq objects... > > -------------------- WARNING --------------------- > MSG: id (ROA1_HUMAN) does not exist > --------------------------------------------------- > Can't call method "display_id" on an undefined value at bptutorial.pl line > 3945. > > Could this error be related to my parsing problem or show that I have not > installed bioperl correctly? > > Thank you, > > George > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmacho at gmail.com Sat Sep 9 04:31:44 2006 From: mmacho at gmail.com (ende) Date: Sat, 9 Sep 2006 10:31:44 +0200 Subject: [Bioperl-l] Blast temporary open files not closed Message-ID: Processing a fasta file with about 500 dna seqs.. my MacOSX (that has the max number of opened files up to 512) crashes... You need to divide the problem in pieces or (in bash shell, with ulimit -n 1024) augment that max number of opened files. This has no sense for me since my perl program nor leave any open file without its corresponding closing. On the other side, the problem arises when the number of dnas grows _in one file_. In the code I run blast (StandAloneBlast... $blastMachine->blastall) for each seq. Then sniffing int the perl program stopped perl program I confirmed my suspects. BioPerl (StandAloneBlast) does not closes temporary opened files. Those files seems to be created to save seqs for to be then processed by blastall program... The output of lsof indicates (as MacOSX System Monitor) that those files are left opened but not there (!?) The output of lsof +p pidofperlprogram COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 19280 1589055 /usr/bin/perl perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/Fcntl.bundle perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/POSIX.bundle perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/Glob.bundle perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/dyld perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ libSystem.B.dylib perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ttyp2 perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ttyp2 perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ttyp2 perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ttyp2 perl 21977 juanfc 4r VREG stat(/ private/tmp/FJPRR4sVob): No such file or directory perl 21977 juanfc 5r VREG stat(/ private/tmp/MYkH56SwfZ): No such file or directory perl 21977 juanfc 6r VREG stat(/ private/tmp/6mgSei6w73): No such file or directory perl 21977 juanfc 7r VREG stat(/ private/tmp/VYBpVSlg59): No such file or directory perl 21977 juanfc 8r VREG stat(/ private/tmp/cfMJ74N25D): No such file or directory perl 21977 juanfc 9r VREG stat(/ private/tmp/M6fgaPCIXa): No such file or directory perl 21977 juanfc 10r VREG stat(/ private/tmp/tMap0OjaUP): No such file or directory perl 21977 juanfc 11r VREG stat(/ private/tmp/W87vtYjsCt): No such file or directory perl 21977 juanfc 12r VREG stat(/ private/tmp/iHWH3bCMPC): No such file or directory perl 21977 juanfc 13r VREG stat(/ private/tmp/IxcDMdaDyv): No such file or directory perl 21977 juanfc 14r VREG stat(/ private/tmp/iKBhFmI6VR): No such file or directory perl 21977 juanfc 15r VREG stat(/ private/tmp/8JnbRBsiTS): No such file or directory perl 21977 juanfc 16r VREG stat(/ private/tmp/hZWlCYbOsY): No such file or directory perl 21977 juanfc 17r VREG stat(/ private/tmp/8p1Vp3jmkV): No such file or directory perl 21977 juanfc 18r VREG stat(/ private/tmp/tClGgoph8f): No such file or directory perl 21977 juanfc 19r VREG stat(/ private/tmp/60BSipkClP): No such file or directory perl 21977 juanfc 20r VREG stat(/ private/tmp/aWZeJiVHvV): No such file or directory perl 21977 juanfc 21r VREG stat(/ private/tmp/87xqhVYjV1): No such file or directory perl 21977 juanfc 22r VREG stat(/ private/tmp/06m1ko4XjK): No such file or directory perl 21977 juanfc 23r VREG stat(/ private/tmp/s7P3AA6F9o): No such file or directory perl 21977 juanfc 24r VREG stat(/ private/tmp/dJt3mpTS8m): No such file or directory perl 21977 juanfc 25r VREG stat(/ private/tmp/xqL5jguilE): No such file or directory perl 21977 juanfc 26r VREG stat(/ private/tmp/AFWKFmgBh2): No such file or directory perl 21977 juanfc 27r VREG stat(/ private/tmp/zWLMrdNlv9): No such file or directory perl 21977 juanfc 28r VREG stat(/ private/tmp/svOtPNyYIs): No such file or directory perl 21977 juanfc 29r VREG stat(/ private/tmp/WpVvGitW33): No such file or directory perl 21977 juanfc 30r VREG stat(/ private/tmp/pyoXSEsRvM): No such file or directory perl 21977 juanfc 31r VREG stat(/ private/tmp/JoBSaJNqf6): No such file or directory perl 21977 juanfc 32r VREG stat(/ private/tmp/Uw9ABNTqcN): No such file or directory perl 21977 juanfc 33r VREG stat(/ private/tmp/wG4la2L5if): No such file or directory perl 21977 juanfc 34r VREG stat(/ private/tmp/IHTz2BeDOv): No such file or directory perl 21977 juanfc 35r VREG stat(/ private/tmp/t9sXizoEim): No such file or directory perl 21977 juanfc 36r VREG stat(/ private/tmp/DJDliIFd30): No such file or directory perl 21977 juanfc 37r VREG stat(/ private/tmp/atGFL3GEa6): No such file or directory perl 21977 juanfc 38r VREG stat(/ private/tmp/PuwFdWLjw0): No such file or directory perl 21977 juanfc 39r VREG stat(/ private/tmp/XoYmuh7JeD): No such file or directory perl 21977 juanfc 40r VREG stat(/ private/tmp/tkXqKOoQ4L): No such file or directory perl 21977 juanfc 41r VREG stat(/ private/tmp/XyQTSlDWZK): No such file or directory perl 21977 juanfc 42r VREG stat(/ private/tmp/sIcuE8SxrK): No such file or directory perl 21977 juanfc 43r VREG stat(/ private/tmp/lyhfZf9XlK): No such file or directory perl 21977 juanfc 44r VREG stat(/ private/tmp/5XK8C6WySy): No such file or directory perl 21977 juanfc 45r VREG stat(/ private/tmp/i4SUNdxxJP): No such file or directory perl 21977 juanfc 46r VREG stat(/ private/tmp/sCT2jmC3Ti): No such file or directory perl 21977 juanfc 47r VREG stat(/ private/tmp/eW27HGMYEH): No such file or directory perl 21977 juanfc 48r VREG stat(/ private/tmp/iKDldazryA): No such file or directory perl 21977 juanfc 49r VREG stat(/ private/tmp/Fim4GJg3UY): No such file or directory perl 21977 juanfc 50r VREG stat(/ private/tmp/iJNWvR2ixG): No such file or directory perl 21977 juanfc 51r VREG stat(/ private/tmp/U9OA0HiQgm): No such file or directory perl 21977 juanfc 52r VREG stat(/ private/tmp/URFitXhlku): No such file or directory perl 21977 juanfc 53r VREG stat(/ private/tmp/IauwZ1ogs6): No such file or directory perl 21977 juanfc 54r VREG stat(/ private/tmp/S5pc159wKf): No such file or directory perl 21977 juanfc 55r VREG stat(/ private/tmp/YENXM1G78k): No such file or directory perl 21977 juanfc 56r VREG stat(/ private/tmp/uDruzEPBaM): No such file or directory perl 21977 juanfc 57r VREG stat(/ private/tmp/NmW0DL9xaQ): No such file or directory perl 21977 juanfc 58r VREG stat(/ private/tmp/nn5Mw2ZfGv): No such file or directory perl 21977 juanfc 59r VREG stat(/ private/tmp/Pw7byHTVUb): No such file or directory perl 21977 juanfc 60r VREG stat(/ private/tmp/2AXMWaFtrM): No such file or directory perl 21977 juanfc 61r VREG stat(/ private/tmp/EU1vNIqLF1): No such file or directory perl 21977 juanfc 62r VREG stat(/ private/tmp/uph6n3kGfL): No such file or directory perl 21977 juanfc 63r VREG stat(/ private/tmp/g5WLuOEcrG): No such file or directory perl 21977 juanfc 64r VREG stat(/ private/tmp/aSvXqPRW46): No such file or directory perl 21977 juanfc 65r VREG stat(/ private/tmp/aJfHCcsEKx): No such file or directory perl 21977 juanfc 66r VREG stat(/ private/tmp/L7vRyF2kOX): No such file or directory perl 21977 juanfc 67r VREG stat(/ private/tmp/gND8bhWI9B): No such file or directory perl 21977 juanfc 68r VREG stat(/ private/tmp/4hukSHCFDZ): No such file or directory perl 21977 juanfc 69r VREG stat(/ private/tmp/6oKfCqu4D0): No such file or directory perl 21977 juanfc 70r VREG stat(/ private/tmp/5W8jmOltQt): No such file or directory perl 21977 juanfc 71r VREG stat(/ private/tmp/zJdLHNfwNb): No such file or directory perl 21977 juanfc 72r VREG stat(/ private/tmp/MLuGvv77HZ): No such file or directory perl 21977 juanfc 73r VREG stat(/ private/tmp/1ThCRyzfTx): No such file or directory perl 21977 juanfc 74r VREG stat(/ private/tmp/pnlSoTXEhf): No such file or directory perl 21977 juanfc 75r VREG stat(/ private/tmp/BYK4UnHbHF): No such file or directory perl 21977 juanfc 76r VREG stat(/ private/tmp/Omp320pcne): No such file or directory perl 21977 juanfc 77r VREG stat(/ private/tmp/oWgmotebi5): No such file or directory perl 21977 juanfc 78r VREG stat(/ private/tmp/p5sAeD6P2S): No such file or directory perl 21977 juanfc 79r VREG stat(/ private/tmp/jSTlPlrVv6): No such file or directory perl 21977 juanfc 80r VREG stat(/ private/tmp/8r6Hcsd5lk): No such file or directory perl 21977 juanfc 81r VREG stat(/ private/tmp/zziCynavEX): No such file or directory perl 21977 juanfc 82r VREG stat(/ private/tmp/cFhq1SuNLW): No such file or directory perl 21977 juanfc 83r VREG stat(/ private/tmp/dUDKQ3ylKn): No such file or directory perl 21977 juanfc 84r VREG stat(/ private/tmp/gVzwuTxImT): No such file or directory perl 21977 juanfc 85r VREG stat(/ private/tmp/iVMuHYGpdD): No such file or directory perl 21977 juanfc 86r VREG stat(/ private/tmp/TYpIJ0XS5r): No such file or directory perl 21977 juanfc 87r VREG stat(/ private/tmp/Fm8AGy6o9J): No such file or directory perl 21977 juanfc 88r VREG stat(/ private/tmp/J4gj6DGGVI): No such file or directory perl 21977 juanfc 89r VREG stat(/ private/tmp/Q0dKLWOJwc): No such file or directory perl 21977 juanfc 90r VREG stat(/ private/tmp/N3D1bRmMyw): No such file or directory perl 21977 juanfc 91r VREG stat(/ private/tmp/UajHqmaUDM): No such file or directory perl 21977 juanfc 92r VREG stat(/ private/tmp/4JYJIScAdA): No such file or directory perl 21977 juanfc 93r VREG stat(/ private/tmp/f6EBKb9nw9): No such file or directory perl 21977 juanfc 94r VREG stat(/ private/tmp/IRuBwvfHMS): No such file or directory perl 21977 juanfc 95r VREG stat(/ private/tmp/kxJdU1yPB9): No such file or directory perl 21977 juanfc 96r VREG stat(/ private/tmp/m1C5ffIUVF): No such file or directory perl 21977 juanfc 97r VREG stat(/ private/tmp/gcVcrpYtoT): No such file or directory perl 21977 juanfc 98r VREG stat(/ private/tmp/NY19yFrBZs): No such file or directory perl 21977 juanfc 99r VREG stat(/ private/tmp/mzFibP1FqR): No such file or directory perl 21977 juanfc 100r VREG stat(/ private/tmp/oUZRAW6h3l): No such file or directory perl 21977 juanfc 101r VREG stat(/ private/tmp/3zErWi27lP): No such file or directory perl 21977 juanfc 102r VREG stat(/ private/tmp/5wrMPduqip): No such file or directory perl 21977 juanfc 103r VREG stat(/ private/tmp/O5rhMCu28A): No such file or directory Please, help. -- Juan Falgueras Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n Universidad de M?laga From hlapp at gmx.net Sat Sep 9 08:16:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Sep 2006 08:16:05 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: References: Message-ID: <77107E48-E645-4764-B0E2-64C0221A3E54@gmx.net> Which version of Bioperl, which version of Mac OSX? You may also need to post your script or an equivalent script that shows the problem. There are plenty of us who run this sort of thing without problem, so I suspect that the way you do things in your script has some bearing on the problem. On Sep 9, 2006, at 4:31 AM, ende wrote: > > Processing a fasta file with about 500 dna seqs.. my MacOSX (that > has the max number of opened files up to 512) crashes... You need to > divide the problem in pieces or (in bash shell, with ulimit -n 1024) > augment that max number of opened files. > > This has no sense for me since my perl program nor leave any open > file without its corresponding closing. On the other side, the > problem arises when the number of dnas grows _in one file_. > > In the code I run blast (StandAloneBlast... $blastMachine->blastall) > for each seq. > > > Then sniffing int the perl program stopped perl program I confirmed > my suspects. BioPerl (StandAloneBlast) does not closes temporary > opened files. Those files seems to be created to save seqs for to be > then processed by blastall program... The output of lsof indicates > (as MacOSX System Monitor) that those files are left opened but not > there (!?) > > The output of lsof +p pidofperlprogram > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ > juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 > 19280 1589055 /usr/bin/perl > perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle > perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/Fcntl.bundle > perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/POSIX.bundle > perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/ > Glob.bundle > perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/ > dyld > perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ > libSystem.B.dylib > perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib > perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 4r VREG stat(/ > private/tmp/FJPRR4sVob): No such file or directory > perl 21977 juanfc 5r VREG stat(/ > private/tmp/MYkH56SwfZ): No such file or directory > perl 21977 juanfc 6r VREG stat(/ > private/tmp/6mgSei6w73): No such file or directory > perl 21977 juanfc 7r VREG stat(/ > private/tmp/VYBpVSlg59): No such file or directory > perl 21977 juanfc 8r VREG stat(/ > private/tmp/cfMJ74N25D): No such file or directory > perl 21977 juanfc 9r VREG stat(/ > private/tmp/M6fgaPCIXa): No such file or directory > perl 21977 juanfc 10r VREG stat(/ > private/tmp/tMap0OjaUP): No such file or directory > perl 21977 juanfc 11r VREG stat(/ > private/tmp/W87vtYjsCt): No such file or directory > perl 21977 juanfc 12r VREG stat(/ > private/tmp/iHWH3bCMPC): No such file or directory > perl 21977 juanfc 13r VREG stat(/ > private/tmp/IxcDMdaDyv): No such file or directory > perl 21977 juanfc 14r VREG stat(/ > private/tmp/iKBhFmI6VR): No such file or directory > perl 21977 juanfc 15r VREG stat(/ > private/tmp/8JnbRBsiTS): No such file or directory > perl 21977 juanfc 16r VREG stat(/ > private/tmp/hZWlCYbOsY): No such file or directory > perl 21977 juanfc 17r VREG stat(/ > private/tmp/8p1Vp3jmkV): No such file or directory > perl 21977 juanfc 18r VREG stat(/ > private/tmp/tClGgoph8f): No such file or directory > perl 21977 juanfc 19r VREG stat(/ > private/tmp/60BSipkClP): No such file or directory > perl 21977 juanfc 20r VREG stat(/ > private/tmp/aWZeJiVHvV): No such file or directory > perl 21977 juanfc 21r VREG stat(/ > private/tmp/87xqhVYjV1): No such file or directory > perl 21977 juanfc 22r VREG stat(/ > private/tmp/06m1ko4XjK): No such file or directory > perl 21977 juanfc 23r VREG stat(/ > private/tmp/s7P3AA6F9o): No such file or directory > perl 21977 juanfc 24r VREG stat(/ > private/tmp/dJt3mpTS8m): No such file or directory > perl 21977 juanfc 25r VREG stat(/ > private/tmp/xqL5jguilE): No such file or directory > perl 21977 juanfc 26r VREG stat(/ > private/tmp/AFWKFmgBh2): No such file or directory > perl 21977 juanfc 27r VREG stat(/ > private/tmp/zWLMrdNlv9): No such file or directory > perl 21977 juanfc 28r VREG stat(/ > private/tmp/svOtPNyYIs): No such file or directory > perl 21977 juanfc 29r VREG stat(/ > private/tmp/WpVvGitW33): No such file or directory > perl 21977 juanfc 30r VREG stat(/ > private/tmp/pyoXSEsRvM): No such file or directory > perl 21977 juanfc 31r VREG stat(/ > private/tmp/JoBSaJNqf6): No such file or directory > perl 21977 juanfc 32r VREG stat(/ > private/tmp/Uw9ABNTqcN): No such file or directory > perl 21977 juanfc 33r VREG stat(/ > private/tmp/wG4la2L5if): No such file or directory > perl 21977 juanfc 34r VREG stat(/ > private/tmp/IHTz2BeDOv): No such file or directory > perl 21977 juanfc 35r VREG stat(/ > private/tmp/t9sXizoEim): No such file or directory > perl 21977 juanfc 36r VREG stat(/ > private/tmp/DJDliIFd30): No such file or directory > perl 21977 juanfc 37r VREG stat(/ > private/tmp/atGFL3GEa6): No such file or directory > perl 21977 juanfc 38r VREG stat(/ > private/tmp/PuwFdWLjw0): No such file or directory > perl 21977 juanfc 39r VREG stat(/ > private/tmp/XoYmuh7JeD): No such file or directory > perl 21977 juanfc 40r VREG stat(/ > private/tmp/tkXqKOoQ4L): No such file or directory > perl 21977 juanfc 41r VREG stat(/ > private/tmp/XyQTSlDWZK): No such file or directory > perl 21977 juanfc 42r VREG stat(/ > private/tmp/sIcuE8SxrK): No such file or directory > perl 21977 juanfc 43r VREG stat(/ > private/tmp/lyhfZf9XlK): No such file or directory > perl 21977 juanfc 44r VREG stat(/ > private/tmp/5XK8C6WySy): No such file or directory > perl 21977 juanfc 45r VREG stat(/ > private/tmp/i4SUNdxxJP): No such file or directory > perl 21977 juanfc 46r VREG stat(/ > private/tmp/sCT2jmC3Ti): No such file or directory > perl 21977 juanfc 47r VREG stat(/ > private/tmp/eW27HGMYEH): No such file or directory > perl 21977 juanfc 48r VREG stat(/ > private/tmp/iKDldazryA): No such file or directory > perl 21977 juanfc 49r VREG stat(/ > private/tmp/Fim4GJg3UY): No such file or directory > perl 21977 juanfc 50r VREG stat(/ > private/tmp/iJNWvR2ixG): No such file or directory > perl 21977 juanfc 51r VREG stat(/ > private/tmp/U9OA0HiQgm): No such file or directory > perl 21977 juanfc 52r VREG stat(/ > private/tmp/URFitXhlku): No such file or directory > perl 21977 juanfc 53r VREG stat(/ > private/tmp/IauwZ1ogs6): No such file or directory > perl 21977 juanfc 54r VREG stat(/ > private/tmp/S5pc159wKf): No such file or directory > perl 21977 juanfc 55r VREG stat(/ > private/tmp/YENXM1G78k): No such file or directory > perl 21977 juanfc 56r VREG stat(/ > private/tmp/uDruzEPBaM): No such file or directory > perl 21977 juanfc 57r VREG stat(/ > private/tmp/NmW0DL9xaQ): No such file or directory > perl 21977 juanfc 58r VREG stat(/ > private/tmp/nn5Mw2ZfGv): No such file or directory > perl 21977 juanfc 59r VREG stat(/ > private/tmp/Pw7byHTVUb): No such file or directory > perl 21977 juanfc 60r VREG stat(/ > private/tmp/2AXMWaFtrM): No such file or directory > perl 21977 juanfc 61r VREG stat(/ > private/tmp/EU1vNIqLF1): No such file or directory > perl 21977 juanfc 62r VREG stat(/ > private/tmp/uph6n3kGfL): No such file or directory > perl 21977 juanfc 63r VREG stat(/ > private/tmp/g5WLuOEcrG): No such file or directory > perl 21977 juanfc 64r VREG stat(/ > private/tmp/aSvXqPRW46): No such file or directory > perl 21977 juanfc 65r VREG stat(/ > private/tmp/aJfHCcsEKx): No such file or directory > perl 21977 juanfc 66r VREG stat(/ > private/tmp/L7vRyF2kOX): No such file or directory > perl 21977 juanfc 67r VREG stat(/ > private/tmp/gND8bhWI9B): No such file or directory > perl 21977 juanfc 68r VREG stat(/ > private/tmp/4hukSHCFDZ): No such file or directory > perl 21977 juanfc 69r VREG stat(/ > private/tmp/6oKfCqu4D0): No such file or directory > perl 21977 juanfc 70r VREG stat(/ > private/tmp/5W8jmOltQt): No such file or directory > perl 21977 juanfc 71r VREG stat(/ > private/tmp/zJdLHNfwNb): No such file or directory > perl 21977 juanfc 72r VREG stat(/ > private/tmp/MLuGvv77HZ): No such file or directory > perl 21977 juanfc 73r VREG stat(/ > private/tmp/1ThCRyzfTx): No such file or directory > perl 21977 juanfc 74r VREG stat(/ > private/tmp/pnlSoTXEhf): No such file or directory > perl 21977 juanfc 75r VREG stat(/ > private/tmp/BYK4UnHbHF): No such file or directory > perl 21977 juanfc 76r VREG stat(/ > private/tmp/Omp320pcne): No such file or directory > perl 21977 juanfc 77r VREG stat(/ > private/tmp/oWgmotebi5): No such file or directory > perl 21977 juanfc 78r VREG stat(/ > private/tmp/p5sAeD6P2S): No such file or directory > perl 21977 juanfc 79r VREG stat(/ > private/tmp/jSTlPlrVv6): No such file or directory > perl 21977 juanfc 80r VREG stat(/ > private/tmp/8r6Hcsd5lk): No such file or directory > perl 21977 juanfc 81r VREG stat(/ > private/tmp/zziCynavEX): No such file or directory > perl 21977 juanfc 82r VREG stat(/ > private/tmp/cFhq1SuNLW): No such file or directory > perl 21977 juanfc 83r VREG stat(/ > private/tmp/dUDKQ3ylKn): No such file or directory > perl 21977 juanfc 84r VREG stat(/ > private/tmp/gVzwuTxImT): No such file or directory > perl 21977 juanfc 85r VREG stat(/ > private/tmp/iVMuHYGpdD): No such file or directory > perl 21977 juanfc 86r VREG stat(/ > private/tmp/TYpIJ0XS5r): No such file or directory > perl 21977 juanfc 87r VREG stat(/ > private/tmp/Fm8AGy6o9J): No such file or directory > perl 21977 juanfc 88r VREG stat(/ > private/tmp/J4gj6DGGVI): No such file or directory > perl 21977 juanfc 89r VREG stat(/ > private/tmp/Q0dKLWOJwc): No such file or directory > perl 21977 juanfc 90r VREG stat(/ > private/tmp/N3D1bRmMyw): No such file or directory > perl 21977 juanfc 91r VREG stat(/ > private/tmp/UajHqmaUDM): No such file or directory > perl 21977 juanfc 92r VREG stat(/ > private/tmp/4JYJIScAdA): No such file or directory > perl 21977 juanfc 93r VREG stat(/ > private/tmp/f6EBKb9nw9): No such file or directory > perl 21977 juanfc 94r VREG stat(/ > private/tmp/IRuBwvfHMS): No such file or directory > perl 21977 juanfc 95r VREG stat(/ > private/tmp/kxJdU1yPB9): No such file or directory > perl 21977 juanfc 96r VREG stat(/ > private/tmp/m1C5ffIUVF): No such file or directory > perl 21977 juanfc 97r VREG stat(/ > private/tmp/gcVcrpYtoT): No such file or directory > perl 21977 juanfc 98r VREG stat(/ > private/tmp/NY19yFrBZs): No such file or directory > perl 21977 juanfc 99r VREG stat(/ > private/tmp/mzFibP1FqR): No such file or directory > perl 21977 juanfc 100r VREG stat(/ > private/tmp/oUZRAW6h3l): No such file or directory > perl 21977 juanfc 101r VREG stat(/ > private/tmp/3zErWi27lP): No such file or directory > perl 21977 juanfc 102r VREG stat(/ > private/tmp/5wrMPduqip): No such file or directory > perl 21977 juanfc 103r VREG stat(/ > private/tmp/O5rhMCu28A): No such file or directory > > > > Please, help. > > > -- > Juan Falgueras > Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n > Universidad de M?laga > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From osborne1 at optonline.net Sat Sep 9 09:45:06 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 09 Sep 2006 09:45:06 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: Message-ID: Juan, I recall a bug like this was fixed a while back - what version Bioperl are you using? By the way, always give version numbers when reporting a bug, the answer "already fixed" is very common. Brian O. On 9/9/06 4:31 AM, "ende" wrote: > > Processing a fasta file with about 500 dna seqs.. my MacOSX (that > has the max number of opened files up to 512) crashes... You need to > divide the problem in pieces or (in bash shell, with ulimit -n 1024) > augment that max number of opened files. > > This has no sense for me since my perl program nor leave any open > file without its corresponding closing. On the other side, the > problem arises when the number of dnas grows _in one file_. > > In the code I run blast (StandAloneBlast... $blastMachine->blastall) > for each seq. > > > Then sniffing int the perl program stopped perl program I confirmed > my suspects. BioPerl (StandAloneBlast) does not closes temporary > opened files. Those files seems to be created to save seqs for to be > then processed by blastall program... The output of lsof indicates > (as MacOSX System Monitor) that those files are left opened but not > there (!?) > > The output of lsof +p pidofperlprogram > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ > juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 > 19280 1589055 /usr/bin/perl > perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle > perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/Fcntl.bundle > perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/POSIX.bundle > perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/Glob.bundle > perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/dyld > perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ > libSystem.B.dylib > perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ > Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib > perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ttyp2 > perl 21977 juanfc 4r VREG stat(/ > private/tmp/FJPRR4sVob): No such file or directory > perl 21977 juanfc 5r VREG stat(/ > private/tmp/MYkH56SwfZ): No such file or directory > perl 21977 juanfc 6r VREG stat(/ > private/tmp/6mgSei6w73): No such file or directory > perl 21977 juanfc 7r VREG stat(/ > private/tmp/VYBpVSlg59): No such file or directory > perl 21977 juanfc 8r VREG stat(/ > private/tmp/cfMJ74N25D): No such file or directory > perl 21977 juanfc 9r VREG stat(/ > private/tmp/M6fgaPCIXa): No such file or directory > perl 21977 juanfc 10r VREG stat(/ > private/tmp/tMap0OjaUP): No such file or directory > perl 21977 juanfc 11r VREG stat(/ > private/tmp/W87vtYjsCt): No such file or directory > perl 21977 juanfc 12r VREG stat(/ > private/tmp/iHWH3bCMPC): No such file or directory > perl 21977 juanfc 13r VREG stat(/ > private/tmp/IxcDMdaDyv): No such file or directory > perl 21977 juanfc 14r VREG stat(/ > private/tmp/iKBhFmI6VR): No such file or directory > perl 21977 juanfc 15r VREG stat(/ > private/tmp/8JnbRBsiTS): No such file or directory > perl 21977 juanfc 16r VREG stat(/ > private/tmp/hZWlCYbOsY): No such file or directory > perl 21977 juanfc 17r VREG stat(/ > private/tmp/8p1Vp3jmkV): No such file or directory > perl 21977 juanfc 18r VREG stat(/ > private/tmp/tClGgoph8f): No such file or directory > perl 21977 juanfc 19r VREG stat(/ > private/tmp/60BSipkClP): No such file or directory > perl 21977 juanfc 20r VREG stat(/ > private/tmp/aWZeJiVHvV): No such file or directory > perl 21977 juanfc 21r VREG stat(/ > private/tmp/87xqhVYjV1): No such file or directory > perl 21977 juanfc 22r VREG stat(/ > private/tmp/06m1ko4XjK): No such file or directory > perl 21977 juanfc 23r VREG stat(/ > private/tmp/s7P3AA6F9o): No such file or directory > perl 21977 juanfc 24r VREG stat(/ > private/tmp/dJt3mpTS8m): No such file or directory > perl 21977 juanfc 25r VREG stat(/ > private/tmp/xqL5jguilE): No such file or directory > perl 21977 juanfc 26r VREG stat(/ > private/tmp/AFWKFmgBh2): No such file or directory > perl 21977 juanfc 27r VREG stat(/ > private/tmp/zWLMrdNlv9): No such file or directory > perl 21977 juanfc 28r VREG stat(/ > private/tmp/svOtPNyYIs): No such file or directory > perl 21977 juanfc 29r VREG stat(/ > private/tmp/WpVvGitW33): No such file or directory > perl 21977 juanfc 30r VREG stat(/ > private/tmp/pyoXSEsRvM): No such file or directory > perl 21977 juanfc 31r VREG stat(/ > private/tmp/JoBSaJNqf6): No such file or directory > perl 21977 juanfc 32r VREG stat(/ > private/tmp/Uw9ABNTqcN): No such file or directory > perl 21977 juanfc 33r VREG stat(/ > private/tmp/wG4la2L5if): No such file or directory > perl 21977 juanfc 34r VREG stat(/ > private/tmp/IHTz2BeDOv): No such file or directory > perl 21977 juanfc 35r VREG stat(/ > private/tmp/t9sXizoEim): No such file or directory > perl 21977 juanfc 36r VREG stat(/ > private/tmp/DJDliIFd30): No such file or directory > perl 21977 juanfc 37r VREG stat(/ > private/tmp/atGFL3GEa6): No such file or directory > perl 21977 juanfc 38r VREG stat(/ > private/tmp/PuwFdWLjw0): No such file or directory > perl 21977 juanfc 39r VREG stat(/ > private/tmp/XoYmuh7JeD): No such file or directory > perl 21977 juanfc 40r VREG stat(/ > private/tmp/tkXqKOoQ4L): No such file or directory > perl 21977 juanfc 41r VREG stat(/ > private/tmp/XyQTSlDWZK): No such file or directory > perl 21977 juanfc 42r VREG stat(/ > private/tmp/sIcuE8SxrK): No such file or directory > perl 21977 juanfc 43r VREG stat(/ > private/tmp/lyhfZf9XlK): No such file or directory > perl 21977 juanfc 44r VREG stat(/ > private/tmp/5XK8C6WySy): No such file or directory > perl 21977 juanfc 45r VREG stat(/ > private/tmp/i4SUNdxxJP): No such file or directory > perl 21977 juanfc 46r VREG stat(/ > private/tmp/sCT2jmC3Ti): No such file or directory > perl 21977 juanfc 47r VREG stat(/ > private/tmp/eW27HGMYEH): No such file or directory > perl 21977 juanfc 48r VREG stat(/ > private/tmp/iKDldazryA): No such file or directory > perl 21977 juanfc 49r VREG stat(/ > private/tmp/Fim4GJg3UY): No such file or directory > perl 21977 juanfc 50r VREG stat(/ > private/tmp/iJNWvR2ixG): No such file or directory > perl 21977 juanfc 51r VREG stat(/ > private/tmp/U9OA0HiQgm): No such file or directory > perl 21977 juanfc 52r VREG stat(/ > private/tmp/URFitXhlku): No such file or directory > perl 21977 juanfc 53r VREG stat(/ > private/tmp/IauwZ1ogs6): No such file or directory > perl 21977 juanfc 54r VREG stat(/ > private/tmp/S5pc159wKf): No such file or directory > perl 21977 juanfc 55r VREG stat(/ > private/tmp/YENXM1G78k): No such file or directory > perl 21977 juanfc 56r VREG stat(/ > private/tmp/uDruzEPBaM): No such file or directory > perl 21977 juanfc 57r VREG stat(/ > private/tmp/NmW0DL9xaQ): No such file or directory > perl 21977 juanfc 58r VREG stat(/ > private/tmp/nn5Mw2ZfGv): No such file or directory > perl 21977 juanfc 59r VREG stat(/ > private/tmp/Pw7byHTVUb): No such file or directory > perl 21977 juanfc 60r VREG stat(/ > private/tmp/2AXMWaFtrM): No such file or directory > perl 21977 juanfc 61r VREG stat(/ > private/tmp/EU1vNIqLF1): No such file or directory > perl 21977 juanfc 62r VREG stat(/ > private/tmp/uph6n3kGfL): No such file or directory > perl 21977 juanfc 63r VREG stat(/ > private/tmp/g5WLuOEcrG): No such file or directory > perl 21977 juanfc 64r VREG stat(/ > private/tmp/aSvXqPRW46): No such file or directory > perl 21977 juanfc 65r VREG stat(/ > private/tmp/aJfHCcsEKx): No such file or directory > perl 21977 juanfc 66r VREG stat(/ > private/tmp/L7vRyF2kOX): No such file or directory > perl 21977 juanfc 67r VREG stat(/ > private/tmp/gND8bhWI9B): No such file or directory > perl 21977 juanfc 68r VREG stat(/ > private/tmp/4hukSHCFDZ): No such file or directory > perl 21977 juanfc 69r VREG stat(/ > private/tmp/6oKfCqu4D0): No such file or directory > perl 21977 juanfc 70r VREG stat(/ > private/tmp/5W8jmOltQt): No such file or directory > perl 21977 juanfc 71r VREG stat(/ > private/tmp/zJdLHNfwNb): No such file or directory > perl 21977 juanfc 72r VREG stat(/ > private/tmp/MLuGvv77HZ): No such file or directory > perl 21977 juanfc 73r VREG stat(/ > private/tmp/1ThCRyzfTx): No such file or directory > perl 21977 juanfc 74r VREG stat(/ > private/tmp/pnlSoTXEhf): No such file or directory > perl 21977 juanfc 75r VREG stat(/ > private/tmp/BYK4UnHbHF): No such file or directory > perl 21977 juanfc 76r VREG stat(/ > private/tmp/Omp320pcne): No such file or directory > perl 21977 juanfc 77r VREG stat(/ > private/tmp/oWgmotebi5): No such file or directory > perl 21977 juanfc 78r VREG stat(/ > private/tmp/p5sAeD6P2S): No such file or directory > perl 21977 juanfc 79r VREG stat(/ > private/tmp/jSTlPlrVv6): No such file or directory > perl 21977 juanfc 80r VREG stat(/ > private/tmp/8r6Hcsd5lk): No such file or directory > perl 21977 juanfc 81r VREG stat(/ > private/tmp/zziCynavEX): No such file or directory > perl 21977 juanfc 82r VREG stat(/ > private/tmp/cFhq1SuNLW): No such file or directory > perl 21977 juanfc 83r VREG stat(/ > private/tmp/dUDKQ3ylKn): No such file or directory > perl 21977 juanfc 84r VREG stat(/ > private/tmp/gVzwuTxImT): No such file or directory > perl 21977 juanfc 85r VREG stat(/ > private/tmp/iVMuHYGpdD): No such file or directory > perl 21977 juanfc 86r VREG stat(/ > private/tmp/TYpIJ0XS5r): No such file or directory > perl 21977 juanfc 87r VREG stat(/ > private/tmp/Fm8AGy6o9J): No such file or directory > perl 21977 juanfc 88r VREG stat(/ > private/tmp/J4gj6DGGVI): No such file or directory > perl 21977 juanfc 89r VREG stat(/ > private/tmp/Q0dKLWOJwc): No such file or directory > perl 21977 juanfc 90r VREG stat(/ > private/tmp/N3D1bRmMyw): No such file or directory > perl 21977 juanfc 91r VREG stat(/ > private/tmp/UajHqmaUDM): No such file or directory > perl 21977 juanfc 92r VREG stat(/ > private/tmp/4JYJIScAdA): No such file or directory > perl 21977 juanfc 93r VREG stat(/ > private/tmp/f6EBKb9nw9): No such file or directory > perl 21977 juanfc 94r VREG stat(/ > private/tmp/IRuBwvfHMS): No such file or directory > perl 21977 juanfc 95r VREG stat(/ > private/tmp/kxJdU1yPB9): No such file or directory > perl 21977 juanfc 96r VREG stat(/ > private/tmp/m1C5ffIUVF): No such file or directory > perl 21977 juanfc 97r VREG stat(/ > private/tmp/gcVcrpYtoT): No such file or directory > perl 21977 juanfc 98r VREG stat(/ > private/tmp/NY19yFrBZs): No such file or directory > perl 21977 juanfc 99r VREG stat(/ > private/tmp/mzFibP1FqR): No such file or directory > perl 21977 juanfc 100r VREG stat(/ > private/tmp/oUZRAW6h3l): No such file or directory > perl 21977 juanfc 101r VREG stat(/ > private/tmp/3zErWi27lP): No such file or directory > perl 21977 juanfc 102r VREG stat(/ > private/tmp/5wrMPduqip): No such file or directory > perl 21977 juanfc 103r VREG stat(/ > private/tmp/O5rhMCu28A): No such file or directory > > > > Please, help. > > > -- > Juan Falgueras > Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n > Universidad de M?laga > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Sep 9 14:06:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 9 Sep 2006 13:06:47 -0500 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: Message-ID: <000001c6d43a$bb83b340$15327e82@pyrimidine> According to the mail list archives the tempfie issue with StandAloneBlast was supposed to be fixed by version 1.4. http://article.gmane.org/gmane.comp.lang.perl.bio.general/4677 The above post also has a possible fix (using '$factory->io->_io_cleanup()'). As has been previously mentioned by Brian and Hilmar, having example code, the OS, and the bioperl version would help in future posts. Chris > Processing a fasta file with about 500 dna seqs.. my MacOSX (that > has the max number of opened files up to 512) crashes... You need to > divide the problem in pieces or (in bash shell, with ulimit -n 1024) > augment that max number of opened files. > > This has no sense for me since my perl program nor leave any open > file without its corresponding closing. On the other side, the > problem arises when the number of dnas grows _in one file_. > > In the code I run blast (StandAloneBlast... $blastMachine->blastall) > for each seq. > > > Then sniffing int the perl program stopped perl program I confirmed > my suspects. BioPerl (StandAloneBlast) does not closes temporary > opened files. Those files seems to be created to save seqs for to be > then processed by blastall program... The output of lsof indicates > (as MacOSX System Monitor) that those files are left opened but not > there (!?) > > The output of lsof +p pidofperlprogram > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ > juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 .... > perl 21977 juanfc 102r VREG stat(/ > private/tmp/5wrMPduqip): No such file or directory > perl 21977 juanfc 103r VREG stat(/ > private/tmp/O5rhMCu28A): No such file or directory > > > > Please, help. > > > -- > Juan Falgueras > Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n > Universidad de M?laga Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Sat Sep 9 16:16:34 2006 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 10 Sep 2006 00:16:34 +0400 Subject: [Bioperl-l] how to select the best hsp? Message-ID: Hi all. How I can select the best hsp in each hit by bioperl parcing of blast result? Thank you in advance. -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From dr.hogart at gmail.com Sat Sep 9 16:22:29 2006 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 10 Sep 2006 00:22:29 +0400 Subject: [Bioperl-l] how to select the best hsp? Message-ID: Hi all. How I can select the best hsp in each hit by bioperl parsing of blast result? Thank you in advance. -- From maximilianh at gmail.com Sat Sep 9 17:03:05 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Sat, 9 Sep 2006 23:03:05 +0200 Subject: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? In-Reply-To: <001501c6c619$37167ae0$15327e82@pyrimidine> References: <001501c6c619$37167ae0$15327e82@pyrimidine> Message-ID: <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> (just found this old mail, sorry for the delay, reading too many mailing lists, how many hours do you guys spend on reading mailng lists per day???) Transfac versions 3.4 and 4.0 (If I remember well) had a much more open licence, at that time you were still allowed to download and distribute the file (you can still find these old versions on the net, e.g. http://biotech.embl-ebi.ac.uk:8400/sw/common/test/matrix.dat). I guess an older version could be used for the test cases in Bioperl. Another argument for supporting the transfac format in Bioperl is that it is the only de-facto standard format for matrices. Many pwm scanners and websites can parse it or at least supply a converter for Transfac into their own format. cheers, Max On 22/08/06, Chris Fields wrote: > > Hilmar, > > No, unfortunately no TRANSFAC or similar matrices. But there are a few > other similar resources out there that may provide matrices: > > http://molbiol-tools.ca/DNA_Motifs.htm > > This one allows you to create a matrix from input sequences: > > http://molbiol-tools.ca/Jie_Zheng/ > > > Chris > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Tuesday, August 22, 2006 1:04 PM > > To: Chris Fields > > Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? > > > > Good idea if that's feasible and not too difficult (or do they > > provide transfac format themselves?). > > > > -hilmar > > > > On Aug 22, 2006, at 1:20 PM, Chris Fields wrote: > > > > > .... > > >> I've sent an email to their support address (though this may take > > >> a long > > >> time to get a reply to, going on past experience). > > >> > > >> This is the full legal spiel they have: > > >> http://www.gene-regulation.com/pub/databases/transfac/doc/misc.html > > >> > > >> There's nothing about restrictions on using the data format, they > > >> haven't tried to shut down the TFBS:: modules, and it would be > > >> illegal > > >> for them to do so according to fair use in many countries, their home > > >> country of Germany especially. In short, the module itself would > > >> not be > > >> a problem. The only cause for concern is the test data, which is not > > >> possible without express permission. I've asked for permission so > > >> now we > > >> just wait. > > > > > > Based on that you could proceed. As long as the format itself isn't > > > restricted you could create 'foo' data for the time being for tests. > > > > > > You might use some of the data from George Church's E. coli work > > > converted > > > to TRANSFAC format and matrices (just reference it if you do); I > > > believe > > > this is public domain (the data has been published). Most of these > > > are in > > > the form of alignments only: > > > > > > http://arep.med.harvard.edu/ecoli_matrices/ > > > > > > Chris > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Maximilian Haeussler, CNRS/INRA Gif-sur-Yvette, France tel: +33 6 12 82 76 16 skype: maximilianhaeussler From osborne1 at optonline.net Sat Sep 9 17:46:30 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 09 Sep 2006 17:46:30 -0400 Subject: [Bioperl-l] how to select the best hsp? In-Reply-To: Message-ID: Sergei, Take a look at the SearchIO HOWTO. A list of HOWTOs: http://www.bioperl.org/wiki/HOWTOs Brian O. On 9/9/06 4:22 PM, "sergei ryazansky" wrote: > Hi all. > How I can select the best hsp in each hit by bioperl parsing of blast > result? > Thank you in advance. From cjfields at uiuc.edu Sat Sep 9 18:35:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 9 Sep 2006 17:35:25 -0500 Subject: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? In-Reply-To: <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> References: <001501c6c619$37167ae0$15327e82@pyrimidine> <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> Message-ID: That should work for tests at least. I don't know if the format has changed much (my guess is no). Sendu seemed to be the one who was most interested taking this on. Sendu? Chris On Sep 9, 2006, at 4:03 PM, Maximilian Haeussler wrote: > (just found this old mail, sorry for the delay, reading too many > mailing > lists, how many hours do you guys spend on reading mailng lists per > day???) > > Transfac versions 3.4 and 4.0 (If I remember well) had a much more > open > licence, at that time you were still allowed to download and > distribute the > file (you can still find these old versions on the net, e.g. > http://biotech.embl-ebi.ac.uk:8400/sw/common/test/matrix.dat). > I guess an older version could be used for the test cases in Bioperl. > > Another argument for supporting the transfac format in Bioperl is > that it is > the only de-facto standard format for matrices. Many pwm scanners and > websites can parse it or at least supply a converter for Transfac > into their > own format. > > cheers, > Max > > > On 22/08/06, Chris Fields wrote: >> >> Hilmar, >> >> No, unfortunately no TRANSFAC or similar matrices. But there are >> a few >> other similar resources out there that may provide matrices: >> >> http://molbiol-tools.ca/DNA_Motifs.htm >> >> This one allows you to create a matrix from input sequences: >> >> http://molbiol-tools.ca/Jie_Zheng/ >> >> >> Chris >> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Tuesday, August 22, 2006 1:04 PM >>> To: Chris Fields >>> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? >>> >>> Good idea if that's feasible and not too difficult (or do they >>> provide transfac format themselves?). >>> >>> -hilmar >>> >>> On Aug 22, 2006, at 1:20 PM, Chris Fields wrote: >>> >>>> .... >>>>> I've sent an email to their support address (though this may take >>>>> a long >>>>> time to get a reply to, going on past experience). >>>>> >>>>> This is the full legal spiel they have: >>>>> http://www.gene-regulation.com/pub/databases/transfac/doc/ >>>>> misc.html >>>>> >>>>> There's nothing about restrictions on using the data format, they >>>>> haven't tried to shut down the TFBS:: modules, and it would be >>>>> illegal >>>>> for them to do so according to fair use in many countries, >>>>> their home >>>>> country of Germany especially. In short, the module itself would >>>>> not be >>>>> a problem. The only cause for concern is the test data, which >>>>> is not >>>>> possible without express permission. I've asked for permission so >>>>> now we >>>>> just wait. >>>> >>>> Based on that you could proceed. As long as the format itself >>>> isn't >>>> restricted you could create 'foo' data for the time being for >>>> tests. >>>> >>>> You might use some of the data from George Church's E. coli work >>>> converted >>>> to TRANSFAC format and matrices (just reference it if you do); I >>>> believe >>>> this is public domain (the data has been published). Most of these >>>> are in >>>> the form of alignments only: >>>> >>>> http://arep.med.harvard.edu/ecoli_matrices/ >>>> >>>> Chris >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Maximilian Haeussler, > CNRS/INRA Gif-sur-Yvette, France > tel: +33 6 12 82 76 16 > skype: maximilianhaeussler > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Sat Sep 9 19:03:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 10 Sep 2006 00:03:16 +0100 Subject: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? In-Reply-To: References: <001501c6c619$37167ae0$15327e82@pyrimidine> <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> Message-ID: <45034834.6000503@sendu.me.uk> Chris Fields wrote: > That should work for tests at least. I don't know if the format has > changed much (my guess is no). > > Sendu seemed to be the one who was most interested taking this on. Sendu? Yes, I'm currently working on this. Well, I've finished really, but am doing related things now before I commit. Regarding tests, its not just the matrix.dat file that I'm working with. http://biotech.embl-ebi.ac.uk:8400/sw/common/test/ does have a few of the other files, but for the ones I've looked at the format has changed significantly enough that they're not really useful. I'm still waiting for a reply from their support, but worst case I can just make some files in the correct format but with gibberish data. From cjfields at uiuc.edu Sat Sep 9 19:29:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 9 Sep 2006 18:29:17 -0500 Subject: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? In-Reply-To: <45034834.6000503@sendu.me.uk> References: <001501c6c619$37167ae0$15327e82@pyrimidine> <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> <45034834.6000503@sendu.me.uk> Message-ID: <664002D5-425F-497D-8932-95DFC6D05666@uiuc.edu> One of the requests that may pop up is having your modules be backwards compatible with older TRANSFAC files, especially since the new ones are not publicly available. Don't know how easy that would be since, as you say, the formats have changed quite a bit... Chris On Sep 9, 2006, at 6:03 PM, Sendu Bala wrote: > Chris Fields wrote: >> That should work for tests at least. I don't know if the format has >> changed much (my guess is no). >> >> Sendu seemed to be the one who was most interested taking this >> on. Sendu? > > Yes, I'm currently working on this. Well, I've finished really, but am > doing related things now before I commit. Regarding tests, its not > just > the matrix.dat file that I'm working with. > http://biotech.embl-ebi.ac.uk:8400/sw/common/test/ does have a few of > the other files, but for the ones I've looked at the format has > changed > significantly enough that they're not really useful. > > I'm still waiting for a reply from their support, but worst case I can > just make some files in the correct format but with gibberish data. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mmacho at gmail.com Sat Sep 9 19:50:10 2006 From: mmacho at gmail.com (ende) Date: Sun, 10 Sep 2006 01:50:10 +0200 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: References: Message-ID: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> Thank all of you for your quick and precise answer!!! My bioperl version must be 1.20 :( I can read it in the file Bio::Perl.pm since perl -MCPAN -e shell and then i Bio::Perl does not inform me about any local version but says instead: trange distribution name [Bio:Perl] Module id = Bio::Perl CPAN_USERID BIRNEY (Ewan Birney ) CPAN_VERSION undef CPAN_FILE B/BI/BIRNEY/bioperl-1.4.tar.gz UPLOAD_DATE 2003-12-23 DSLIP_STATUS (,,,,) MANPAGE Bio::Perl - Functional access to BioPerl for people who don't know objects INST_FILE /Library/Perl/5.8.6/Bio/Perl.pm INST_VERSION undef I read 1.20 at top of /Library/5.8.6/Bio/Perl.pm file! After many (many) attempts of installing Bio::Perl via -MCPAN always it ends with the same message (of course using force!!): Failed 3/25 tests, 88.00% okay t/WABA.......................ok t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not installed. This means that Bio::DB::XEMBL module is not usable. Skipping tests. t/XEMBL_DB...................ok Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/BioFetch_DB.t 27 1 3.70% 8 t/EMBL_DB.t 15 3 20.00% 6 13-14 t/Ontology.t 255 65280 50 100 200.00% 1-50 t/TreeIO.t 41 1 2.44% 42 t/Variation_IO.t 25 3 12.00% 15 20 25 t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 121 subtests skipped. Failed 6/179 test scripts, 96.65% okay. 154/8273 subtests failed, 98.14% okay. make: *** [test_dynamic] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Failed during this command: BIRNEY/bioperl-1.4.tar.gz : make_test NO ..and I am using "force" (exactly:) cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz I have installed many other modules without no problems but this seems to reject me. Finally I have dowloaded the bioperl 1.5.1 .tar.gz and installed ignoring the many errors make test gave me. sudo make install and wow!!! the bioperl version changed and lsof +p pidofperl | grep stat | wc -l again grows during the execution until reach 250 (!) and crashes. It was also imposible from CPAN shell to obtain the local bioperl version (as was also impossible in a remote Linux installation, that also said) INST_VERSION undef. But now the Bio/Perl.pm file heads: # Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp # # BioPerl module for Bio::Perl (I am using 10.4.7 on the MacOSX) the code launch blast all for each seq inside a perl object (use of $blastMachine->io->_io_cleanup(); did not resolve the problem) > sub DoBlastSeq ($$$$) { > my ($self, $db, $seq, $outFileName) = @_; > my %params = ( > program => "blastn", > outfile => $outFileName, > database => "$self->{path_db}/$db", > # q => "-5", > G => 3, # si > E => 3, # si > F => "\"m D\"", # "mD", # si > e => 700, # si > Y => 1.75e12, # si > best => 1, > ); > if (!$params{outfile}) { > delete $params{outfile}; > } > my $outErrs = ($outFileName || "blastErrs.err"), > > > my $blastMachine = Bio::Tools::Run::StandAloneBlast->new(%params); > > open(OLDSTDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!"; > open(STDERR, ">", $outErrs ) or die "ERROR reopening STDERR: $!"; > > print "Running Blast with id: ", $seq->id(), "\n" if $DEBUG; > > my $blastResult = $blastMachine->blastall($seq); > > > close(STDERR); > open(STDERR, ">&", \*OLDSTDERR) or die "Can't dup OLDSTDERR: $!"; > unlink ($outErrs) if (-z $outErrs); > > return $blastResult; > } El 09/09/2006, a las 15:45, Brian Osborne escribi?: > Juan, > > I recall a bug like this was fixed a while back - what version > Bioperl are > you using? By the way, always give version numbers when reporting a > bug, the > answer "already fixed" is very common. > > Brian O. > > > On 9/9/06 4:31 AM, "ende" wrote: > >> >> Processing a fasta file with about 500 dna seqs.. my MacOSX (that >> has the max number of opened files up to 512) crashes... You need to >> divide the problem in pieces or (in bash shell, with ulimit -n 1024) >> augment that max number of opened files. >> >> This has no sense for me since my perl program nor leave any open >> file without its corresponding closing. On the other side, the >> problem arises when the number of dnas grows _in one file_. >> >> In the code I run blast (StandAloneBlast... $blastMachine->blastall) >> for each seq. >> >> >> Then sniffing int the perl program stopped perl program I confirmed >> my suspects. BioPerl (StandAloneBlast) does not closes temporary >> opened files. Those files seems to be created to save seqs for to be >> then processed by blastall program... The output of lsof indicates >> (as MacOSX System Monitor) that those files are left opened but not >> there (!?) >> >> The output of lsof +p pidofperlprogram >> >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ >> juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 >> 19280 1589055 /usr/bin/perl >> perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ >> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle >> perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ >> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/Fcntl.bundle >> perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ >> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/POSIX.bundle >> perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ >> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/ >> Glob.bundle >> perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/ >> dyld >> perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ >> libSystem.B.dylib >> perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ >> Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib >> perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >> perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >> perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >> perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >> perl 21977 juanfc 4r VREG stat(/ >> ... >> Please, help. >> >> >> -- >> Juan Falgueras >> Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n >> Universidad de M?laga >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ---- ende From hlapp at gmx.net Sat Sep 9 23:59:19 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Sep 2006 23:59:19 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> Message-ID: <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> Your code snippet leaves plenty of opportunities for temp files accumulating, depending on what you do with the report object. So long as you use the report object the associated result file will remain present and open. You will have to provide the calling code as well if you want someone to look at the problem. On Sep 9, 2006, at 7:50 PM, ende wrote: > > Thank all of you for your quick and precise answer!!! > > My bioperl version must be 1.20 :( > > I can read it in the file Bio::Perl.pm since > > perl -MCPAN -e shell > and then > i Bio::Perl > > does not inform me about any local version but says instead: > > trange distribution name [Bio:Perl] > Module id = Bio::Perl > CPAN_USERID BIRNEY (Ewan Birney ) > CPAN_VERSION undef > CPAN_FILE B/BI/BIRNEY/bioperl-1.4.tar.gz > UPLOAD_DATE 2003-12-23 > DSLIP_STATUS (,,,,) > MANPAGE Bio::Perl - Functional access to BioPerl for people > who don't know objects > INST_FILE /Library/Perl/5.8.6/Bio/Perl.pm > INST_VERSION undef > > I read 1.20 at top of /Library/5.8.6/Bio/Perl.pm file! > > After many (many) attempts of installing Bio::Perl via -MCPAN always > it ends with the same message (of course using force!!): > > > Failed 3/25 tests, 88.00% okay > t/WABA.......................ok > t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not > installed. This means that Bio::DB::XEMBL module is not usable. > Skipping tests. > t/XEMBL_DB...................ok > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > -- > ------- > t/BioFetch_DB.t 27 1 3.70% 8 > t/EMBL_DB.t 15 3 20.00% 6 13-14 > t/Ontology.t 255 65280 50 100 200.00% 1-50 > t/TreeIO.t 41 1 2.44% 42 > t/Variation_IO.t 25 3 12.00% 15 20 25 > t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 > 121 subtests skipped. > Failed 6/179 test scripts, 96.65% okay. 154/8273 subtests failed, > 98.14% okay. > make: *** [test_dynamic] Error 2 > /usr/bin/make test -- NOT OK > Running make install > make test had returned bad status, won't install without force > Failed during this command: > BIRNEY/bioperl-1.4.tar.gz : make_test NO > > > ..and I am using "force" (exactly:) > > cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz > > > I have installed many other modules without no problems but this > seems to reject me. Finally I have dowloaded the bioperl > 1.5.1 .tar.gz and installed ignoring the many errors make test gave > me. sudo make install and wow!!! the bioperl version changed and > > lsof +p pidofperl | grep stat | wc -l > > again grows during the execution until reach 250 (!) and crashes. > > It was also imposible from CPAN shell to obtain the local bioperl > version (as was also impossible in a remote Linux installation, that > also said) INST_VERSION undef. > > But now the Bio/Perl.pm file heads: > > # Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp > # > # BioPerl module for Bio::Perl > > > > > > (I am using 10.4.7 on the MacOSX) > > the code launch blast all for each seq inside a perl object (use of > $blastMachine->io->_io_cleanup(); did not resolve the problem) > >> sub DoBlastSeq ($$$$) { >> my ($self, $db, $seq, $outFileName) = @_; >> my %params = ( >> program => "blastn", >> outfile => $outFileName, >> database => "$self->{path_db}/$db", >> # q => "-5", >> G => 3, # si >> E => 3, # si >> F => "\"m D\"", # "mD", # si >> e => 700, # si >> Y => 1.75e12, # si >> best => 1, >> ); >> if (!$params{outfile}) { >> delete $params{outfile}; >> } >> my $outErrs = ($outFileName || "blastErrs.err"), >> >> >> my $blastMachine = Bio::Tools::Run::StandAloneBlast->new(% >> params); >> >> open(OLDSTDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!"; >> open(STDERR, ">", $outErrs ) or die "ERROR reopening STDERR: $!"; >> >> print "Running Blast with id: ", $seq->id(), "\n" if $DEBUG; >> >> my $blastResult = $blastMachine->blastall($seq); >> >> >> close(STDERR); >> open(STDERR, ">&", \*OLDSTDERR) or die "Can't dup OLDSTDERR: $!"; >> unlink ($outErrs) if (-z $outErrs); >> >> return $blastResult; >> } > > > > > > > > > > El 09/09/2006, a las 15:45, Brian Osborne escribi?: > >> Juan, >> >> I recall a bug like this was fixed a while back - what version >> Bioperl are >> you using? By the way, always give version numbers when reporting a >> bug, the >> answer "already fixed" is very common. >> >> Brian O. >> >> >> On 9/9/06 4:31 AM, "ende" wrote: >> >>> >>> Processing a fasta file with about 500 dna seqs.. my MacOSX (that >>> has the max number of opened files up to 512) crashes... You need to >>> divide the problem in pieces or (in bash shell, with ulimit -n 1024) >>> augment that max number of opened files. >>> >>> This has no sense for me since my perl program nor leave any open >>> file without its corresponding closing. On the other side, the >>> problem arises when the number of dnas grows _in one file_. >>> >>> In the code I run blast (StandAloneBlast... $blastMachine->blastall) >>> for each seq. >>> >>> >>> Then sniffing int the perl program stopped perl program I confirmed >>> my suspects. BioPerl (StandAloneBlast) does not closes temporary >>> opened files. Those files seems to be created to save seqs for >>> to be >>> then processed by blastall program... The output of lsof indicates >>> (as MacOSX System Monitor) that those files are left opened but not >>> there (!?) >>> >>> The output of lsof +p pidofperlprogram >>> >>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >>> perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ >>> juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 >>> 19280 1589055 /usr/bin/perl >>> perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ >>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle >>> perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ >>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/ >>> Fcntl.bundle >>> perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ >>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/ >>> POSIX.bundle >>> perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ >>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/ >>> Glob.bundle >>> perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/ >>> dyld >>> perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ >>> libSystem.B.dylib >>> perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ >>> Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib >>> perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >>> perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >>> perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >>> perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ttyp2 >>> perl 21977 juanfc 4r VREG stat(/ >>> ... >>> Please, help. >>> >>> >>> -- >>> Juan Falgueras >>> Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n >>> Universidad de M?laga >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > ---- ende > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Sep 10 00:21:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 9 Sep 2006 23:21:22 -0500 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> Message-ID: <5D8FACC3-5A30-4EF8-BDF4-FC279564A83A@uiuc.edu> On Sep 9, 2006, at 6:50 PM, ende wrote: > > Thank all of you for your quick and precise answer!!! > > My bioperl version must be 1.20 :( ... > CPAN_FILE B/BI/BIRNEY/bioperl-1.4.tar.gz Nope, Bioperl v 1.4. I think Hilmar's on the right track (about the report object remaining open). You could try upgrading to the newest developer release or bioperl- live(CVS), not on CPAN but on the Bioperl website: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Chris ... Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Sun Sep 10 07:15:58 2006 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 10 Sep 2006 15:15:58 +0400 Subject: [Bioperl-l] how to select the best hsp? References: Message-ID: already looked.. but i didn`t found something useful. most probably i am only passed the required information.. please help me to find it On Sun, 10 Sep 2006 01:46:30 +0400, Brian Osborne wrote: > Sergei, > > Take a look at the SearchIO HOWTO. A list of HOWTOs: > > http://www.bioperl.org/wiki/HOWTOs > > > Brian O. > > > On 9/9/06 4:22 PM, "sergei ryazansky" wrote: > >> Hi all. >> How I can select the best hsp in each hit by bioperl parsing of blast >> result? >> Thank you in advance. -- From mmacho at gmail.com Sun Sep 10 08:19:15 2006 From: mmacho at gmail.com (ende) Date: Sun, 10 Sep 2006 14:19:15 +0200 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> Message-ID: <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> El 10/09/2006, a las 5:59, Hilmar Lapp escribi?: > Your code snippet leaves plenty of opportunities for temp files > accumulating, depending on what you do with the report object. So > long as you use the report object the associated result file will > remain present and open. > It sounds possible.. I have thought it previously... but I returns a local copy of the result of $blastMachine->blastall(..); not the $blastMachine at all. How it could be possible the original object could get caught with that copy? > You will have to provide the calling code as well if you want > someone to look at the problem. > From the main object I call this DoBlast ######################## # # # ANALIZE EACH SEQ # # # ######################## # for (my $i=0; $i < $self->{inSeqArrTotal}; ++$i) { my $s = ${$self->{inSeqArr}}[$i]; my $ID = $s->id(); local *x = \$self->{SQ}{$ID}; # short for ref curr seq $x->{seq} = $s; # search for cloning vector seqs. in all the query seqs. # my $blastTemp = $self->DoBlastSeq($self-> {cloningVectorsFilename}, $s); and then use this $blastTemp and save it in another place of the object: $x->{blastcv} = $blastTemp; ... > On Sep 9, 2006, at 7:50 PM, ende wrote: > >> >> Thank all of you for your quick and precise answer!!! >> >> My bioperl version must be 1.20 :( >> >> I can read it in the file Bio::Perl.pm since >> >> perl -MCPAN -e shell >> and then >> i Bio::Perl >> >> does not inform me about any local version but says instead: >> >> trange distribution name [Bio:Perl] >> Module id = Bio::Perl >> CPAN_USERID BIRNEY (Ewan Birney ) >> CPAN_VERSION undef >> CPAN_FILE B/BI/BIRNEY/bioperl-1.4.tar.gz >> UPLOAD_DATE 2003-12-23 >> DSLIP_STATUS (,,,,) >> MANPAGE Bio::Perl - Functional access to BioPerl for people >> who don't know objects >> INST_FILE /Library/Perl/5.8.6/Bio/Perl.pm >> INST_VERSION undef >> >> I read 1.20 at top of /Library/5.8.6/Bio/Perl.pm file! >> >> After many (many) attempts of installing Bio::Perl via -MCPAN always >> it ends with the same message (of course using force!!): >> >> >> Failed 3/25 tests, 88.00% okay >> t/WABA.......................ok >> t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not >> installed. This means that Bio::DB::XEMBL module is not usable. >> Skipping tests. >> t/XEMBL_DB...................ok >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> t/BioFetch_DB.t 27 1 3.70% 8 >> t/EMBL_DB.t 15 3 20.00% 6 13-14 >> t/Ontology.t 255 65280 50 100 200.00% 1-50 >> t/TreeIO.t 41 1 2.44% 42 >> t/Variation_IO.t 25 3 12.00% 15 20 25 >> t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 >> 121 subtests skipped. >> Failed 6/179 test scripts, 96.65% okay. 154/8273 subtests failed, >> 98.14% okay. >> make: *** [test_dynamic] Error 2 >> /usr/bin/make test -- NOT OK >> Running make install >> make test had returned bad status, won't install without force >> Failed during this command: >> BIRNEY/bioperl-1.4.tar.gz : make_test NO >> >> >> ..and I am using "force" (exactly:) >> >> cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz >> >> >> I have installed many other modules without no problems but this >> seems to reject me. Finally I have dowloaded the bioperl >> 1.5.1 .tar.gz and installed ignoring the many errors make test gave >> me. sudo make install and wow!!! the bioperl version changed and >> >> lsof +p pidofperl | grep stat | wc -l >> >> again grows during the execution until reach 250 (!) and crashes. >> >> It was also imposible from CPAN shell to obtain the local bioperl >> version (as was also impossible in a remote Linux installation, that >> also said) INST_VERSION undef. >> >> But now the Bio/Perl.pm file heads: >> >> # Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp >> # >> # BioPerl module for Bio::Perl >> >> >> >> >> >> (I am using 10.4.7 on the MacOSX) >> >> the code launch blast all for each seq inside a perl object (use of >> $blastMachine->io->_io_cleanup(); did not resolve the problem) >> >>> sub DoBlastSeq ($$$$) { >>> my ($self, $db, $seq, $outFileName) = @_; >>> my %params = ( >>> program => "blastn", >>> outfile => $outFileName, >>> database => "$self->{path_db}/$db", >>> # q => "-5", >>> G => 3, # si >>> E => 3, # si >>> F => "\"m D\"", # "mD", # si >>> e => 700, # si >>> Y => 1.75e12, # si >>> best => 1, >>> ); >>> if (!$params{outfile}) { >>> delete $params{outfile}; >>> } >>> my $outErrs = ($outFileName || "blastErrs.err"), >>> >>> >>> my $blastMachine = Bio::Tools::Run::StandAloneBlast->new(% >>> params); >>> >>> open(OLDSTDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!"; >>> open(STDERR, ">", $outErrs ) or die "ERROR reopening STDERR: >>> $!"; >>> >>> print "Running Blast with id: ", $seq->id(), "\n" if $DEBUG; >>> >>> my $blastResult = $blastMachine->blastall($seq); >>> >>> >>> close(STDERR); >>> open(STDERR, ">&", \*OLDSTDERR) or die "Can't dup OLDSTDERR: >>> $!"; >>> unlink ($outErrs) if (-z $outErrs); >>> >>> return $blastResult; >>> } >> >> >> >> >> >> >> >> >> >> El 09/09/2006, a las 15:45, Brian Osborne escribi?: >> >>> Juan, >>> >>> I recall a bug like this was fixed a while back - what version >>> Bioperl are >>> you using? By the way, always give version numbers when reporting a >>> bug, the >>> answer "already fixed" is very common. >>> >>> Brian O. >>> >>> >>> On 9/9/06 4:31 AM, "ende" wrote: >>> >>>> >>>> Processing a fasta file with about 500 dna seqs.. my MacOSX (that >>>> has the max number of opened files up to 512) crashes... You >>>> need to >>>> divide the problem in pieces or (in bash shell, with ulimit -n >>>> 1024) >>>> augment that max number of opened files. >>>> >>>> This has no sense for me since my perl program nor leave any open >>>> file without its corresponding closing. On the other side, the >>>> problem arises when the number of dnas grows _in one file_. >>>> >>>> In the code I run blast (StandAloneBlast... $blastMachine- >>>> >blastall) >>>> for each seq. >>>> >>>> >>>> Then sniffing int the perl program stopped perl program I confirmed >>>> my suspects. BioPerl (StandAloneBlast) does not closes temporary >>>> opened files. Those files seems to be created to save seqs for >>>> to be >>>> then processed by blastall program... The output of lsof indicates >>>> (as MacOSX System Monitor) that those files are left opened but not >>>> there (!?) >>>> >>>> The output of lsof +p pidofperlprogram >>>> >>>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >>>> perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ >>>> juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 >>>> 19280 1589055 /usr/bin/perl >>>> perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ >>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ >>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/ >>>> Fcntl.bundle >>>> perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ >>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/ >>>> POSIX.bundle >>>> perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ >>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/ >>>> Glob.bundle >>>> perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/lib/ >>>> dyld >>>> perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/lib/ >>>> libSystem.B.dylib >>>> perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ >>>> Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib >>>> perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ >>>> ttyp2 >>>> perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ >>>> ttyp2 >>>> perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ >>>> ttyp2 >>>> perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ >>>> ttyp2 >>>> perl 21977 juanfc 4r VREG stat(/ >>>> ... >>>> Please, help. >>>> >>>> >>>> -- >>>> Juan Falgueras >>>> Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n >>>> Universidad de M?laga >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> ---- ende >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > ---- ende From hlapp at gmx.net Sun Sep 10 09:40:46 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 10 Sep 2006 09:40:46 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> Message-ID: On Sep 10, 2006, at 8:19 AM, ende wrote: > El 10/09/2006, a las 5:59, Hilmar Lapp escribi?: > >> Your code snippet leaves plenty of opportunities for temp files >> accumulating, depending on what you do with the report object. So >> long as you use the report object the associated result file will >> remain present and open. >> > > It sounds possible.. I have thought it previously... but I returns > a local copy of the result of $blastMachine->blastall(..); not the > $blastMachine at all. How it could be possible the original object > could get caught with that copy? Never assume inner workings to be as you think they ought to be. In this case it is not the original object but the file because it is parsed on-demand. I.e., by the time the function returns the file has not been fully parsed yet and therefore remains open. > >> You will have to provide the calling code as well if you want >> someone to look at the problem. >> > > > From the main object I call this DoBlast > > ######################## > # # > # ANALIZE EACH SEQ # > # # > ######################## > # > for (my $i=0; $i < $self->{inSeqArrTotal}; ++$i) { > my $s = ${$self->{inSeqArr}}[$i]; > my $ID = $s->id(); > local *x = \$self->{SQ}{$ID}; # short for ref curr seq > > > $x->{seq} = $s; > > > # search for cloning vector seqs. in all the query seqs. > # > my $blastTemp = $self->DoBlastSeq($self-> > {cloningVectorsFilename}, $s); > > > and then use this $blastTemp and save it in another place of the > object: > > $x->{blastcv} = $blastTemp; Well this is still not the complete story but sounds like you're storing all the report objects which will have the files remain open. -hilmar > > ... > > > >> On Sep 9, 2006, at 7:50 PM, ende wrote: >> >>> >>> Thank all of you for your quick and precise answer!!! >>> >>> My bioperl version must be 1.20 :( >>> >>> I can read it in the file Bio::Perl.pm since >>> >>> perl -MCPAN -e shell >>> and then >>> i Bio::Perl >>> >>> does not inform me about any local version but says instead: >>> >>> trange distribution name [Bio:Perl] >>> Module id = Bio::Perl >>> CPAN_USERID BIRNEY (Ewan Birney ) >>> CPAN_VERSION undef >>> CPAN_FILE B/BI/BIRNEY/bioperl-1.4.tar.gz >>> UPLOAD_DATE 2003-12-23 >>> DSLIP_STATUS (,,,,) >>> MANPAGE Bio::Perl - Functional access to BioPerl for >>> people >>> who don't know objects >>> INST_FILE /Library/Perl/5.8.6/Bio/Perl.pm >>> INST_VERSION undef >>> >>> I read 1.20 at top of /Library/5.8.6/Bio/Perl.pm file! >>> >>> After many (many) attempts of installing Bio::Perl via -MCPAN always >>> it ends with the same message (of course using force!!): >>> >>> >>> Failed 3/25 tests, 88.00% okay >>> t/WABA.......................ok >>> t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not >>> installed. This means that Bio::DB::XEMBL module is not usable. >>> Skipping tests. >>> t/XEMBL_DB...................ok >>> Failed Test Stat Wstat Total Fail Failed List of Failed >>> -------------------------------------------------------------------- >>> ---- >>> ------- >>> t/BioFetch_DB.t 27 1 3.70% 8 >>> t/EMBL_DB.t 15 3 20.00% 6 13-14 >>> t/Ontology.t 255 65280 50 100 200.00% 1-50 >>> t/TreeIO.t 41 1 2.44% 42 >>> t/Variation_IO.t 25 3 12.00% 15 20 25 >>> t/simpleGOparser.t 255 65280 98 196 200.00% 1-98 >>> 121 subtests skipped. >>> Failed 6/179 test scripts, 96.65% okay. 154/8273 subtests failed, >>> 98.14% okay. >>> make: *** [test_dynamic] Error 2 >>> /usr/bin/make test -- NOT OK >>> Running make install >>> make test had returned bad status, won't install without force >>> Failed during this command: >>> BIRNEY/bioperl-1.4.tar.gz : make_test NO >>> >>> >>> ..and I am using "force" (exactly:) >>> >>> cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz >>> >>> >>> I have installed many other modules without no problems but this >>> seems to reject me. Finally I have dowloaded the bioperl >>> 1.5.1 .tar.gz and installed ignoring the many errors make test gave >>> me. sudo make install and wow!!! the bioperl version changed and >>> >>> lsof +p pidofperl | grep stat | wc -l >>> >>> again grows during the execution until reach 250 (!) and crashes. >>> >>> It was also imposible from CPAN shell to obtain the local bioperl >>> version (as was also impossible in a remote Linux installation, that >>> also said) INST_VERSION undef. >>> >>> But now the Bio/Perl.pm file heads: >>> >>> # Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp >>> # >>> # BioPerl module for Bio::Perl >>> >>> >>> >>> >>> >>> (I am using 10.4.7 on the MacOSX) >>> >>> the code launch blast all for each seq inside a perl object (use of >>> $blastMachine->io->_io_cleanup(); did not resolve the problem) >>> >>>> sub DoBlastSeq ($$$$) { >>>> my ($self, $db, $seq, $outFileName) = @_; >>>> my %params = ( >>>> program => "blastn", >>>> outfile => $outFileName, >>>> database => "$self->{path_db}/$db", >>>> # q => "-5", >>>> G => 3, # si >>>> E => 3, # si >>>> F => "\"m D\"", # "mD", # si >>>> e => 700, # si >>>> Y => 1.75e12, # si >>>> best => 1, >>>> ); >>>> if (!$params{outfile}) { >>>> delete $params{outfile}; >>>> } >>>> my $outErrs = ($outFileName || "blastErrs.err"), >>>> >>>> >>>> my $blastMachine = Bio::Tools::Run::StandAloneBlast->new(% >>>> params); >>>> >>>> open(OLDSTDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!"; >>>> open(STDERR, ">", $outErrs ) or die "ERROR reopening STDERR: >>>> $!"; >>>> >>>> print "Running Blast with id: ", $seq->id(), "\n" if $DEBUG; >>>> >>>> my $blastResult = $blastMachine->blastall($seq); >>>> >>>> >>>> close(STDERR); >>>> open(STDERR, ">&", \*OLDSTDERR) or die "Can't dup OLDSTDERR: >>>> $!"; >>>> unlink ($outErrs) if (-z $outErrs); >>>> >>>> return $blastResult; >>>> } >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> El 09/09/2006, a las 15:45, Brian Osborne escribi?: >>> >>>> Juan, >>>> >>>> I recall a bug like this was fixed a while back - what version >>>> Bioperl are >>>> you using? By the way, always give version numbers when reporting a >>>> bug, the >>>> answer "already fixed" is very common. >>>> >>>> Brian O. >>>> >>>> >>>> On 9/9/06 4:31 AM, "ende" wrote: >>>> >>>>> >>>>> Processing a fasta file with about 500 dna seqs.. my MacOSX (that >>>>> has the max number of opened files up to 512) crashes... You >>>>> need to >>>>> divide the problem in pieces or (in bash shell, with ulimit -n >>>>> 1024) >>>>> augment that max number of opened files. >>>>> >>>>> This has no sense for me since my perl program nor leave any open >>>>> file without its corresponding closing. On the other side, the >>>>> problem arises when the number of dnas grows _in one file_. >>>>> >>>>> In the code I run blast (StandAloneBlast... $blastMachine- >>>>> >blastall) >>>>> for each seq. >>>>> >>>>> >>>>> Then sniffing int the perl program stopped perl program I >>>>> confirmed >>>>> my suspects. BioPerl (StandAloneBlast) does not closes temporary >>>>> opened files. Those files seems to be created to save seqs for >>>>> to be >>>>> then processed by blastall program... The output of lsof >>>>> indicates >>>>> (as MacOSX System Monitor) that those files are left opened but >>>>> not >>>>> there (!?) >>>>> >>>>> The output of lsof +p pidofperlprogram >>>>> >>>>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >>>>> perl 21977 juanfc cwd VDIR 14,7 238 6835993 /Users/ >>>>> juanfc/Documents/programperl 21977 juanfc txt VREG 14,7 >>>>> 19280 1589055 /usr/bin/perl >>>>> perl 21977 juanfc txt VREG 14,7 23476 1580272 /System/ >>>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> perl 21977 juanfc txt VREG 14,7 17772 1580263 /System/ >>>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/Fcntl/ >>>>> Fcntl.bundle >>>>> perl 21977 juanfc txt VREG 14,7 114116 1580381 /System/ >>>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/POSIX/ >>>>> POSIX.bundle >>>>> perl 21977 juanfc txt VREG 14,7 23684 1580265 /System/ >>>>> Library/Perl/5.8.6/darwin-thread-multi-2level/auto/File/Glob/ >>>>> Glob.bundle >>>>> perl 21977 juanfc txt VREG 14,7 1797788 6275687 /usr/ >>>>> lib/ >>>>> dyld >>>>> perl 21977 juanfc txt VREG 14,7 4379472 6276030 /usr/ >>>>> lib/ >>>>> libSystem.B.dylib >>>>> perl 21977 juanfc txt VREG 14,7 1086420 6276221 /System/ >>>>> Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/libperl.dylib >>>>> perl 21977 juanfc 0u VCHR 4,2 0t3748 63113092 /dev/ >>>>> ttyp2 >>>>> perl 21977 juanfc 1u VCHR 4,2 0t3748 63113092 /dev/ >>>>> ttyp2 >>>>> perl 21977 juanfc 2u VCHR 4,2 0t3748 63113092 /dev/ >>>>> ttyp2 >>>>> perl 21977 juanfc 3u VCHR 4,2 0t3748 63113092 /dev/ >>>>> ttyp2 >>>>> perl 21977 juanfc 4r VREG stat(/ >>>>> ... >>>>> Please, help. >>>>> >>>>> >>>>> -- >>>>> Juan Falgueras >>>>> Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n >>>>> Universidad de M?laga >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> ---- ende >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > > > > ---- ende > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mmacho at gmail.com Sun Sep 10 12:33:14 2006 From: mmacho at gmail.com (ende) Date: Sun, 10 Sep 2006 18:33:14 +0200 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> Message-ID: <1FBFE48B-A862-4C53-9047-0058D6FD7C9F@gmail.com> El 10/09/2006, a las 15:40, Hilmar Lapp escribi?: >> $x->{blastcv} = $blastTemp; > > Well this is still not the complete story but sounds like you're > storing all the report objects which will have the files remain open. Thanks a lot!!! Hilmar. You have shown me the way!! I only needed to not to save each $blastTemp! result. I remain a bit confused but it seems as you say that things don't work as they should. Thanks, thank you very much! -- "ende" Juan Falgueras Profesor del Depto. de Lenguajes y Ciencias de la Computaci?n Universidad de M?laga From hlapp at gmx.net Sun Sep 10 12:40:36 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 10 Sep 2006 12:40:36 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <1FBFE48B-A862-4C53-9047-0058D6FD7C9F@gmail.com> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> <1FBFE48B-A862-4C53-9047-0058D6FD7C9F@gmail.com> Message-ID: <1CF309A8-42B0-4839-A52E-BCEF516D8EC3@gmx.net> On Sep 10, 2006, at 12:33 PM, ende wrote: > I remain a bit confused but it seems as you say that things don't > work as they should. I wasn't saying that they don't work as they should - I was only saying that they don't necessarily work as *you* think they should. There were most certainly reasons behind the implementation that may not concern you but do concern others. Bioperl needs to serve a variety of requirements and won't be optimized (but should still work!) for everybody's application. Great to hear that you got your issue fixed though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lzhtom at hotmail.com Sun Sep 10 22:00:00 2006 From: lzhtom at hotmail.com (zhihua li) Date: Mon, 11 Sep 2006 02:00:00 +0000 Subject: [Bioperl-l] ensembl perl API - very slow retrival of data? Message-ID: hi netters, has anyone had any experience in using ensembl perl API (based on bioperl) to retrieve and analyse data from ensembl? i wanted to retrieve all the genes from ensembl core database. to do this i used a slice adaptor: $db=new Bio::EnsEMBL::DBSQL::DBAdaptor (...); my $slice_adaptor = $db->get_SliceAdaptor(); my @slices = @{$slice_adaptor->fetch_all('chromosome')}; foreach my $slice (@slices){ my @genes=@{$slice->get_all_Genes}; do something...... } it took several hours for the script to get all the genes from ensembl. if i'd used the website of BioMart and had the same task done, it'd be just a matter of minutes. So is there a better way of coding? or ensembl modules are just extremely slow? Thanks a lot! From taerwin at gmail.com Mon Sep 11 02:36:17 2006 From: taerwin at gmail.com (Tim Erwin) Date: Mon, 11 Sep 2006 16:36:17 +1000 Subject: [Bioperl-l] ensembl perl API - very slow retrival of data? In-Reply-To: References: Message-ID: Hi Zhihua Is this using the remote ensembl databases? I have done a similar thing using a local copy of the Arabidopsis ensembl and it only took about 30 seconds. This question would also be better on the ensembl-dev mailing list: http://www.ensembl.org/info/about/contact.html Regards, Tim On 9/11/06, zhihua li wrote: > > hi netters, > > has anyone had any experience in using ensembl perl API (based on bioperl) > to retrieve and analyse data from ensembl? i wanted to retrieve all the > genes from ensembl core database. to do this i used a slice adaptor: > > $db=new Bio::EnsEMBL::DBSQL::DBAdaptor (...); > my $slice_adaptor = $db->get_SliceAdaptor(); > my @slices = @{$slice_adaptor->fetch_all('chromosome')}; > foreach my $slice (@slices){ > my @genes=@{$slice->get_all_Genes}; > do something...... > } > > it took several hours for the script to get all the genes from > ensembl. if > i'd used the website of BioMart and had the same task done, it'd be just a > matter of minutes. So is there a better way of coding? or ensembl modules > are just extremely slow? > > Thanks a lot! > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From birney at ebi.ac.uk Mon Sep 11 04:55:31 2006 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 11 Sep 2006 09:55:31 +0100 Subject: [Bioperl-l] ensembl perl API - very slow retrival of data? In-Reply-To: References: Message-ID: <1E475B44-35A7-419B-A7F4-A3F5947817CC@ebi.ac.uk> On 11 Sep 2006, at 03:00, zhihua li wrote: > hi netters, > > has anyone had any experience in using ensembl perl API (based on > bioperl) to retrieve and analyse data from ensembl? i wanted to > retrieve all the genes from ensembl core database. to do this i > used a slice adaptor: > First off, this question is probably best asked on one of the ensembl lists, such as ensembl-dev at ebi.ac.uk or to the helpdesk at ensembl (helpdesk at ensembl.org). Just to answer it directly; the perl API is slower than BioMart - BioMart is explicitly a denormalised query-optimised system which aims to provide quick response, whereas the Perl API works against the normalised data and is also designed to handle both reads and writes (though of course you can't write to our databases). Therefore if you can solve a problem through the BioMart API I would use that. That said, this looks a bit slow; the public mysql server (ensembldb.ensembl.org) can get very loaded, and perhaps that was the problem. In addition, as the API does alot of lazy-loading, internet latency as well as throughput can be a problem - if your connection is not ideal then this could cause it. But I suspect the key reason is the "do something...." part. Depending on what you are doing the API might or might not be doing alot of lazy evaluation. Just for info - I regularly use the ensembldb.ensembl.org and Perl API remotely, and often something that is genome-wide might take 1 or 2 hours or so. I personnally find the ease of writing with the API fine for this length of time; as I said, if you can get the info from BioMart (not everything which is accessible in the API is accessible in BioMart) go for that. Your next point is probably to describe the "do something" and post to either ensembl-dev or helpdesk - please don't ask me directly as often I am completely max'd out and will drop the email :) > $db=new Bio::EnsEMBL::DBSQL::DBAdaptor (...); > my $slice_adaptor = $db->get_SliceAdaptor(); > my @slices = @{$slice_adaptor->fetch_all('chromosome')}; > foreach my $slice (@slices){ > my @genes=@{$slice->get_all_Genes}; > do something...... > } > > it took several hours for the script to get all the genes from > ensembl. if i'd used the website of BioMart and had the same task > done, it'd be just a matter of minutes. So is there a better way > of coding? or ensembl modules are just extremely slow? > > Thanks a lot! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmacho at gmail.com Mon Sep 11 07:46:27 2006 From: mmacho at gmail.com (ende) Date: Mon, 11 Sep 2006 13:46:27 +0200 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: <1CF309A8-42B0-4839-A52E-BCEF516D8EC3@gmx.net> References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> <1FBFE48B-A862-4C53-9047-0058D6FD7C9F@gmail.com> <1CF309A8-42B0-4839-A52E-BCEF516D8EC3@gmx.net> Message-ID: El 10/09/2006, a las 18:40, Hilmar Lapp escribi?: > On Sep 10, 2006, at 12:33 PM, ende wrote: > >> I remain a bit confused but it seems as you say that things don't >> work as they should. > > I wasn't saying that they don't work as they should - I was only > saying that they don't necessarily work as *you* think they should. > > There were most certainly reasons behind the implementation that > may not concern you but do concern others. Bioperl needs to serve a > variety of requirements and won't be optimized (but should still > work!) for everybody's application. but you must understand that is not obvious that a local variable made persistent ($blastResult) because you return it.... should keep reference to temporary files that are only built for blastall execution needs. All the data are send to doblast via memory vars and returned via memory vars... BioPerl ought to convert the content of the result to local memory vars and close and delete any /tmp/ temporary files before returning!! IMHO. > > Great to hear that you got your issue fixed though. you are very kindly. ---- ende From hlapp at gmx.net Mon Sep 11 08:10:34 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Sep 2006 08:10:34 -0400 Subject: [Bioperl-l] Blast temporary open files not closed In-Reply-To: References: <7B7DC7EA-72DC-4DD5-8EEE-A93BEE5BAB9D@gmail.com> <4FE2D9E8-F8B9-45F0-AE24-6650FD421102@gmx.net> <9D007453-72FC-4F1D-AAEF-9C979EE6EB0F@gmail.com> <1FBFE48B-A862-4C53-9047-0058D6FD7C9F@gmail.com> <1CF309A8-42B0-4839-A52E-BCEF516D8EC3@gmx.net> Message-ID: On Sep 11, 2006, at 7:46 AM, ende wrote: > BioPerl ought to convert the content of the result to local memory > vars and close and delete any /tmp/temporary files before returning!! Well that's what I meant by Bioperl needs to serve a variety of requirement. Your suggestion will probably work great on your machine and your number of sequences but will fail completely on a machine with limited memory and a query of 10,000 sequences. I agree with you that the fact that temporary files will accumulate along with storing the report objects should be documented (I haven't checked - maybe it even is but not conspicuous enough?). Also, I believe that at some point it was being discussed to have the option of reading everything into memory. I'm not sure where that went in terms of implementation. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dr.hogart at gmail.com Mon Sep 11 08:15:50 2006 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 11 Sep 2006 16:15:50 +0400 Subject: [Bioperl-l] how to select the best hsp? References: Message-ID: I did it, so i needn`t help any more. On Sun, 10 Sep 2006 15:15:58 +0400, sergei ryazansky wrote: > already looked.. but i didn`t found something useful. > most probably i am only passed the required information.. please help me > to find it > > On Sun, 10 Sep 2006 01:46:30 +0400, Brian Osborne > > wrote: > >> Sergei, >> >> Take a look at the SearchIO HOWTO. A list of HOWTOs: >> >> http://www.bioperl.org/wiki/HOWTOs >> >> >> Brian O. >> >> >> On 9/9/06 4:22 PM, "sergei ryazansky" wrote: >> >>> Hi all. >>> How I can select the best hsp in each hit by bioperl parsing of blast >>> result? >>> Thank you in advance. > > > -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From cjfields at uiuc.edu Mon Sep 11 13:21:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Sep 2006 12:21:50 -0500 Subject: [Bioperl-l] Test::Simple now in Bioperl Message-ID: <000001c6d5c6$c48e2a50$15327e82@pyrimidine> I have added Test::Simple to t/lib in bioperl-live for those who want to use Test::Simple or Test::More for tests. This will give us a little more flexibility, especially when we only want to run some tests but not others. Very likely everyone has it already installed (if you have perl v5.8 it is included with the core distribution), but this allows the few of those out there running various perl v.5.6 versions w/o Test::Simple to still run tests. Test.pm will also likely be moved to t/lib in the future; according to Michael Schwern (maintainer for Test::Simple) CPAN may index any modules in the test directory but will not index those in t/lib. I have also added new tests which use Test::More (EUtilities.t) and changed Location.t and LocationFactory.t over to using Test::More. I also may move other tests over gradually, especially those modules which rely on remote servers. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Mon Sep 11 13:40:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 11 Sep 2006 13:40:03 -0400 Subject: [Bioperl-l] Retrieving Gene Info from NCBI Message-ID: <007801c6d5c9$50bb41f0$2f01a8c0@GOLHARMOBILE1> I'm not sure this is possible but I'll ask anyway: NCBI contains the Genomic region information in the Gene database for every known gene. For instance, if you search NCBI for XM_547879, there is 1 entry in "Gene". Follow that entry and it takes you to LOC490757...follow that and it takes you to the Gene db entry. Under genomic regions, etc, it shows the exons, intron, and UTRs. How can I extract this information from Entrez Gene? Is it possible with Bioperl? From osborne1 at optonline.net Mon Sep 11 15:49:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 11 Sep 2006 15:49:22 -0400 Subject: [Bioperl-l] Retrieving Gene Info from NCBI In-Reply-To: <007801c6d5c9$50bb41f0$2f01a8c0@GOLHARMOBILE1> Message-ID: Ryan, I'm not completely sure I understand the question but I will try to answer. Yes, you can retrieve genes from Entrez Gene as objects in at least 3 ways. One is to use Bio::DB::EntrezGene, but the problem here is you need to know the "Gene id", you can't use something like "LOC490757". This is due to a limitation of the Entrez Gene API, I assume this limitation is still in effect. As you know there are files available at NCBI that map accessions and identifiers to Gene ids. Or, there's the Ensembl API. I'd expect that you could query this API with your accession successfully but I haven't used this API much except to know that it's quite powerful. Take a look at this FAQ question: http://www.bioperl.org/wiki/Getting_Genomic_Sequences Or, you can download Entrez Genes ASN file and use SeqIO. If you choose pure Bioperl over Ensembl you'll see that there's quite a bit of information in these Sequence objects from Entrez Gene, you need to do a bit of studying to find out where the desired data is. Brian O. On 9/11/06 1:40 PM, "Ryan Golhar" wrote: > I'm not sure this is possible but I'll ask anyway: > > NCBI contains the Genomic region information in the Gene database for > every known gene. For instance, if you search NCBI for XM_547879, there > is 1 entry in "Gene". Follow that entry and it takes you to > LOC490757...follow that and it takes you to the Gene db entry. Under > genomic regions, etc, it shows the exons, intron, and UTRs. How can I > extract this information from Entrez Gene? Is it possible with Bioperl? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Mon Sep 11 17:55:41 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 11 Sep 2006 17:55:41 -0400 Subject: [Bioperl-l] Bug 1672 - accessing Biosql using OBDA Message-ID: Hilmar, As mentioned, I?ve worked on this bug, #1672. You?d encouraged me to look at the adaptor modules in BioSQL/ as part of the solution but the module I wrote and added does not closely resemble the existing adaptors. It certainly uses some adaptors as well as BioQuery so it?s very much a ?standard? bioperl-db module. However I was not sure I wanted to call it something like OBDAAdaptor, instead I?ve just called it Bio::DB::BioSQL::OBDA. Tell me if I should rename it or put it in some other directory, it?s certainly possible that there are subtleties in these module names that I?m failing to appreciate. Brian O. From jay at jays.net Mon Sep 11 21:41:14 2006 From: jay at jays.net (Jay Hannah) Date: Mon, 11 Sep 2006 20:41:14 -0500 Subject: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack? Message-ID: I was assigned the task of wading through 4GB of GenBank data to find all of the sequences that mentioned "Staphylococcus epidermidis" anywhere in any annotation. Separate those hits out into smaller files. Easy enough. Something like this (stripped down): foreach my $source_file (@file_list) { my $outfile = Bio::SeqIO->new(-file => ">$dir/$source_file" . '_parsed', -format => 'GenBank'); my $infile = Bio::SeqIO->new(-file => "$dir/ $source_file", -format => 'GenBank'); SEQUENCE: while(my $seqobj = $infile->next_seq()) { my $annotations = $seqobj->annotation; foreach my $key ( $annotations->get_all_annotation_keys() ) { my @values = $annotations->get_Annotations($key); foreach my $value ( @values ) { if ($value->as_text =~ /$select_feature/i) { # $select_feature is 'Staph...' $outfile->write_seq($seqobj); next SEQUENCE; } } } } } Works great. The only problem is that it takes a long time to wade through all 4GB. I'm making Perl do a TON of work and throwing 99.999% of that work away. :) I noticed that I can discover the sweet spots in the 4GB a thousand times faster with a simple grep: grep -in "Staphylococcus epidermidis" *.seq From the filenames:linenumbers that grep discovers for me I believe I could "hop" BioPerl (Bio::SeqIO) into the "sweet spots" of the files and have BioPerl just serialize *those* into sequence objects. Has anyone ever thought of adding a "raw_file_grep" argument or something? Like this? my $infile = Bio::SeqIO->new( -file => "$dir/$source_file", -format => 'GenBank', -raw_file_grep => 'Staphylococcus epidermidis' ); or perhaps implement it inside Bio::SeqIO::next_seq()? my $seqobj = $infile->next_seq( raw_file_grep => 'Staphylococcus epidermidis' ); (This would be a blind string-based hack for pure speed. Any string match anywhere -- no context intelligence whatsoever.) Or am I just crazy and if I'm going to do a hack like this I should just write a stand-alone file filter outside of BioPerl? Or has someone already optimized this somewhere and I just couldn't find it? Thanks, j P.S. If you're keeping score, yes I will be circling back to implement the ideas I started in previous threads... :) From hlapp at gmx.net Mon Sep 11 23:22:24 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Sep 2006 23:22:24 -0400 Subject: [Bioperl-l] Bug 1672 - accessing Biosql using OBDA In-Reply-To: References: Message-ID: Hi Brian, thanks, that's great! Your choice of name and locations sound good, OBDAAdaptor would (falsely) suggest that the module is the persistence adaptor for bioperl OBDA objects. -hilmar On Sep 11, 2006, at 5:55 PM, Brian Osborne wrote: > Hilmar, > > As mentioned, I?ve worked on this bug, #1672. You?d encouraged me > to look at > the adaptor modules in BioSQL/ as part of the solution but the > module I > wrote and added does not closely resemble the existing adaptors. It > certainly uses some adaptors as well as BioQuery so it?s very much a > ?standard? bioperl-db module. However I was not sure I wanted to > call it > something like OBDAAdaptor, instead I?ve just called it > Bio::DB::BioSQL::OBDA. Tell me if I should rename it or put it in > some other > directory, it?s certainly possible that there are subtleties in > these module > names that I?m failing to appreciate. > > Brian O. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From R.Birnie at leeds.ac.uk Tue Sep 12 06:37:52 2006 From: R.Birnie at leeds.ac.uk (Richard Birnie) Date: Tue, 12 Sep 2006 11:37:52 +0100 Subject: [Bioperl-l] drawing chromosome ideograms References: Message-ID: Hi all, Are their any tools in bioperl to generate chromosome ideograms for visualising CGH data? I did find an old thread (2003) in the mailing list describing someone attempting to do this in Gbrowse and the mention of a module called ideogram.pm. I had a look at Gbrowse and I'm not sure I need that level of complexity, I just need to produce a simple static PNG or JPG for reports/slides. Although if this is the only way then I'll give it a go. I couldn't find any further mention of ideogram.pm anywhere, is this part of GBrowse? I also found a mention of a glyph in Bio::Graphics for this type of thing but I can't find it again now. Essentially I'm looking for some pointers to what is possible in this area. If this can't be done in bioperl does anyone know of any other tools (preferably free) that can do this type of thing. regards, Richard Dr Richard Birnie Scientific Officer Section of Pathology and Tumour Biology Welcome Brenner Building, LIMM St James University Hospital Beckett St, Leeds, LS9 7TF Tel:0113 3438624 e-mail: r.birnie at leeds.ac.uk From sanges at biogem.it Tue Sep 12 07:18:25 2006 From: sanges at biogem.it (Remo Sanges) Date: Tue, 12 Sep 2006 13:18:25 +0200 Subject: [Bioperl-l] drawing chromosome ideograms In-Reply-To: References: Message-ID: <45069781.4010201@biogem.it> Have you tried karyoview at Ensembl? http://www.ensembl.org/Homo_sapiens/karyoview HTH Remo Richard Birnie wrote: >Hi all, > >Are their any tools in bioperl to generate chromosome ideograms for visualising CGH data? I did find an old thread (2003) in the mailing list describing someone attempting to do this in Gbrowse and the mention of a module called ideogram.pm. I had a look at Gbrowse and I'm not sure I need that level of complexity, I just need to produce a simple static PNG or JPG for reports/slides. Although if this is the only way then I'll give it a go. I couldn't find any further mention of ideogram.pm anywhere, is this part of GBrowse? I also found a mention of a glyph in Bio::Graphics for this type of thing but I can't find it again now. > >Essentially I'm looking for some pointers to what is possible in this area. If this can't be done in bioperl does anyone know of any other tools (preferably free) that can do this type of thing. > >regards, >Richard > >Dr Richard Birnie >Scientific Officer >Section of Pathology and Tumour Biology >Welcome Brenner Building, LIMM >St James University Hospital >Beckett St, Leeds, LS9 7TF >Tel:0113 3438624 >e-mail: r.birnie at leeds.ac.uk > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From R.Birnie at leeds.ac.uk Tue Sep 12 07:25:51 2006 From: R.Birnie at leeds.ac.uk (Richard Birnie) Date: Tue, 12 Sep 2006 12:25:51 +0100 Subject: [Bioperl-l] drawing chromosome ideograms References: <45069676.6030501@biogem.it> Message-ID: Thanks Remo, Looks like that will do the job nicely. Richard -----Original Message----- From: Remo Sanges [mailto:sanges at biogem.it] Sent: Tue 9/12/2006 12:13 To: Richard Birnie Subject: Re: [Bioperl-l] drawing chromosome ideograms Have you tried karyoview at Ensembl? http://www.ensembl.org/Homo_sapiens/karyoview HTH Remo Richard Birnie wrote: >Hi all, > >Are their any tools in bioperl to generate chromosome ideograms for visualising CGH data? I did find an old thread (2003) in the mailing list describing someone attempting to do this in Gbrowse and the mention of a module called ideogram.pm. I had a look at Gbrowse and I'm not sure I need that level of complexity, I just need to produce a simple static PNG or JPG for reports/slides. Although if this is the only way then I'll give it a go. I couldn't find any further mention of ideogram.pm anywhere, is this part of GBrowse? I also found a mention of a glyph in Bio::Graphics for this type of thing but I can't find it again now. > >Essentially I'm looking for some pointers to what is possible in this area. If this can't be done in bioperl does anyone know of any other tools (preferably free) that can do this type of thing. > >regards, >Richard > >Dr Richard Birnie >Scientific Officer >Section of Pathology and Tumour Biology >Welcome Brenner Building, LIMM >St James University Hospital >Beckett St, Leeds, LS9 7TF >Tel:0113 3438624 >e-mail: r.birnie at leeds.ac.uk > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From benoit at ebi.ac.uk Tue Sep 12 08:04:00 2006 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Tue, 12 Sep 2006 13:04:00 +0100 Subject: [Bioperl-l] drawing chromosome ideograms In-Reply-To: References: Message-ID: <4506A230.4090508@ebi.ac.uk> Richard, You can also have a look at Bioconductor - Some of the packages are specific to CGHarrays, and they are very nice. This is obviously of topic on this mailling list, but if you need to normalize, analyse, plot CGH data, I suggest these packages : cghMCR : Find chromosome regions showing common gains/losse aCGH : Classes and functions for Array Comparative Genomic Hybridization data snapCGH : Segmentation, normalisation and processing of aCGH data. Regards, Benoit Richard Birnie wrote: > Hi all, > > Are their any tools in bioperl to generate chromosome ideograms for visualising CGH data? I did find an old thread (2003) in the mailing list describing someone attempting to do this in Gbrowse and the mention of a module called ideogram.pm. I had a look at Gbrowse and I'm not sure I need that level of complexity, I just need to produce a simple static PNG or JPG for reports/slides. Although if this is the only way then I'll give it a go. I couldn't find any further mention of ideogram.pm anywhere, is this part of GBrowse? I also found a mention of a glyph in Bio::Graphics for this type of thing but I can't find it again now. > > Essentially I'm looking for some pointers to what is possible in this area. If this can't be done in bioperl does anyone know of any other tools (preferably free) that can do this type of thing. > > regards, > Richard > > Dr Richard Birnie > Scientific Officer > Section of Pathology and Tumour Biology > Welcome Brenner Building, LIMM > St James University Hospital > Beckett St, Leeds, LS9 7TF > Tel:0113 3438624 > e-mail: r.birnie at leeds.ac.uk http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Benoit Ballester Ensembl Team European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From fgarret at ub.edu Tue Sep 12 09:30:27 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Tue, 12 Sep 2006 15:30:27 +0200 Subject: [Bioperl-l] Help with Bio::DB::GFF Message-ID: <4506B673.9030305@ub.edu> Hi all, I'm trying to access to Flybase Dmel GFF files through the bioperl module. I've tried different approaches (below) but none seem to work. " use DBI; use Bio::DB::GFF; # Open the sequence database my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', -user => 'user', -pass => 'pass'); $db->initialize(1); $db->load_gff('dmel-4-r4.2.1.gff'); my @f; my $segment1 = $db->segment(-name => '4', -start => 200000, -end => 230000); my @features = $segment1->features('transcript', -automerge=>0); @f = $db->contained_features(-start => 200000, -stop => 230000); @f = $db->overlapping_features(-start => 200000, -stop => 230000); @f = $db->features(-start => 200000, -end => 230000); my $g = $db->segment(-start => 200000, $end => 230000); " What I need is to get all the features in a genome region. Can anyone help me? thanks in adv, FG From cjfields at uiuc.edu Tue Sep 12 11:30:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Sep 2006 10:30:23 -0500 Subject: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack? In-Reply-To: Message-ID: <000f01c6d680$60b2b400$15327e82@pyrimidine> ... > I noticed that I can discover the sweet spots in the 4GB a thousand > times faster with a simple grep: > > grep -in "Staphylococcus epidermidis" *.seq > > From the filenames:linenumbers that grep discovers for me I believe > I could "hop" BioPerl (Bio::SeqIO) into the "sweet spots" of the > files and have BioPerl just serialize *those* into sequence objects. > > Has anyone ever thought of adding a "raw_file_grep" argument or > something? Like this? > > my $infile = Bio::SeqIO->new( > -file => "$dir/$source_file", > -format => 'GenBank', > -raw_file_grep => 'Staphylococcus epidermidis' > ); > > or perhaps implement it inside Bio::SeqIO::next_seq()? > > my $seqobj = $infile->next_seq( > raw_file_grep => 'Staphylococcus epidermidis' > ); You have two problems with this approach: 1) It sounds like you want the system 'grep' instead of the perl built-in, which would have to be used here for cross-platform issues (Windows does not have grep). 2) I don't think SeqIO would be the best way to go; you should use SeqIO for getting sequences into objects, not searching for specific sequences. There may be a way to do this via flat databases, where you would index your local genbank file (though I can't vouch for how they will work on a 4 GB file). But I think a non-Bioperl approach is better. > (This would be a blind string-based hack for pure speed. Any string > match anywhere -- no context intelligence whatsoever.) > > Or am I just crazy and if I'm going to do a hack like this I should > just write a stand-alone file filter outside of BioPerl? That would probably be your best bet. If your individual sequence records aren't very large, you could iterate through the individual sequence records in the file by changing the line separator to gulp each record and use a plain ol' regex, like this (modified from a quickie script I use): #! perl use strict; use warnings; { local $/ = "//\n"; while (my $gb = <>) { print $gb if $gb =~ m/Staphylococcus\sepidermidis/im; } } You could probably squeeze that into a one-liner if needed; this one was from WinXP which has problems with using one-liners containing quotes. I use something like the above as a screen whose output is piped into a second script, which then runs everything through SeqIO via STDIN to do with what you want (print accession #, convert to FASTA, etc). You could also just save it as a file instead for later processing if the output data is too large. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From maximilianh at gmail.com Tue Sep 12 12:21:05 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Tue, 12 Sep 2006 18:21:05 +0200 Subject: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? In-Reply-To: <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> References: <001501c6c619$37167ae0$15327e82@pyrimidine> <76f031ae0609091403x17148915g5ca2857b9cbb1e74@mail.gmail.com> Message-ID: <76f031ae0609120921h67f00864xc447a9f26a7a336e@mail.gmail.com> FYI: I knew that there was an older version, finally found it on the emboss mailing list: http://newportal.open-bio.org/pipermail/emboss/2005-August/002178.html They refer to: ftp://ftp.ebi.ac.uk/pub/databases/transfac/ That is version 3.4 that I mentioned. There are quite a few matrices and sites in there. cheers, Max On 09/09/06, Maximilian Haeussler wrote: > (just found this old mail, sorry for the delay, reading too many mailing > lists, how many hours do you guys spend on reading mailng lists per day???) > > Transfac versions 3.4 and 4.0 (If I remember well) had a much more open > licence, at that time you were still allowed to download and distribute the > file (you can still find these old versions on the net, e.g. > http://biotech.embl-ebi.ac.uk:8400/sw/common/test/matrix.dat). > I guess an older version could be used for the test cases in Bioperl. > > Another argument for supporting the transfac format in Bioperl is that it is > the only de-facto standard format for matrices. Many pwm scanners and > websites can parse it or at least supply a converter for Transfac into their > own format. > > cheers, > Max > > > > On 22/08/06, Chris Fields wrote: > > Hilmar, > > > > No, unfortunately no TRANSFAC or similar matrices. But there are a few > > other similar resources out there that may provide matrices: > > > > http://molbiol-tools.ca/DNA_Motifs.htm > > > > This one allows you to create a matrix from input sequences: > > > > http://molbiol-tools.ca/Jie_Zheng/ > > > > > > Chris > > > > > -----Original Message----- > > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > > Sent: Tuesday, August 22, 2006 1:04 PM > > > To: Chris Fields > > > Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org > > > Subject: Re: [Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable? > > > > > > Good idea if that's feasible and not too difficult (or do they > > > provide transfac format themselves?). > > > > > > -hilmar > > > > > > On Aug 22, 2006, at 1:20 PM, Chris Fields wrote: > > > > > > > .... > > > >> I've sent an email to their support address (though this may take > > > >> a long > > > >> time to get a reply to, going on past experience). > > > >> > > > >> This is the full legal spiel they have: > > > >> > http://www.gene-regulation.com/pub/databases/transfac/doc/misc.html > > > >> > > > >> There's nothing about restrictions on using the data format, they > > > >> haven't tried to shut down the TFBS:: modules, and it would be > > > >> illegal > > > >> for them to do so according to fair use in many countries, their home > > > >> country of Germany especially. In short, the module itself would > > > >> not be > > > >> a problem. The only cause for concern is the test data, which is not > > > >> possible without express permission. I've asked for permission so > > > >> now we > > > >> just wait. > > > > > > > > Based on that you could proceed. As long as the format itself isn't > > > > restricted you could create 'foo' data for the time being for tests. > > > > > > > > You might use some of the data from George Church's E. coli work > > > > converted > > > > to TRANSFAC format and matrices (just reference it if you do); I > > > > believe > > > > this is public domain (the data has been published). Most of these > > > > are in > > > > the form of alignments only: > > > > > > > > http://arep.med.harvard.edu/ecoli_matrices/ > > > > > > > > Chris > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > > =========================================================== > > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > > > =========================================================== > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Maximilian Haeussler, > CNRS/INRA Gif-sur-Yvette, France > tel: +33 6 12 82 76 16 > skype: maximilianhaeussler -- Maximilian Haeussler, CNRS/INRA Gif-sur-Yvette, France tel: +33 6 12 82 76 16 skype: maximilianhaeussler From sdavis2 at mail.nih.gov Tue Sep 12 12:48:52 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 12 Sep 2006 12:48:52 -0400 Subject: [Bioperl-l] drawing chromosome ideograms In-Reply-To: <4506A230.4090508@ebi.ac.uk> References: <4506A230.4090508@ebi.ac.uk> Message-ID: <200609121248.52334.sdavis2@mail.nih.gov> On Tuesday 12 September 2006 08:04, Benoit Ballester wrote: > Richard, > > You can also have a look at Bioconductor - Some of the packages are > specific to CGHarrays, and they are very nice. This is obviously of > topic on this mailling list, but if you need to normalize, analyse, plot > CGH data, I suggest these packages : > > cghMCR : Find chromosome regions showing common gains/losse > aCGH : Classes and functions for Array Comparative Genomic Hybridization > data > snapCGH : Segmentation, normalisation and processing of aCGH data. I'll second this. Plotting is something that R/bioconductor does quite well and is EXTREMELY fast when dealing with numeric data. Sean From lincoln.stein at gmail.com Tue Sep 12 19:09:49 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 12 Sep 2006 19:09:49 -0400 Subject: [Bioperl-l] Help with Bio::DB::GFF In-Reply-To: <4506B673.9030305@ub.edu> References: <4506B673.9030305@ub.edu> Message-ID: <6dce9a0b0609121609s7dc58c34p7cccddd4ff8773c5@mail.gmail.com> The init() and load part should only be performed once. After that you should open the database read-only and do the analysis you need. What happens when you run this code? Lincoln On 9/12/06, Filipe Garrett wrote: > > Hi all, > > I'm trying to access to Flybase Dmel GFF files through the bioperl > module. I've tried different approaches (below) but none seem to work. > > > " > use DBI; > use Bio::DB::GFF; > > # Open the sequence database > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', > -user => 'user', > -pass => 'pass'); > > $db->initialize(1); > $db->load_gff('dmel-4-r4.2.1.gff'); > > > > my @f; > my $segment1 = $db->segment(-name => '4', -start => 200000, -end => > 230000); > my @features = $segment1->features('transcript', -automerge=>0); > > > > @f = $db->contained_features(-start => 200000, -stop => 230000); > @f = $db->overlapping_features(-start => 200000, -stop => 230000); > > > @f = $db->features(-start => 200000, -end => 230000); > > > my $g = $db->segment(-start => 200000, $end => 230000); > > " > > > What I need is to get all the features in a genome region. > Can anyone help me? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From n.haigh at sheffield.ac.uk Wed Sep 13 08:05:16 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed, 13 Sep 2006 13:05:16 +0100 Subject: [Bioperl-l] sequence object with no sequence data Message-ID: <4507F3FC.60709@sheffield.ac.uk> Just a quick one: Is it possible to create a sequence object with no actual sequences data? For a project I'm working on, I'd like to be able to generate such a sequence (i.e. to just have a start/end data) and have it behave like a real sequence e.g add features and can take slices of the object. Thanks Nath From lincoln.stein at gmail.com Wed Sep 13 07:55:10 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 13 Sep 2006 07:55:10 -0400 Subject: [Bioperl-l] Help with Bio::DB::GFF In-Reply-To: <4506B673.9030305@ub.edu> References: <4506B673.9030305@ub.edu> Message-ID: <6dce9a0b0609130455x2114b2e0r1167ad437bde3d77@mail.gmail.com> This is the correct way to do it: my $segment1 = $db->segment(-name => '4', -start => 200000, -end => 230000); > my @features = $segment1->features('transcript') or die "no features"; You should also check that you are getting a database handle in $db: my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', > -user => 'user', > -pass => 'pass') || die "database open failed"; > > > > Are you sure that there are features named "transcript" in that GFF file? I'm looking at release 4.3, and there are NO "transcript" features listed. Instead I see genes and mRNAs. Lincoln On 9/12/06, Filipe Garrett wrote: > > Hi all, > > I'm trying to access to Flybase Dmel GFF files through the bioperl > module. I've tried different approaches (below) but none seem to work. > > > " > use DBI; > use Bio::DB::GFF; > > # Open the sequence database > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', > -user => 'user', > -pass => 'pass'); > > $db->initialize(1); > $db->load_gff('dmel-4-r4.2.1.gff'); > > > > my @f; > my $segment1 = $db->segment(-name => '4', -start => 200000, -end => > 230000); > my @features = $segment1->features('transcript', -automerge=>0); > > > > @f = $db->contained_features(-start => 200000, -stop => 230000); > @f = $db->overlapping_features(-start => 200000, -stop => 230000); > > > @f = $db->features(-start => 200000, -end => 230000); > > > my $g = $db->segment(-start => 200000, $end => 230000); > > " > > > What I need is to get all the features in a genome region. > Can anyone help me? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed Sep 13 08:31:08 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 13 Sep 2006 08:31:08 -0400 Subject: [Bioperl-l] sequence object with no sequence data In-Reply-To: <4507F3FC.60709@sheffield.ac.uk> References: <4507F3FC.60709@sheffield.ac.uk> Message-ID: <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> Yes you can, just make sure you set the length. I'm not sure though what you mean by taking slices of it. -hilmar On Sep 13, 2006, at 8:05 AM, Nathan Haigh wrote: > Just a quick one: > Is it possible to create a sequence object with no actual sequences > data? > > For a project I'm working on, I'd like to be able to generate such a > sequence (i.e. to just have a start/end data) and have it behave > like a > real sequence e.g add features and can take slices of the object. > > Thanks > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gthorisson at gmail.com Tue Sep 12 16:46:24 2006 From: gthorisson at gmail.com (=?ISO-8859-1?Q?Gu=F0mundur_=C1rni_=DE=F3risson?=) Date: Tue, 12 Sep 2006 21:46:24 +0100 Subject: [Bioperl-l] drawing chromosome ideograms In-Reply-To: <4506A230.4090508@ebi.ac.uk> References: <4506A230.4090508@ebi.ac.uk> Message-ID: <2107B0C4-3EB7-4D18-9616-E4F8E30C2F19@gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Richard, ideogram.pm is part of GBrowse. I fact, I see it here in the GMOD CVS repository: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/lib/Bio/ Graphics/Glyph/ Sheldon McKay (mckays at cshl.edu, is probably on this list) at CSHL did some more work on this Bio::Graphics glyph after my initial implementation (for the HapMap installation at http://www.hapmap.org/ cgi-perl/gbrowse), fixed some color overflow bugs and added another ideogram-like glyph to GBrowse (heat map). Sheldon may be able to help you if you want to add some more functionality and/or create new similar glyphs yourself for displaying CGH data. Mummi, University of Leicester On Sep 12, 2006, at 1:04 PM, Benoit Ballester wrote: > Richard, > > You can also have a look at Bioconductor - Some of the packages are > specific to CGHarrays, and they are very nice. This is obviously of > topic on this mailling list, but if you need to normalize, analyse, > plot > CGH data, I suggest these packages : > > cghMCR : Find chromosome regions showing common gains/losse > aCGH : Classes and functions for Array Comparative Genomic > Hybridization > data > snapCGH : Segmentation, normalisation and processing of aCGH data. > > Regards, > > Benoit > > Richard Birnie wrote: >> Hi all, >> >> Are their any tools in bioperl to generate chromosome ideograms >> for visualising CGH data? I did find an old thread (2003) in the >> mailing list describing someone attempting to do this in Gbrowse >> and the mention of a module called ideogram.pm. I had a look at >> Gbrowse and I'm not sure I need that level of complexity, I just >> need to produce a simple static PNG or JPG for reports/slides. >> Although if this is the only way then I'll give it a go. I >> couldn't find any further mention of ideogram.pm anywhere, is this >> part of GBrowse? I also found a mention of a glyph in >> Bio::Graphics for this type of thing but I can't find it again now. >> >> Essentially I'm looking for some pointers to what is possible in >> this area. If this can't be done in bioperl does anyone know of >> any other tools (preferably free) that can do this type of thing. >> >> regards, >> Richard >> >> Dr Richard Birnie >> Scientific Officer >> Section of Pathology and Tumour Biology >> Welcome Brenner Building, LIMM >> St James University Hospital >> Beckett St, Leeds, LS9 7TF >> Tel:0113 3438624 >> e-mail: r.birnie at leeds.ac.uk > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Benoit Ballester > Ensembl Team > European Bioinformatics Institute (EMBL-EBI) > Wellcome Trust Genome Campus, Hinxton > Cambridge CB10 1SD, United Kingdom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (Darwin) iD8DBQFFBxyhDZKDfFK+q7wRAtQPAKCTiBs8bgKNu61UgZDBKksGoAIp/QCff+Iz D0sh7kBLZNeDyA1rHKs+W4k= =zVKB -----END PGP SIGNATURE----- From tniram at hotmail.com Wed Sep 13 06:50:08 2006 From: tniram at hotmail.com (=?iso-8859-1?B?QW50b25pbyBSYW1vcyBGZXJu4W5kZXo=?=) Date: Wed, 13 Sep 2006 12:50:08 +0200 Subject: [Bioperl-l] parsing protein accession numbers and types from >fasta headers Message-ID: I'd like to write a script to parse fasta headers of fasta-formatted protein databases and get protein accession numbers and identifiers (uniprot, IPI, gi, Refseq, ensembl...). The idea is building a simple local database that relates an accession number for protein sequence with all valid identifiers and the fasta files from where they weher obtained at my system, or checking, for instance, if an uniprot accession exists for a given gi. However, the structure of the fasta header is quite variable depending on the source. Any suggestions? _________________________________________________________________ Hor?scopo, tarot, numerolog?a... Escucha lo que te dicen los astros. http://astrocentro.msn.es/ From bernd.web at gmail.com Wed Sep 13 09:17:33 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 13 Sep 2006 15:17:33 +0200 Subject: [Bioperl-l] parsing protein accession numbers and types from >fasta headers In-Reply-To: References: Message-ID: <716af09c0609130617g48d7c8f6p78c023f498f0475d@mail.gmail.com> Hi I tried to parse this variabilty and get out the dbs. So first I read the DB type in $1 and then I got out the ID I needed for my purposes. Of course not *Bio*Perl, but it worked for me ;-) if ( m/>gi\|\d+\|(\w+)\|([^\|\s]*)\|(\S*)\s/ ) { my $name; #if ($1 eq 'pdb') { $name = $2.$3 } elsif ($1 eq 'sp' || $1 eq 'pir') { $name = $3 } else { $name = $2 } SWITCH: { if ($1 eq 'pdb') { $name = $2.$3; last SWITCH; } if ($1 eq 'sp' ) { $name = $3; last SWITCH; } if ($1 eq 'pir') { $name = $3; last SWITCH; } $name = $2; } bernd On 9/13/06, Antonio Ramos Fern?ndez wrote: > > I'd like to write a script to parse fasta headers of fasta-formatted protein > databases and get protein accession numbers and identifiers (uniprot, IPI, > gi, Refseq, ensembl...). The idea is building a simple local database that > relates an accession number for protein sequence with all valid identifiers > and the fasta files from where they weher obtained at my system, or > checking, for instance, if an uniprot accession exists for a given gi. > However, the structure of the fasta header is quite variable depending on > the source. Any suggestions? > > _________________________________________________________________ > Hor?scopo, tarot, numerolog?a... Escucha lo que te dicen los astros. > http://astrocentro.msn.es/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Wed Sep 13 07:54:53 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 13 Sep 2006 13:54:53 +0200 Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis Message-ID: <716af09c0609130454t4fd3955ej2c75e9c43f1c5ded@mail.gmail.com> Hi, I started to use Bio::Factory::EMBOSS and started with the example in the synopsis. Although now I managed to do what I wanted, it seems like the synopsis is not correct. See http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Factory/EMBOSS.html (and: http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Tools/Run/EMBOSSApplication.html) for the code. I cannot run "water". Using the latest CVS, (Emboss.pm v 1.7) and EMBOSS v 2.9.0, I get wrong output. In the commandline the reference to the ARRAY \@seqs_to_check appears. Output is: sh: -c: line 1: syntax error near unexpected token `(' sh: -c: line 1: `water -gapopen 10.0 -gapextend 0.5 -seqall ARRAY(0x8bdb154) -outfile out.water -sequencea aseq -auto' When I set $water->verbose(1); the output is: $VAR1 = { '-gapopen' => '10.0', '-gapextend' => '0.5', '-seqall' => [ 'CKRIHIGPGRAFWTTWC' ], '-outfile' => 'out.water', '-sequencea' => 'aseq' }; Input attr: gapopen => 10.0 Input attr: gapextend => 0.5 ------------- EXCEPTION ------------- MSG: Attribute [-seqall] not recognized! STACK Bio::Tools::Run::EMBOSSApplication::run /home/bwbrandt/perllib/Bio/Tools/Run/EMBOSSApplication.pm:204 STACK toplevel emboss_factory.pl:27 Code is as in the SYNOPSIS but with: my $seq_to_test = "aseq"; # this would have a seq here my @seqs_to_check; # this would be a list of seqs to compare # (could be just 1) $seqs_to_check[0] = "CKRIHIGPGRAFWTTWC"; Any suggestions what should be coded? Thanks, Bernd From n.haigh at sheffield.ac.uk Wed Sep 13 09:26:44 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed, 13 Sep 2006 14:26:44 +0100 Subject: [Bioperl-l] sequence object with no sequence data In-Reply-To: <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> References: <4507F3FC.60709@sheffield.ac.uk> <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> Message-ID: <45080714.8060107@sheffield.ac.uk> Hilmar Lapp wrote: > Yes you can, just make sure you set the length. I'm not sure though > what you mean by taking slices of it. > > -hilmar One more related quick question: Is it possible to have a sequence length measured in different units so that decimals could be used? In particular, I'm thinking of centimorgans as units and I'd also like to be able to draw these sequences using Bio::Graphics. Thanks Nath From hlapp at gmx.net Wed Sep 13 09:39:02 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 13 Sep 2006 09:39:02 -0400 Subject: [Bioperl-l] sequence object with no sequence data In-Reply-To: <45080714.8060107@sheffield.ac.uk> References: <4507F3FC.60709@sheffield.ac.uk> <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> <45080714.8060107@sheffield.ac.uk> Message-ID: Interesting. No, you can't - length and also feature positions are measured in basepairs. Having said that, I'm not sure off the top of my head at which places and whether at all it is checked that the positions are actually integers. So, so long as you make sure that you never ask the system for translating to basepair coordinates, providing cM could be worth a try if you're in an adventurous mood. I.e., make the length long enough that your highest cM number is still lower, and never ask for subsequences. Also, the system may check that for features the start is greater or equal to 1, so you may have to translate the coordinates by adding 1 if some of your markers are at <1cM. -hilmar On Sep 13, 2006, at 9:26 AM, Nathan Haigh wrote: > Hilmar Lapp wrote: >> Yes you can, just make sure you set the length. I'm not sure though >> what you mean by taking slices of it. >> >> -hilmar > One more related quick question: > > Is it possible to have a sequence length measured in different > units so > that decimals could be used? In particular, I'm thinking of > centimorgans > as units and I'd also like to be able to draw these sequences using > Bio::Graphics. > > Thanks > Nath > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Sep 13 10:36:33 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Sep 2006 09:36:33 -0500 Subject: [Bioperl-l] parsing protein accession numbers and types from>fasta headers In-Reply-To: <716af09c0609130617g48d7c8f6p78c023f498f0475d@mail.gmail.com> Message-ID: <003401c6d742$02e6a080$15327e82@pyrimidine> I agree that the non-BioPerl way is probably best, though you can look at the Flat Database HOWTO for a fast Bioperl-ish way to index a FASTA file, get the IDs, set primary and secondary accessions, retrieve sequences, etc. http://www.bioperl.org/wiki/HOWTO:Flat_databases Bio::DB::Fasta is also a flat-db interface for accessing large FASTA databases which users seem to like. It's now capable of handling files > 4GB. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Wednesday, September 13, 2006 8:18 AM > To: Antonio Ramos Fern?ndez > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] parsing protein accession numbers and types > from>fasta headers > > Hi > > I tried to parse this variabilty and get out the dbs. So first I read > the DB type in $1 and then I got out the ID I needed for my purposes. > Of course not *Bio*Perl, but it worked for me ;-) > > if ( m/>gi\|\d+\|(\w+)\|([^\|\s]*)\|(\S*)\s/ ) { > my $name; > #if ($1 eq 'pdb') { $name = $2.$3 } elsif ($1 eq 'sp' || $1 eq > 'pir') > { $name = $3 } else { $name = $2 } > SWITCH: { > if ($1 eq 'pdb') { $name = $2.$3; last SWITCH; } > if ($1 eq 'sp' ) { $name = $3; last SWITCH; } > if ($1 eq 'pir') { $name = $3; last SWITCH; } > $name = $2; > } > > bernd > > > On 9/13/06, Antonio Ramos Fern?ndez wrote: > > > > I'd like to write a script to parse fasta headers of fasta-formatted > protein > > databases and get protein accession numbers and identifiers (uniprot, > IPI, > > gi, Refseq, ensembl...). The idea is building a simple local database > that > > relates an accession number for protein sequence with all valid > identifiers > > and the fasta files from where they weher obtained at my system, or > > checking, for instance, if an uniprot accession exists for a given gi. > > However, the structure of the fasta header is quite variable depending > on > > the source. Any suggestions? > > > > _________________________________________________________________ > > Hor?scopo, tarot, numerolog?a... Escucha lo que te dicen los astros. > > http://astrocentro.msn.es/ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Wed Sep 13 10:26:25 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Wed, 13 Sep 2006 15:26:25 +0100 Subject: [Bioperl-l] sequence object with no sequence data In-Reply-To: References: <4507F3FC.60709@sheffield.ac.uk> <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> <45080714.8060107@sheffield.ac.uk> Message-ID: <45081511.508@colibase.bham.ac.uk> Hi Nath. Just a couple more suggestions to add to Hilmar's advice. If you mean what I think you do by a "slice", then you may want to look at Bio::SeqUtils->trunc_with_features I don't think it will work correctly without any sequence data, maybe you can try creating a sequence consisting of just Ns, by calling something like $seq->seq('N'x$seq->length) As for working with centiMorgan coordinates, couldn't you just multiply all your values by a fixed amount (100, say) to give you integer coordinates? It might even be possible to hack Bio::Graphics, to convert these pseudo-coordinates back into their original format if you want to draw a scale etc. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk > Interesting. No, you can't - length and also feature positions are > measured in basepairs. > > Having said that, I'm not sure off the top of my head at which places > and whether at all it is checked that the positions are actually > integers. So, so long as you make sure that you never ask the system > for translating to basepair coordinates, providing cM could be worth > a try if you're in an adventurous mood. I.e., make the length long > enough that your highest cM number is still lower, and never ask for > subsequences. Also, the system may check that for features the start > is greater or equal to 1, so you may have to translate the > coordinates by adding 1 if some of your markers are at <1cM. > > -hilmar > > On Sep 13, 2006, at 9:26 AM, Nathan Haigh wrote: > >> Hilmar Lapp wrote: >>> Yes you can, just make sure you set the length. I'm not sure though >>> what you mean by taking slices of it. >>> >>> -hilmar >> One more related quick question: >> >> Is it possible to have a sequence length measured in different >> units so >> that decimals could be used? In particular, I'm thinking of >> centimorgans >> as units and I'd also like to be able to draw these sequences using >> Bio::Graphics. >> >> Thanks >> Nath >> > From n.haigh at sheffield.ac.uk Wed Sep 13 12:43:00 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed, 13 Sep 2006 17:43:00 +0100 Subject: [Bioperl-l] sequence object with no sequence data In-Reply-To: <45081511.508@colibase.bham.ac.uk> References: <4507F3FC.60709@sheffield.ac.uk> <1B6C6F79-57CD-4D39-8102-8ACB3B81FEB4@gmx.net> <45080714.8060107@sheffield.ac.uk> <45081511.508@colibase.bham.ac.uk> Message-ID: <45083514.1050306@sheffield.ac.uk> Roy Chaudhuri wrote: > Hi Nath. > > Just a couple more suggestions to add to Hilmar's advice. If you mean > what I think you do by a "slice", then you may want to look at > Bio::SeqUtils->trunc_with_features > > I don't think it will work correctly without any sequence data, maybe > you can try creating a sequence consisting of just Ns, by calling > something like $seq->seq('N'x$seq->length) > > As for working with centiMorgan coordinates, couldn't you just > multiply all your values by a fixed amount (100, say) to give you > integer coordinates? It might even be possible to hack Bio::Graphics, > to convert these pseudo-coordinates back into their original format if > you want to draw a scale etc. > > Roy. > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk Hmm, another excellent idea. The possibility of hacking Bio::Graphics to display features is definitely once worth considering. Thanks! Nath From goshng at gmail.com Wed Sep 13 16:03:28 2006 From: goshng at gmail.com (Sang Chul Choi) Date: Wed, 13 Sep 2006 16:03:28 -0400 Subject: [Bioperl-l] A failure to fetch a sequence Message-ID: <33f36270609131303x5e03be95y45ac3f68bd9cb583@mail.gmail.com> Hi, I want to fetch a sequence and I tested the following simple code: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID print $seq->display_id, "\n"; print $seq->seq, "\n"; Output: ----------------------- MUSIGHBA1 ----------------------- I could get the sequence information but not the sequence itself. I felt like my fetching work was blocked by NCBI server because I have run many jobs to fetch DNA and protein sequences using get_Seq_by_*** method. If my fetching job was blocked, what else could I do for fetching sequences? Can I download datafile and parse it locally? Thank you, Sang Chul From szhan at uoguelph.ca Wed Sep 13 16:36:13 2006 From: szhan at uoguelph.ca (szhan at uoguelph.ca) Date: Wed, 13 Sep 2006 16:36:13 -0400 Subject: [Bioperl-l] problems for installing Bioperl 1.4 Message-ID: <20060913163613.uo7tdf2uqccgs8k0@webmail.uoguelph.ca> Dear Bioperl users, I have downloaded ActivePerl-5.8.8.819-MSWin32-x86-267479, and installed it on Window XP PC successfully. I used PPM to install Bioperl 1.4 following the instructions of Installing Bioperl on Windows (http://bioperl.org/SRC/bioperl-live/INSTALL.WIN) but got error as below: ERROR: Installing File-Spec-0.82 would downgrade File::Spec from version 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 and File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from version 1.2 to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and File::Spec::VMS from version 1.4 to 1.1 and File::Spec::Win32 from version 1.6 to 1.2 Could you please help me out? Thank you in advance! Josh From smarkel at scitegic.com Wed Sep 13 18:14:59 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Wed, 13 Sep 2006 15:14:59 -0700 Subject: [Bioperl-l] A failure to fetch a sequence Message-ID: We just had a customer report this today. It looks like the URL used by BioPerl no longer results in the sequence data being returned. Our test case is GET http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rett ype=gbwithparts&db=nucleotide&tool=bioperl&id=X61622&retstart=0&usehistory =n The response ends with no sequence data: ORIGIN // Changing the rettype to "gb" doesn't change the result. Using a rettype of "fasta" does return the correct FASTA entry. Our regressions ran fine last night, so it looks like the change was today at NCBI. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sang Chul Choi > Sent: Wednesday, 13 September 2006 13:03 > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] A failure to fetch a sequence > > Hi, > > I want to fetch a sequence and I tested the following simple code: > > use Bio::DB::GenBank; > $gb = new Bio::DB::GenBank; > $seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID print > $seq->display_id, "\n"; print $seq->seq, "\n"; > > Output: > ----------------------- > MUSIGHBA1 > > ----------------------- > > I could get the sequence information but not the sequence itself. > I felt like my fetching work was blocked by NCBI server > because I have run many jobs to fetch DNA and protein sequences using > get_Seq_by_*** method. If my fetching job was blocked, what > else could I do for fetching sequences? Can I download > datafile and parse it locally? > > Thank you, > > Sang Chul From cjfields at uiuc.edu Wed Sep 13 20:12:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Sep 2006 19:12:56 -0500 Subject: [Bioperl-l] A failure to fetch a sequence In-Reply-To: Message-ID: <000301c6d792$8a2b2c00$15327e82@pyrimidine> I've run several tests today which rely on Bio::DB::GenBank and they all seem to be passing. Using your URL directly in a browser gets the proper sequence back as well, so it was likely a 'burp' on NCBI's end. Are you still having any problems? BTW, 'gbwithparts' is set by default for the return format to always grab full sequence records (ones w/o CONTIG information) as that's what most users want. You can change that if needed to 'gb','fasta', etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of smarkel at scitegic.com > Sent: Wednesday, September 13, 2006 5:15 PM > To: goshng at gmail.com; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] A failure to fetch a sequence > > We just had a customer report this today. It looks like the URL > used by BioPerl no longer results in the sequence data being returned. > > Our test case is > > GET > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rett > ype=gbwithparts&db=nucleotide&tool=bioperl&id=X61622&retstart=0&usehistory > =n > > The response ends with no sequence data: > > ORIGIN > // > > Changing the rettype to "gb" doesn't change the result. Using a > rettype of "fasta" does return the correct FASTA entry. > > Our regressions ran fine last night, so it looks like the change > was today at NCBI. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Sang Chul Choi > > Sent: Wednesday, 13 September 2006 13:03 > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] A failure to fetch a sequence > > > > Hi, > > > > I want to fetch a sequence and I tested the following simple code: > > > > use Bio::DB::GenBank; > > $gb = new Bio::DB::GenBank; > > $seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID print > > $seq->display_id, "\n"; print $seq->seq, "\n"; > > > > Output: > > ----------------------- > > MUSIGHBA1 > > > > ----------------------- > > > > I could get the sequence information but not the sequence itself. > > I felt like my fetching work was blocked by NCBI server > > because I have run many jobs to fetch DNA and protein sequences using > > get_Seq_by_*** method. If my fetching job was blocked, what > > else could I do for fetching sequences? Can I download > > datafile and parse it locally? > > > > Thank you, > > > > Sang Chul > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From smarkel at scitegic.com Wed Sep 13 20:42:45 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Wed, 13 Sep 2006 17:42:45 -0700 Subject: [Bioperl-l] A failure to fetch a sequence Message-ID: Chris, Runs fine here, too, now. Scott > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, 13 September 2006 17:13 > To: Scott Markel; goshng at gmail.com; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] A failure to fetch a sequence > > I've run several tests today which rely on Bio::DB::GenBank > and they all seem to be passing. Using your URL directly in > a browser gets the proper sequence back as well, so it was > likely a 'burp' on NCBI's end. Are you still having any problems? > > BTW, 'gbwithparts' is set by default for the return format to > always grab full sequence records (ones w/o CONTIG > information) as that's what most users want. You can change > that if needed to 'gb','fasta', etc. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign From liugai at 126.com Thu Sep 14 04:06:25 2006 From: liugai at 126.com (liugai at 126.com) Date: Thu, 14 Sep 2006 16:06:25 +0800 Subject: [Bioperl-l] A problem with Bio::DB::Query::GenBank Message-ID: <001101c6d7d4$acfbf5b0$7b00a8c0@lenovo671a05ae> Hi Dear PPL I am a fresh for bioperl. My problem happened when I am learning from HOWTO:Beginner-Bioperl. Would be there anyone kindly help me out? Thank you very much! a.. The version of Bioperl :1.5.1 b.. The platform or operating system : WinXp c.. What you are trying to do: To run following script in my machine: #!/bin/perl -w use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query ); $gb_obj = Bio::DB::GenBank->new; $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while ($seq_obj = $stream_obj->next_seq) { # do something with the sequence object print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; } a.. Any error messages. Can't locate IO/String.pm in @INC (@INC contains: D:\Perl\example\ D:/Perl/lib D :/Perl/site/lib .) at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line 83. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line 8 3. Compilation failed in require at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm line 80. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm line 80. Compilation failed in require at D:/Perl/site/lib/Bio/DB/GenBank.pm line 123. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/GenBank.pm line 123 . Compilation failed in require at C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\query.pl line 3. BEGIN failed--compilation aborted at C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\quer y.pl line 3. From liugai at 126.com Wed Sep 13 22:39:02 2006 From: liugai at 126.com (liugai at 126.com) Date: Thu, 14 Sep 2006 10:39:02 +0800 Subject: [Bioperl-l] Question from HOWTO:Beginner-Bioperl Message-ID: <000601c6d7a6$f10557c0$7b00a8c0@lenovo671a05ae> Hi Dear I am a fresh for bioperl. My problem happened when I am learning from HOWTO:Beginner-Bioperl. Would be there anyone kindly help me out? a.. The version of Bioperl :1.5.1 b.. The platform or operating system : WinXp c.. What you are trying to do: To run following script in my machine: #!/bin/perl -w use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query ); $gb_obj = Bio::DB::GenBank->new; $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while ($seq_obj = $stream_obj->next_seq) { # do something with the sequence object print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; } a.. Any error messages. Can't locate IO/String.pm in @INC (@INC contains: D:\Perl\example\ D:/Perl/lib D :/Perl/site/lib .) at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line 83. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line 8 3. Compilation failed in require at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm line 80. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm line 80. Compilation failed in require at D:/Perl/site/lib/Bio/DB/GenBank.pm line 123. BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/GenBank.pm line 123 . Compilation failed in require at C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\query.pl line 3. BEGIN failed--compilation aborted at C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\quer y.pl line 3. From bix at sendu.me.uk Thu Sep 14 05:10:45 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Sep 2006 10:10:45 +0100 Subject: [Bioperl-l] A problem with Bio::DB::Query::GenBank In-Reply-To: <001101c6d7d4$acfbf5b0$7b00a8c0@lenovo671a05ae> References: <001101c6d7d4$acfbf5b0$7b00a8c0@lenovo671a05ae> Message-ID: <45091C95.1000901@sendu.me.uk> liugai at 126.com wrote: > > Hi Dear PPL > I am a fresh for bioperl. My problem happened when I am learning from HOWTO:Beginner-Bioperl. Would be there anyone kindly help me out? Thank you very much! > a.. The version of Bioperl :1.5.1 > b.. The platform or operating system : WinXp [snip] > Can't locate IO/String.pm You do not have IO/String.pm installed (correctly). Follow the installation instructions carefully: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows From osborne1 at optonline.net Thu Sep 14 08:40:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 14 Sep 2006 08:40:39 -0400 Subject: [Bioperl-l] Question from HOWTO:Beginner-Bioperl In-Reply-To: <000601c6d7a6$f10557c0$7b00a8c0@lenovo671a05ae> Message-ID: liugai, It looks like you need to install the module IO::String. Brian O. On 9/13/06 10:39 PM, "liugai at 126.com" wrote: > Hi Dear > I am a fresh for bioperl. My problem happened when I am learning from > HOWTO:Beginner-Bioperl. Would be there anyone kindly help me out? > a.. The version of Bioperl :1.5.1 > b.. The platform or operating system : WinXp > c.. What you are trying to do: > To run following script in my machine: > > #!/bin/perl -w > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; > $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => > $query ); > > $gb_obj = Bio::DB::GenBank->new; > > $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > while ($seq_obj = $stream_obj->next_seq) { > # do something with the sequence object > print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; > } > > > a.. Any error messages. > > > Can't locate IO/String.pm in @INC (@INC contains: D:\Perl\example\ D:/Perl/lib > D > :/Perl/site/lib .) at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line 83. > BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/WebDBSeqI.pm line > 8 > 3. > Compilation failed in require at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm line > 80. > BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/NCBIHelper.pm > line > 80. > Compilation failed in require at D:/Perl/site/lib/Bio/DB/GenBank.pm line 123. > BEGIN failed--compilation aborted at D:/Perl/site/lib/Bio/DB/GenBank.pm line > 123 > . > Compilation failed in require at > C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\query.pl > line 3. > BEGIN failed--compilation aborted at > C:\DOCUME~1\wk\LOCALS~1\Temp\dir1D.tmp\quer > y.pl line 3. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu Sep 14 11:16:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 14 Sep 2006 11:16:50 -0400 Subject: [Bioperl-l] More on PDB and chains... Message-ID: Bernd, I?m taking this discussion back into bioperl-l. You've uncovered a slightly different bug then. Shouldn't the HETATMs always be in a separate "chain" regardless of whether there are 1 or more than 1 polypeptide chains? So that?s one question. Related question: shouldn't the get_chains() method only return polypeptide chains, just as they're described in the PDB file? I would think that you'd retrieve the HETATMs using something like: my $hetatm = $struc->get_hetatm In the PDB file if there are, say, 3 chains the get_chains() method returns 4. One of these is the HETATMs ?chain? labelled by the id ?default?. I don?t think this is right since, first, the heteroatoms do no constitute a ?chain? and, second, the PDB file itself states that there are 3 chains. Perhaps users of StructIO::pdb have other points of view? Brian O. From bernd.web at gmail.com Thu Sep 14 11:56:09 2006 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 14 Sep 2006 17:56:09 +0200 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: References: Message-ID: <716af09c0609140856s162b3a8fl453e281979a207d9@mail.gmail.com> Hi, HETATM sometimes are present in a chain. So we cannot just exclude all HETATMS from a chain. However, since a chain is terminated with TER we could indeed store all non-chain HETATMs in an object (indeed like $struc->get_hetatm). What would be nice is to be able to see if a "residue" IN a chain is a HETATM. (Sometimes) modified residues (e.g. CME) are also labelled HETATM. At least internally to Structure::pdb it is clear what are HETATMs since the PDB files are written (almost) correctly. I used a script on http://lists.open-bio.org/pipermail/bioperl-guts-l/2005-November/020116.html to write PDB from 8HVP. In this case indeed at each "border" between ATOM and HETATOM within the chain a TER is printed where the original record has ATOM. Look for ABA and LOV HETATMS in the chain. Indeed I agree that non-chain HETATMs should not be part of the default chain. So a PDB record (e.g. 102L) with only one chain should have the protein chain and a separate HETATM "chain". Bernd On 9/14/06, Brian Osborne wrote: > Bernd, > > I?m taking this discussion back into bioperl-l. You've uncovered a slightly > different bug then. Shouldn't the HETATMs always be in a separate "chain" > regardless of whether there are 1 or more than 1 polypeptide chains? So > that?s one question. > > Related question: shouldn't the get_chains() method only return polypeptide > chains, just as they're described in the PDB file? I would think that you'd > retrieve the HETATMs using something like: > > my $hetatm = $struc->get_hetatm > > In the PDB file if there are, say, 3 chains the get_chains() method returns > 4. One of these is the HETATMs ?chain? labelled by the id ?default?. I don?t > think this is right since, first, the heteroatoms do no constitute a ?chain? > and, second, the PDB file itself states that there are 3 chains. Perhaps > users of StructIO::pdb have other points of view? > > Brian O. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From akarger at CGR.Harvard.edu Thu Sep 14 11:58:49 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 14 Sep 2006 11:58:49 -0400 Subject: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack? Message-ID: > From: Chris Fields [mailto:cjfields at uiuc.edu] > Subject: Re: [Bioperl-l] Bio::SeqIO -- add an ugly but fast grep hack? > SNIP > If your individual > sequence records > aren't very large, you could iterate through the individual > sequence records > in the file by changing the line separator to gulp each > record and use a > plain ol' regex, like this (modified from a quickie script I use): > > #! perl > use strict; > use warnings; > > { > local $/ = "//\n"; > while (my $gb = <>) { > print $gb if $gb =~ m/Staphylococcus\sepidermidis/im; > } > } Perl Golf! (Untested, as all good Perl Golf should be.) perl -wne 'BEGIN {$/="//\n"} print if /Staphylococcus\sepidermidis/im/' blah.gb > filtered.gb Unfortunately, I can't golf down the species name :) > You could probably squeeze that into a one-liner if needed; > this one was > from WinXP which has problems with using one-liners > containing quotes. OT. The Windows shell is very annoying. For those who don't know, it basically requires you to put double quotes around scripts, not single. (This automatically means you can't exactly port one-liners UNIX<->Windows, because if you use single quotes, Windows doesn't get it, and if you use double quotes, UNIX interprets any $ variables in your script as SHELL variables instead of Perl ones.) You can use qq~blah~ instead of "blah" in one-liners. So the above one-liner could be used on Windows as: perl -wne "BEGIN {$/=qq~//\n~} print if /Staphylococcus\sepidermidis/im/" blah.gb > filtered.gb (If you really need ~ inside your string, you can use other quote characters, qq/blah/ or qq-blah- or qq{blah} will work.) I've had pretty good luck with taking UNIX one-liners and just running them through (a slightly more complicated version of): # Use a non-greedy match so we correctly frame each pair of double quotes s/"(.*?)"/qq~$1~/g; # Non-greedy match matches the first ' on the whole line to the very last one # So we avoid messing up any apostrophes or \' inside the script s/'(.*)'/"$1"/; - Amir Karger Research Computing Bauer Center for Genomics Research Harvard University From cjfields at uiuc.edu Thu Sep 14 13:31:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Sep 2006 12:31:09 -0500 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <716af09c0609140856s162b3a8fl453e281979a207d9@mail.gmail.com> Message-ID: <000301c6d823$91543df0$15327e82@pyrimidine> > Hi, > > HETATM sometimes are present in a chain. So we cannot just exclude all > HETATMS from a chain. However, since a chain is terminated with TER we > could indeed store all non-chain HETATMs in an object (indeed like > $struc->get_hetatm). Sounds good to me. You could have a new class that inherits from the chain class or it?s interface (has same methods) but acts as a container for all the non-chain atoms, that way it differentiates itself from chain. This object could be retrieved via a get_nonchain() method instead of get_chains(). The Bio::Structure implementation, judging by the docs, is pretty confusing and, IMHO, needs some work. For instance, I wouldn?t expect to get the residues for each chain from the structure object but from the chain object, somewhat like: while ( my $struc = $stream->next_structure() ) { while (my $chain = $struc->next_chain()) { while (my $res = $chain->next_residue()) { # do work here } } while (my $chain = $struc->next_nonchain()) { # or whatever while (my $res = $chain->next_residue()) { # do work here } } } Right now, you get the residues directly from the structure object, using the chain as input. I don?t know the internals but this makes me think all the residue data is in the structure object and not the chain object. A bit inconvenient. while ( my $struc = $stream->next_structure() ) { for my $chain ($struc->get_chains) { my $chainid = $chain->id; my @res = $struc->get_residues($chain); # do work here } } } > What would be nice is to be able to see if a "residue" IN a chain is a > HETATM. (Sometimes) modified residues (e.g. CME) are also labelled > HETATM. At least internally to Structure::pdb it is clear what are > HETATMs since the PDB files are written (almost) correctly. This sounds more like ?get_hetatm()?, but should it be $chain->get_hetatm(), not $struct->get_hetatm($chain)? Going this route, you could have ?is_hetatm($resnumber)? (boolean for residue position), ?get_hetatms()? (grabs all hetatms), ?next_hetatm()? (iterate through the hetatms), etc. This could be along with ?next_residue()?, ?get_residues()?, etc for all residues, regardless of what type of residue they are. Chris > I used a script on > http://lists.open-bio.org/pipermail/bioperl-guts-l/2005- > November/020116.html > to write PDB from 8HVP. In this case indeed at each "border" between > ATOM and HETATOM within the chain a TER is printed where the original > record has ATOM. Look for ABA and LOV HETATMS in the chain. > Indeed I agree that non-chain HETATMs should not be part of the default > chain. > So a PDB record (e.g. 102L) with only one chain should have the > protein chain and a separate HETATM "chain". > > Bernd > > On 9/14/06, Brian Osborne wrote: > > Bernd, > > > > I?m taking this discussion back into bioperl-l. You've uncovered a > slightly > > different bug then. Shouldn't the HETATMs always be in a separate > "chain" > > regardless of whether there are 1 or more than 1 polypeptide chains? So > > that?s one question. > > > > Related question: shouldn't the get_chains() method only return > polypeptide > > chains, just as they're described in the PDB file? I would think that > you'd > > retrieve the HETATMs using something like: > > > > my $hetatm = $struc->get_hetatm > > > > In the PDB file if there are, say, 3 chains the get_chains() method > returns > > 4. One of these is the HETATMs ?chain? labelled by the id ?default?. I > don?t > > think this is right since, first, the heteroatoms do no constitute a > ?chain? > > and, second, the PDB file itself states that there are 3 chains. Perhaps > > users of StructIO::pdb have other points of view? > > > > Brian O. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Sep 14 14:15:30 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 14 Sep 2006 14:15:30 -0400 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <000301c6d823$91543df0$15327e82@pyrimidine> Message-ID: Chris and Bernd, I believe there's more to it than structure->chain->residue->atom, it is currently more like entry->structure AKA model->chain->residue->atom. In this way one can accommodate macromolecular structures or complexes composed of more than 1 protein, each protein capable of having more than one chain. >From Entry.pm: This object stores a whole Bio::Structure entry. It can consist of one or more models (L), which in turn consist of one or more chains (L). A chain is composed of residues (L) and a residue consists of atoms (L). My understanding is that multiple models in a single PDB file are separated by ENDMDL - Bernd, do you know of a multi-model PDB entry? However, Entry is handling all kinds of different functions e.g. getting and setting residues. I agree that this is unconventional. Brian O. On 9/14/06 1:31 PM, "Chris Fields" wrote: > The Bio::Structure implementation, judging by the docs, is pretty confusing > and, IMHO, needs some work. For instance, I wouldn?t expect to get the > residues for each chain from the structure object but from the chain object, > somewhat like: > > while ( my $struc = $stream->next_structure() ) { > while (my $chain = $struc->next_chain()) { > while (my $res = $chain->next_residue()) { > # do work here > } > } > while (my $chain = $struc->next_nonchain()) { # or whatever > while (my $res = $chain->next_residue()) { > # do work here > } > } > } From cjfields at uiuc.edu Thu Sep 14 14:44:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Sep 2006 13:44:08 -0500 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: Message-ID: <000301c6d82d$c3dd0e50$15327e82@pyrimidine> > Chris and Bernd, > > I believe there's more to it than structure->chain->residue->atom, it is > currently more like entry->structure AKA model->chain->residue->atom. In > this way one can accommodate macromolecular structures or complexes > composed > of more than 1 protein, each protein capable of having more than one > chain. > > >From Entry.pm: > > This object stores a whole Bio::Structure entry. It can consist of one > or more models (L), which in turn consist of one > or more chains (L). A chain is composed of residues > (L) and a residue consists of atoms > (L). > > My understanding is that multiple models in a single PDB file are > separated > by ENDMDL - Bernd, do you know of a multi-model PDB entry? > > However, Entry is handling all kinds of different functions e.g. getting > and > setting residues. I agree that this is unconventional. > > Brian O. Some of the small NMR structures in PDB have multiple models. So we could have something more like this: while (my $entry = $parser->next_entry) { while (my $struct = $entry->next_structure) { while (my $chain = $struct->next_chain) { # all residues while (my $res = $chain->next_residue) { while (my $atom = $res->next_atom) { # deepest level } } # hetatms only while (my $het = $chain->next_hetatm) { # similar methods to $res } } # use whatever method name for nonchain... while (my $nonchain = $struct->next_nonchain) { # similar methods to $chain } } } This creates a hierarchy similar to the SearchIO parser structure for result/hit/hsp. Then each class could have specific methods for the next level in the hierarchy. Each level in the hierarchy would need some common interface but I think it's achievable. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > > > On 9/14/06 1:31 PM, "Chris Fields" wrote: > > > The Bio::Structure implementation, judging by the docs, is pretty > confusing > > and, IMHO, needs some work. For instance, I wouldn?t expect to get the > > residues for each chain from the structure object but from the chain > object, > > somewhat like: > > > > while ( my $struc = $stream->next_structure() ) { > > while (my $chain = $struc->next_chain()) { > > while (my $res = $chain->next_residue()) { > > # do work here > > } > > } > > while (my $chain = $struc->next_nonchain()) { # or whatever > > while (my $res = $chain->next_residue()) { > > # do work here > > } > > } > > } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From smarkel at scitegic.com Thu Sep 14 15:13:38 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Thu, 14 Sep 2006 12:13:38 -0700 Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis Message-ID: Bernd, Based on the error message you posted > ------------- EXCEPTION ------------- > MSG: Attribute [-seqall] not recognized! it looks like there's a mismatch between the command-line arguments you're using via BioPerl and those expected by water. "water -h" will give you the correct command-line syntax. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Wednesday, 13 September 2006 04:55 > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis > > Hi, > > I started to use Bio::Factory::EMBOSS and started with the > example in the synopsis. > Although now I managed to do what I wanted, it seems like the > synopsis is not correct. > See > http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi o/Factory/EMBOSS.html > (and: > http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi o/Tools/Run/EMBOSSApplication.html) > for the code. > > I cannot run "water". Using the latest CVS, (Emboss.pm v 1.7) > and EMBOSS v 2.9.0, I get wrong output. In the commandline > the reference to the ARRAY \@seqs_to_check appears. > Output is: > sh: -c: line 1: syntax error near unexpected token `(' > sh: -c: line 1: `water -gapopen 10.0 -gapextend 0.5 -seqall > ARRAY(0x8bdb154) -outfile out.water -sequencea aseq -auto' > > When I set $water->verbose(1); the output is: > $VAR1 = { > '-gapopen' => '10.0', > '-gapextend' => '0.5', > '-seqall' => [ > 'CKRIHIGPGRAFWTTWC' > ], > '-outfile' => 'out.water', > '-sequencea' => 'aseq' > }; > Input attr: gapopen => 10.0 > Input attr: gapextend => 0.5 > > ------------- EXCEPTION ------------- > MSG: Attribute [-seqall] not recognized! > > STACK Bio::Tools::Run::EMBOSSApplication::run > /home/bwbrandt/perllib/Bio/Tools/Run/EMBOSSApplication.pm:204 > STACK toplevel emboss_factory.pl:27 > > Code is as in the SYNOPSIS but with: > my $seq_to_test = "aseq"; # this would have a seq here my > @seqs_to_check; # this would be a list of seqs to compare > # (could be just 1) $seqs_to_check[0] = > "CKRIHIGPGRAFWTTWC"; > > Any suggestions what should be coded? > > Thanks, > Bernd From bernd.web at gmail.com Thu Sep 14 18:17:06 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 15 Sep 2006 00:17:06 +0200 Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis In-Reply-To: References: Message-ID: <716af09c0609141517i7d1a65dcx9facf305fb0367e6@mail.gmail.com> Hi Scott, Thanks. I also thought of the possibility that the emboss module did some extra work (due to @seqs_to_check). From the synopsis I thought that emboss.pm produced a database from the sequences (or sequence files) in the array. This is not the case. The ARRAY() ref is printed to the command line causing the shell errors. The following comments in the SYNOPSIS are confusing to me: my $seq_to_test; # this would have a seq here my @seqs_to_check; # this would be a list of seqs to compare # (could be just 1) Both should be a sequence fileNAMES and the \@seqs_to_check cannot be passed (in my hands). It works when passing filenames (one with 1 seq, 1 with more than one). water->run({ '-asequence' => $seq_to_test, #filename! '-bsequence' => $seqs_to_check, #filename! '-gapopen' => '10.0', '-gapextend' => '0.5', '-outfile' => $wateroutfile}); It would be clearer (to me at least) to adapt the synopsis (if the above is indeed the only way to run water). Bernd On 9/14/06, smarkel at scitegic.com wrote: > Bernd, > > Based on the error message you posted > > > ------------- EXCEPTION ------------- > > MSG: Attribute [-seqall] not recognized! > > it looks like there's a mismatch between the command-line arguments > you're using via BioPerl and those expected by water. "water -h" > will give you the correct command-line syntax. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Web > > Sent: Wednesday, 13 September 2006 04:55 > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis > > > > Hi, > > > > I started to use Bio::Factory::EMBOSS and started with the > > example in the synopsis. > > Although now I managed to do what I wanted, it seems like the > > synopsis is not correct. > > See > > http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi > o/Factory/EMBOSS.html > > (and: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi > o/Tools/Run/EMBOSSApplication.html) > > for the code. > > > > I cannot run "water". Using the latest CVS, (Emboss.pm v 1.7) > > and EMBOSS v 2.9.0, I get wrong output. In the commandline > > the reference to the ARRAY \@seqs_to_check appears. > > Output is: > > sh: -c: line 1: syntax error near unexpected token `(' > > sh: -c: line 1: `water -gapopen 10.0 -gapextend 0.5 -seqall > > ARRAY(0x8bdb154) -outfile out.water -sequencea aseq -auto' > > > > When I set $water->verbose(1); the output is: > > $VAR1 = { > > '-gapopen' => '10.0', > > '-gapextend' => '0.5', > > '-seqall' => [ > > 'CKRIHIGPGRAFWTTWC' > > ], > > '-outfile' => 'out.water', > > '-sequencea' => 'aseq' > > }; > > Input attr: gapopen => 10.0 > > Input attr: gapextend => 0.5 > > > > ------------- EXCEPTION ------------- > > MSG: Attribute [-seqall] not recognized! > > > > STACK Bio::Tools::Run::EMBOSSApplication::run > > /home/bwbrandt/perllib/Bio/Tools/Run/EMBOSSApplication.pm:204 > > STACK toplevel emboss_factory.pl:27 > > > > Code is as in the SYNOPSIS but with: > > my $seq_to_test = "aseq"; # this would have a seq here my > > @seqs_to_check; # this would be a list of seqs to compare > > # (could be just 1) $seqs_to_check[0] = > > "CKRIHIGPGRAFWTTWC"; > > > > Any suggestions what should be coded? > > > > Thanks, > > Bernd > From kellert at ohsu.edu Thu Sep 14 18:45:51 2006 From: kellert at ohsu.edu (Thomas J Keller) Date: Thu, 14 Sep 2006 15:45:51 -0700 Subject: [Bioperl-l] DNA sequence assembly for SNP analysis Message-ID: <66EBAADD-35CC-4724-B1BC-7B4FE3390F1A@ohsu.edu> Greetings, What are people using for DNA sequence assembly for SNP analysis these days? Thanks, Tom K Thomas J. Keller, Ph.D. Director, MMI Core Facility Oregon Health & Science University 3181 SW Sam Jackson Park Rd. Portland, OR, USA, 97239 http://www.ohsu.edu/research/core From osborne1 at optonline.net Thu Sep 14 22:43:24 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 14 Sep 2006 22:43:24 -0400 Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis In-Reply-To: <716af09c0609141517i7d1a65dcx9facf305fb0367e6@mail.gmail.com> Message-ID: Scott and Bernd, The names of the parameters passed to EMBOSS applications have changed over the past EMBOSS versions, Bioperl handles older versions. Some discussion: http://article.gmane.org/gmane.comp.lang.perl.bio.general/7682/match=emboss Bernd, what version EMBOSS are you using? Brian O. On 9/14/06 6:17 PM, "Bernd Web" wrote: > Hi Scott, > > Thanks. I also thought of the possibility that the emboss module did > some extra work (due to @seqs_to_check). From the synopsis I thought > that emboss.pm produced a database from the sequences (or sequence > files) in the array. This is not the case. The ARRAY() ref is printed > to the command line causing the shell errors. > > The following comments in the SYNOPSIS are confusing to me: > my $seq_to_test; # this would have a seq here > my @seqs_to_check; # this would be a list of seqs to compare > # (could be just 1) > > Both should be a sequence fileNAMES and the \@seqs_to_check cannot be > passed (in my hands). It works when passing filenames (one with 1 seq, > 1 with more than one). > > water->run({ '-asequence' => $seq_to_test, #filename! > '-bsequence' => $seqs_to_check, #filename! > '-gapopen' => '10.0', > '-gapextend' => '0.5', > '-outfile' => $wateroutfile}); > > It would be clearer (to me at least) to adapt the synopsis (if the > above is indeed the only way to run water). > > Bernd > > On 9/14/06, smarkel at scitegic.com wrote: >> Bernd, >> >> Based on the error message you posted >> >>> ------------- EXCEPTION ------------- >>> MSG: Attribute [-seqall] not recognized! >> >> it looks like there's a mismatch between the command-line arguments >> you're using via BioPerl and those expected by water. "water -h" >> will give you the correct command-line syntax. >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Web >>> Sent: Wednesday, 13 September 2006 04:55 >>> To: bioperl-l at bioperl.org >>> Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis >>> >>> Hi, >>> >>> I started to use Bio::Factory::EMBOSS and started with the >>> example in the synopsis. >>> Although now I managed to do what I wanted, it seems like the >>> synopsis is not correct. >>> See >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi >> o/Factory/EMBOSS.html >>> (and: >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bi >> o/Tools/Run/EMBOSSApplication.html) >>> for the code. >>> >>> I cannot run "water". Using the latest CVS, (Emboss.pm v 1.7) >>> and EMBOSS v 2.9.0, I get wrong output. In the commandline >>> the reference to the ARRAY \@seqs_to_check appears. >>> Output is: >>> sh: -c: line 1: syntax error near unexpected token `(' >>> sh: -c: line 1: `water -gapopen 10.0 -gapextend 0.5 -seqall >>> ARRAY(0x8bdb154) -outfile out.water -sequencea aseq -auto' >>> >>> When I set $water->verbose(1); the output is: >>> $VAR1 = { >>> '-gapopen' => '10.0', >>> '-gapextend' => '0.5', >>> '-seqall' => [ >>> 'CKRIHIGPGRAFWTTWC' >>> ], >>> '-outfile' => 'out.water', >>> '-sequencea' => 'aseq' >>> }; >>> Input attr: gapopen => 10.0 >>> Input attr: gapextend => 0.5 >>> >>> ------------- EXCEPTION ------------- >>> MSG: Attribute [-seqall] not recognized! >>> >>> STACK Bio::Tools::Run::EMBOSSApplication::run >>> /home/bwbrandt/perllib/Bio/Tools/Run/EMBOSSApplication.pm:204 >>> STACK toplevel emboss_factory.pl:27 >>> >>> Code is as in the SYNOPSIS but with: >>> my $seq_to_test = "aseq"; # this would have a seq here my >>> @seqs_to_check; # this would be a list of seqs to compare >>> # (could be just 1) $seqs_to_check[0] = >>> "CKRIHIGPGRAFWTTWC"; >>> >>> Any suggestions what should be coded? >>> >>> Thanks, >>> Bernd >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Sep 15 04:50:00 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 15 Sep 2006 10:50:00 +0200 Subject: [Bioperl-l] Bio::Factory::EMBOSS synopsis In-Reply-To: References: <716af09c0609141517i7d1a65dcx9facf305fb0367e6@mail.gmail.com> Message-ID: <716af09c0609150150o62deae6av912b114f7b9fa658@mail.gmail.com> Hi Brian, I used 2.9.0 and 4.0.0. Below I include the adapted synopsis. This example works untill the alignment handling. The water alignment output also changed, so no alignment are parsed (no problem for me, i do not use this). The synopsis is on: http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Factory/EMBOSS.html and: http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Tools/Run/EMBOSSApplication.html On the first, there is a small error in Description: set "$factory->verbose" should be set "$prog->verbose". Bernd - Hide quoted text - On 9/15/06, Brian Osborne wrote: > Scott and Bernd, > > The names of the parameters passed to EMBOSS applications have changed over > the past EMBOSS versions, Bioperl handles older versions. Some discussion: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/7682/match=emboss > > Bernd, what version EMBOSS are you using? > > Brian O. From bernd.web at gmail.com Fri Sep 15 04:57:19 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 15 Sep 2006 10:57:19 +0200 Subject: [Bioperl-l] More on PDB and chains... Message-ID: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> Hi Brian, Just to give some PDB accession with multiple models: 1A4T 1A6B 1A6S 2CPS Structure parser the PDB and looks for a chain. HETATMs in multichain records do not have a "chain" so end up in the "default" chain. Chain parsing thus should also take the TER into account. Then the non-chain HETATMS can be recognized since the follow the TER in the 1 chain record. I'll check this. Is anyone else on this list using StructureIO::pdb at all? Bernd On 9/14/06, Brian Osborne wrote: > Chris and Bernd, > > I believe there's more to it than structure->chain->residue->atom, it is > currently more like entry->structure AKA model->chain->residue->atom. In > this way one can accommodate macromolecular structures or complexes composed > of more than 1 protein, each protein capable of having more than one chain. > > From Entry.pm: > > This object stores a whole Bio::Structure entry. It can consist of one > or more models (L), which in turn consist of one > or more chains (L). A chain is composed of residues > (L) and a residue consists of atoms > (L). > > My understanding is that multiple models in a single PDB file are separated > by ENDMDL - Bernd, do you know of a multi-model PDB entry? > > However, Entry is handling all kinds of different functions e.g. getting and > setting residues. I agree that this is unconventional. > > Brian O. From oliver.burren at cimr.cam.ac.uk Fri Sep 15 04:59:33 2006 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Fri, 15 Sep 2006 09:59:33 +0100 Subject: [Bioperl-l] Bio::Graphics::Glyph->parts differences between 1.636 and 1.654 Message-ID: <1158310773.13842.35.camel@jakarta> Hi Bioperlers, I'm having some problems with the CVS version of Bio::Graphics::Glyph especially the 'parts' method. I have written a script and module to demonstrate behaviour which I am happy to supply on request. Script is called test_parts.pl. It creates a 100 random features and adds them to a holding feature. A Bio::Graphics::Panel is created that this is then passed to for rendering. The glyph used to render is test_parts with following 'draw' sub sub draw{ my $self=shift; warn "Bio::Graphics::Panel API is ".Bio::Graphics::Panel::api_version ()."\n"; warn "Bio::Graphics::Glyph::testparts.pm can find ".(($self->parts =~ /ARRAY/?@{$self->parts}:$self->parts)|'no')." parts\n"; } #With 'old' version of Bio::Graphics. perl test_parts.pl Top level feature contains 100 features Bio::Graphics::Panel API is 1.636 Bio::Graphics::Glyph::testparts.pm can find 100 parts #with 'new' (CVS co) version. perl test_parts.pl Top level feature contains 100 features Bio::Graphics::Panel API is 1.654 Bio::Graphics::Glyph::testparts.pm can find 0 parts Looks as if I'm loosing parts between the two apis ? I saw a thread on gmod mailing list http://sourceforge.net/mailarchive/forum.php? forum_id=31947&max_rows=25&style=flat&viewmonth=200607&viewday=26 which may be relevant but I wasn't able to find any follow up. Would somone be able to advise/document the changes that have occured between the 2 api versions that might be relevant so that I can patch some of my custom glyphs so they are compatible. Many thanks, Olly Burren From n.haigh at sheffield.ac.uk Fri Sep 15 06:25:36 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri, 15 Sep 2006 11:25:36 +0100 Subject: [Bioperl-l] reactive bioperl developer account Message-ID: <450A7FA0.5040606@sheffield.ac.uk> I haven't been working with Perl/Bioperl for a while and never got round to getting my new password for dev.open-bio.org for cvs write access. My username (if I remember rightly) was "nathan". Could someone arrange to let me have accesses once again. Thanks Nathan From jurgen.pletinckx at algonomics.com Fri Sep 15 06:16:44 2006 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Fri, 15 Sep 2006 12:16:44 +0200 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> Message-ID: <20060915101854.4533583A0@sienna.algonomics.com> | Is anyone else on this list using StructureIO::pdb at all? *waves hand* Well, occasionally, in any case. Kris, the author of the Structure modules, was a colleague. When he left, I tried taking over the maintenance of the modules. Un- fortunately, it is way, way down on my priority list. So low, in fact, that I never committed anything whatsoever to repo, and only answered to the occasional query. Regarding architecture - the original design called for stuff like # my $struc = $stream->next_structure(); # my $chain = $struc->next_chain(); # my $res = $chain->next_residue(); but the implementation ran into severe reference management trouble. Which is why there is now a single object keeping track of all data, with everything of note handled via callback to that object. Yeah, unconventional. That being said, I _think_ you can add streams of residues, chains, ... without rewriting. But I never had the tuits. Regarding multiple models: the following works: my $io = Bio::Structure::IO->new(-file => "/PDB/a4/pdb1a4t.ent"); my $struc = $io->next_structure; my @models = $struc->get_models; print scalar(@models),"\n"; my @chains = $struc->get_chains($models[0]); print scalar(@chains), "\n"; my @residues=$struc->get_residues($chains[0]); print scalar(@residues), "\n";' ---> 20 ---> 2 ---> 15 So Model (and there is a Bio::Structure::Model.pm) is an optional layer between Structure and Chain. Fun! Finally - regarding non-chain HETATMs - I actually like the behaviour there, and think it is consistent. If an AA chain is labeled A, then the HETATMs which are also labeled A are understood to be part of that chain. By extrapolation, unlabeled HETATMs are part of the unlabeled chain. If that's not what the author intended, he surely would have labeled the AA chain, right? . Ahem. There is an awful lot of room for interpretation there, and I argued for (and against) most of the design decisions. Cheers, -- Jurgen Pletinckx AlgoNomics NV -- Jurgen Pletinckx AlgoNomics NV | -----Original Message----- | From: bioperl-l-bounces at lists.open-bio.org | [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Web | Sent: 15 September 2006 10:57 | To: Brian Osborne | Cc: Chris Fields; Bernd Web; bioperl-l | Subject: [Bioperl-l] More on PDB and chains... | | Hi Brian, | | Just to give some PDB accession with multiple models: | | 1A4T | 1A6B | 1A6S | 2CPS | | Structure parser the PDB and looks for a chain. HETATMs in multichain | records do not have a "chain" so end up in the "default" chain. Chain | parsing thus should also take the TER into account. Then the non-chain | HETATMS can be recognized since the follow the TER in the 1 chain | record. I'll check this. | | Is anyone else on this list using StructureIO::pdb at all? | | Bernd | | On 9/14/06, Brian Osborne wrote: | > Chris and Bernd, | > | > I believe there's more to it than | structure->chain->residue->atom, it is | > currently more like entry->structure AKA | model->chain->residue->atom. In | > this way one can accommodate macromolecular structures or | complexes composed | > of more than 1 protein, each protein capable of having more | than one chain. | > | > From Entry.pm: | > | > This object stores a whole Bio::Structure entry. It can | consist of one | > or more models (L), which in turn | consist of one | > or more chains (L). A chain is | composed of residues | > (L) and a residue consists of atoms | > (L). | > | > My understanding is that multiple models in a single PDB | file are separated | > by ENDMDL - Bernd, do you know of a multi-model PDB entry? | > | > However, Entry is handling all kinds of different functions | e.g. getting and | > setting residues. I agree that this is unconventional. | > | > Brian O. | _______________________________________________ | Bioperl-l mailing list | Bioperl-l at lists.open-bio.org | http://lists.open-bio.org/mailman/listinfo/bioperl-l | From bernd.web at gmail.com Fri Sep 15 07:53:15 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 15 Sep 2006 13:53:15 +0200 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <20060915101854.4533583A0@sienna.algonomics.com> References: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> <20060915101854.4533583A0@sienna.algonomics.com> Message-ID: <716af09c0609150453s66aad6a4s75355c24327a9bfd@mail.gmail.com> Hi Jurgen, Thanks for your info. Re the chain labelling: I fully agree with you, in case the chains are labelled. HETATMS in the chain should remain in the chain. That's why we talked about non-chain hetero atoms. If the chain is not labelled which is the case if only one (default) chain exists (e.g. 102L) all non-chain HETATMS also end up in the default chain. This is not nice: it becomes tedious to check if a HETATM is part of the chain or not. In case of such records (102L) it could be good not to store non-chain in the default chain and obey the TER chain termination label. Regards, Bernd On 9/15/06, Jurgen Pletinckx wrote: > > | Is anyone else on this list using StructureIO::pdb at all? > > *waves hand* > > Well, occasionally, in any case. > > Kris, the author of the Structure modules, was a colleague. When > he left, I tried taking over the maintenance of the modules. Un- > fortunately, it is way, way down on my priority list. So low, in > fact, that I never committed anything whatsoever to repo, and only > answered to the occasional query. > > Regarding architecture - the original design called for stuff like > > # my $struc = $stream->next_structure(); > # my $chain = $struc->next_chain(); > # my $res = $chain->next_residue(); > > but the implementation ran into severe reference management trouble. > Which is why there is now a single object keeping track of all data, > with everything of note handled via callback to that object. Yeah, > unconventional. That being said, I _think_ you can add streams of > residues, chains, ... without rewriting. But I never had the tuits. > > > Regarding multiple models: the following works: > my $io = Bio::Structure::IO->new(-file => "/PDB/a4/pdb1a4t.ent"); > my $struc = $io->next_structure; > my @models = $struc->get_models; > print scalar(@models),"\n"; > my @chains = $struc->get_chains($models[0]); > print scalar(@chains), "\n"; > my @residues=$struc->get_residues($chains[0]); > print scalar(@residues), "\n";' > > ---> 20 > ---> 2 > ---> 15 > > So Model (and there is a Bio::Structure::Model.pm) is an optional > layer between Structure and Chain. Fun! > > Finally - regarding non-chain HETATMs - I actually like the behaviour > there, and think it is consistent. If an AA chain is labeled A, then > the HETATMs which are also labeled A are understood to be part of that > chain. By extrapolation, unlabeled HETATMs are part of the unlabeled > chain. If that's not what the author intended, he surely would > have labeled the AA chain, right? . Ahem. There is an awful > lot of room for interpretation there, and I argued for (and against) > most of the design decisions. > > Cheers, > > -- > Jurgen Pletinckx > AlgoNomics NV > > > -- > Jurgen Pletinckx > AlgoNomics NV > > | -----Original Message----- > | From: bioperl-l-bounces at lists.open-bio.org > | [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Web > | Sent: 15 September 2006 10:57 > | To: Brian Osborne > | Cc: Chris Fields; Bernd Web; bioperl-l > | Subject: [Bioperl-l] More on PDB and chains... > | > | Hi Brian, > | > | Just to give some PDB accession with multiple models: > | > | 1A4T > | 1A6B > | 1A6S > | 2CPS > | > | Structure parser the PDB and looks for a chain. HETATMs in > multichain > | records do not have a "chain" so end up in the "default" chain. > Chain > | parsing thus should also take the TER into account. Then the > non-chain > | HETATMS can be recognized since the follow the TER in the 1 chain > | record. I'll check this. > | > | Is anyone else on this list using StructureIO::pdb at all? > | > | Bernd > | > | On 9/14/06, Brian Osborne wrote: > | > Chris and Bernd, > | > > | > I believe there's more to it than > | structure->chain->residue->atom, it is > | > currently more like entry->structure AKA > | model->chain->residue->atom. In > | > this way one can accommodate macromolecular structures or > | complexes composed > | > of more than 1 protein, each protein capable of having more > | than one chain. > | > > | > From Entry.pm: > | > > | > This object stores a whole Bio::Structure entry. It can > | consist of one > | > or more models (L), which in turn > | consist of one > | > or more chains (L). A chain is > | composed of residues > | > (L) and a residue consists of atoms > | > (L). > | > > | > My understanding is that multiple models in a single PDB > | file are separated > | > by ENDMDL - Bernd, do you know of a multi-model PDB entry? > | > > | > However, Entry is handling all kinds of different functions > | e.g. getting and > | > setting residues. I agree that this is unconventional. > | > > | > Brian O. > | _______________________________________________ > | Bioperl-l mailing list > | Bioperl-l at lists.open-bio.org > | http://lists.open-bio.org/mailman/listinfo/bioperl-l > | > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Sep 15 09:29:44 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 15 Sep 2006 08:29:44 -0500 Subject: [Bioperl-l] DNA sequence assembly for SNP analysis Message-ID: Good question! Add to this list: http://bioinformatics.org/pipermail/bio_bulletin_board/2006-May/003215.h tml This new option http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubM ed&cmd=Retrieve&list_uids=15931688&dopt=Abstract I'm interested in others too.... Cheers, Malcolm Cook -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Thomas J Keller Sent: Thursday, September 14, 2006 5:46 PM To: BioPerl-List Subject: [Bioperl-l] DNA sequence assembly for SNP analysis Greetings, What are people using for DNA sequence assembly for SNP analysis these days? Thanks, Tom K Thomas J. Keller, Ph.D. Director, MMI Core Facility Oregon Health & Science University 3181 SW Sam Jackson Park Rd. Portland, OR, USA, 97239 http://www.ohsu.edu/research/core _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From fgarret at ub.edu Fri Sep 15 10:40:21 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Fri, 15 Sep 2006 16:40:21 +0200 Subject: [Bioperl-l] Bio::DB::GFF Message-ID: <450ABB55.6060302@ub.edu> Hi all, I'm using Bio::DB::GFF and I need to get a all the GFF features in a particular region (eg. 100000 to 200000). However I need the positions to be 'reset' in order to start from 1 (and not 100000). I've looking in BioPerl but it looks like there's no method for it. So I changed the 'Bio::DB:GFF' Module on the 'start' and 'end' methods. The objective is to make them as get/set in order to change the start and stop positions of the GFF features. "... # overridden methods # start, stop, length sub start { my $self = shift; my $pos = shift; $self->strand < 0 ? $self->{stop} = $pos : $self->{start} = $pos if $self->absolute && $pos ne undef; return $self->strand < 0 ? $self->{stop} : $self->{start} if $self->absolute; $self->_abs2rel($self->{start}); } sub end { my $self = shift; my $pos = shift; $self->strand < 0 ? $self->{start} = $pos : $self->{stop} = $pos if $self->absolute && $pos ne undef; return $self->strand < 0 ? $self->{start} : $self->{stop} if $self->absolute; $self->_abs2rel($self->{stop}); } ..." Probably this changes are not totally general so any adjustment is welcome thanks in adv, FG From lincoln.stein at gmail.com Fri Sep 15 10:53:05 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Sep 2006 10:53:05 -0400 Subject: [Bioperl-l] Bio::DB::GFF In-Reply-To: <450ABB55.6060302@ub.edu> References: <450ABB55.6060302@ub.edu> Message-ID: <6dce9a0b0609150753g17bff177h5c1b606c829bb329@mail.gmail.com> Hi, This is quite easy. After you fetch the segment, tell it to use itself as the reference system with the refseq() method: my $s = $db->segment('Chr1',100000 => 200000); $s->refseq($s); >From now on, all coordinates will be relative to the start of s, so the range will go from 1 to 10000. This will also apply to all features you fetch from the segment. Lincoln On 9/15/06, Filipe Garrett wrote: > > Hi all, > > I'm using Bio::DB::GFF and I need to get a all the GFF features in a > particular region (eg. 100000 to 200000). However I need the positions > to be 'reset' in order to start from 1 (and not 100000). > > I've looking in BioPerl but it looks like there's no method for it. So I > changed the 'Bio::DB:GFF' Module on the 'start' and 'end' methods. The > objective is to make them as get/set in order to change the start and > stop positions of the GFF features. > > "... > # overridden methods > # start, stop, length > sub start { > my $self = shift; > my $pos = shift; > > $self->strand < 0 ? $self->{stop} = $pos : $self->{start} = $pos if > $self->absolute && $pos ne undef; > return $self->strand < 0 ? $self->{stop} : $self->{start} if > $self->absolute; > $self->_abs2rel($self->{start}); > } > sub end { > my $self = shift; > my $pos = shift; > > $self->strand < 0 ? $self->{start} = $pos : $self->{stop} = $pos if > $self->absolute && $pos ne undef; > return $self->strand < 0 ? $self->{start} : $self->{stop} if > $self->absolute; > $self->_abs2rel($self->{stop}); > } > .." > > Probably this changes are not totally general so any adjustment is welcome > > thanks in adv, > > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Fri Sep 15 11:34:52 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 15 Sep 2006 17:34:52 +0200 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <716af09c0609150453s66aad6a4s75355c24327a9bfd@mail.gmail.com> References: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> <20060915101854.4533583A0@sienna.algonomics.com> <716af09c0609150453s66aad6a4s75355c24327a9bfd@mail.gmail.com> Message-ID: <716af09c0609150834s44a18b9cwfbf26ad50aeb6ed8@mail.gmail.com> Hi all, The story about the chains is somewhat more complex. HETATM can be part of the chain and present before TER. No problem. We cannot just push all HETATMS after the TER to "non-chain". Some records associate HETATM after TER with a chain. A structure like chain (A,B,C etc), hetatm_chain (A,B,C etc) and non_chain (all other HETATM) would be able to capture this. An example PDB record where HETATMs after TER are labelled with a chain is 1QA7. regards, bernd On 9/15/06, Bernd Web wrote: > Hi Jurgen, > > Thanks for your info. > Re the chain labelling: I fully agree with you, in case the chains are > labelled. HETATMS in the chain should remain in the chain. That's why > we talked about non-chain hetero atoms. > If the chain is not labelled which is the case if only one (default) > chain exists (e.g. 102L) all non-chain HETATMS also end up in the > default chain. This is not nice: it becomes tedious to check if a > HETATM is part of the chain or not. In case of such records (102L) it > could be good not to store non-chain in the default chain and obey the > TER chain termination label. > > > Regards, > Bernd From arareko at campus.iztacala.unam.mx Fri Sep 15 11:28:45 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 15 Sep 2006 10:28:45 -0500 Subject: [Bioperl-l] reactive bioperl developer account In-Reply-To: <450A7FA0.5040606@sheffield.ac.uk> References: <450A7FA0.5040606@sheffield.ac.uk> Message-ID: <450AC6AD.9040800@campus.iztacala.unam.mx> Hi Nathan, To reactivate your account you should send an email to the OBF Helpdesk: support at open-bio.org Regards, Mauricio. Nathan Haigh wrote: > I haven't been working with Perl/Bioperl for a while and never got round > to getting my new password for dev.open-bio.org for cvs write access. My > username (if I remember rightly) was "nathan". Could someone arrange to > let me have accesses once again. > > Thanks > Nathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From muratem at eng.uah.edu Fri Sep 15 11:37:34 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Fri, 15 Sep 2006 10:37:34 -0500 (CDT) Subject: [Bioperl-l] World's smallest format converter Message-ID: Greetings This subject comes up periodically, but I have a question about the design approach. I have a simple script that reads a Genbank record, checks the species data, and writes it out as Fasta that I use to build species-specific BLAST databases from Genbank files. I noted that some times the Fasta header contains locus rather than accession. I looked at the source for the SeqIO methods and the default for writing Fasta is the display id, which defaults to locus when reading Genbank. Many times the locus equals the accession, but sometimes it does not. There is a comment in genbank.pm "there can be multiple accessions". Does anybody have any experience with this, and what happens if there are? Was locus picked for the display id because it is more likely to be unique? I see that one can select which flavor of id gets printed in the fasta header, but I'm curious about what to expect if I select 'accession'. Thanks Mike From dhoworth at mrc-lmb.cam.ac.uk Fri Sep 15 12:14:57 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri, 15 Sep 2006 17:14:57 +0100 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> References: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> Message-ID: <450AD181.8050601@mrc-lmb.cam.ac.uk> Bernd Web wrote: > Is anyone else on this list using StructureIO::pdb at all? I don't :( The problems with writing a PDB parser as I see them are: (1) Legacy data - for every rule that you try to rely on, there is at least one PDB file that breaks it! Even today, new files are not fully validated before release. (2) Different uses and ambiguity - depending on the use to which somebody wants to put the data, results may be different. Does a file with just C alpha positions contain residues? It all depends. Do the last few observed atoms constitute a residue? It all depends. Does one always accept the authors' labelling? etc etc. Compatibility with specific tools and/or other analysis of the same data also influence how the files are interpreted. So I think many people write their own parser, with just those tweaks and interpretations that they require for their application. In mine, I have over 600 lines just correcting simple errors in various PDB files so my parser is able to read them. And I don't even read the ATOM records! I'm just reading header lines. I did say people had different use cases :) I would suggest looking at other existing solutions to try to benefit from that knowledge. The mmCIF/XML data model and the MSD schema can suggest object and data structures. Andrew Dalke's work on the biopython module has lots of hard-won experience, I believe. See for example, Cheers, Dave From szhan at uoguelph.ca Fri Sep 15 15:55:51 2006 From: szhan at uoguelph.ca (szhan at uoguelph.ca) Date: Fri, 15 Sep 2006 15:55:51 -0400 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XP PC using ActivePerl PPM Message-ID: <20060915155551.io6jerifwg8wg0sw@webmail.uoguelph.ca> Dear Bioperl users, I have downloaded ActivePerl-5.8.8.819 MSI (x86), and installed it on Windows XP PC successfully. I used GUI PPM to install Bioperl 1.4 by opening GUI PPM, choosing Bioperl1.4 then marking for install but got error as below: ERROR: Installing File-Spec-0.82 would downgrade File::Spec from version 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 and File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from version 1.2 to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and File::Spec::VMS from version 1.4 to 1.1 and File::Spec::Win32 from version 1.6 to 1.2 Why did I get the error? Could you please help me out? Thanks! Josh From cjfields at uiuc.edu Fri Sep 15 16:53:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Sep 2006 15:53:44 -0500 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <20060915155551.io6jerifwg8wg0sw@webmail.uoguelph.ca> Message-ID: <000001c6d909$086cf810$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of szhan at uoguelph.ca > Sent: Friday, September 15, 2006 2:56 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows > XPPC using ActivePerl PPM > > Dear Bioperl users, > I have downloaded ActivePerl-5.8.8.819 MSI (x86), and installed it on > Windows XP PC successfully. I used GUI PPM to install Bioperl 1.4 by > opening GUI PPM, choosing Bioperl1.4 then marking for install but got > error as below: > ERROR: Installing File-Spec-0.82 would downgrade File::Spec from > version 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 > and File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from > version 1.2 to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and > File::Spec::VMS from version 1.4 to 1.1 and File::Spec::Win32 from > version 1.6 to 1.2 > Why did I get the error? Could you please help me out? > Thanks! > > Josh I haven't used the new PPM GUI with ActivePerl 5.8.819 yet, but it's strange that it says those will be downgraded. Have you tried the command line PPM to install? I know it's still available with the distribution. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mc at michaelcraige.com Fri Sep 15 15:08:24 2006 From: mc at michaelcraige.com (Michael Craige) Date: Fri, 15 Sep 2006 15:08:24 -0400 Subject: [Bioperl-l] Aligned Fasta file to Phylip conversion In-Reply-To: Message-ID: Hello all, I need to convert some aligned fasta files quickly in my lab to phylip format for phylogenetic prediction and analysis. My plan is to developed a script(s), but before I make the plunge - does anyone know of or have a script(s) that can perform the conversion with or with out some tweaking? I am also interested in a bio-perl module(s) that can make the process faster. Thanks Mike. From cjfields at uiuc.edu Fri Sep 15 17:39:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Sep 2006 16:39:46 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website Message-ID: <000001c6d90f$772729f0$15327e82@pyrimidine> I noticed that the BioPerl site where regular distributions are kept is a little, um, unkempt. Who would we talk to about maybe moving some of the older (pre-1.4) bioperl releases into a separate directory? Chris D. maybe? http://www.bioperl.org/DIST/ Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Fri Sep 15 17:49:21 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 15 Sep 2006 17:49:21 -0400 Subject: [Bioperl-l] Aligned Fasta file to Phylip conversion In-Reply-To: Message-ID: Mike, Easy! http://www.bioperl.org/wiki/Bptutorial.pl#III.2.2_Transforming_alignment_fil es_.28AlignIO.29 Brian O. On 9/15/06 3:08 PM, "Michael Craige" wrote: > Hello all, > > I need to convert some aligned fasta files quickly in my lab to phylip > format for phylogenetic prediction and analysis. > > My plan is to developed a script(s), but before I make the plunge - does > anyone know of or have a script(s) that can perform the conversion with or > with out some tweaking? > > I am also interested in a bio-perl module(s) that can make the process > faster. > > Thanks > Mike. > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Fri Sep 15 18:12:33 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Sep 2006 18:12:33 -0400 Subject: [Bioperl-l] Bio::Graphics::Glyph->parts differences between 1.636 and 1.654 In-Reply-To: <1158310773.13842.35.camel@jakarta> References: <1158310773.13842.35.camel@jakarta> Message-ID: <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> I will test this. The parts() API is not supposed to have changed. If you could send me your script, that would be very helpful. Best, Lincoln On 9/15/06, Oliver Burren wrote: > > Hi Bioperlers, > > I'm having some problems with the CVS version of Bio::Graphics::Glyph > especially the 'parts' method. I have written a script and module to > demonstrate behaviour which I am happy to supply on request. > > Script is called test_parts.pl. It creates a 100 random features and > adds them to a holding feature. A Bio::Graphics::Panel is created that > this is then passed to for rendering. The glyph used to render is > test_parts with following 'draw' sub > > sub draw{ > my $self=shift; > warn "Bio::Graphics::Panel API is ".Bio::Graphics::Panel::api_version > ()."\n"; > warn "Bio::Graphics::Glyph::testparts.pm can find ".(($self->parts > =~ /ARRAY/?@{$self->parts}:$self->parts)|'no')." parts\n"; > } > > > #With 'old' version of Bio::Graphics. > > perl test_parts.pl > Top level feature contains 100 features > Bio::Graphics::Panel API is 1.636 > Bio::Graphics::Glyph::testparts.pm can find 100 parts > > #with 'new' (CVS co) version. > > perl test_parts.pl > Top level feature contains 100 features > Bio::Graphics::Panel API is 1.654 > Bio::Graphics::Glyph::testparts.pm can find 0 parts > > Looks as if I'm loosing parts between the two apis ? > > I saw a thread on gmod mailing list > > http://sourceforge.net/mailarchive/forum.php? > forum_id=31947&max_rows=25&style=flat&viewmonth=200607&viewday=26 which > may be relevant but I wasn't able to find any follow up. > > Would somone be able to advise/document the changes that have occured > between the 2 api versions that might be relevant so that I can patch > some of my custom glyphs so they are compatible. > > Many thanks, > > > Olly Burren > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lincoln.stein at gmail.com Fri Sep 15 18:28:54 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Sep 2006 18:28:54 -0400 Subject: [Bioperl-l] Bio::Graphics::Glyph->parts differences between 1.636 and 1.654 In-Reply-To: <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> References: <1158310773.13842.35.camel@jakarta> <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> Message-ID: <6dce9a0b0609151528u8eab6d1q50060f47975dfc39@mail.gmail.com> Hi Oliver, Sorry the answer didn't occur to me earlier. There is a new maxdepth method that returns the number of levels of descent that the glyph can draw. Bio::Graphics::Glyph returns undef from this method, meaning that it can draw an unlimited number of levels of subparts, but Bio::Graphics::Glyph::generic returns 0, meaning that it only cares about the top level feature. This is a major performance boost. For your glyph, you can do one of three things: 1) override maxdepth() so that it returns the number of levels to descend into. 2) inherit from Bio::Graphics::Glyph, not generic 3) pass the -maxdepth option to the track when you create it. I will fix the documentation now. Lincoln On 9/15/06, Lincoln Stein wrote: > > I will test this. The parts() API is not supposed to have changed. If you > could send me your script, that would be very helpful. > > Best, > > Lincoln > > > On 9/15/06, Oliver Burren wrote: > > > > Hi Bioperlers, > > > > I'm having some problems with the CVS version of Bio::Graphics::Glyph > > especially the 'parts' method. I have written a script and module to > > demonstrate behaviour which I am happy to supply on request. > > > > Script is called test_parts.pl. It creates a 100 random features and > > adds them to a holding feature. A Bio::Graphics::Panel is created that > > this is then passed to for rendering. The glyph used to render is > > test_parts with following 'draw' sub > > > > sub draw{ > > my $self=shift; > > warn "Bio::Graphics::Panel API is ".Bio::Graphics::Panel::api_version > > ()."\n"; > > warn "Bio::Graphics::Glyph:: testparts.pm can find ".(($self->parts > > =~ /ARRAY/?@{$self->parts}:$self->parts)|'no')." parts\n"; > > } > > > > > > #With 'old' version of Bio::Graphics. > > > > perl test_parts.pl > > Top level feature contains 100 features > > Bio::Graphics::Panel API is 1.636 > > Bio::Graphics::Glyph::testparts.pm can find 100 parts > > > > #with 'new' (CVS co) version. > > > > perl test_parts.pl > > Top level feature contains 100 features > > Bio::Graphics::Panel API is 1.654 > > Bio::Graphics::Glyph::testparts.pm can find 0 parts > > > > Looks as if I'm loosing parts between the two apis ? > > > > I saw a thread on gmod mailing list > > > > http://sourceforge.net/mailarchive/forum.php? > > forum_id=31947&max_rows=25&style=flat&viewmonth=200607&viewday=26 which > > may be relevant but I wasn't able to find any follow up. > > > > Would somone be able to advise/document the changes that have occured > > between the 2 api versions that might be relevant so that I can patch > > some of my custom glyphs so they are compatible. > > > > Many thanks, > > > > > > Olly Burren > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lincoln.stein at gmail.com Fri Sep 15 18:28:04 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Sep 2006 18:28:04 -0400 Subject: [Bioperl-l] Bio::Graphics::Glyph->parts differences between 1.636 and 1.654 In-Reply-To: <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> References: <1158310773.13842.35.camel@jakarta> <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> Message-ID: <6dce9a0b0609151528o15935fa0xaa71b0df45dd9393@mail.gmail.com> Hi Oliver, Sorry the answer didn't occur to me earlier. There is a new maxdepth method that returns the number of levels of descent that the glyph can draw. Bio::Graphics::Glyph returns undef from this method, meaning that it can draw an unlimited number of levels of subparts, but Bio::Graphics::Glyph::generic returns 0, meaning that it only cares about the top level feature. This is a major performance boost. For your glyph, you can do one of two things: 1) override maxdepth() so that it returns the number of levels to descend into. On 9/15/06, Lincoln Stein wrote: > > I will test this. The parts() API is not supposed to have changed. If you > could send me your script, that would be very helpful. > > Best, > > Lincoln > > > On 9/15/06, Oliver Burren wrote: > > > > Hi Bioperlers, > > > > I'm having some problems with the CVS version of Bio::Graphics::Glyph > > especially the 'parts' method. I have written a script and module to > > demonstrate behaviour which I am happy to supply on request. > > > > Script is called test_parts.pl. It creates a 100 random features and > > adds them to a holding feature. A Bio::Graphics::Panel is created that > > this is then passed to for rendering. The glyph used to render is > > test_parts with following 'draw' sub > > > > sub draw{ > > my $self=shift; > > warn "Bio::Graphics::Panel API is ".Bio::Graphics::Panel::api_version > > ()."\n"; > > warn "Bio::Graphics::Glyph:: testparts.pm can find ".(($self->parts > > =~ /ARRAY/?@{$self->parts}:$self->parts)|'no')." parts\n"; > > } > > > > > > #With 'old' version of Bio::Graphics. > > > > perl test_parts.pl > > Top level feature contains 100 features > > Bio::Graphics::Panel API is 1.636 > > Bio::Graphics::Glyph::testparts.pm can find 100 parts > > > > #with 'new' (CVS co) version. > > > > perl test_parts.pl > > Top level feature contains 100 features > > Bio::Graphics::Panel API is 1.654 > > Bio::Graphics::Glyph::testparts.pm can find 0 parts > > > > Looks as if I'm loosing parts between the two apis ? > > > > I saw a thread on gmod mailing list > > > > http://sourceforge.net/mailarchive/forum.php? > > forum_id=31947&max_rows=25&style=flat&viewmonth=200607&viewday=26 which > > may be relevant but I wasn't able to find any follow up. > > > > Would somone be able to advise/document the changes that have occured > > between the 2 api versions that might be relevant so that I can patch > > some of my custom glyphs so they are compatible. > > > > Many thanks, > > > > > > Olly Burren > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Fri Sep 15 19:09:22 2006 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 16 Sep 2006 01:09:22 +0200 Subject: [Bioperl-l] More on PDB and chains... In-Reply-To: <450AD181.8050601@mrc-lmb.cam.ac.uk> References: <716af09c0609150157j4d56bdeu9ab1f99ad5b7f964@mail.gmail.com> <450AD181.8050601@mrc-lmb.cam.ac.uk> Message-ID: <716af09c0609151609j2f60a087se70eb3ef29b738a9@mail.gmail.com> Hi, Regarding the pdb parsing. Jurgen writes: # my $struc = $stream->next_structure(); # my $chain = $struc->next_chain(); # my $res = $chain->next_residue(); but the implementation ran into severe reference management trouble. I am not into StructureIO, but given this I would not feel like rewriting to this structure again, since apparently it was originally like this. Regarding the HETATMs after TER in the PDB records that are included in the residue chains: looking through the code it is due to a "bug" that was solved: fix for bug #1187 for 1abm. Before this bugfix a chain A with residues and a chain A with HETATMS was created. So rolling this back partly solves the point about HETATMS. In case there is only one chain, like Jurgen says, it is ambiguous to state that HETATMs are non-chain, since the chain and the HETATMs (mostly HOH) are all not labelled. However, here I would take the chain only to TER. Like Dave said: difference of interpretation ;-) I wrote about my interpretation and will try to adapt pdb.pm accordingly.... Bernd From arareko at campus.iztacala.unam.mx Fri Sep 15 20:58:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 15 Sep 2006 19:58:27 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <000001c6d90f$772729f0$15327e82@pyrimidine> References: <000001c6d90f$772729f0$15327e82@pyrimidine> Message-ID: <450B4C33.1000704@campus.iztacala.unam.mx> I can handle that Chris. Do you have some directory setup in mind? What would you suggest to be moved? Mauricio. Chris Fields wrote: > I noticed that the BioPerl site where regular distributions are kept is a > little, um, unkempt. Who would we talk to about maybe moving some of the > older (pre-1.4) bioperl releases into a separate directory? Chris D. maybe? > > http://www.bioperl.org/DIST/ > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Fri Sep 15 22:14:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Sep 2006 21:14:19 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450B4C33.1000704@campus.iztacala.unam.mx> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> Message-ID: <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> A good start might be if we could move the pre-1.4 core releases to an 'old_releases' directory, along with old PPMs and so on (for Windows releases). I may try building a bare-bones PPM for the next release. Haven't heard much from Sendu on that last bit, but it's the beginning of the fall semester so lots of the academics (me included) are pretty busy! Chris On Sep 15, 2006, at 7:58 PM, Mauricio Herrera Cuadra wrote: > I can handle that Chris. Do you have some directory setup in mind? > What > would you suggest to be moved? > > Mauricio. > > Chris Fields wrote: >> I noticed that the BioPerl site where regular distributions are >> kept is a >> little, um, unkempt. Who would we talk to about maybe moving some >> of the >> older (pre-1.4) bioperl releases into a separate directory? Chris >> D. maybe? >> >> http://www.bioperl.org/DIST/ >> >> Chris >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Sat Sep 16 01:21:12 2006 From: jason at bioperl.org (Jason Stajich) Date: Sat, 16 Sep 2006 06:21:12 +0100 Subject: [Bioperl-l] correction to docs for Bio::SearchIO::axt In-Reply-To: <20060915221840.GA14612@bio.cse.psu.edu> References: <20060915221840.GA14612@bio.cse.psu.edu> Message-ID: right - I guess I meant the only way to get BLASTZ into our system is lav -> axt. I will see if someone wants to update the module synopsis to clarify this point for users or I will get to it when I have time. On Sep 15, 2006, at 11:18 PM, Cathy Riemer wrote: > Hello Jason, > > I noticed an error in the documentation page for the > Bio::SearchIO::axt parser at > . > > The page states that "AXT format reports [are] typically produced > by BLASTZ", but actually BLASTZ produces lav output, not axt. > Sites that provide pre-computed BLASTZ alignments in axt format > (such as UCSC's Genome Browser) convert them from lav to axt in > a post-processing step. > > -Cathy > > --- > Cathy Riemer > Center for Comparative Genomics and Bioinformatics > Penn State University From phil- at 21cn.com Sat Sep 16 03:26:50 2006 From: phil- at 21cn.com (Philip Yang) Date: Sat, 16 Sep 2006 15:26:50 +0800 (CST) Subject: [Bioperl-l] (no subject) Message-ID: An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060916/8c9ae1f2/attachment.html From cjfields at uiuc.edu Sat Sep 16 09:55:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Sep 2006 08:55:27 -0500 Subject: [Bioperl-l] correction to docs for Bio::SearchIO::axt In-Reply-To: References: <20060915221840.GA14612@bio.cse.psu.edu> Message-ID: <11378303-D3CC-48D4-A33B-9F20460149D6@uiuc.edu> Jason, Cathy, I changed the POD description to the following in CVS: .... =head1 DESCRIPTION This is a parser and event-generator for AXT format reports. BLASTZ reports (Schwartz et al,(2003) Genome Research, 13:103-107) are normally in LAV format but are commonly post-processed to AXT format; precomputed BLASTZ reports, such as those found in the UCSC Genome Browser, are in AXT format. This parser will also parse any AXT format produced from any lav report and directly out of BLAT. ... Chris On Sep 16, 2006, at 12:21 AM, Jason Stajich wrote: > right - I guess I meant the only way to get BLASTZ into our system is > lav -> axt. > > I will see if someone wants to update the module synopsis to clarify > this point for users or I will get to it when I have time. > > On Sep 15, 2006, at 11:18 PM, Cathy Riemer wrote: > >> Hello Jason, >> >> I noticed an error in the documentation page for the >> Bio::SearchIO::axt parser at >> . >> >> The page states that "AXT format reports [are] typically produced >> by BLASTZ", but actually BLASTZ produces lav output, not axt. >> Sites that provide pre-computed BLASTZ alignments in axt format >> (such as UCSC's Genome Browser) convert them from lav to axt in >> a post-processing step. >> >> -Cathy >> >> --- >> Cathy Riemer >> Center for Comparative Genomics and Bioinformatics >> Penn State University > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From phil- at 126.com Sat Sep 16 09:25:29 2006 From: phil- at 126.com (phil-) Date: Sat, 16 Sep 2006 21:25:29 +0800 (CST) Subject: [Bioperl-l] Bug in method "slice" of Bio::SimpleAlign Message-ID: <450BFB49.0000A1.29116@bj126app35.126.com> Hi there, part of the original code of Bio::SimpleAlign, which caculate the start and end of the sliced LocatableSeq my $slice_seq = $seq->subseq($start, $seq_end); $new_seq->seq( $slice_seq ); # start if ($start > 1) { my $pre_start_seq = $seq->subseq(1, $start - 1); $pre_start_seq =~ s/\W//g; #print "$pre_start_seq\n"; $new_seq->start( $seq->start + CORE::length($pre_start_seq) ); } else { $new_seq->start( $seq->start); } # end $slice_seq =~ s/\W//g; $new_seq->end( $new_seq->start + CORE::length($slice_seq) - 1 ); And I think there is something wrong when we have a LocatableSeq which in its negative strand. In my opinion it should change to this my $slice_seq = $seq->subseq($start, $seq_end); $new_seq->seq( $slice_seq ); $slice_seq =~ s/\W//g; if ($start > 1) { my $pre_start_seq = $seq->subseq(1, $start - 1); $pre_start_seq =~ s/\W//g; if ($seq->strand > 0){ $new_seq->start( $seq->start + CORE::length($pre_start_seq) ); } else { $new_seq->start( $seq->end - CORE::length($pre_start_seq) - CORE::length($slice_seq) + 1); } } else { $new_seq->start( $seq->start); } $new_seq->end( $new_seq->start + CORE::length($slice_seq) - 1 ); Maybe my code still needs optimization, and I don't know how to distribute it to the CVS. So please help ^_^ Philip Young GuangZhou, China 2006-9-16 From cjfields at uiuc.edu Sat Sep 16 13:24:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Sep 2006 12:24:51 -0500 Subject: [Bioperl-l] correction to docs for Bio::SearchIO::axt In-Reply-To: <20060916164314.GA23640@bio.cse.psu.edu> References: <20060915221840.GA14612@bio.cse.psu.edu> <11378303-D3CC-48D4-A33B-9F20460149D6@uiuc.edu> <20060916164314.GA23640@bio.cse.psu.edu> Message-ID: <7642B11A-B248-4695-818E-413921D85D38@uiuc.edu> Added it. Anything else? Chris On Sep 16, 2006, at 11:43 AM, Cathy Riemer wrote: > Chris and Jason, > > Thanks for taking care of this. One small suggestion: to add the > word "many" before "precomputed", since some sites might provide > LAV (or other formats). > > -Cathy > > > On Sat, Sep 16, 2006 at 08:55:27AM -0500, Chris Fields wrote: >> Jason, Cathy, >> >> I changed the POD description to the following in CVS: >> >> .... >> >> =head1 DESCRIPTION >> >> This is a parser and event-generator for AXT format reports. BLASTZ >> reports (Schwartz et al,(2003) Genome Research, 13:103-107) are >> normally >> in LAV format but are commonly post-processed to AXT format; >> precomputed >> BLASTZ reports, such as those found in the UCSC Genome >> Browser, are in AXT format. This parser will also parse any >> AXT format produced from any lav report and directly out of BLAT. >> ... >> >> >> Chris >> >> On Sep 16, 2006, at 12:21 AM, Jason Stajich wrote: >> >>> right - I guess I meant the only way to get BLASTZ into our >>> system is >>> lav -> axt. >>> >>> I will see if someone wants to update the module synopsis to clarify >>> this point for users or I will get to it when I have time. >>> >>> On Sep 15, 2006, at 11:18 PM, Cathy Riemer wrote: >>> >>>> Hello Jason, >>>> >>>> I noticed an error in the documentation page for the >>>> Bio::SearchIO::axt parser at >>>> . >>>> >>>> The page states that "AXT format reports [are] typically produced >>>> by BLASTZ", but actually BLASTZ produces lav output, not axt. >>>> Sites that provide pre-computed BLASTZ alignments in axt format >>>> (such as UCSC's Genome Browser) convert them from lav to axt in >>>> a post-processing step. >>>> >>>> -Cathy >>>> >>>> --- >>>> Cathy Riemer >>>> Center for Comparative Genomics and Bioinformatics >>>> Penn State University >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cathy at bio.cse.psu.edu Sat Sep 16 12:43:14 2006 From: cathy at bio.cse.psu.edu (Cathy Riemer) Date: 16 Sep 2006 12:43:14 -0400 Subject: [Bioperl-l] correction to docs for Bio::SearchIO::axt In-Reply-To: <11378303-D3CC-48D4-A33B-9F20460149D6@uiuc.edu> References: <20060915221840.GA14612@bio.cse.psu.edu> <11378303-D3CC-48D4-A33B-9F20460149D6@uiuc.edu> Message-ID: <20060916164314.GA23640@bio.cse.psu.edu> Chris and Jason, Thanks for taking care of this. One small suggestion: to add the word "many" before "precomputed", since some sites might provide LAV (or other formats). -Cathy On Sat, Sep 16, 2006 at 08:55:27AM -0500, Chris Fields wrote: > Jason, Cathy, > > I changed the POD description to the following in CVS: > > .... > > =head1 DESCRIPTION > > This is a parser and event-generator for AXT format reports. BLASTZ > reports (Schwartz et al,(2003) Genome Research, 13:103-107) are normally > in LAV format but are commonly post-processed to AXT format; precomputed > BLASTZ reports, such as those found in the UCSC Genome > Browser, are in AXT format. This parser will also parse any > AXT format produced from any lav report and directly out of BLAT. > ... > > > Chris > > On Sep 16, 2006, at 12:21 AM, Jason Stajich wrote: > > >right - I guess I meant the only way to get BLASTZ into our system is > >lav -> axt. > > > >I will see if someone wants to update the module synopsis to clarify > >this point for users or I will get to it when I have time. > > > >On Sep 15, 2006, at 11:18 PM, Cathy Riemer wrote: > > > >>Hello Jason, > >> > >>I noticed an error in the documentation page for the > >>Bio::SearchIO::axt parser at > >>. > >> > >>The page states that "AXT format reports [are] typically produced > >>by BLASTZ", but actually BLASTZ produces lav output, not axt. > >>Sites that provide pre-computed BLASTZ alignments in axt format > >>(such as UCSC's Genome Browser) convert them from lav to axt in > >>a post-processing step. > >> > >>-Cathy > >> > >>--- > >>Cathy Riemer > >>Center for Comparative Genomics and Bioinformatics > >>Penn State University > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From Rafi.Ahmad at fagmed.uit.no Sat Sep 16 13:21:49 2006 From: Rafi.Ahmad at fagmed.uit.no (Rafi Ahmad) Date: Sat, 16 Sep 2006 19:21:49 +0200 Subject: [Bioperl-l] generate ptt file from Genbank file Message-ID: Hi, I am trying to generate a .ptt file like the NCBI ptt file, which basically contains the gene co-ordiante information, its strand, name. I have a Genbank file from which i want to generate this ptt file. Is there any BioPerl module which can do the same, or any sample script which I can may be modify and use. Thanks in advance for your reply. Regards Rafi From arareko at campus.iztacala.unam.mx Sat Sep 16 15:16:18 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 16 Sep 2006 14:16:18 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> Message-ID: <450C4D82.9050106@campus.iztacala.unam.mx> Chris, I've done the initial moves and created new signature files for each directory. Currently I've doubts on what to do with the following files: - Bioperl.ppd (what does this stands for?) - GD-SVG-0.25-ppm.tar.gz (where does this came from?) - GD-SVG.ppd (where does this came from?) - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) - package.lst (still in use? needs updating? be splitted into corresponding subdirectories?) Cheers, Mauricio. Chris Fields wrote: > A good start might be if we could move the pre-1.4 core releases to > an 'old_releases' directory, along with old PPMs and so on (for > Windows releases). I may try building a bare-bones PPM for the next > release. Haven't heard much from Sendu on that last bit, but it's > the beginning of the fall semester so lots of the academics (me > included) are pretty busy! > > Chris > > On Sep 15, 2006, at 7:58 PM, Mauricio Herrera Cuadra wrote: > >> I can handle that Chris. Do you have some directory setup in mind? >> What >> would you suggest to be moved? >> >> Mauricio. >> >> Chris Fields wrote: >>> I noticed that the BioPerl site where regular distributions are >>> kept is a >>> little, um, unkempt. Who would we talk to about maybe moving some >>> of the >>> older (pre-1.4) bioperl releases into a separate directory? Chris >>> D. maybe? >>> >>> http://www.bioperl.org/DIST/ >>> >>> Chris >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From n.haigh at sheffield.ac.uk Sat Sep 16 17:03:53 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Sat, 16 Sep 2006 22:03:53 +0100 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450C4D82.9050106@campus.iztacala.unam.mx> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> Message-ID: <450C66B9.4030801@sheffield.ac.uk> Mauricio Herrera Cuadra wrote: > Chris, > > I've done the initial moves and created new signature files for each > directory. Currently I've doubts on what to do with the following files: > > - Bioperl.ppd (what does this stands for?) > - GD-SVG-0.25-ppm.tar.gz (where does this came from?) > - GD-SVG.ppd (where does this came from?) > - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) > - package.lst (still in use? needs updating? be splitted into > corresponding subdirectories?) > > Cheers, > Mauricio. > > Bioperl.ppd and bioperl-1.5-ppm.tar.gz are the files required for users wishing to install Bioperl 1.5 via ppm (i.e. mainly windows users). The same is true for GD-SVG* files which allow these users to install this module which if i remember correctly is required by Bio::Graphics::*. If i do remember correctly, this package wasn't available in any of the standard ppm repositories so was created just for the windows ppm users. They should probably be kept where they are at least until newer releases are created for these two modules. I think package.lst is required by the ppm software to identify the packages in the current directory/repository. This should be updated to contain only those packages in the same directory as this file. A new one then probably needs to be created for the old_releases directory. Someone may have a better idea of this, especially since ActiveState have been updating the ppm software etc, so feel free to correct me if needs be! Cheers Nathan From cjfields at uiuc.edu Sat Sep 16 20:13:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Sep 2006 19:13:19 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450C66B9.4030801@sheffield.ac.uk> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> Message-ID: There was a post on list recently about problems using the newer PPM GUI that comes with ActivePerl 5.8.819 (it wanted to downgrade all installed prereq modules already installed). I haven't tried it myself. I did find that the newer bioperl PPM (v. 1.5) didn't show up when I last used the command-line PPM for ActivePerl 5.8.817, so maybe package.lst needs to be updated. We can probably leave the PPM related stuff for now. I'm not sure whether anyone would be interested in having access to the older ones but we could probably make them available in the sub directory with a modified package.lst. We'll try getting a PPM for Windows for the next developer release and add it then. Chris On Sep 16, 2006, at 4:03 PM, Nathan S. Haigh wrote: > Mauricio Herrera Cuadra wrote: >> Chris, >> >> I've done the initial moves and created new signature files for each >> directory. Currently I've doubts on what to do with the following >> files: >> >> - Bioperl.ppd (what does this stands for?) >> - GD-SVG-0.25-ppm.tar.gz (where does this came from?) >> - GD-SVG.ppd (where does this came from?) >> - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) >> - package.lst (still in use? needs updating? be >> splitted into >> corresponding subdirectories?) >> >> Cheers, >> Mauricio. >> >> > Bioperl.ppd and bioperl-1.5-ppm.tar.gz are the files required for > users > wishing to install Bioperl 1.5 via ppm (i.e. mainly windows users). > The > same is true for GD-SVG* files which allow these users to install this > module which if i remember correctly is required by > Bio::Graphics::*. If > i do remember correctly, this package wasn't available in any of the > standard ppm repositories so was created just for the windows ppm > users. > They should probably be kept where they are at least until newer > releases are created for these two modules. > > I think package.lst is required by the ppm software to identify the > packages in the current directory/repository. This should be > updated to > contain only those packages in the same directory as this file. A new > one then probably needs to be created for the old_releases directory. > Someone may have a better idea of this, especially since ActiveState > have been updating the ppm software etc, so feel free to correct me if > needs be! > > Cheers > Nathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Sep 16 20:23:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Sep 2006 19:23:35 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450C4D82.9050106@campus.iztacala.unam.mx> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> Message-ID: <557A4AF9-B1B6-45B7-8806-FC0CC4A8A374@uiuc.edu> (smacking my head...) Mauricio, Forgot to mention: thanks for doing this! Again, we could leave the PPM files for now. I'll try looking into PPM to determine what we should do with the current ones. I looked at the package.lst file and it does need to be updated but we'll probably wait until the next release so we can add the 1.5.2 PPM once it's made. Chris On Sep 16, 2006, at 2:16 PM, Mauricio Herrera Cuadra wrote: > Chris, > > I've done the initial moves and created new signature files for each > directory. Currently I've doubts on what to do with the following > files: > > - Bioperl.ppd (what does this stands for?) > - GD-SVG-0.25-ppm.tar.gz (where does this came from?) > - GD-SVG.ppd (where does this came from?) > - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) > - package.lst (still in use? needs updating? be splitted > into > corresponding subdirectories?) > > Cheers, > Mauricio. > > Chris Fields wrote: >> A good start might be if we could move the pre-1.4 core releases to >> an 'old_releases' directory, along with old PPMs and so on (for >> Windows releases). I may try building a bare-bones PPM for the next >> release. Haven't heard much from Sendu on that last bit, but it's >> the beginning of the fall semester so lots of the academics (me >> included) are pretty busy! >> >> Chris >> >> On Sep 15, 2006, at 7:58 PM, Mauricio Herrera Cuadra wrote: >> >>> I can handle that Chris. Do you have some directory setup in mind? >>> What >>> would you suggest to be moved? >>> >>> Mauricio. >>> >>> Chris Fields wrote: >>>> I noticed that the BioPerl site where regular distributions are >>>> kept is a >>>> little, um, unkempt. Who would we talk to about maybe moving some >>>> of the >>>> older (pre-1.4) bioperl releases into a separate directory? Chris >>>> D. maybe? >>>> >>>> http://www.bioperl.org/DIST/ >>>> >>>> Chris >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Sat Sep 16 22:33:17 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 16 Sep 2006 21:33:17 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> Message-ID: <450CB3ED.5020201@campus.iztacala.unam.mx> Nathan & Chris, I've taken a look into Bioperl.ppd and it uses bioperl-1.5-ppm.tar.gz which according to the ppd file corresponds to the 1.5.0 release, so I renamed Bioperl.ppd to Bioperl-1.5.ppd and moved both of them into 'old_releases'. I've also edited all ppd files in 'old_releases' to modify their CODEBASE HREF's to the new locations. Copied package.lst into 'old_releases' and modified this also. The original package.lst in 'DIST' was copied to package.lst.old as a backup, so I could edit the first one to reflect directory moves. Also generated new signature files for both directories. Please take a look and tell me if all of this changes make sense. Cheers, Mauricio. Chris Fields wrote: > There was a post on list recently about problems using the newer PPM > GUI that comes with ActivePerl 5.8.819 (it wanted to downgrade all > installed prereq modules already installed). I haven't tried it > myself. I did find that the newer bioperl PPM (v. 1.5) didn't show > up when I last used the command-line PPM for ActivePerl 5.8.817, so > maybe package.lst needs to be updated. > > We can probably leave the PPM related stuff for now. I'm not sure > whether anyone would be interested in having access to the older ones > but we could probably make them available in the sub directory with a > modified package.lst. > > We'll try getting a PPM for Windows for the next developer release > and add it then. > > Chris > > On Sep 16, 2006, at 4:03 PM, Nathan S. Haigh wrote: > >> Mauricio Herrera Cuadra wrote: >>> Chris, >>> >>> I've done the initial moves and created new signature files for each >>> directory. Currently I've doubts on what to do with the following >>> files: >>> >>> - Bioperl.ppd (what does this stands for?) >>> - GD-SVG-0.25-ppm.tar.gz (where does this came from?) >>> - GD-SVG.ppd (where does this came from?) >>> - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) >>> - package.lst (still in use? needs updating? be >>> splitted into >>> corresponding subdirectories?) >>> >>> Cheers, >>> Mauricio. >>> >>> >> Bioperl.ppd and bioperl-1.5-ppm.tar.gz are the files required for >> users >> wishing to install Bioperl 1.5 via ppm (i.e. mainly windows users). >> The >> same is true for GD-SVG* files which allow these users to install this >> module which if i remember correctly is required by >> Bio::Graphics::*. If >> i do remember correctly, this package wasn't available in any of the >> standard ppm repositories so was created just for the windows ppm >> users. >> They should probably be kept where they are at least until newer >> releases are created for these two modules. >> >> I think package.lst is required by the ppm software to identify the >> packages in the current directory/repository. This should be >> updated to >> contain only those packages in the same directory as this file. A new >> one then probably needs to be created for the old_releases directory. >> Someone may have a better idea of this, especially since ActiveState >> have been updating the ppm software etc, so feel free to correct me if >> needs be! >> >> Cheers >> Nathan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun Sep 17 00:01:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Sep 2006 23:01:27 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450CB3ED.5020201@campus.iztacala.unam.mx> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> Message-ID: Mauricio, I personally don't have a problem with the changes. Bioperl 1.5 had serious problems anyway. It's a shame we don't have a 1.5.1 ppm but when we release 1.5.2 we'll add one to DIST. Nathan, what do you think? When I make up a local ppm package for installation on WinXP, I maunally edit the XML generated (the ppd file) to make it more consistent with my local distribution, including modifying the prereqs. I think we can also include an installation script with the ppm (which I haven't played around with, but Scott Cain has done this with the Generic Genome Browser PPM package). When we start making release candidates I can try packaging everything up for Windows and have you add it to the main directory, then modify the package.lst to point to the newer ppms as well as Text::ShellWords and GD::SVG. We can have the old package.lst (in old_releases) point at the older distributions. Chris On Sep 16, 2006, at 9:33 PM, Mauricio Herrera Cuadra wrote: > Nathan & Chris, > > I've taken a look into Bioperl.ppd and it uses bioperl-1.5- > ppm.tar.gz which according to the ppd file corresponds to the 1.5.0 > release, so I renamed Bioperl.ppd to Bioperl-1.5.ppd and moved both > of them into 'old_releases'. > > I've also edited all ppd files in 'old_releases' to modify their > CODEBASE HREF's to the new locations. Copied package.lst into > 'old_releases' and modified this also. > > The original package.lst in 'DIST' was copied to package.lst.old as > a backup, so I could edit the first one to reflect directory moves. > Also generated new signature files for both directories. > > Please take a look and tell me if all of this changes make sense. > > Cheers, > Mauricio. > > Chris Fields wrote: >> There was a post on list recently about problems using the newer >> PPM GUI that comes with ActivePerl 5.8.819 (it wanted to >> downgrade all installed prereq modules already installed). I >> haven't tried it myself. I did find that the newer bioperl PPM >> (v. 1.5) didn't show up when I last used the command-line PPM for >> ActivePerl 5.8.817, so maybe package.lst needs to be updated. >> We can probably leave the PPM related stuff for now. I'm not >> sure whether anyone would be interested in having access to the >> older ones but we could probably make them available in the sub >> directory with a modified package.lst. >> We'll try getting a PPM for Windows for the next developer >> release and add it then. >> Chris >> On Sep 16, 2006, at 4:03 PM, Nathan S. Haigh wrote: >>> Mauricio Herrera Cuadra wrote: >>>> Chris, >>>> >>>> I've done the initial moves and created new signature files for >>>> each >>>> directory. Currently I've doubts on what to do with the >>>> following files: >>>> >>>> - Bioperl.ppd (what does this stands for?) >>>> - GD-SVG-0.25-ppm.tar.gz (where does this came from?) >>>> - GD-SVG.ppd (where does this came from?) >>>> - bioperl-1.5-ppm.tar.gz (1.5.0 or 1.5.1 release?) >>>> - package.lst (still in use? needs updating? be >>>> splitted into >>>> corresponding subdirectories?) >>>> >>>> Cheers, >>>> Mauricio. >>>> >>>> >>> Bioperl.ppd and bioperl-1.5-ppm.tar.gz are the files required >>> for users >>> wishing to install Bioperl 1.5 via ppm (i.e. mainly windows >>> users). The >>> same is true for GD-SVG* files which allow these users to install >>> this >>> module which if i remember correctly is required by >>> Bio::Graphics::*. If >>> i do remember correctly, this package wasn't available in any of the >>> standard ppm repositories so was created just for the windows >>> ppm users. >>> They should probably be kept where they are at least until newer >>> releases are created for these two modules. >>> >>> I think package.lst is required by the ppm software to identify the >>> packages in the current directory/repository. This should be >>> updated to >>> contain only those packages in the same directory as this file. A >>> new >>> one then probably needs to be created for the old_releases >>> directory. >>> Someone may have a better idea of this, especially since ActiveState >>> have been updating the ppm software etc, so feel free to correct >>> me if >>> needs be! >>> >>> Cheers >>> Nathan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Sun Sep 17 06:09:55 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Sun, 17 Sep 2006 11:09:55 +0100 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> Message-ID: <450D1EF3.6060607@sheffield.ac.uk> Chris Fields wrote: > Mauricio, > > I personally don't have a problem with the changes. Bioperl 1.5 had > serious problems anyway. It's a shame we don't have a 1.5.1 ppm but > when we release 1.5.2 we'll add one to DIST. Nathan, what do you think? > > When I make up a local ppm package for installation on WinXP, I > maunally edit the XML generated (the ppd file) to make it more > consistent with my local distribution, including modifying the > prereqs. I think we can also include an installation script with the > ppm (which I haven't played around with, but Scott Cain has done this > with the Generic Genome Browser PPM package). When we start making > release candidates I can try packaging everything up for Windows and > have you add it to the main directory, then modify the package.lst to > point to the newer ppms as well as Text::ShellWords and GD::SVG. We > can have the old package.lst (in old_releases) point at the older > distributions. > > Chris > I don't have any immediate problems about the changes either. I've seen a few posts recently about installing Bioperl on Windows - how soon will 1.5.2 been released? It's not too difficult to generate a new ppd file etc so I could make a barebones ppd for 1.5.1? I know when I made the ppd file for 1.5 I included a lot of prereqs to ensure that most of bioperl would work without the need to manually install the modules later once the user found out that something didn't work. Personally, despite the need to download and install a lot more packages in one sitting, I thought this was important since Windows users that install bioperl are probably (or more likely) not from a programming background (no offence intended if your a whiz programmer working in Windows! :o) ). Therefore, their first experience of bioperl would get off to a better start if everything worked out of the hat after its installation, despite having a longer/bigger install. What do you think? What is the state of play with regards to tracking dependencies? I've just noticed that Makefile.PL has a lot more packages in %packages, is this a complete list of prereqs? If so, could they be added to PREREQ_PM in the WriteMakefile sub in order to make it easier for generating a ppd with a complete prereqs list? Nath From faustov at yahoo.com Sun Sep 17 09:06:54 2006 From: faustov at yahoo.com (Fausto Rodríguez Zapata) Date: Sun, 17 Sep 2006 06:06:54 -0700 (PDT) Subject: [Bioperl-l] DNA sequence assembly for SNP analysis In-Reply-To: <66EBAADD-35CC-4724-B1BC-7B4FE3390F1A@ohsu.edu> Message-ID: <20060917130655.84383.qmail@web56403.mail.re3.yahoo.com> I've found some people that had made po perl scripts for extracting SNPs from CAP3 ACE assembly files: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12584131 http://nar.oxfordjournals.org/cgi/content/full/33/suppl_2/W493 --- Thomas J Keller wrote: > Greetings, > What are people using for DNA sequence assembly for SNP analysis > these days? > > Thanks, > Tom K > > > Thomas J. Keller, Ph.D. > Director, MMI Core Facility > Oregon Health & Science University > 3181 SW Sam Jackson Park Rd. > Portland, OR, USA, 97239 > > http://www.ohsu.edu/research/core > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun Sep 17 09:56:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 17 Sep 2006 08:56:54 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450D1EF3.6060607@sheffield.ac.uk> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> Message-ID: On Sep 17, 2006, at 5:09 AM, Nathan S. Haigh wrote: ... > I don't have any immediate problems about the changes either. I've > seen > a few posts recently about installing Bioperl on Windows - how soon > will > 1.5.2 been released? It's not too difficult to generate a new ppd file > etc so I could make a barebones ppd for 1.5.1? > > I know when I made the ppd file for 1.5 I included a lot of prereqs to > ensure that most of bioperl would work without the need to manually > install the modules later once the user found out that something > didn't > work. Personally, despite the need to download and install a lot more > packages in one sitting, I thought this was important since Windows > users that install bioperl are probably (or more likely) not from a > programming background (no offence intended if your a whiz programmer > working in Windows! :o) ). Therefore, their first experience of > bioperl > would get off to a better start if everything worked out of the hat > after its installation, despite having a longer/bigger install. > What do > you think? I'm not sure how the new GUI version of PPM (PPM4) affects installation. If we keep the prereqs in we might want to remove the version requirements; a previous poster stated that when they attempted installation of Bioperl 1.4 it wanted to downgrade the dependencies, likely to the versions listed in the ppd. I have never seen it do that before, so it could be something to do with the new PPM version. http://bioperl.org/pipermail/bioperl-l/2006-September/023002.html > > What is the state of play with regards to tracking dependencies? I've > just noticed that Makefile.PL has a lot more packages in %packages, is > this a complete list of prereqs? If so, could they be added to > PREREQ_PM > in the WriteMakefile sub in order to make it easier for generating > a ppd > with a complete prereqs list? > > Nath Many of those are included with current versions of ActivePerl, like XML::Simple, libwww-perl, etc. We are also planning on having a minimal requirement of perl 5.6.1 (Features parsing requires recursive regexes, which aren't found before 5.6.1). All the tests pass so far with ActivePerl. I haven't tried anything with CygWin, though. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Sep 17 10:00:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 17 Sep 2006 09:00:34 -0500 Subject: [Bioperl-l] DNA sequence assembly for SNP analysis In-Reply-To: <20060917130655.84383.qmail@web56403.mail.re3.yahoo.com> References: <20060917130655.84383.qmail@web56403.mail.re3.yahoo.com> Message-ID: <26A38D5B-D1EB-4FF8-AEDA-A3E275AE5184@uiuc.edu> Bioperl has some SNP tools but I'm not sure how up-to-date they are. Most of those are accessible via Bio::Cluster, Bio::ClusterIO, and Bio::Variation modules. Chris On Sep 17, 2006, at 8:06 AM, Fausto Rodr?guez Zapata wrote: > I've found some people that had made po perl scripts for extracting > SNPs > from CAP3 ACE assembly files: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12584131 > > http://nar.oxfordjournals.org/cgi/content/full/33/suppl_2/W493 > > > > > --- Thomas J Keller wrote: > >> Greetings, >> What are people using for DNA sequence assembly for SNP analysis >> these days? >> >> Thanks, >> Tom K >> >> >> Thomas J. Keller, Ph.D. >> Director, MMI Core Facility >> Oregon Health & Science University >> 3181 SW Sam Jackson Park Rd. >> Portland, OR, USA, 97239 >> >> http://www.ohsu.edu/research/core >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Sun Sep 17 14:46:38 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sun, 17 Sep 2006 13:46:38 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450D1EF3.6060607@sheffield.ac.uk> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> Message-ID: <450D980E.4050300@campus.iztacala.unam.mx> Nathan S. Haigh wrote: > I don't have any immediate problems about the changes either. I've > seen a few posts recently about installing Bioperl on Windows - how > soon will 1.5.2 been released? It's not too difficult to generate a > new ppd file etc so I could make a barebones ppd for 1.5.1? Go ahead if you want, I'll gladly upload it to the DIST directory. > I know when I made the ppd file for 1.5 I included a lot of prereqs > to ensure that most of bioperl would work without the need to > manually install the modules later once the user found out that > something didn't work. Personally, despite the need to download and > install a lot more packages in one sitting, I thought this was > important since Windows users that install bioperl are probably (or > more likely) not from a programming background (no offence intended > if your a whiz programmer working in Windows! :o) ). Therefore, their > first experience of bioperl would get off to a better start if > everything worked out of the hat after its installation, despite > having a longer/bigger install. What do you think? Definitely this would be better, not only for Windows installations :) > What is the state of play with regards to tracking dependencies? I've > just noticed that Makefile.PL has a lot more packages in %packages, > is this a complete list of prereqs? If so, could they be added to > PREREQ_PM in the WriteMakefile sub in order to make it easier for > generating a ppd with a complete prereqs list? > > Nath Supposedly it's complete, so they should be the same ones to add into the ppd file. Take a look into Bioperl-1.5.ppd in the 'old_releases' directory, since it's for the 1.5.0 release, the list should be very similar to the one you'll have to create. Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun Sep 17 17:03:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 17 Sep 2006 16:03:03 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450D980E.4050300@campus.iztacala.unam.mx> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> <450D980E.4050300@campus.iztacala.unam.mx> Message-ID: <06790DBA-3DA5-49DD-AD83-D5613EE381A7@uiuc.edu> On Sep 17, 2006, at 1:46 PM, Mauricio Herrera Cuadra wrote: > Nathan S. Haigh wrote: >> I don't have any immediate problems about the changes either. I've >> seen a few posts recently about installing Bioperl on Windows - how >> soon will 1.5.2 been released? It's not too difficult to generate a >> new ppd file etc so I could make a barebones ppd for 1.5.1? > > Go ahead if you want, I'll gladly upload it to the DIST directory. If so, we'll need to update package.lst to include it as well. > >> I know when I made the ppd file for 1.5 I included a lot of prereqs >> to ensure that most of bioperl would work without the need to >> manually install the modules later once the user found out that >> something didn't work. Personally, despite the need to download and >> install a lot more packages in one sitting, I thought this was >> important since Windows users that install bioperl are probably (or >> more likely) not from a programming background (no offence intended >> if your a whiz programmer working in Windows! :o) ). Therefore, their >> first experience of bioperl would get off to a better start if >> everything worked out of the hat after its installation, despite >> having a longer/bigger install. What do you think? > > Definitely this would be better, not only for Windows installations :) Make sure that the required packages and versions are available for Windows as well. As for other systems, we could always ask Chris D. to add any extra dependencies to Bundle::Bioperl on CPAN in anticipation for this release. > >> What is the state of play with regards to tracking dependencies? I've >> just noticed that Makefile.PL has a lot more packages in %packages, >> is this a complete list of prereqs? If so, could they be added to >> PREREQ_PM in the WriteMakefile sub in order to make it easier for >> generating a ppd with a complete prereqs list? >> >> Nath > > Supposedly it's complete, so they should be the same ones to add into > the ppd file. Take a look into Bioperl-1.5.ppd in the 'old_releases' > directory, since it's for the 1.5.0 release, the list should be very > similar to the one you'll have to create. > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM There may be a few more; I need to add XML::Simple myself. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Sep 17 18:17:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 17 Sep 2006 17:17:13 -0500 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450DC7E6.4030505@sheffield.ac.uk> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> <450D980E.4050300@campus.iztacala.unam.mx> <06790DBA-3DA5-49DD-AD83-D5613EE381A7@uiuc.edu> <450DC7E6.4030505@sheffield.ac.uk> Message-ID: <0FB7F0D2-A3E5-4CE4-B7F4-BE3E1DABAFEA@uiuc.edu> >> ... > OK, I'll have a look into this sometime this week. Shall I go for a > prereqs of Perl 5.6.1? That way I'll ensure I do tests on a virtual > machine with a fresh install of Perl in order to check that > dependencies are satisfied. Should the test suite should show up > any dependencies that aren't satisfied or would the be silently > skipped? I'll also try to check multiple Perl versions from > whatever minimal version you suggest. > > What exactly should/needs to go into Bundle::Bioperl and why is it > needed? > > Nath I had mentioned to Chris D. recently that we may add a few dependencies. XML::Simple is needed by Bio::DB::EUtilities. There are several more that are present on the wiki: http://www.bioperl.org/wiki/ Installing_Bioperl_for_Unix#DEPENDENCIES_AND_Bundle::BioPerl I have a feeling that not all of them are in Bundle::Bioperl yet. At some point we should think about adding a script to the PPM distro, similar to Makefile.PL, that checks dependencies and installs scripts. This currently works using 'perl Makefile.PL' so there should some way of doing this (I thought this may be present in an old bioperl ppm package but I may be wrong. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Sun Sep 17 18:10:46 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Sun, 17 Sep 2006 23:10:46 +0100 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <06790DBA-3DA5-49DD-AD83-D5613EE381A7@uiuc.edu> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> <450D980E.4050300@campus.iztacala.unam.mx> <06790DBA-3DA5-49DD-AD83-D5613EE381A7@uiuc.edu> Message-ID: <450DC7E6.4030505@sheffield.ac.uk> Chris Fields wrote: > > On Sep 17, 2006, at 1:46 PM, Mauricio Herrera Cuadra wrote: > >> Nathan S. Haigh wrote: >>> I don't have any immediate problems about the changes either. I've >>> seen a few posts recently about installing Bioperl on Windows - how >>> soon will 1.5.2 been released? It's not too difficult to generate a >>> new ppd file etc so I could make a barebones ppd for 1.5.1? >> >> Go ahead if you want, I'll gladly upload it to the DIST directory. > > If so, we'll need to update package.lst to include it as well. > >> >>> I know when I made the ppd file for 1.5 I included a lot of prereqs >>> to ensure that most of bioperl would work without the need to >>> manually install the modules later once the user found out that >>> something didn't work. Personally, despite the need to download and >>> install a lot more packages in one sitting, I thought this was >>> important since Windows users that install bioperl are probably (or >>> more likely) not from a programming background (no offence intended >>> if your a whiz programmer working in Windows! :o) ). Therefore, their >>> first experience of bioperl would get off to a better start if >>> everything worked out of the hat after its installation, despite >>> having a longer/bigger install. What do you think? >> >> Definitely this would be better, not only for Windows installations :) > > Make sure that the required packages and versions are available for > Windows as well. As for other systems, we could always ask Chris D. > to add any extra dependencies to Bundle::Bioperl on CPAN in > anticipation for this release. > >> >>> What is the state of play with regards to tracking dependencies? I've >>> just noticed that Makefile.PL has a lot more packages in %packages, >>> is this a complete list of prereqs? If so, could they be added to >>> PREREQ_PM in the WriteMakefile sub in order to make it easier for >>> generating a ppd with a complete prereqs list? >>> >>> Nath >> >> Supposedly it's complete, so they should be the same ones to add into >> the ppd file. Take a look into Bioperl-1.5.ppd in the 'old_releases' >> directory, since it's for the 1.5.0 release, the list should be very >> similar to the one you'll have to create. >> >> Mauricio. >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM > > There may be a few more; I need to add XML::Simple myself. > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > OK, I'll have a look into this sometime this week. Shall I go for a prereqs of Perl 5.6.1? That way I'll ensure I do tests on a virtual machine with a fresh install of Perl in order to check that dependencies are satisfied. Should the test suite should show up any dependencies that aren't satisfied or would the be silently skipped? I'll also try to check multiple Perl versions from whatever minimal version you suggest. What exactly should/needs to go into Bundle::Bioperl and why is it needed? Nath From bix at sendu.me.uk Sun Sep 17 18:46:29 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 17 Sep 2006 23:46:29 +0100 Subject: [Bioperl-l] Cleanup of BioPerl distribution website In-Reply-To: <450DC7E6.4030505@sheffield.ac.uk> References: <000001c6d90f$772729f0$15327e82@pyrimidine> <450B4C33.1000704@campus.iztacala.unam.mx> <316F3F86-C106-48DA-AAC0-B6D758012DCD@uiuc.edu> <450C4D82.9050106@campus.iztacala.unam.mx> <450C66B9.4030801@sheffield.ac.uk> <450CB3ED.5020201@campus.iztacala.unam.mx> <450D1EF3.6060607@sheffield.ac.uk> <450D980E.4050300@campus.iztacala.unam.mx> <06790DBA-3DA5-49DD-AD83-D5613EE381A7@uiuc.edu> <450DC7E6.4030505@sheffield.ac.uk> Message-ID: <450DD045.8090702@sendu.me.uk> Nathan S. Haigh wrote: > Chris Fields wrote: >> On Sep 17, 2006, at 1:46 PM, Mauricio Herrera Cuadra wrote: >> >>> Nathan S. Haigh wrote: >>>> I don't have any immediate problems about the changes either. I've >>>> seen a few posts recently about installing Bioperl on Windows - how >>>> soon will 1.5.2 been released? I'm looking at the 25th for RC1. > OK, I'll have a look into this sometime this week. Shall I go for a > prereqs of Perl 5.6.1? That way I'll ensure I do tests on a virtual > machine with a fresh install of Perl in order to check that dependencies > are satisfied. Btw, it would be good to see what the test results are for 5.6.1. Please add them to http://www.bioperl.org/wiki/Release_1.5.2 if you do any. Cheers! From zchou at cau.edu.cn Sat Sep 16 23:18:20 2006 From: zchou at cau.edu.cn (zchou at cau.edu.cn) Date: Sun, 17 Sep 2006 11:18:20 +0800 Subject: [Bioperl-l] failed install Bioperl on linux Message-ID: <1742ccc1749f74.1749f741742ccc@cau.edu.cn> Hello, manager, I install bioperl from internet by using CPAN or download Bioperl-run-1.5.1.tar.gz into my local pc, however, all these methods falied. It seems that the seem error occured. How can I deal with it? (1) Local machine, I use 'root' to install RH9 make: Nothing to be done for `Makefile.PL'. (2) Network Checking if your kit is complete... Looks good Writing Makefile for Bio make: *** No rule to make target `perl1'. Stop. /usr/bin/make perl1 -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible Thanks a lot, Zhuocheng From torsten.seemann at infotech.monash.edu.au Sun Sep 17 20:36:48 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 18 Sep 2006 10:36:48 +1000 Subject: [Bioperl-l] generate ptt file from Genbank file In-Reply-To: References: Message-ID: <450DEA20.3000103@infotech.monash.edu.au> Rafi, > I am trying to generate a .ptt file like the NCBI ptt file, which basically contains the gene co-ordiante information, its strand, name. I have a Genbank file from which i want to generate this ptt file. > Is there any BioPerl module which can do the same, or any sample script which I can may be modify and use. > Thanks in advance for your reply. I don't think there is any BioPerl script to do it. And Bio::FeatureIO doesn't support PTT - I will try and add it soon. Until then, below is a sample script to work with! Hope it helps, --Torsten #!/usr/bin/perl -w use strict; use Bio::SeqIO; # This script takes a GenBank file as input, and produces a # NCBI PTT file (protein table) as output. A PTT file is # a line based, tab separated format with fixed column types. # # Written by Torsten Seemann # 18 September 2006 my $gbk = Bio::SeqIO->new(-fh=>\*STDIN, -format=>'genbank'); my $seq = $gbk->next_seq; my @cds = grep { $_->primary_tag eq 'CDS' } $seq->get_SeqFeatures; print $seq->description, " - 0..",$seq->length,"\n"; print scalar(@cds)," proteins\n"; print join("\t", qw(Location Strand Length PID Gene Synonym Code COG Product)),"\n"; for my $f (@cds) { my $gi = '-'; $gi = $1 if tag($f, 'db_xref') =~ m/\bGI:(\d+)\b/; my $cog = '-'; $cog = $1 if tag($f, 'product') =~ m/^(COG\S+)/; my @col = ( $f->start.'..'.$f->end, $f->strand >= 0 ? '+' : '-', ($f->length/3)-1, $gi, tag($f, 'gene'), tag($f, 'locus_tag'), $cog, tag($f, 'product'), ); print join("\t", @col), "\n"; } sub tag { my($f, $tag) = @_; return '-' unless $f->has_tag($tag); return join(' ', $f->get_tag_values($tag)); } From cmlapid at up.edu.ph Sun Sep 17 23:21:05 2006 From: cmlapid at up.edu.ph (Carlo Lapid) Date: Mon, 18 Sep 2006 11:21:05 +0800 Subject: [Bioperl-l] using Bio::Structure::IO for homology models Message-ID: Hi, I used the SWISS-MODEL homology modelling server to create a PDB structure file for a protein sequence that I want to analyze. I then tried to use Bioperl to open and manipulate that PDB file. Unfortunately, I get the following error: 'x' outside of string in unpack at C:/Perl/site/lib/Bio/Structure/IO/pdb.pm line 141, line 2. I don't get the same problem when I try to open a PDB file that I obtained directly from the PDB database. What does this error mean? Is it impossible to use Bioperl to manipulate structures created from homology modelling? The code that I used so far is pretty short and straightforward: use strict; use warnings; use Bio::Structure::IO; my $structureio = Bio::Structure::IO->new(-file => "structure.pdb"); my $structure = $structureio->next_structure; Any help would be really appreciated. Thanks, Carlo From bix at sendu.me.uk Mon Sep 18 02:43:08 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Sep 2006 07:43:08 +0100 Subject: [Bioperl-l] failed install Bioperl on linux In-Reply-To: <1742ccc1749f74.1749f741742ccc@cau.edu.cn> References: <1742ccc1749f74.1749f741742ccc@cau.edu.cn> Message-ID: <450E3FFC.9090403@sendu.me.uk> zchou at cau.edu.cn wrote: > Hello, manager, > > I install bioperl from internet by using CPAN or download > Bioperl-run-1.5.1.tar.gz into my local pc, however, all these methods > falied. > > It seems that the seem error occured. How can I deal with it? > > (1) Local machine, I use 'root' to install > RH9 make: Nothing to be done for `Makefile.PL'. > > (2) Network > Checking if your kit is complete... > Looks good > Writing Makefile for Bio > make: *** No rule to make target `perl1'. Stop. > /usr/bin/make perl1 -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible You need to say perl Makefile.PL to generate the make file, then make make test make install From cjfields at uiuc.edu Mon Sep 18 11:02:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Sep 2006 10:02:32 -0500 Subject: [Bioperl-l] Bio::Location::Split question Message-ID: <000901c6db33$7aa226e0$15327e82@pyrimidine> This is a general question about how locations are described for split locations, so whoever has an opinion, please chip in. This is particularly pertinent to GenBank/EMBL/swiss formats. Okay, stick with me here... A pretty interesting question was raised while I was working on a bug (bug 1953), which deals with split location data with the following formats: join(complement(1..100),complement(201..300),complement(401-500)) complement(join(1..100,201..300,401..500)) GenBank acc #AL137247 has examples of both, if you want a real example. According to BioPerl these are syntactically the same (look at the last few tests in LocationFactory.t). However, according to GenBank (and the rationale outlined in bug 1953), these are actually quite different. Acc. to the GenBank/EMBL/DDBJ feature table definition, the use of the operator 'join' entails that the segments in the following parentheses are joined in the order presented ('placed end-to-end'), whereas the use of 'complement' uses the complementary strand of the segment in parentheses. So, the operator tells one how to treat the sequence data using the locations shown. Here are examples from the definition: ... complement(join(2691..4571,4918..5163)) Joins regions 2691 to 4571 and 4918 to 5163, then complements the joined segments (the feature is on the strand complementary to the presented strand) join(complement(4918..5163),complement(2691..4571)) Complements regions 4918 to 5163 and 2691 to 4571, then joins the complemented segments (the feature is on the strand complementary to the presented strand) ... Using this rational, substituting in letters for clarity and lower case to indicate the complement strand: Location #1 : join(complement(A..B),complement(C..D),complement(E..F)) would be: join(b..a,d..c,f..e) and the following: Location # 2: complement(join(A..B,C..D,E..F) would be: join(f..e,d..c,b..a) The current behavior of Bio::Location::Split propogates the strand information (flips) to the sublocations w/o resorting them. We could sort them, but wouldn't it be much simpler to not propogate strand changes at all? Seems we're making it more complicated than it actually is. Thoughts? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From fgarret at ub.edu Mon Sep 18 10:48:40 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Sep 2006 16:48:40 +0200 Subject: [Bioperl-l] DB::GFF Message-ID: <450EB1C8.409@ub.edu> Hi all, When trying to fetch some GFF features from a GFF DB there are some options that do not seem to be working well. In Flybase, on the - strand features, the start is bigger than the stop positions (they assume the start of the feature rather than the start of the sequence). I managed to fix it by using the '$db->absolute(1)'. However, when performing a '$segment->refseq($segment);' (to reset the positions) the absolute option is ignored and returns to "Flybase style" positions. Here's the code I'm using 'use Bio::DB::GFF; use Bio::Tools::GFF; # Open the sequence database my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', -user => 'user', -pass => 'pass') || die("database open failed"); $db->absolute(1); my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); $segment->refseq($segment); my @features = $segment->features(-types => ['gene', 'exon', 'intron', 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; print(scalar(@features)."\n"); my $gffio = Bio::Tools::GFF->new(-file => '>'.'gff.out.gff', -gff_version => 3); $gffio->write_feature(@features); exit(1);' Does anyone knows what can be going wrong? Thanks in adv, FG From cjfields at uiuc.edu Mon Sep 18 11:09:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Sep 2006 10:09:43 -0500 Subject: [Bioperl-l] failed install Bioperl on linux In-Reply-To: <1742ccc1749f74.1749f741742ccc@cau.edu.cn> Message-ID: <000a01c6db34$7cefa3e0$15327e82@pyrimidine> > Hello, manager, Why do I feel like I work at MacDonald's? > I install bioperl from internet by using CPAN or download > Bioperl-run-1.5.1.tar.gz into my local pc, however, all these methods > falied. Bioperl == Core package http://www.bioperl.org/wiki/Core_package Bioperl != Run package (aka bioperl-run) http://www.bioperl.org/wiki/Run_package Very different packages! > It seems that the seem error occured. How can I deal with it? > > (1) Local machine, I use 'root' to install > RH9 make: Nothing to be done for `Makefile.PL'. > > (2) Network > Checking if your kit is complete... > Looks good > Writing Makefile for Bio > make: *** No rule to make target `perl1'. Stop. > /usr/bin/make perl1 -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible > > Thanks a lot, > Zhuocheng Sendu covers this one. Good luck! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From lincoln.stein at gmail.com Mon Sep 18 11:26:53 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 18 Sep 2006 15:26:53 +0000 Subject: [Bioperl-l] DB::GFF In-Reply-To: <450EB1C8.409@ub.edu> References: <450EB1C8.409@ub.edu> Message-ID: <6dce9a0b0609180826m43863aa4ma8a6f82efb326815@mail.gmail.com> If you use refseq() to set relative addresses, then you must be prepared to have start features that are greater than end features. This is so that the identity of a coordinate doesn't change depending on the direction you are looking in! Lincoln On 9/18/06, Filipe Garrett wrote: > > Hi all, > > When trying to fetch some GFF features from a GFF DB there are some > options that do not seem to be working well. > > In Flybase, on the - strand features, the start is bigger than the stop > positions (they assume the start of the feature rather than the start of > the sequence). > > I managed to fix it by using the '$db->absolute(1)'. However, when > performing a '$segment->refseq($segment);' (to reset the positions) the > absolute option is ignored and returns to "Flybase style" positions. > > Here's the code I'm using > > 'use Bio::DB::GFF; > use Bio::Tools::GFF; > > # Open the sequence database > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > -dsn => 'dbi:mysql:dmel_gff:123.456.78.90', > -user => 'user', > -pass => 'pass') || die("database open > failed"); > > $db->absolute(1); > my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); > $segment->refseq($segment); > my @features = $segment->features(-types => ['gene', 'exon', 'intron', > 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; > print(scalar(@features)."\n"); > > my $gffio = Bio::Tools::GFF->new(-file => '>'.'gff.out.gff', > -gff_version => 3); > $gffio->write_feature(@features); > > exit(1);' > > > Does anyone knows what can be going wrong? > > Thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Schragi at gmail.com Mon Sep 18 05:26:22 2006 From: Schragi at gmail.com (Schragi) Date: Mon, 18 Sep 2006 02:26:22 -0700 (PDT) Subject: [Bioperl-l] Downloading multiple contigs using bioperl Message-ID: <6360351.post@talk.nabble.com> Hello, I think this might be a simple question - but I'm yet a novice... Is there any way I can download, automatically and at once, all contigs of a given genome in Genebank, and ideally merge them all into one file? Or do I have to download every contig separately in order to receive the full genome? In the latter case, is there some sort of list that provides the identifiers of all contigs of the genome I'm interested in? Thank you very much, Schragi -- View this message in context: http://www.nabble.com/Downloading-multiple-contigs-using-bioperl-tf2290014.html#a6360351 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Sep 18 12:13:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Sep 2006 11:13:37 -0500 Subject: [Bioperl-l] Downloading multiple contigs using bioperl In-Reply-To: <6360351.post@talk.nabble.com> Message-ID: <000001c6db3d$65f0b680$15327e82@pyrimidine> > Hello, > I think this might be a simple question - but I'm yet a novice... > > Is there any way I can download, automatically and at once, all contigs of > a > given genome in Genebank, and ideally merge them all into one file? Or do > I > have to download every contig separately in order to receive the full > genome? > > In the latter case, is there some sort of list that provides the > identifiers > of all contigs of the genome I'm interested in? > > Thank you very much, > Schragi It depends on the type of sequence record. WGS files contain WGS line annotation which gives a range of sequence records that can be retrieved: LOCUS AAFC03000000 131728 rc DNA linear MAM 28-AUG-2006 DEFINITION Bos taurus whole genome shotgun sequencing project. ACCESSION AAFC00000000 VERSION AAFC00000000.3 GI:112180191 KEYWORDS WGS. .... FEATURES Location/Qualifiers source 1..131728 /organism="Bos taurus" /mol_type="genomic DNA" /isolate="L1 Dominette 01449" /db_xref="taxon:9913" /sex="female" /note="breed: Hereford" WGS AAFC03000001-AAFC03131728 WGS_SCAFLD CM000177-CM000206 WGS_SCAFLD CH974204-CH980624 // The WGS line is the range of single sequences and the scaffolds represent different scaffold or supercontig builds. The contig files contain the list of subsequences for the build (which can be pretty complex), but these aren't necessary if you want the sequence itself. That can be retrieved directly from GenBank using Bio::DB::GenBank with the default settings; if you use the web Entrez interface you can get the full sequences by selecting the format 'GenBank(full)'. Depending on what you are after, you may be better off downloading the sequences via ftp, though. Some of these files are very large (~100 MB or more). Retrieval via Bio::DB::GenBank converts everything into BioPerl objects before saving, so these files may take a long time if they work at all. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Mon Sep 18 15:39:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 18 Sep 2006 15:39:05 -0400 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <000901c6db33$7aa226e0$15327e82@pyrimidine> References: <000901c6db33$7aa226e0$15327e82@pyrimidine> Message-ID: <7D987AA3-1C2B-4454-81BA-C0F8DB87EA6E@gmx.net> I'm not sure what you're suggesting. Are you suggesting that the examples are not identical in resulting DNA sequence (because in my book they are, because the order of segments is reversed in the second example). Or are you suggesting that there is a bug in how BioPerl resolves split locations? Or both? -hilmar On Sep 18, 2006, at 11:02 AM, Chris Fields wrote: > This is a general question about how locations are described for split > locations, so whoever has an opinion, please chip in. This is > particularly > pertinent to GenBank/EMBL/swiss formats. Okay, stick with me here... > > A pretty interesting question was raised while I was working on a > bug (bug > 1953), which deals with split location data with the following > formats: > > join(complement(1..100),complement(201..300),complement(401-500)) > > complement(join(1..100,201..300,401..500)) > > GenBank acc #AL137247 has examples of both, if you want a real > example. > > According to BioPerl these are syntactically the same (look at the > last few > tests in LocationFactory.t). However, according to GenBank (and the > rationale outlined in bug 1953), these are actually quite different. > > Acc. to the GenBank/EMBL/DDBJ feature table definition, the use of the > operator 'join' entails that the segments in the following > parentheses are > joined in the order presented ('placed end-to-end'), whereas the > use of > 'complement' uses the complementary strand of the segment in > parentheses. > So, the operator tells one how to treat the sequence data using the > locations shown. > > Here are examples from the definition: > > ... > > complement(join(2691..4571,4918..5163)) > Joins regions 2691 to 4571 and 4918 to > 5163, then > complements the joined segments (the > feature is on > the > strand complementary to the presented > strand) > > join(complement(4918..5163),complement(2691..4571)) > Complements regions 4918 to 5163 and 2691 to 4571, > then > joins the complemented segments (the > feature is on > the > strand complementary to the presented > strand) > ... > > Using this rational, substituting in letters for clarity and lower > case to > indicate the complement strand: > > Location #1 : join(complement(A..B),complement(C..D),complement(E..F)) > > would be: > > join(b..a,d..c,f..e) > > and the following: > > Location # 2: complement(join(A..B,C..D,E..F) > > would be: > > join(f..e,d..c,b..a) > > The current behavior of Bio::Location::Split propogates the strand > information (flips) to the sublocations w/o resorting them. We > could sort > them, but wouldn't it be much simpler to not propogate strand > changes at > all? Seems we're making it more complicated than it actually is. > > Thoughts? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From skirov at utk.edu Mon Sep 18 13:39:34 2006 From: skirov at utk.edu (skirov) Date: Mon, 18 Sep 2006 13:39:34 -0400 Subject: [Bioperl-l] failed install Bioperl on linux Message-ID: <4527BD97@webmail.utk.edu> >===== Original Message From Chris Fields ===== >> Hello, manager, > >Why do I feel like I work at MacDonald's? > >> I install bioperl from internet by using CPAN or download >> Bioperl-run-1.5.1.tar.gz into my local pc, however, all these methods >> falied. > >Bioperl == Core package > >http://www.bioperl.org/wiki/Core_package > >Bioperl != Run package (aka bioperl-run) > >http://www.bioperl.org/wiki/Run_package > >Very different packages! > >> It seems that the seem error occured. How can I deal with it? >> >> (1) Local machine, I use 'root' to install >> RH9 make: Nothing to be done for `Makefile.PL'. >> You have run make Makefile.PL which is wrong. Do (as the docs state) perl Makefile.PL make install as easy as that. Stefan >> (2) Network >> Checking if your kit is complete... >> Looks good >> Writing Makefile for Bio >> make: *** No rule to make target `perl1'. Stop. >> /usr/bin/make perl1 -- NOT OK >> Running make test >> Can't test without successful make >> Running make install >> make had returned bad status, install seems impossible >> >> Thanks a lot, >> Zhuocheng > >Sendu covers this one. > >Good luck! > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Sep 18 17:55:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Sep 2006 16:55:38 -0500 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <7D987AA3-1C2B-4454-81BA-C0F8DB87EA6E@gmx.net> Message-ID: <000301c6db6d$30adc2d0$15327e82@pyrimidine> > I'm not sure what you're suggesting. > > Are you suggesting that the examples are not identical in resulting > DNA sequence (because in my book they are, because the order of > segments is reversed in the second example). > > Or are you suggesting that there is a bug in how BioPerl resolves > split locations? > > Or both? > > -hilmar This is from the GenBank/EMBL/DDBJ feature table definition for Locations: --------------------------------------- complement(join(2691..4571,4918..5163)) Joins regions 2691 to 4571 and 4918 to 5163, then complements the joined segments (the feature is on the strand complementary to the presented strand) join(complement(4918..5163),complement(2691..4571)) Complements regions 4918 to 5163 and 2691 to 4571, then joins the complemented segments (the feature is on the strand complementary to the presented strand) --------------------------------------- These two are the same only if the order of the locations is reversed, otherwise they aren't the same: complement(join(A..B,C..D)) join(complement(C..D),complement(A..B)) Both get the order, [dcba]. Normally only the first form is seen, and just about every time I see the second the location order is reversed, so they technically are the same. No problem there. However, if I take the two examples above, run them through FTLocationFactory, then use to_FTstring() to get the feature string, this is what I get: complement(join(2691..4571,4918..5163)) complement(join(4918..5163,2691..4571)) Which one is correct? From the above definition, I thought using 'join' implies that the order is important for joining the locations (at least according to the feature table definition above), starting from left to right irrespective of the location order on the sequence. Hence we have the two different variations. For Bioperl, do we always assume that the order of the locations in a join is in sequence order or in the order they appear in the original location string? And how do remote locations fit in here? It seems a simple reversal of the sublocation order should fix the above to be in the correct order for the join if we want to stick with one form. Even stranger, if they are remote locations they act differently (actually, they act somewhat correctly). If I add a faux remote location to the original strings above, this is what I get from the location object's to_FTstring(): complement(join(2691..4571,ABC1234.5:4918..5163)) join(complement(ABC1234.5:4918..5163),complement(2691..4571)) That way lies madness.... Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Mon Sep 18 18:26:51 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 18 Sep 2006 18:26:51 -0400 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <000301c6db6d$30adc2d0$15327e82@pyrimidine> References: <000301c6db6d$30adc2d0$15327e82@pyrimidine> Message-ID: <30D63782-E5EC-494C-A42E-3D1AC29043D8@gmx.net> On Sep 18, 2006, at 5:55 PM, Chris Fields wrote: > However, if I take the two examples above, run them through > FTLocationFactory, then use to_FTstring() to get the feature > string, this is > what I get: > > complement(join(2691..4571,4918..5163)) > > complement(join(4918..5163,2691..4571)) So this looks like a bug, right? The correct result would be if both yielded the same strings, or syntactically equivalent strings. The two above are neither identical nor syntactically equivalent. Another test is if you set a feature location from either string and then request the sub-sequence, the resulting sequence should be identical given syntactically equivalent location specifications. Do you want to file (and possibly address?) this? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Rafi.Ahmad at fagmed.uit.no Mon Sep 18 10:55:59 2006 From: Rafi.Ahmad at fagmed.uit.no (Rafi Ahmad) Date: Mon, 18 Sep 2006 16:55:59 +0200 Subject: [Bioperl-l] generate ptt file from Genbank file In-Reply-To: <450DEA20.3000103@infotech.monash.edu.au> Message-ID: Hi Torsten, I tried to use your script but it just remains at the new line after giving the gbk input file. Is there anything I need to do more with the script. Rafi -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: 18. september 2006 02:37 To: Rafi Ahmad Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] generate ptt file from Genbank file Rafi, > I am trying to generate a .ptt file like the NCBI ptt file, which basically contains the gene co-ordiante information, its strand, name. I have a Genbank file from which i want to generate this ptt file. > Is there any BioPerl module which can do the same, or any sample script which I can may be modify and use. > Thanks in advance for your reply. I don't think there is any BioPerl script to do it. And Bio::FeatureIO doesn't support PTT - I will try and add it soon. Until then, below is a sample script to work with! Hope it helps, --Torsten #!/usr/bin/perl -w use strict; use Bio::SeqIO; # This script takes a GenBank file as input, and produces a # NCBI PTT file (protein table) as output. A PTT file is # a line based, tab separated format with fixed column types. # # Written by Torsten Seemann # 18 September 2006 my $gbk = Bio::SeqIO->new(-fh=>\*STDIN, -format=>'genbank'); my $seq = $gbk->next_seq; my @cds = grep { $_->primary_tag eq 'CDS' } $seq->get_SeqFeatures; print $seq->description, " - 0..",$seq->length,"\n"; print scalar(@cds)," proteins\n"; print join("\t", qw(Location Strand Length PID Gene Synonym Code COG Product)),"\n"; for my $f (@cds) { my $gi = '-'; $gi = $1 if tag($f, 'db_xref') =~ m/\bGI:(\d+)\b/; my $cog = '-'; $cog = $1 if tag($f, 'product') =~ m/^(COG\S+)/; my @col = ( $f->start.'..'.$f->end, $f->strand >= 0 ? '+' : '-', ($f->length/3)-1, $gi, tag($f, 'gene'), tag($f, 'locus_tag'), $cog, tag($f, 'product'), ); print join("\t", @col), "\n"; } sub tag { my($f, $tag) = @_; return '-' unless $f->has_tag($tag); return join(' ', $f->get_tag_values($tag)); } From torsten.seemann at infotech.monash.edu.au Mon Sep 18 17:15:02 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Sep 2006 07:15:02 +1000 Subject: [Bioperl-l] generate ptt file from Genbank file In-Reply-To: References: Message-ID: <450F0C56.6050907@infotech.monash.edu.au> Rafi, >> my $gbk = Bio::SeqIO->new(-fh=>\*STDIN, -format=>'genbank'); > I tried to use your script but it just remains at the new line after > giving the gbk input file. > Is there anything I need to do more with the script. It takes input to STDIN. Try running it as: genbank_to_ptt.pl < infile.gbk > outfile.ptt Or, alternatively, you could modify the script to use ARGV rather than STDIN. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cdavis at bcm.tmc.edu Mon Sep 18 18:15:06 2006 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Mon, 18 Sep 2006 17:15:06 -0500 Subject: [Bioperl-l] trouble with pairwise_kaks.PLS on Cygwin/XP platform Message-ID: <450F1A6A.2050201@bcm.tmc.edu> Hiyo, I just switched from cygwin to solaris (perl 5.6.1, bioperl 1.5) and the pairwise_kaks.PLS script is falling a bit short for me there too. I don't have the path issues anymore, but I am still getting the same 'Use of uninitialized value in pattern match (m//) at /users/cdavis/perlmodules/Bio/Tools/Phylo/PAML.pm line 711' error. I checked Bio::Tools::Phylo::PAML.pm, and the offending line is 'return unless (/^Nei\s*\&\s*Gojobori/);'. It's not within a loop, so maybe there's no $_ variable to match against? I think most of the script is working fine. If I check the temp directories I see all the files created by codeml, but the code fails when it tries to parse the mlc file. Any help would be much appreciated. Thanks, --Caleb Below are my command line, screen output, and mlc file: ------------command line, screen output------------------- %perl pairwise_kaks.PLS -i worm_fam_2785.cdna CLUSTAL W (1.8) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: CBG10100 363 aa Sequence 2: F22B7.13 525 aa Sequence 3: C38C10.4 525 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 9 Sequences (1:3) Aligned. Score: 8 Sequences (2:3) Aligned. Score: 96 Guide tree file created: [/var/tmp/RPJTug0JLw/8t0wv2O5ht.dnd] Start of Multiple Alignment There are 2 groups Aligning... Group 1: Sequences: 2 Score:11215 Group 2: Delayed Sequence:1 Score:2781 Alignment Score 3113 GCG-Alignment file created [/var/tmp/RPJTug0JLw/xgCEKU3GhA] Use of uninitialized value in pattern match (m//) at /users/cdavis/perlmodules/Bio/Tools/Phylo/PAML.pm line 711, line 105. Can't call method "get_MLmatrix" on an undefined value at pairwise_kaks.PLS line 177, line 105. ---------------------mlc output file:---------------------------- CODONML (in paml 3.15, November 2005) /var/tmp/twy9U1wDNm/5Z2veeuziK Model: One dN/dS ratio Codon frequencies: F3x4 ns = 3 ls = 363 Codon usage in sequences -------------------------------------------------------------------------- Phe TTT 4 4 7 | Ser TCT 3 4 4 | Tyr TAT 4 4 2 | Cys TGT 0 1 2 TTC 14 14 7 | TCC 2 4 4 | TAC 8 7 5 | TGC 3 2 1 Leu TTA 1 0 2 | TCA 4 3 5 | *** TAA 0 0 0 | *** TGA 0 0 0 TTG 10 10 6 | TCG 9 9 5 | TAG 0 0 0 | Trp TGG 1 1 2 -------------------------------------------------------------------------- Leu CTT 4 3 2 | Pro CCT 3 2 4 | His CAT 2 1 5 | Arg CGT 4 4 3 CTC 6 7 5 | CCC 2 1 5 | CAC 2 4 2 | CGC 1 1 2 CTA 3 2 4 | CCA 2 2 6 | Gln CAA 7 6 12 | CGA 3 3 14 CTG 7 8 6 | CCG 2 2 3 | CAG 6 7 4 | CGG 1 1 3 -------------------------------------------------------------------------- Ile ATT 8 10 10 | Thr ACT 3 3 9 | Asn AAT 11 11 10 | Ser AGT 4 4 8 ATC 12 12 2 | ACC 4 4 4 | AAC 8 7 5 | AGC 5 6 4 ATA 0 1 1 | ACA 7 7 11 | Lys AAA 17 19 26 | Arg AGA 9 8 12 Met ATG 17 16 10 | ACG 5 5 2 | AAG 17 17 9 | AGG 3 3 3 -------------------------------------------------------------------------- Val GTT 1 1 4 | Ala GCT 9 9 6 | Asp GAT 21 20 16 | Gly GGT 2 2 5 GTC 4 4 4 | GCC 8 7 6 | GAC 5 5 4 | GGC 2 2 4 GTA 2 2 4 | GCA 7 7 8 | Glu GAA 12 13 10 | GGA 12 11 8 GTG 9 9 8 | GCG 3 3 3 | GAG 17 17 14 | GGG 1 1 1 -------------------------------------------------------------------------- Codon position x base (3x4) table for each sequence. #1: F22B7.13 position 1: T:0.17355 C:0.15152 A:0.35813 G:0.31680 position 2: T:0.28099 C:0.20110 A:0.37741 G:0.14050 position 3: T:0.22865 C:0.23691 A:0.23691 G:0.29752 #2: C38C10.4 position 1: T:0.17355 C:0.14876 A:0.36639 G:0.31129 position 2: T:0.28375 C:0.19835 A:0.38017 G:0.13774 position 3: T:0.22865 C:0.23967 A:0.23140 G:0.30028 #3: CBG10100 position 1: T:0.14325 C:0.22039 A:0.34711 G:0.28926 position 2: T:0.22590 C:0.23416 A:0.34160 G:0.19835 position 3: T:0.26722 C:0.17631 A:0.33884 G:0.21763 Sums of codon usage counts ------------------------------------------------------------------------------ Phe F TTT 15 | Ser S TCT 11 | Tyr Y TAT 10 | Cys C TGT 3 TTC 35 | TCC 10 | TAC 20 | TGC 6 Leu L TTA 3 | TCA 12 | *** * TAA 0 | *** * TGA 0 TTG 26 | TCG 23 | TAG 0 | Trp W TGG 4 ------------------------------------------------------------------------------ Leu L CTT 9 | Pro P CCT 9 | His H CAT 8 | Arg R CGT 11 CTC 18 | CCC 8 | CAC 8 | CGC 4 CTA 9 | CCA 10 | Gln Q CAA 25 | CGA 20 CTG 21 | CCG 7 | CAG 17 | CGG 5 ------------------------------------------------------------------------------ Ile I ATT 28 | Thr T ACT 15 | Asn N AAT 32 | Ser S AGT 16 ATC 26 | ACC 12 | AAC 20 | AGC 15 ATA 2 | ACA 25 | Lys K AAA 62 | Arg R AGA 29 Met M ATG 43 | ACG 12 | AAG 43 | AGG 9 ------------------------------------------------------------------------------ Val V GTT 6 | Ala A GCT 24 | Asp D GAT 57 | Gly G GGT 9 GTC 12 | GCC 21 | GAC 14 | GGC 8 GTA 8 | GCA 22 | Glu E GAA 35 | GGA 31 GTG 26 | GCG 9 | GAG 48 | GGG 3 ------------------------------------------------------------------------------ Codon position x base (3x4) table, overall position 1: T:0.16345 C:0.17355 A:0.35721 G:0.30579 position 2: T:0.26354 C:0.21120 A:0.36639 G:0.15886 position 3: T:0.24151 C:0.21763 A:0.26905 G:0.27181 Nei & Gojobori 1986. dN/dS (dN, dS) (Note: This matrix is not used in later m.l. analysis. Use runmode = -2 for ML pairwise comparison.) F22B7.13 C38C10.4 0.3605 (0.0142 0.0393) CBG10100 -1.0000 (1.2263 -1.0000)-1.0000 (1.2584 -1.0000) pairwise comparison, codon frequencies: F3x4. 2 (C38C10.4) ... 1 (F22B7.13) lnL =-1564.430860 0.05959 6.84008 0.57546 t= 0.0596 S= 329.1 N= 759.9 dN/dS= 0.5755 dN= 0.0162 dS= 0.0282 -------------- next part -------------- An embedded message was scrubbed... From: Caleb Davis Subject: Re: [Bioperl-l] trouble with pairwise_kaks.PLS on Cygwin/XP platform Date: Fri, 01 Sep 2006 11:48:29 -0500 Size: 1293 Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060918/9515f6c6/attachment-0001.mht From torsten.seemann at infotech.monash.edu.au Mon Sep 18 20:09:27 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Sep 2006 10:09:27 +1000 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die Message-ID: <450F3537.5060605@infotech.monash.edu.au> Developers, Given the pending RC1 release, I decided to do a quick audit of bioperl-live, see below. Hopefully no much POD text may got through. Hope it is useful anyway. --Torsten "return undef;" => "return;" # return undef intentional? Bio/DB/Biblio/pdf.pm: return undef; Bio/DB/Biblio/pdf.pm: return undef unless $link; Bio/DB/Biblio/pdf.pm: return undef; Bio/DB/Biblio/eutils.pm: return undef; Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || !defin Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); Bio/FeatureIO/gff.pm: # be graceful about empty lines or comments, an Bio/FeatureIO/gff.pm:will return undef if not all features in the stre Bio/Root/IOManager.pm: return undef unless -e $file; Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; "die" => "$self->throw" # use Bio::Perl exception handling Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat forma Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die "Co Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't sto Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No featur Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open $path: Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open imag Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $mutation ) ) { Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something went ve Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot get a Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a result" u Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for Smith Wa Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot get Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden Marko Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to rea Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; "FIXME" Bio/AlignIO/po.pm Bio/DB/Expression/geo.pm Bio/FeatureIO/gff.pm Bio/Ontology/RelationshipType.pm Bio/SeqIO/kegg.pm Bio/SeqIO/swiss.pm Bio/Tools/ESTScan.pm Bio/Tools/Est2Genome.pm Bio/Tools/GuessSeqFormat.pm Bio/Tools/HMMER/Results.pm "???" Bio/Variation/VariantI.pm: $self->allele_mut($value); #???? Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS Bio/DB/GFF/Adaptor/dbi/oracleace.pm: # then generate a bogus Homolo Bio/DB/GFF/Adaptor/dbi/mysqlace.pm: # then generate a bogus Homolog Bio/SeqFeature/Annotated.pm: Args : ??? Bio/SeqFeature/Tools/Unflattener.pm: # what should w Bio/SeqFeature/Tools/IDHandler.pm: # warn?? Bio/Root/IO.pm: $ROOTDIR = ""; # what is reasonable?? Bio/LiveSeq/Chain.pm:# *??* create hash2dchain ???? (with hashkeys use Bio/LiveSeq/Chain.pm:# **????** how about using array of arrays instea Bio/LiveSeq/Chain.pm:# in verbose $string assignment around line 721 ? "TODO" Bio/Root/Root.pm: # TODO: Fix the MSG: line of the re-thrown err Bio/Root/Storable.pm: # TODO: add cleanup and unlink methods. For n Bio/Seq/EncodedSeq.pm: #TODO: finish all this Bio/SeqFeature/Tools/Unflattener.pm: # TODO - we ignore this Bio/SeqFeature/Tools/Unflattener.pm: # TODO - allow more Bio/SeqFeature/Tools/Unflattener.pm: ## features. TODO: P Bio/SeqIO/game/gameWriter.pm:#TODO: can't sequences also have database Bio/SeqIO/chaos.pm: # TODO Bio/SeqIO/pir.pm: # TODO - not processing SFS data Bio/SeqIO/strider.pm: # TODO: determine 'DNA Degenerate Bio/Search/BlastUtils.pm: # TODO: Account for strand/frame issue! "***" Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? Bio/Matrix/PSM/SiteMatrix.pm: #*** IUPACp values not actually u Bio/Tree/TreeFunctionsI.pm: #*** the algorithm here hasn't really b Bio/SeqIO/agave.pm:***NOTE*** At the moment, not all of the tags are Bio/LiveSeq/Chain.pm:# **** performance concerns Bio/Map/LinkageMap.pm:#*** what is this? what calls it? note that it s Bio/Map/Marker.pm: *** does not actually add this marker to the Bio/Map/SimpleMap.pm: *** does not actually add the element to th Bio/Map/FPCMarker.pm: *** This has nothing to do with an actual Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? "Why?" Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS Bio/Graphics/Glyph/dot.pm: # The can() method fails with GD::SVG. Why Bio/SeqIO/bsml.pm: # Need to kill object for following code to work... Bio/SeqIO/tinyseq.pm: foreach my $subatt(@$seqatt) { # why are there t Bio/SeqIO/tinyseq.pm: # NCBI puts refseq ids in TSeq_sid, others in Bio/SeqIO/game/featHandler.pm: # Why is CDS coordinate info saved a Bio/SeqIO/swiss.pm: # Um, why would this be anything else but PRT? Bio/Taxonomy.pm: # taxonomy - why would you be doing things this Bio/Seq.pm: # I can't remember why not delegating was ever deemed Bio/Cluster/UniGene.pm: # why does NCBI prepend a 'g' to its own Bio/LiveSeq/IO/BioPerl.pm:# why array from each_tag_value($qual) ? Whe Bio/SearchIO/Writer/HSPTableWriter.pm:# Don't know why this Bio/SearchIO/Writer/ResultTableWriter.pm:# Don't know why t Bio/Tools/Alignment/Consed.pm:Why was this developed like this? I was Bio/Tools/Alignment/Consed.pm: # if there is a member array (why woul io/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new ? Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new Bio/SeqFeature/Primer.pm: off from those of the idtdna web Bio/SeqFeature/Primer.pm: as primer3 does. Don't ask why, I never f Bio/Seq/PrimaryQual.pm: Returns : 1 for a valid sequence (WHY? Shouldn Bio/Seq/PrimaryQual.pm: Args : a scalar (any scalar, why PrimarySeq "hack" Bio/DB/Flat/BinarySearch.pm:# is an awful hack - in reality Michele's Bio/DB/WebDBSeqI.pm:# sorry, but this is hacked in because of BioFetch Bio/DB/SeqFeature/Store/GFF3Loader.pm: # TEMPORARY HACKS TO SIMPLIFY Bio/AlignIO/phylip.pm: #if you use a version of phylip (hacked) tha Bio/SearchIO/blast.pm: # bl2seq hackiness... Not sure I lik Bio/Tools/Sigcleave.pm:## a quick hack to make sure that we get the sc Bio/Tools/Blast/HTML.pm: # This is fine for yeast but not worm. This Bio/Tools/Geneid.pm: # then need to perform the hack of extract Bio/Root/Utilities.pm: # this is a quick hack to check for availabili Bio/Root/Err.pm:objects more eval/die-savvy (but the current strategy From osborne1 at optonline.net Mon Sep 18 20:25:11 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 18 Sep 2006 20:25:11 -0400 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <450F3537.5060605@infotech.monash.edu.au> Message-ID: Torsten, Forgot to say - nice to have you back! Brian O. On 9/18/06 8:09 PM, "Torsten Seemann" wrote: > Developers, > > Given the pending RC1 release, I decided to do a quick audit of > bioperl-live, see below. Hopefully no much POD text may got through. > Hope it is useful anyway. > > --Torsten > > > "return undef;" => "return;" # return undef intentional? > > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/pdf.pm: return undef unless $link; > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/eutils.pm: return undef; > Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || !defin > Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua > Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); > Bio/FeatureIO/gff.pm: # be graceful about empty lines or comments, an > Bio/FeatureIO/gff.pm:will return undef if not all features in the stre > Bio/Root/IOManager.pm: return undef unless -e $file; > Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; > > "die" => "$self->throw" # use Bio::Perl exception handling > > Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat forma > Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die "Co > Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die > Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't sto > Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No featur > Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open $path: > Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open imag > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $mutation ) ) { > Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something went ve > Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a result" u > Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" > Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for Smith Wa > Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot get > Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden Marko > Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to rea > Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; > > "FIXME" > > Bio/AlignIO/po.pm > Bio/DB/Expression/geo.pm > Bio/FeatureIO/gff.pm > Bio/Ontology/RelationshipType.pm > Bio/SeqIO/kegg.pm > Bio/SeqIO/swiss.pm > Bio/Tools/ESTScan.pm > Bio/Tools/Est2Genome.pm > Bio/Tools/GuessSeqFormat.pm > Bio/Tools/HMMER/Results.pm > > "???" > > Bio/Variation/VariantI.pm: $self->allele_mut($value); #???? > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/DB/GFF/Adaptor/dbi/oracleace.pm: # then generate a bogus Homolo > Bio/DB/GFF/Adaptor/dbi/mysqlace.pm: # then generate a bogus Homolog > Bio/SeqFeature/Annotated.pm: Args : ??? > Bio/SeqFeature/Tools/Unflattener.pm: # what should w > Bio/SeqFeature/Tools/IDHandler.pm: # warn?? > Bio/Root/IO.pm: $ROOTDIR = ""; # what is reasonable?? > Bio/LiveSeq/Chain.pm:# *??* create hash2dchain ???? (with hashkeys use > Bio/LiveSeq/Chain.pm:# **????** how about using array of arrays instea > Bio/LiveSeq/Chain.pm:# in verbose $string assignment around line 721 ? > > "TODO" > > Bio/Root/Root.pm: # TODO: Fix the MSG: line of the re-thrown err > Bio/Root/Storable.pm: # TODO: add cleanup and unlink methods. For n > Bio/Seq/EncodedSeq.pm: #TODO: finish all this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - we ignore this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - allow more > Bio/SeqFeature/Tools/Unflattener.pm: ## features. TODO: P > Bio/SeqIO/game/gameWriter.pm:#TODO: can't sequences also have database > Bio/SeqIO/chaos.pm: # TODO > Bio/SeqIO/pir.pm: # TODO - not processing SFS data > Bio/SeqIO/strider.pm: # TODO: determine 'DNA Degenerate > Bio/Search/BlastUtils.pm: # TODO: Account for strand/frame issue! > > "***" > > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > Bio/Matrix/PSM/SiteMatrix.pm: #*** IUPACp values not actually u > Bio/Tree/TreeFunctionsI.pm: #*** the algorithm here hasn't really b > Bio/SeqIO/agave.pm:***NOTE*** At the moment, not all of the tags are > Bio/LiveSeq/Chain.pm:# **** performance concerns > Bio/Map/LinkageMap.pm:#*** what is this? what calls it? note that it s > Bio/Map/Marker.pm: *** does not actually add this marker to the > Bio/Map/SimpleMap.pm: *** does not actually add the element to th > Bio/Map/FPCMarker.pm: *** This has nothing to do with an actual > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > > "Why?" > > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/Graphics/Glyph/dot.pm: # The can() method fails with GD::SVG. Why > Bio/SeqIO/bsml.pm: # Need to kill object for following code to work... > Bio/SeqIO/tinyseq.pm: foreach my $subatt(@$seqatt) { # why are there t > Bio/SeqIO/tinyseq.pm: # NCBI puts refseq ids in TSeq_sid, others in > Bio/SeqIO/game/featHandler.pm: # Why is CDS coordinate info saved a > Bio/SeqIO/swiss.pm: # Um, why would this be anything else but PRT? > Bio/Taxonomy.pm: # taxonomy - why would you be doing things this > Bio/Seq.pm: # I can't remember why not delegating was ever deemed > Bio/Cluster/UniGene.pm: # why does NCBI prepend a 'g' to its own > Bio/LiveSeq/IO/BioPerl.pm:# why array from each_tag_value($qual) ? Whe > Bio/SearchIO/Writer/HSPTableWriter.pm:# Don't know why this > Bio/SearchIO/Writer/ResultTableWriter.pm:# Don't know why t > Bio/Tools/Alignment/Consed.pm:Why was this developed like this? I was > Bio/Tools/Alignment/Consed.pm: # if there is a member array (why woul > io/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new ? > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/SeqFeature/Primer.pm: off from those of the idtdna web > Bio/SeqFeature/Primer.pm: as primer3 does. Don't ask why, I never f > Bio/Seq/PrimaryQual.pm: Returns : 1 for a valid sequence (WHY? Shouldn > Bio/Seq/PrimaryQual.pm: Args : a scalar (any scalar, why PrimarySeq > > "hack" > > Bio/DB/Flat/BinarySearch.pm:# is an awful hack - in reality Michele's > Bio/DB/WebDBSeqI.pm:# sorry, but this is hacked in because of BioFetch > Bio/DB/SeqFeature/Store/GFF3Loader.pm: # TEMPORARY HACKS TO SIMPLIFY > Bio/AlignIO/phylip.pm: #if you use a version of phylip (hacked) tha > Bio/SearchIO/blast.pm: # bl2seq hackiness... Not sure I lik > Bio/Tools/Sigcleave.pm:## a quick hack to make sure that we get the sc > Bio/Tools/Blast/HTML.pm: # This is fine for yeast but not worm. This > Bio/Tools/Geneid.pm: # then need to perform the hack of extract > Bio/Root/Utilities.pm: # this is a quick hack to check for availabili > Bio/Root/Err.pm:objects more eval/die-savvy (but the current strategy > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Sep 18 21:04:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Sep 2006 20:04:08 -0500 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <30D63782-E5EC-494C-A42E-3D1AC29043D8@gmx.net> References: <000301c6db6d$30adc2d0$15327e82@pyrimidine> <30D63782-E5EC-494C-A42E-3D1AC29043D8@gmx.net> Message-ID: <67F8D081-4313-494F-A43F-7F5FB9207CCE@uiuc.edu> I'll take a look at it. There's a related bug report already in Bugzilla that's related to this but I'll add a new one that's more directly related to the issue here. I'll give your test a try to see if it makes a difference. Chris On Sep 18, 2006, at 5:26 PM, Hilmar Lapp wrote: > On Sep 18, 2006, at 5:55 PM, Chris Fields wrote: > >> However, if I take the two examples above, run them through >> FTLocationFactory, then use to_FTstring() to get the feature >> string, this is >> what I get: >> >> complement(join(2691..4571,4918..5163)) >> >> complement(join(4918..5163,2691..4571)) > > So this looks like a bug, right? The correct result would be if > both yielded the same strings, or syntactically equivalent strings. > The two above are neither identical nor syntactically equivalent. > > Another test is if you set a feature location from either string > and then request the sub-sequence, the resulting sequence should be > identical given syntactically equivalent location specifications. > > Do you want to file (and possibly address?) this? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Tue Sep 19 04:12:02 2006 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 19 Sep 2006 10:12:02 +0200 Subject: [Bioperl-l] EMBOSS die/warn Message-ID: <716af09c0609190112q564848aewc7b531c12b58241b@mail.gmail.com> Hi, X11 on OS-X read hardcoded paths. This caused the absence of the path to EMBOSS in a newly opened X-term. However, EMBOSS-factory does not signal the absence of EMBOSS when the pathto EMBOSS is absent. When I realized this, I thought to signal the absence of EMBOSS and stumbled on something surprising. A small example follows below: ------------ use Bio::Factory::EMBOSS; print "1\n"; #comment line below my $fact = new Bio::Factory::EMBOSS; print "2\n"; die "DIE STRING NOT PRINTED"; print "3\n"; -------------- Output: 1 2 So die does work but nothing is printed. When you comment the "my $fact" line the die string is printed. When EMBOSS cannot be found die and warn do not print their strings anymore. I could not find why this happens yet. So, the above happens when the path to emboss binaries is not in $PATH. regards, bernd From oliver.burren at cimr.cam.ac.uk Tue Sep 19 04:34:36 2006 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Tue, 19 Sep 2006 09:34:36 +0100 Subject: [Bioperl-l] Bio::Graphics::Glyph->parts differences between 1.636 and 1.654 In-Reply-To: <6dce9a0b0609151528o15935fa0xaa71b0df45dd9393@mail.gmail.com> References: <1158310773.13842.35.camel@jakarta> <6dce9a0b0609151512h55b5078an8e5245611675a253@mail.gmail.com> <6dce9a0b0609151528o15935fa0xaa71b0df45dd9393@mail.gmail.com> Message-ID: <1158654876.14304.7.camel@jakarta> Thanks Lincoln. I overrode maxdepth and set it to 1. I assume that this is more efficient than setting to undef ? perl test_parts.pl Top level feature contains 100 features Bio::Graphics::Panel API is 1.654 <---- Bio::Graphics::Glyph::testparts.pm can find 100 parts. So problem solved. I am intrigued though as you said there were two things, I was just wondering about the second thing ? Thanks for your help. Olly Burren On Fri, 2006-09-15 at 18:28 -0400, Lincoln Stein wrote: > Hi Oliver, > > Sorry the answer didn't occur to me earlier. There is a new maxdepth method > that returns the number of levels of descent that the glyph can draw. > Bio::Graphics::Glyph returns undef from this method, meaning that it can > draw an unlimited number of levels of subparts, but > Bio::Graphics::Glyph::generic returns 0, meaning that it only cares about > the top level feature. This is a major performance boost. > > For your glyph, you can do one of two things: > > 1) override maxdepth() so that it returns the number of levels to descend > into. > > > On 9/15/06, Lincoln Stein wrote: > > > > I will test this. The parts() API is not supposed to have changed. If you > > could send me your script, that would be very helpful. > > > > Best, > > > > Lincoln > > > > > > On 9/15/06, Oliver Burren wrote: > > > > > > Hi Bioperlers, > > > > > > I'm having some problems with the CVS version of Bio::Graphics::Glyph > > > especially the 'parts' method. I have written a script and module to > > > demonstrate behaviour which I am happy to supply on request. > > > > > > Script is called test_parts.pl. It creates a 100 random features and > > > adds them to a holding feature. A Bio::Graphics::Panel is created that > > > this is then passed to for rendering. The glyph used to render is > > > test_parts with following 'draw' sub > > > > > > sub draw{ > > > my $self=shift; > > > warn "Bio::Graphics::Panel API is ".Bio::Graphics::Panel::api_version > > > ()."\n"; > > > warn "Bio::Graphics::Glyph:: testparts.pm can find ".(($self->parts > > > =~ /ARRAY/?@{$self->parts}:$self->parts)|'no')." parts\n"; > > > } > > > > > > > > > #With 'old' version of Bio::Graphics. > > > > > > perl test_parts.pl > > > Top level feature contains 100 features > > > Bio::Graphics::Panel API is 1.636 > > > Bio::Graphics::Glyph::testparts.pm can find 100 parts > > > > > > #with 'new' (CVS co) version. > > > > > > perl test_parts.pl > > > Top level feature contains 100 features > > > Bio::Graphics::Panel API is 1.654 > > > Bio::Graphics::Glyph::testparts.pm can find 0 parts > > > > > > Looks as if I'm loosing parts between the two apis ? > > > > > > I saw a thread on gmod mailing list > > > > > > http://sourceforge.net/mailarchive/forum.php? > > > forum_id=31947&max_rows=25&style=flat&viewmonth=200607&viewday=26 which > > > may be relevant but I wasn't able to find any follow up. > > > > > > Would somone be able to advise/document the changes that have occured > > > between the 2 api versions that might be relevant so that I can patch > > > some of my custom glyphs so they are compatible. > > > > > > Many thanks, > > > > > > > > > Olly Burren > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > > From bix at sendu.me.uk Tue Sep 19 05:13:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Sep 2006 10:13:43 +0100 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <450F3537.5060605@infotech.monash.edu.au> References: <450F3537.5060605@infotech.monash.edu.au> Message-ID: <450FB4C7.1030301@sendu.me.uk> Torsten Seemann wrote: > Developers, > > Given the pending RC1 release, I decided to do a quick audit of > bioperl-live, see below. Hopefully no much POD text may got through. > Hope it is useful anyway. Thanks. Does anyone have it in mind to do anything about these? I can take care of the undefs and dies if no one else wanted to. From zchou at cau.edu.cn Tue Sep 19 05:40:07 2006 From: zchou at cau.edu.cn (zhuocheng Hou) Date: Tue, 19 Sep 2006 17:40:07 +0800 Subject: [Bioperl-l] about PAML running within bioperl Message-ID: <001901c6dbcf$9af4de50$0915020a@zchou> Hello, every one, I use code in the PAML HOWTO (running PAML fom within Bioperl) on my Linux OS. And I set ENV as described by instructions. At the beginning, it seems that ClustalW run smoothly. However, when the programme run to call method "get_MLmatrix", somethign happened. The following information was listed as follows: (What reason or How to solve these problems?) ........ Sequences (2:3) Aligned. Score: 87 Sequences (2:4) Aligned. Score: 88 Sequences (2:5) Aligned. Score: 87 Sequences (2:6) Aligned. Score: 87 Sequences (2:7) Aligned. Score: 87 Sequences (2:8) Aligned. Score: 87 Sequences (3:4) Aligned. Score: 93 Sequences (3:5) Aligned. Score: 93 Sequences (3:6) Aligned. Score: 93 Sequences (3:7) Aligned. Score: 92 Sequences (3:8) Aligned. Score: 92 Sequences (4:5) Aligned. Score: 99 Sequences (4:6) Aligned. Score: 99 Sequences (4:7) Aligned. Score: 98 Sequences (4:8) Aligned. Score: 98 Sequences (5:6) Aligned. Score: 100 Sequences (5:7) Aligned. Score: 99 Sequences (5:8) Aligned. Score: 99 Sequences (6:7) Aligned. Score: 99 Sequences (6:8) Aligned. Score: 99 Sequences (7:8) Aligned. Score: 100 Guide tree file created: [/home/zchou/TMPDIR/8QEqLivAKY/JU833u8OTP.dnd] Start of Multiple Alignment There are 7 groups Aligning... Group 1: Sequences: 2 Score:5875 Group 2: Sequences: 2 Score:5877 Group 3: Sequences: 4 Score:5864 Group 4: Sequences: 5 Score:5537 Group 5: Sequences: 6 Score:5727 Group 6: Sequences: 7 Score:5608 Group 7: Sequences: 8 Score:5607 Alignment Score 43650 GCG-Alignment file created [/home/zchou/TMPDIR/8QEqLivAKY/CussPD56rZ] aligned aa sequences were: Bio::SimpleAlign=HASH(0x87b93f4) Can't call method "get_MLmatrix" on an undefined value at originalpaml.pl line 57, line 332. Zhuocheng Hou Department of Animal Genetics and Breeding China Agricultural University From cjfields at uiuc.edu Tue Sep 19 10:03:33 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 09:03:33 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <450F3537.5060605@infotech.monash.edu.au> Message-ID: <001001c6dbf4$65437260$15327e82@pyrimidine> I added a page to the wiki for code improvements a few days ago. I can add these so we can keep tabs on these and other code 'oddities' and also make suggestions. The other issue I found, though not as prevalent, was the use of $`, $&, $', and $+ for regex matches, which supposedly create a performance hit (at least acc. to Jeff Friedl). I modified Bio::Factory::FTLocationFactory to not use these, but I did find several other modules using them, including Bio::Root::Storable: http://www.bioperl.org/wiki/BioPerl_code_optimization Thanks Torsten! Let us know if you find any more. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Monday, September 18, 2006 7:09 PM > To: 'bioperl-l' > Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and > undef/die > > Developers, > > Given the pending RC1 release, I decided to do a quick audit of > bioperl-live, see below. Hopefully no much POD text may got through. > Hope it is useful anyway. > > --Torsten > > > "return undef;" => "return;" # return undef intentional? > > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/pdf.pm: return undef unless $link; > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/eutils.pm: return undef; > Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || !defin > Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua > Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); > Bio/FeatureIO/gff.pm: # be graceful about empty lines or comments, an > Bio/FeatureIO/gff.pm:will return undef if not all features in the stre > Bio/Root/IOManager.pm: return undef unless -e $file; > Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; > > "die" => "$self->throw" # use Bio::Perl exception handling > > Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat forma > Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die "Co > Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die > Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't sto > Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No featur > Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open $path: > Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open imag > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $mutation ) ) { > Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something went ve > Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a result" u > Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" > Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for Smith Wa > Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot get > Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden Marko > Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to rea > Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; > > "FIXME" > > Bio/AlignIO/po.pm > Bio/DB/Expression/geo.pm > Bio/FeatureIO/gff.pm > Bio/Ontology/RelationshipType.pm > Bio/SeqIO/kegg.pm > Bio/SeqIO/swiss.pm > Bio/Tools/ESTScan.pm > Bio/Tools/Est2Genome.pm > Bio/Tools/GuessSeqFormat.pm > Bio/Tools/HMMER/Results.pm > > "???" > > Bio/Variation/VariantI.pm: $self->allele_mut($value); #???? > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/DB/GFF/Adaptor/dbi/oracleace.pm: # then generate a bogus Homolo > Bio/DB/GFF/Adaptor/dbi/mysqlace.pm: # then generate a bogus Homolog > Bio/SeqFeature/Annotated.pm: Args : ??? > Bio/SeqFeature/Tools/Unflattener.pm: # what should w > Bio/SeqFeature/Tools/IDHandler.pm: # warn?? > Bio/Root/IO.pm: $ROOTDIR = ""; # what is reasonable?? > Bio/LiveSeq/Chain.pm:# *??* create hash2dchain ???? (with hashkeys use > Bio/LiveSeq/Chain.pm:# **????** how about using array of arrays instea > Bio/LiveSeq/Chain.pm:# in verbose $string assignment around line 721 ? > > "TODO" > > Bio/Root/Root.pm: # TODO: Fix the MSG: line of the re-thrown err > Bio/Root/Storable.pm: # TODO: add cleanup and unlink methods. For n > Bio/Seq/EncodedSeq.pm: #TODO: finish all this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - we ignore this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - allow more > Bio/SeqFeature/Tools/Unflattener.pm: ## features. TODO: P > Bio/SeqIO/game/gameWriter.pm:#TODO: can't sequences also have database > Bio/SeqIO/chaos.pm: # TODO > Bio/SeqIO/pir.pm: # TODO - not processing SFS data > Bio/SeqIO/strider.pm: # TODO: determine 'DNA Degenerate > Bio/Search/BlastUtils.pm: # TODO: Account for strand/frame issue! > > "***" > > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > Bio/Matrix/PSM/SiteMatrix.pm: #*** IUPACp values not actually u > Bio/Tree/TreeFunctionsI.pm: #*** the algorithm here hasn't really b > Bio/SeqIO/agave.pm:***NOTE*** At the moment, not all of the tags are > Bio/LiveSeq/Chain.pm:# **** performance concerns > Bio/Map/LinkageMap.pm:#*** what is this? what calls it? note that it s > Bio/Map/Marker.pm: *** does not actually add this marker to the > Bio/Map/SimpleMap.pm: *** does not actually add the element to th > Bio/Map/FPCMarker.pm: *** This has nothing to do with an actual > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > > "Why?" > > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/Graphics/Glyph/dot.pm: # The can() method fails with GD::SVG. Why > Bio/SeqIO/bsml.pm: # Need to kill object for following code to work... > Bio/SeqIO/tinyseq.pm: foreach my $subatt(@$seqatt) { # why are there t > Bio/SeqIO/tinyseq.pm: # NCBI puts refseq ids in TSeq_sid, others in > Bio/SeqIO/game/featHandler.pm: # Why is CDS coordinate info saved a > Bio/SeqIO/swiss.pm: # Um, why would this be anything else but PRT? > Bio/Taxonomy.pm: # taxonomy - why would you be doing things this > Bio/Seq.pm: # I can't remember why not delegating was ever deemed > Bio/Cluster/UniGene.pm: # why does NCBI prepend a 'g' to its own > Bio/LiveSeq/IO/BioPerl.pm:# why array from each_tag_value($qual) ? Whe > Bio/SearchIO/Writer/HSPTableWriter.pm:# Don't know why this > Bio/SearchIO/Writer/ResultTableWriter.pm:# Don't know why t > Bio/Tools/Alignment/Consed.pm:Why was this developed like this? I was > Bio/Tools/Alignment/Consed.pm: # if there is a member array (why woul > io/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new ? > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/SeqFeature/Primer.pm: off from those of the idtdna web > Bio/SeqFeature/Primer.pm: as primer3 does. Don't ask why, I never f > Bio/Seq/PrimaryQual.pm: Returns : 1 for a valid sequence (WHY? Shouldn > Bio/Seq/PrimaryQual.pm: Args : a scalar (any scalar, why PrimarySeq > > "hack" > > Bio/DB/Flat/BinarySearch.pm:# is an awful hack - in reality Michele's > Bio/DB/WebDBSeqI.pm:# sorry, but this is hacked in because of BioFetch > Bio/DB/SeqFeature/Store/GFF3Loader.pm: # TEMPORARY HACKS TO SIMPLIFY > Bio/AlignIO/phylip.pm: #if you use a version of phylip (hacked) tha > Bio/SearchIO/blast.pm: # bl2seq hackiness... Not sure I lik > Bio/Tools/Sigcleave.pm:## a quick hack to make sure that we get the sc > Bio/Tools/Blast/HTML.pm: # This is fine for yeast but not worm. This > Bio/Tools/Geneid.pm: # then need to perform the hack of extract > Bio/Root/Utilities.pm: # this is a quick hack to check for availabili > Bio/Root/Err.pm:objects more eval/die-savvy (but the current strategy > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Tue Sep 19 10:42:18 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 19 Sep 2006 09:42:18 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <001001c6dbf4$65437260$15327e82@pyrimidine> References: <001001c6dbf4$65437260$15327e82@pyrimidine> Message-ID: Sendu, Chris, Torsten, et al., From running the deobfuscator on bioperl-live, I've found some formatting inconsistencies in the POD of some modules which are hard for the deobfuscator to parse. I've corrected some of these in my working (checked-out) copy of bioperl-live. Anyone object to my checking these in to CVS for inclusion in the new release? I don't want to get in the way of you guys who are already cleaning up the PODs. The intended freeze date is still Sept 25th, correct? Dave From cjfields at uiuc.edu Tue Sep 19 11:03:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 10:03:29 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: Message-ID: <001b01c6dbfc$c68711a0$15327e82@pyrimidine> I believe the 25th is to be the first RC, correct Sendu? I don't have a problem myself with your additions to CVS. Which modules were they? I have a few POD additions I need to work on for the various Bio::DB::EUtilities modules but they shouldn't be in the way of your additions. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: David Messina [mailto:dmessina at wustl.edu] > Sent: Tuesday, September 19, 2006 9:42 AM > To: Chris Fields > Cc: Torsten Seemann; bioperl-l; Sendu Bala > Subject: Re: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and > undef/die > > Sendu, Chris, Torsten, et al., > > From running the deobfuscator on bioperl-live, I've found some > formatting inconsistencies in the POD of some modules which are hard > for the deobfuscator to parse. I've corrected some of these in my > working (checked-out) copy of bioperl-live. > > Anyone object to my checking these in to CVS for inclusion in the new > release? > > I don't want to get in the way of you guys who are already cleaning > up the PODs. > > The intended freeze date is still Sept 25th, correct? > > Dave From bix at sendu.me.uk Tue Sep 19 10:49:48 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Sep 2006 15:49:48 +0100 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: References: <001001c6dbf4$65437260$15327e82@pyrimidine> Message-ID: <4510038C.5020606@sendu.me.uk> David Messina wrote: > Sendu, Chris, Torsten, et al., > > From running the deobfuscator on bioperl-live, I've found some > formatting inconsistencies in the POD of some modules which are hard for > the deobfuscator to parse. I've corrected some of these in my working > (checked-out) copy of bioperl-live. > > Anyone object to my checking these in to CVS for inclusion in the new > release? I'd say go right ahead, and thank you. > I don't want to get in the way of you guys who are already cleaning up > the PODs. I'm only aware of Mauricio doing significant work on POD, but CVS should handle the overlap. I don't see a problem. > The intended freeze date is still Sept 25th, correct? Yes. From dmessina at wustl.edu Tue Sep 19 11:57:48 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 19 Sep 2006 10:57:48 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <001b01c6dbfc$c68711a0$15327e82@pyrimidine> References: <001b01c6dbfc$c68711a0$15327e82@pyrimidine> Message-ID: <5B78FD09-FAD4-455D-8145-9986F8292659@wustl.edu> On Sep 19, 2006, at 10:03 AM, Chris Fields wrote: > Which modules were they? Primarily some modules in the Bio/Ontology/ directory. Their NAME pod sections don't have their fully qualified name. e.g. OntologyEngineI - Interface a minimal Ontology implementation should satisfy instead of Bio::Ontology::OntologyEngineI - Interface a minimal Ontology implementation should satisfy This produced the problem that Hilmar reported whereby these modules are listed (and indexed) incorrectly in the deobfuscator's browsable list, and clicking on them wouldn't show the lower table of methods. Dave From bix at sendu.me.uk Tue Sep 19 12:12:31 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Sep 2006 17:12:31 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? Message-ID: <451016EF.5080703@sendu.me.uk> Hi, I'm writing a Bioperl module in which I would like to implement one of the methods using the Ensembl Perl API. If the method is optional and the POD clearly states the need for Ensembl, whilst the Ensembl Perl API never gets added as a pre-req for Bioperl (ie. no entry in Makefile.PL), are there any problems with this? From jason at bioperl.org Tue Sep 19 12:13:54 2006 From: jason at bioperl.org (Jason Stajich) Date: Tue, 19 Sep 2006 09:13:54 -0700 Subject: [Bioperl-l] Blast Parser Error In-Reply-To: <451006BE.1090504@gmu.edu> References: <451006BE.1090504@gmu.edu> Message-ID: <8273f6c20609190913oee9c992g3e7312a31366c949@mail.gmail.com> Forwarding your question to the bioperl mailing list where it can be answered. Please consider making your BLAST file available one a website - it is bad form to mail someone a very large email attachment unsolicited. If you do not have access to a website where you can make the report avaialble you can submit this a bug at http://bugzilla.open-bio.org and attached a report to the bug. it is possible the PARACEL BLAST format is not parseable with latest code - please let the list know what version of bioperl you are using etc. Please test your question against the latest code in CVS as well before you ask others to debug this problem. -jason On 9/19/06, Lakshmi K Matukumalli wrote: > > Hi Jason, > > I am unable to get correct results with the following blast file. > My input sequence has a repeat element so it has a number of hits. > > But there is only one hit with complete identity across the length of > query. > > The Bioperl parser is not giving me the correct result for the first hit > and first hsp. > > Can you please look into this and let me know if I am doing something > wrong or > if you have to fix the script. > > I am attaching the input file I used along with the script I used to > print out the top hit. > > Thank you, > > Lakshmi Kumar > > > > > use strict; > use Bio::SearchIO; > > my $searchio = Bio::SearchIO->new(-file => 'unmasked.blast', > -format => 'blast'); > > while ( my $result = $searchio->next_result() ) { > my $query_name = $result->query_name; > my ($str); > while( my $hit = $result->next_hit ) { > # process the Bio::Search::Hit::HitI object > while( my $hsp = $hit->next_hsp ) { > # process the Bio::Search::HSP::HSPI object > my ($qs,$qe,$hs,$he) = > ($hsp->query->start,$hsp->query->end,$hsp->subject->start,$hsp->subject->end); > my ($chr,$Bts,$Bte) = ($hit->description =~ /Bos taurus chromosome > (.*)-FRAG\[(\d+)\,(\d+)\]/); > $str = $query_name."\t".$chr."\t".($Bts+$hs)."\t".($Bts+$he); > last; > } > last; > } > print $str,"\n"; > } > > > > BLASTN 1.5.4-Paracel [2003-06-05] > > [SNIP] -- Jason Stajich jason at bioperl.org http://www.duke.edu/~jes12/ From arareko at campus.iztacala.unam.mx Tue Sep 19 13:07:37 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 19 Sep 2006 12:07:37 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4510038C.5020606@sendu.me.uk> References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510038C.5020606@sendu.me.uk> Message-ID: <451023D9.9050507@campus.iztacala.unam.mx> Sendu Bala wrote: > David Messina wrote: >> Sendu, Chris, Torsten, et al., >> >> From running the deobfuscator on bioperl-live, I've found some >> formatting inconsistencies in the POD of some modules which are hard for >> the deobfuscator to parse. I've corrected some of these in my working >> (checked-out) copy of bioperl-live. >> >> Anyone object to my checking these in to CVS for inclusion in the new >> release? > > I'd say go right ahead, and thank you. Go ahead Dave! >> I don't want to get in the way of you guys who are already cleaning up >> the PODs. > > I'm only aware of Mauricio doing significant work on POD, but CVS should > handle the overlap. I don't see a problem. No problem, there's always 'cvs up' just before 'cvs commit' ;) >> The intended freeze date is still Sept 25th, correct? > > Yes. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Tue Sep 19 13:28:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 12:28:55 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <451016EF.5080703@sendu.me.uk> Message-ID: <000901c6dc11$15b84370$15327e82@pyrimidine> I think any dependencies are supposed to be listed in the Makefile and DEPENDENCIES regardless of how many times they are used or if the method is optional. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Tuesday, September 19, 2006 11:13 AM > To: bioperl-l > Subject: [Bioperl-l] Optional 'circular dependency' ok? > > Hi, > I'm writing a Bioperl module in which I would like to implement one of > the methods using the Ensembl Perl API. If the method is optional and > the POD clearly states the need for Ensembl, whilst the Ensembl Perl API > never gets added as a pre-req for Bioperl (ie. no entry in Makefile.PL), > are there any problems with this? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Sep 19 13:34:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 12:34:00 -0500 Subject: [Bioperl-l] FW: BioPerl SeqIO-like system in BioPython Message-ID: <000a01c6dc11$cb77ab10$15327e82@pyrimidine> The following is a request from Peter, one of the Biopython developers, for suggestions from us Bioperlers (Bioperlites?). They are trying to implement a SeqIO-like system for BioPython. Any suggestions/hints/help would be greatly appreciated. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign Forwarded message: ----------------------------------------------------- Chris Fields wrote: > > I know that BioPython is trying to get a SeqIO-like system set up. > Let us know if you need any help/advice. > > Cheers! I've thought of a couple of things - if you want to pass this on to the appropriate BioPerl people, please do so. Cheers, Peter Internal names for formats ========================== I want to use simple strings to describe the different file formats for use as function arguments (e.g. "fasta", "genbank"), and ideally use the same names as BioPerl: http://bioperl.open-bio.org/wiki/HOWTO:SeqIO#Formats Is the webpage authoritative? I would guess they match the module names under Bio/SeqIO/*.pm and Bio/AlignIO/*.pm The comments from Bio/AlignIO.pm list a few more names (not listed under SeqIO). For the moment, my intention is to also include multiple alignments as part of our sequence reading support. Gaps ==== How do you cope with assorted gap characters (typically dot/period and dash, '.' and '-') and how different file formats treat them? For example, multiple alignments in Fasta format probably use either, depending on the source of the file. Clustal and Phylip seem to use '-' as a gap. MSF uses '.' Phylip apparently treats '.' as meaning "same character as the previous sequence" which is asking for trouble. Does BioPerl make any efforts to convert everything into an internal standard (say '-') when loading files, and convert as appropriate when writing them? This old thread suggests it is (was) left in the end user's hands: http://bioperl.org/pipermail/bioperl-l/2004-February/014915.html Peter From lstein at cshl.edu Tue Sep 19 13:38:31 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 19 Sep 2006 17:38:31 +0000 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <450F3537.5060605@infotech.monash.edu.au> References: <450F3537.5060605@infotech.monash.edu.au> Message-ID: <6dce9a0b0609191038l2e63fc05w97b4844f5e364934@mail.gmail.com> Torsten, Thanks for doing this audit. I'll take care of my modules. Lincoln On 9/19/06, Torsten Seemann wrote: > > Developers, > > Given the pending RC1 release, I decided to do a quick audit of > bioperl-live, see below. Hopefully no much POD text may got through. > Hope it is useful anyway. > > --Torsten > > > "return undef;" => "return;" # return undef intentional? > > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/pdf.pm: return undef unless $link; > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/eutils.pm: return undef; > Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || !defin > Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua > Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); > Bio/FeatureIO/gff.pm: # be graceful about empty lines or comments, an > Bio/FeatureIO/gff.pm:will return undef if not all features in the stre > Bio/Root/IOManager.pm: return undef unless -e $file; > Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; > > "die" => "$self->throw" # use Bio::Perl exception handling > > Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat forma > Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die "Co > Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die > Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't sto > Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No featur > Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open $path: > Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open imag > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $mutation ) ) { > Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something went ve > Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a result" u > Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" > Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for Smith Wa > Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot get > Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden Marko > Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to rea > Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; > > "FIXME" > > Bio/AlignIO/po.pm > Bio/DB/Expression/geo.pm > Bio/FeatureIO/gff.pm > Bio/Ontology/RelationshipType.pm > Bio/SeqIO/kegg.pm > Bio/SeqIO/swiss.pm > Bio/Tools/ESTScan.pm > Bio/Tools/Est2Genome.pm > Bio/Tools/GuessSeqFormat.pm > Bio/Tools/HMMER/Results.pm > > "???" > > Bio/Variation/VariantI.pm: $self->allele_mut($value); #???? > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/DB/GFF/Adaptor/dbi/oracleace.pm: # then generate a bogus Homolo > Bio/DB/GFF/Adaptor/dbi/mysqlace.pm: # then generate a bogus Homolog > Bio/SeqFeature/Annotated.pm: Args : ??? > Bio/SeqFeature/Tools/Unflattener.pm: # what should w > Bio/SeqFeature/Tools/IDHandler.pm: # warn?? > Bio/Root/IO.pm: $ROOTDIR = ""; # what is reasonable?? > Bio/LiveSeq/Chain.pm:# *??* create hash2dchain ???? (with hashkeys use > Bio/LiveSeq/Chain.pm:# **????** how about using array of arrays instea > Bio/LiveSeq/Chain.pm:# in verbose $string assignment around line 721 ? > > "TODO" > > Bio/Root/Root.pm: # TODO: Fix the MSG: line of the re-thrown err > Bio/Root/Storable.pm: # TODO: add cleanup and unlink methods. For n > Bio/Seq/EncodedSeq.pm: #TODO: finish all this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - we ignore this > Bio/SeqFeature/Tools/Unflattener.pm: # TODO - allow more > Bio/SeqFeature/Tools/Unflattener.pm: ## features. TODO: P > Bio/SeqIO/game/gameWriter.pm:#TODO: can't sequences also have database > Bio/SeqIO/chaos.pm: # TODO > Bio/SeqIO/pir.pm: # TODO - not processing SFS data > Bio/SeqIO/strider.pm: # TODO: determine 'DNA Degenerate > Bio/Search/BlastUtils.pm: # TODO: Account for strand/frame issue! > > "***" > > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > Bio/Matrix/PSM/SiteMatrix.pm: #*** IUPACp values not actually u > Bio/Tree/TreeFunctionsI.pm: #*** the algorithm here hasn't really b > Bio/SeqIO/agave.pm:***NOTE*** At the moment, not all of the tags are > Bio/LiveSeq/Chain.pm:# **** performance concerns > Bio/Map/LinkageMap.pm:#*** what is this? what calls it? note that it s > Bio/Map/Marker.pm: *** does not actually add this marker to the > Bio/Map/SimpleMap.pm: *** does not actually add the element to th > Bio/Map/FPCMarker.pm: *** This has nothing to do with an actual > Bio/Map/PositionI.pm:#*** should this be overridden from RangeI? > > "Why?" > > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/Search/Hit/PullHitI.pm: # why does this method even exist?! > Bio/DB/Biblio/soap.pm:use Bio::Biblio; # TBD: ?? WHY SHOULD I DO THIS > Bio/Graphics/Glyph/dot.pm: # The can() method fails with GD::SVG. Why > Bio/SeqIO/bsml.pm: # Need to kill object for following code to work... > Bio/SeqIO/tinyseq.pm: foreach my $subatt(@$seqatt) { # why are there t > Bio/SeqIO/tinyseq.pm: # NCBI puts refseq ids in TSeq_sid, others in > Bio/SeqIO/game/featHandler.pm: # Why is CDS coordinate info saved a > Bio/SeqIO/swiss.pm: # Um, why would this be anything else but PRT? > Bio/Taxonomy.pm: # taxonomy - why would you be doing things this > Bio/Seq.pm: # I can't remember why not delegating was ever deemed > Bio/Cluster/UniGene.pm: # why does NCBI prepend a 'g' to its own > Bio/LiveSeq/IO/BioPerl.pm:# why array from each_tag_value($qual) ? Whe > Bio/SearchIO/Writer/HSPTableWriter.pm:# Don't know why this > Bio/SearchIO/Writer/ResultTableWriter.pm:# Don't know why t > Bio/Tools/Alignment/Consed.pm:Why was this developed like this? I was > Bio/Tools/Alignment/Consed.pm: # if there is a member array (why woul > io/Map/Physical.pm: #*** why doesn't it call Bio::Map::Clone->new ? > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::FPCMarker-> > Bio/Map/Physical.pm: #*** why doesn't it call Bio::Map::Contig->new > Bio/SeqFeature/Primer.pm: off from those of the idtdna web > Bio/SeqFeature/Primer.pm: as primer3 does. Don't ask why, I never f > Bio/Seq/PrimaryQual.pm: Returns : 1 for a valid sequence (WHY? Shouldn > Bio/Seq/PrimaryQual.pm: Args : a scalar (any scalar, why PrimarySeq > > "hack" > > Bio/DB/Flat/BinarySearch.pm:# is an awful hack - in reality Michele's > Bio/DB/WebDBSeqI.pm:# sorry, but this is hacked in because of BioFetch > Bio/DB/SeqFeature/Store/GFF3Loader.pm: # TEMPORARY HACKS TO SIMPLIFY > Bio/AlignIO/phylip.pm: #if you use a version of phylip (hacked) tha > Bio/SearchIO/blast.pm: # bl2seq hackiness... Not sure I lik > Bio/Tools/Sigcleave.pm:## a quick hack to make sure that we get the sc > Bio/Tools/Blast/HTML.pm: # This is fine for yeast but not worm. This > Bio/Tools/Geneid.pm: # then need to perform the hack of extract > Bio/Root/Utilities.pm: # this is a quick hack to check for availabili > Bio/Root/Err.pm:objects more eval/die-savvy (but the current strategy > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Tue Sep 19 14:30:36 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Sep 2006 19:30:36 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <000901c6dc11$15b84370$15327e82@pyrimidine> References: <000901c6dc11$15b84370$15327e82@pyrimidine> Message-ID: <4510374C.5030805@sendu.me.uk> Chris Fields wrote: > I think any dependencies are supposed to be listed in the Makefile and > DEPENDENCIES regardless of how many times they are used or if the method is > optional. Well, the issue here is that doing that would create a circular dependency; Ensembl Perl API requires Bioperl. Do the various ways of installing Bioperl not 'care' about circular dependencies? From johnsonm at gmail.com Tue Sep 19 11:33:22 2006 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 19 Sep 2006 10:33:22 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer Message-ID: The initial checkin comments (circa '03) for Bio::Tools::Glimmer describe it as a 'GlimmerM 3.0' parser. The POD says '...a module for parsing Glimmer predictions (currently GlimmerM 3.0 is all that has been tested)...'. However, the latest version of GlimmerM looks to be 2.5.1 (ftp://ftp.tigr.org/pub/software/GlimmerM), and there are multiple versions/flavors of Glimmer besides GlimmerM: Glimmer 2.X ( bacteria, archaea, and viruses): http://www.cbcb.umd.edu/software/glimmer/glimmer2.jun01.shtml Glimmer 3.X ( bacteria, archaea, and viruses): http://www.cbcb.umd.edu/software/glimmer/ GlimmerHMM ( eukaryotes ): http://www.cbcb.umd.edu/software/GlimmerHMM/ GlimmerM ( eukaryotes ): http://www.cbcb.umd.edu/software/glimmerm/index.shtml http://www.tigr.org/software/glimmerm/ I suspect Bio::Tools::Glimmer only parses GlimmerM, *maybe* GlimmerHMM, but not Glimmer 2.X or Glimmer 3.X. People do seem to be confused, see Michael Watson's post to bioperl-l on 10/18/2004: http://bioperl.org/pipermail/bioperl-l/2004-October/017112.html. It seems that Glimmer is really more of a 'family' of programs. Should there be one module that tries to parse all the different output formats, or should there be Bio::Tools::Glimmer2, Bio::Tools::Glimmer3, Bio::Tools::GlimmerHMM, Bio::Tools::GlimmerM? I'm presently leaning towards the latter, and would not be opposed to working on Glimmer2 and Glimmer3 myself, as I'm going to need them. Comments, suggestions, opposition, rotten fruit? From cjfields at uiuc.edu Tue Sep 19 14:50:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 13:50:52 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <4510374C.5030805@sendu.me.uk> Message-ID: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> > Chris Fields wrote: > > I think any dependencies are supposed to be listed in the Makefile and > > DEPENDENCIES regardless of how many times they are used or if the method > is > > optional. > > Well, the issue here is that doing that would create a circular > dependency; Ensembl Perl API requires Bioperl. Do the various ways of > installing Bioperl not 'care' about circular dependencies? Well, if there are circular dependencies that's a whole different ballgame. That's what I get for not reading the subject line! Could the module be added to bioperl-ext or bioperl-run? That would break the circular dependency since the Ensembl API doesn't require those; at least I think it doesn't. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cdavis at bcm.tmc.edu Tue Sep 19 15:09:34 2006 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Tue, 19 Sep 2006 14:09:34 -0500 Subject: [Bioperl-l] about PAML running within bioperl + bp_pairwise_kaks.pl In-Reply-To: <001901c6dbcf$9af4de50$0915020a@zchou> References: <001901c6dbcf$9af4de50$0915020a@zchou> Message-ID: <4510406E.3090908@bcm.tmc.edu> Hi Zhuocheng, This is the same error I was getting. What version of PAML are you using? I refreshed bioperl-live and bioperl-run to the CVS version and then bp_pairwise_kaks reported that it doesn't work for 3.15. The mlc file format must have changed and broke the parser. I switched to PAML 3.14 and then it worked great. Thanks guys! --Caleb zhuocheng Hou wrote: > Hello, every one, > > I use code in the PAML HOWTO (running PAML fom within Bioperl) on my Linux OS. And I set ENV as described by instructions. At the beginning, it seems that ClustalW run smoothly. However, when the programme run to call method "get_MLmatrix", somethign happened. The following information was listed as follows: (What reason or How to solve these problems?) > ........ > Sequences (2:3) Aligned. Score: 87 > Sequences (2:4) Aligned. Score: 88 > Sequences (2:5) Aligned. Score: 87 > Sequences (2:6) Aligned. Score: 87 > Sequences (2:7) Aligned. Score: 87 > Sequences (2:8) Aligned. Score: 87 > Sequences (3:4) Aligned. Score: 93 > Sequences (3:5) Aligned. Score: 93 > Sequences (3:6) Aligned. Score: 93 > Sequences (3:7) Aligned. Score: 92 > Sequences (3:8) Aligned. Score: 92 > Sequences (4:5) Aligned. Score: 99 > Sequences (4:6) Aligned. Score: 99 > Sequences (4:7) Aligned. Score: 98 > Sequences (4:8) Aligned. Score: 98 > Sequences (5:6) Aligned. Score: 100 > Sequences (5:7) Aligned. Score: 99 > Sequences (5:8) Aligned. Score: 99 > Sequences (6:7) Aligned. Score: 99 > Sequences (6:8) Aligned. Score: 99 > Sequences (7:8) Aligned. Score: 100 > Guide tree file created: [/home/zchou/TMPDIR/8QEqLivAKY/JU833u8OTP.dnd] > Start of Multiple Alignment > There are 7 groups > Aligning... > Group 1: Sequences: 2 Score:5875 > Group 2: Sequences: 2 Score:5877 > Group 3: Sequences: 4 Score:5864 > Group 4: Sequences: 5 Score:5537 > Group 5: Sequences: 6 Score:5727 > Group 6: Sequences: 7 Score:5608 > Group 7: Sequences: 8 Score:5607 > Alignment Score 43650 > GCG-Alignment file created [/home/zchou/TMPDIR/8QEqLivAKY/CussPD56rZ] > aligned aa sequences were: Bio::SimpleAlign=HASH(0x87b93f4) > Can't call method "get_MLmatrix" on an undefined value at originalpaml.pl line 57, line 332. > > > > > > > > > Zhuocheng Hou > Department of Animal Genetics and Breeding > China Agricultural University > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Tue Sep 19 16:21:30 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 19 Sep 2006 16:21:30 -0400 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: Message-ID: If the differences aren't too big, or if you there is only specific sections for which you need to branch based on version, then handling all in a single module would be more cohesive and simpler for the user. For example, Bio::Tools::GFF supports different flavors of GFF all in one module; you specify the one you want as a named parameter to the constructor. -hilmar On Sep 19, 2006, at 11:33 AM, Mark Johnson wrote: > The initial checkin comments (circa '03) for Bio::Tools::Glimmer > describe it as a 'GlimmerM 3.0' parser. The POD says '...a module for > parsing Glimmer predictions (currently GlimmerM > 3.0 is all that has been tested)...'. However, the latest version of > GlimmerM looks to be 2.5.1 (ftp://ftp.tigr.org/pub/software/GlimmerM), > and there are multiple versions/flavors of Glimmer besides GlimmerM: > > Glimmer 2.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/glimmer2.jun01.shtml > Glimmer 3.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/ > GlimmerHMM ( eukaryotes ): > http://www.cbcb.umd.edu/software/GlimmerHMM/ > GlimmerM ( eukaryotes ): > http://www.cbcb.umd.edu/software/glimmerm/index.shtml > http://www.tigr.org/software/glimmerm/ > > I suspect Bio::Tools::Glimmer only parses GlimmerM, *maybe* > GlimmerHMM, but not Glimmer 2.X or Glimmer 3.X. > People do seem to be confused, see Michael Watson's post to > bioperl-l on 10/18/2004: > http://bioperl.org/pipermail/bioperl-l/2004-October/017112.html. > It seems that Glimmer is really more of a 'family' of programs. > Should there be one module that tries to parse all the different > output formats, or should there be Bio::Tools::Glimmer2, > Bio::Tools::Glimmer3, Bio::Tools::GlimmerHMM, Bio::Tools::GlimmerM? > I'm presently leaning towards the latter, and would not be opposed to > working on Glimmer2 and Glimmer3 myself, as I'm going to need them. > Comments, suggestions, opposition, rotten fruit? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From n.haigh at sheffield.ac.uk Tue Sep 19 17:11:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 19 Sep 2006 21:11:34 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Prerequisites In-Reply-To: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> Message-ID: <45105D06.2070401@sheffield.ac.uk> Yesterday, I started having a look at installing the latest code on the 1.5.2 branch with the aim of running all the tests with Perl 5.6.1. However, I ran into a few hurdles including some regarding dependencies. Firstly, in order to run large test suites that break the command line length limit in windows (and possibly other OS's at different limits) ExtUtils::MakeMaker >= 6.06 is required. It may not be necessary for *nix but it is required if installing Bundle::BioPerl via CPAN and probably also Bioperl (however, I haven't got that far yet - this is tomorrows job), but it is required for testing under windows. It could be added as a prerequisite in a ppd, but most people installing it this way probably wouldn't run the test suite. How should this be dealt with? For prerequisites where there isn't a ppd available in either the 3 main repositories used for Windows install via PPM, how should these be dealt with? I could make the ppd's but should they go into Bundle::BioPerl? or just pop them in the Bioperl repository? What modules should go into Bundle::BioPerl? My feeling is that all non-essential modules should go into Bundle::BioPerl so that it's install with Bioperl would yield a fully functional Bioperl that is capable of running all the tests. This way, either 1) the Bioperl ppd could contain the minimal set of prerequisites and the user can optionally install Bundle::BioPerl or 2) the Bioperl ppd could contain Bundle::BioPerl as a prerequisite in order that PPM users would get a fully functional Bioperl from the off - which might be beneficial to more naive users/beginners. Anyway, I'll get back to the testing tomorrow. Nath From arareko at campus.iztacala.unam.mx Tue Sep 19 17:00:47 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 19 Sep 2006 16:00:47 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> Message-ID: <45105A7F.5010804@campus.iztacala.unam.mx> Chris Fields wrote: > Sendu Bala wrote: >> Chris Fields wrote: >>> I think any dependencies are supposed to be listed in the Makefile and >>> DEPENDENCIES regardless of how many times they are used or if the method >> is >>> optional. >> Well, the issue here is that doing that would create a circular >> dependency; Ensembl Perl API requires Bioperl. Do the various ways of >> installing Bioperl not 'care' about circular dependencies? > > Well, if there are circular dependencies that's a whole different ballgame. > That's what I get for not reading the subject line! > > Could the module be added to bioperl-ext or bioperl-run? That would break > the circular dependency since the Ensembl API doesn't require those; at > least I think it doesn't. What I remember (if my brain still works properly) is that the Ensembl Perl API requires Bioperl 1.2.3 to run properly (or at least to be installed). Ewan mentioned a couple of months ago in the ensembl-dev@ list that Ensembl doesn't make heavy use of Bioperl anymore, so for the moment they're not planning to move up to 1.5. AFAIK most people install the latest Ensembl code with Bioperl 1.2.3 (correct me if I'm wrong, I'm not really sure about this). The thread I mention was precisely about the Bioperl version used in Ensembl, you can see it here: http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/threads.html#01744 My questions here would be: 1) Will your methods be fully functional/compatible with the current Bioperl branch? What I try to say here is that in your particular case, won't the use of the Ensembl Perl API introduce the need of older Bioperl code? Something like a 'backwards dependency' at code level (the arrows mean 'requires'): bioperl-1-5-2 method -> ensembl-40 method -> bioperl-1-2-3 method Here I'm assuming that you're prototyping this by using the latest Bioperl and Ensembl versions from CVS (*almost* every developer lives on the bleeding edge :)). 2) Depending on the amount of code you will use from Ensembl, why introducing its whole API into Bioperl? Maybe you can borrow only what you need from Ensembl and give credit for that. On the other hand, I support Chris' idea of adding your module into the bioperl-ext or bioperl-run packages. To me it sounds good for avoiding a circular dependency problem. Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Tue Sep 19 17:33:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 16:33:34 -0500 Subject: [Bioperl-l] Bioperl 1.5.2 Prerequisites In-Reply-To: <45105D06.2070401@sheffield.ac.uk> Message-ID: <000601c6dc33$43564d50$15327e82@pyrimidine> ... > Yesterday, I started having a look at installing the latest code on the > 1.5.2 branch with the aim of running all the tests with Perl 5.6.1. > > However, I ran into a few hurdles including some regarding dependencies. > > Firstly, in order to run large test suites that break the command line > length limit in windows (and possibly other OS's at different limits) > ExtUtils::MakeMaker >= 6.06 is required. It may not be necessary for > *nix but it is required if installing Bundle::BioPerl via CPAN and > probably also Bioperl (however, I haven't got that far yet - this is > tomorrows job), but it is required for testing under windows. It could > be added as a prerequisite in a ppd, but most people installing it this > way probably wouldn't run the test suite. How should this be dealt with? The way ActiveState deals with up their PPM releases is based on whether they pass an automated test run. If they don't pass everything (which assumes that they have all modules installed for everything to pass) they aren't converted to PPM. I don't think that PPM runs tests, though; that is run pre-build. Technically, all you need to do for making a PPM is (from http://www.bioperl.org/wiki/Create_a_Bioperl_PPM_Package): perl Makefile.PL nmake tar cvf bioperl-ppm.tar blib gzip --best bioperl.tar nmake ppd Then modify the *.ppd file accordingly. You can use WinZip or 7-Zip to package up the blib directory. We don't need to make the same demands with dependencies for a Bioperl PPM release. In fact, we could leave out dependencies completely and have a test script (run by PPM) check for missing dependencies and let the user know (warn) if they aren't present. The current Makefile.PL already does this; we could probably customize it for this and have PPM run the script. This could also be used to install the scripts, something which isn't done with a bare-bones PPM (like the above). Generic Genome Browser does something along those lines; here's the PPD file (note the INSTALL HREF tag): Generic-Genome-Browser A CGI-driven browser for genomic annotations. Lincoln Stein (lstein at cshl.org) > For prerequisites where there isn't a ppd available in either the 3 main > repositories used for Windows install via PPM, how should these be dealt > with? I could make the ppd's but should they go into Bundle::BioPerl? or > just pop them in the Bioperl repository? The main three repositories outside of BioPerl are Kobes, Bribes, and the normal ActiveState repository (which PPM comes configured with). Kobes - http://theoryx5.uwinnipeg.ca/ppms Bribes - http://www.Bribes.org/perl/ppm Several modules aren't present in those repositories, in fact GD::SVG, Text::ShellWords, and Bio::ASN1::EntrezGene come to mind. The first two are present in the Bioperl PPM directory and need to be added to package.lst. The last one builds easily into a PPM so could be added to the Bioperl repository. > What modules should go into Bundle::BioPerl? My feeling is that all > non-essential modules should go into Bundle::BioPerl so that it's > install with Bioperl would yield a fully functional Bioperl that is > capable of running all the tests. This way, either 1) the Bioperl ppd > could contain the minimal set of prerequisites and the user can > optionally install Bundle::BioPerl or 2) the Bioperl ppd could contain > Bundle::BioPerl as a prerequisite in order that PPM users would get a > fully functional Bioperl from the off - which might be beneficial to > more naive users/beginners. > > Anyway, I'll get back to the testing tomorrow. > Nath I like the first option (minimum set of prereqs in the PPD file) with the optional install of Bundle::Bioperl. This allows you to get the core PPM made and added, then add Bundle::Bioperl PPM later when it is ready. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From Kary at ioc.fiocruz.br Tue Sep 19 13:23:56 2006 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Tue, 19 Sep 2006 14:23:56 -0300 Subject: [Bioperl-l] The results of your email commands Message-ID: <29AC1A3F62AAF54BA71E367C6D62CEB096C4FB@alpha.ioc.fiocruz.br> Dear all, I need help with a consense module. I am using bioperl to run clustalw, seqboot, protdist(that are goingo to change by puzzle), weighbor and consense, but for some reason I get this message: MESSAGE 1 Can't call method "get_root_node" on an undefined value at /usr/local/bioperl-1.5.0/Bio/TreeIO/newick.pm line 236. MESSAGE 2 ------------- EXCEPTION ------------- MSG: Expected a Bio::TreeI object STACK Bio::Tools::Run::Phylo::Phylip::Consense::_setinput /usr/local/bioperl-run-1.4/Bio/Tools/Run/Phylo/Phylip/Consense.pm:448 STACK Bio::Tools::Run::Phylo::Phylip::Consense::run /usr/local/bioperl-run-1.4/Bio/Tools/Run/Phylo/Phylip/Consense.pm:331 STACK main::makePhylogenyPipeline filogenia_pipeline.pl:120 STACK main::main filogenia_pipeline.pl:61 STACK toplevel filogenia_pipeline.pl:42 -------------------------------------- I have no idea where to get started to try to solve this. Here is my object with the principal reference. Thank you for the help. Regards Kary ################################################################################ #!/usr/bin/perl -w use lib "/usr/local/bioperl-1.5.0"; use lib "/usr/local/bioperl-run-1.4"; use Bio::Tools::Run::Alignment::Clustalw; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Bio::Tools::Run::Phylo::Phylip::ProtDist; use Bio::Tools::Run::Phylo::Phylip::Neighbor; use Bio::Tools::Run::AnalysisFactory::Pise; use Bio::Tools::Run::Phylo::Phylip::Consense; use Bio::Tools::Run::Phylo::Phylip::DrawTree; use Bio::AlignIO; use Bio::SimpleAlign; use strict; ################## dir ######################### sub makePhylogenyPipeline{ my $dirin_mafft = $_[0]; my $length_weighbor = $_[1]; my $inputfilename = ""; &makeInvariantAwk($length_weighbor); open (READDIRMOD, "find $dirin_mafft |") or die "Cannot open $dirin_mafft: $!"; while ($inputfilename = ){ for ($inputfilename =~ /\.mafft$/) { $inputfilename =~ s/\n//; #Create a SimpleAlign object my @params_align = ( 'ktuple' => 2, 'matrix' => 'BLOSUM', 'output' => 'PHYLIP', 'outfile' => $inputfilename.'.phy'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params_align); my $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object. #Use seqboot to generate bootstrap alignments my @params = ( 'datatype'=>'SEQUENCE', # 'replicates'=>1000); 'replicates'=>1); my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params); my $aln_ref = $seqboot_factory->run($aln); #next build distance matrices my @params_protdist = ('MODEL' => 'PAM'); my $protdist_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params_protdist); #next construct trees #Build a Pise factory my $weighbor_factory = new Bio::Tools::Run::AnalysisFactory::Pise(); #Then create an application object (Pise::Run::Tools::PiseApplication): my $weighbor = $weighbor_factory->program('weighbor'); my @tree; foreach my $a (@{$aln_ref}){ my $matrix = $protdist_factory->create_distance_matrix($a); push @tree, $weighbor->run('infile' => $matrix, 'length' => 500, 'size' => 3.85); # Size of the alphabet (-b) } #use consense to get a final tree my $consense_factory = Bio::Tools::Run::Phylo::Phylip::Consense->new(); my ($tree) = $consense_factory->run(\@tree); #now draw the tree my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); my $image_filename = $draw_factory->draw_tree($tree); } } close (READDIRMOD); } -------------- next part -------------- An embedded message was scrubbed... From: "Kary Ann Del Carmen Soriano Ocana" Subject: Help with consense module Date: Mon, 18 Sep 2006 15:29:59 -0300 Size: 4639 Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060919/00b28bb7/attachment-0001.mht From lmatukum at gmu.edu Tue Sep 19 12:41:09 2006 From: lmatukum at gmu.edu (Lakshmi K Matukumalli) Date: Tue, 19 Sep 2006 12:41:09 -0400 Subject: [Bioperl-l] Blast Parser Error In-Reply-To: <8273f6c20609190913oee9c992g3e7312a31366c949@mail.gmail.com> References: <451006BE.1090504@gmu.edu> <8273f6c20609190913oee9c992g3e7312a31366c949@mail.gmail.com> Message-ID: <45101DA5.9030104@gmu.edu> Jason Stajich wrote: > Forwarding your question to the bioperl mailing list where it can be > answered. Please consider making your BLAST file available one a > website - it is bad form to mail someone a very large email attachment > unsolicited. If you do not have access to a website where you can > make the report avaialble you can submit this a bug at > http://bugzilla.open-bio.org and attached a report to the bug. > > it is possible the PARACEL BLAST format is not parseable with latest > code - please let the list know what version of bioperl you are using > etc. Please test your question against the latest code in CVS as well > before you ask others to debug this problem. > > -jason > > On 9/19/06, *Lakshmi K Matukumalli* > wrote: > > Hi Jason, > > I am unable to get correct results with the following blast file. > My input sequence has a repeat element so it has a number of hits. > > But there is only one hit with complete identity across the length > of query. > > The Bioperl parser is not giving me the correct result for the > first hit > and first hsp. > > Can you please look into this and let me know if I am doing something > wrong or > if you have to fix the script. > > I am attaching the input file I used along with the script I used to > print out the top hit. > > Thank you, > > Lakshmi Kumar > > > > > use strict; > use Bio::SearchIO; > > my $searchio = Bio::SearchIO->new(-file => ' unmasked.blast', > -format => 'blast'); > > while ( my $result = $searchio->next_result() ) { > my $query_name = $result->query_name; > my ($str); > while( my $hit = $result->next_hit ) { > # process the Bio::Search::Hit::HitI object > while( my $hsp = $hit->next_hsp ) { > # process the Bio::Search::HSP::HSPI object > my ($qs,$qe,$hs,$he) = > ($hsp->query->start,$hsp->query->end,$hsp->subject->start,$hsp->subject->end); > > my ($chr,$Bts,$Bte) = ($hit->description =~ /Bos taurus > chromosome (.*)-FRAG\[(\d+)\,(\d+)\]/); > $str = $query_name."\t".$chr."\t".($Bts+$hs)."\t".($Bts+$he); > last; > } > last; > } > print $str,"\n"; > } > > > > BLASTN 1.5.4-Paracel [2003-06-05] > > [SNIP] > > > -- > Jason Stajich > jason at bioperl.org > http://www.duke.edu/~jes12/ Hi Jason, Thank you for forwarding the email. I apologize for sending the large attachment. I have placed the blast file here. http://mysite.verizon.net/lmatukum/blast/unmasked.blast You can right click and save the blast file. Thank you, Lakshmi Kumar From bix at sendu.me.uk Tue Sep 19 18:25:48 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Sep 2006 23:25:48 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <45105A7F.5010804@campus.iztacala.unam.mx> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> <45105A7F.5010804@campus.iztacala.unam.mx> Message-ID: <45106E6C.1000605@sendu.me.uk> Mauricio Herrera Cuadra wrote: > Chris Fields wrote: >> Sendu Bala wrote: > >>> Well, the issue here is that doing that would create a circular >>> dependency; Ensembl Perl API requires Bioperl. Do the various ways of >>> installing Bioperl not 'care' about circular dependencies? [snip] > My questions here would be: > > 1) Will your methods be fully functional/compatible with the current > Bioperl branch? What I try to say here is that in your particular case, > won't the use of the Ensembl Perl API introduce the need of older > Bioperl code? Something like a 'backwards dependency' at code level (the > arrows mean 'requires'): > > bioperl-1-5-2 method -> ensembl-40 method -> bioperl-1-2-3 method > > Here I'm assuming that you're prototyping this by using the latest > Bioperl and Ensembl versions from CVS (*almost* every developer lives on > the bleeding edge :)). Yes, so the method would work with latest bioperl and ensembl, no need for bioperl 1.2.3. > 2) Depending on the amount of code you will use from Ensembl, why > introducing its whole API into Bioperl? Maybe you can borrow only what > you need from Ensembl and give credit for that. I'm not really putting any of Ensembl in Bioperl, I'm just using it, in the same way methods are implemented using any other external module. It isn't feasible to extract the code that does the job from Ensembl, especially given that the underlying code could change on their next release; I need to make use of the API. > On the other hand, I support Chris' idea of adding your module into the > bioperl-ext or bioperl-run packages. To me it sounds good for avoiding a > circular dependency problem. Well, bioperl-ext is described as being for Bioperl C compiled extensions. I suppose that instead of being a simple method in my module, I could create a whole new module in bioperl-run that was essentially a simplified front-end to what I needed Ensembl for, treating it like an external application? I don't think either is really an appropriate 'fit'; what is wrong with simply not listing the Ensembl API as a dependency in Makefile.PL? Aren't there already optional things in Bioperl that only begin to work after you read the instructions and manually install something? Well, there must be, since I've had to do exactly that to get all tests in the suite to run (and not just skip). From simona_bazzocchi at yahoo.it Tue Sep 19 18:54:53 2006 From: simona_bazzocchi at yahoo.it (simona bazzocchi) Date: Wed, 20 Sep 2006 00:54:53 +0200 (CEST) Subject: [Bioperl-l] FASTA parsing Message-ID: <20060919225453.72934.qmail@web86809.mail.ukl.yahoo.com> Hi, I have a problem. I need to filter just the query sequences that match with the first 4 bases vs library sequences. I used FASTA to get the output (FASTA_RESULTS104.txt) using my perl script below, but now I don't know how i can filter that output. I read the Bio::SearchIO manual but it's not so clear to me. Have you any idea? Thank you very much Simona #!/usr/bin/perl -w use strict; my $location="/home/kei/fasta/fasta34"; my $library="DNAUS.txt"; my $query="mature2.txt"; my $ktup=1; my $options="-r +5/-4 -H -w 100 -m 9 -f -12 -g -4 -q"; my $command="$location $options $query $library $ktup"; my $output = `$command`; open FASTA,">>","FASTA_RESULTS104.txt"; print FASTA $output; close (FASTA); __________________________________________________ Do You Yahoo!? Poco spazio e tanto spam? Yahoo! Mail ti protegge dallo spam e ti da tanto spazio gratuito per i tuoi file e i messaggi http://mail.yahoo.it From cjfields at uiuc.edu Tue Sep 19 19:35:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 18:35:32 -0500 Subject: [Bioperl-l] Blast Parser Error In-Reply-To: <45101DA5.9030104@gmu.edu> References: <451006BE.1090504@gmu.edu> <8273f6c20609190913oee9c992g3e7312a31366c949@mail.gmail.com> <45101DA5.9030104@gmu.edu> Message-ID: This is related to a bug that I'm working on (bug 1986 for the curious). There are problems with certain BLAST reports where the order of the hit alignments doesn't match the order in the hit table. Compounding that, the built-in event handler sorts repetitive hits, which throws the hit order off. To solve this, change the event handler: use Bio::SearchIO::SearchResultEventBuilder; my $searchio = Bio::SearchIO->new(-file => 'unmasked.blast', -format => 'blast'); $searchio->attach_EventHandler (Bio::SearchIO::SearchResultEventBuilder->new()); When I tried this on your report the hit changes to the first one. Thank Jason S. for pointing this one out. I plan on changing the event handler to prevent this default behavior. Chris On Sep 19, 2006, at 11:41 AM, Lakshmi K Matukumalli wrote: > Jason Stajich wrote: >> Forwarding your question to the bioperl mailing list where it can be >> answered. Please consider making your BLAST file available one a >> website - it is bad form to mail someone a very large email >> attachment >> unsolicited. If you do not have access to a website where you can >> make the report avaialble you can submit this a bug at >> http://bugzilla.open-bio.org and attached a report to the bug. >> >> it is possible the PARACEL BLAST format is not parseable with latest >> code - please let the list know what version of bioperl you are using >> etc. Please test your question against the latest code in CVS as >> well >> before you ask others to debug this problem. >> >> -jason >> >> On 9/19/06, *Lakshmi K Matukumalli* > > wrote: >> >> Hi Jason, >> >> I am unable to get correct results with the following blast file. >> My input sequence has a repeat element so it has a number of >> hits. >> >> But there is only one hit with complete identity across the >> length >> of query. >> >> The Bioperl parser is not giving me the correct result for the >> first hit >> and first hsp. >> >> Can you please look into this and let me know if I am doing >> something >> wrong or >> if you have to fix the script. >> >> I am attaching the input file I used along with the script I >> used to >> print out the top hit. >> >> Thank you, >> >> Lakshmi Kumar >> >> >> >> >> use strict; >> use Bio::SearchIO; >> >> my $searchio = Bio::SearchIO->new(-file => ' unmasked.blast', >> -format => 'blast'); >> >> while ( my $result = $searchio->next_result() ) { >> my $query_name = $result->query_name; >> my ($str); >> while( my $hit = $result->next_hit ) { >> # process the Bio::Search::Hit::HitI object >> while( my $hsp = $hit->next_hsp ) { >> # process the Bio::Search::HSP::HSPI object >> my ($qs,$qe,$hs,$he) = >> ($hsp->query->start,$hsp->query->end,$hsp->subject->start,$hsp- >> >subject->end); >> >> my ($chr,$Bts,$Bte) = ($hit->description =~ /Bos taurus >> chromosome (.*)-FRAG\[(\d+)\,(\d+)\]/); >> $str = $query_name."\t".$chr."\t".($Bts+$hs)."\t".($Bts+ >> $he); >> last; >> } >> last; >> } >> print $str,"\n"; >> } >> >> >> >> BLASTN 1.5.4-Paracel [2003-06-05] >> >> [SNIP] >> >> >> -- >> Jason Stajich >> jason at bioperl.org >> http://www.duke.edu/~jes12/ > > Hi Jason, > > Thank you for forwarding the email. I apologize for sending the large > attachment. > > I have placed the blast file here. > > http://mysite.verizon.net/lmatukum/blast/unmasked.blast > > You can right click and save the blast file. > > Thank you, > > Lakshmi Kumar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Sep 19 19:47:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 18:47:41 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <45106E6C.1000605@sendu.me.uk> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> <45105A7F.5010804@campus.iztacala.unam.mx> <45106E6C.1000605@sendu.me.uk> Message-ID: On Sep 19, 2006, at 5:25 PM, Sendu Bala wrote: ... >> On the other hand, I support Chris' idea of adding your module >> into the >> bioperl-ext or bioperl-run packages. To me it sounds good for >> avoiding a >> circular dependency problem. > > Well, bioperl-ext is described as being for Bioperl C compiled > extensions. I suppose that instead of being a simple method in my > module, I could create a whole new module in bioperl-run that was > essentially a simplified front-end to what I needed Ensembl for, > treating it like an external application? > > I don't think either is really an appropriate 'fit'; what is wrong > with > simply not listing the Ensembl API as a dependency in Makefile.PL? I think there are a few modules in bioperl-run which don't necessarily fit. Regardless, it's up to you. > Aren't there already optional things in Bioperl that only begin to > work > after you read the instructions and manually install something? Well, > there must be, since I've had to do exactly that to get all tests > in the > suite to run (and not just skip). There shouldn't be! All dependencies should be found in the Makefile.PL and listed in the INSTALL file dependencies. Using 'perl Makefile.PL' doesn't force you to install them, but it does warn you what Bioperl classes require them if they aren't present. I think the large dependency list is the reason there is a separate Bundle::Bioperl installation. And, even then, I don't get abi.t and other similar tests to work b/c they require bioperl-ext (which I find too much of a bother to install, really). Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Tue Sep 19 21:13:43 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 20 Sep 2006 11:13:43 +1000 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: Message-ID: <451095C7.3020905@infotech.monash.edu.au> Mark, I have added example output files for all 4 flavours of Glimmer to bioperl-live CVS as t/data/Glimmer*, described below: > The initial checkin comments (circa '03) for Bio::Tools::Glimmer > describe it as a 'GlimmerM 3.0' parser. The POD says '...a module for > parsing Glimmer predictions (currently GlimmerM > 3.0 is all that has been tested)...'. However, the latest version of > GlimmerM looks to be 2.5.1 (ftp://ftp.tigr.org/pub/software/GlimmerM), > and there are multiple versions/flavors of Glimmer besides GlimmerM: > > Glimmer 2.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/glimmer2.jun01.shtml A single two part output file. The first part has detailed information regarding all ORFs, while the second part has the putative genes. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/t/data/Glimmer2.out?cvsroot=bioperl > Glimmer 3.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/ Glimmer3 produces two separate files: XXX.detail and XXX.predict. The Glimmer3 .detail file is similar to the first part of the Glimmer 2.x first part. The Glimmer3 .predict file conveys the same information as the second part of a Glimmer2 file, but in a totally different format! http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/t/data/Glimmer2.detail?cvsroot=bioperl http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/t/data/Glimmer2.predict?cvsroot=bioperl > GlimmerM ( eukaryotes ): > http://www.cbcb.umd.edu/software/glimmerm/index.shtml > http://www.tigr.org/software/glimmerm/ I used GlimmerM 2.5.1. The output matches the original "t/data/glimmer.out" test file in CVS. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/t/data/GlimmerM.out?cvsroot=bioperl > GlimmerHMM ( eukaryotes ): > http://www.cbcb.umd.edu/software/GlimmerHMM/ This format is nearly identical to GlimmerM, only the first line header is different. I used version 2.2.0. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/t/data/GlimmerHMM.out?cvsroot=bioperl > I suspect Bio::Tools::Glimmer only parses GlimmerM, *maybe* > GlimmerHMM, but not Glimmer 2.X or Glimmer 3.X. It doesn't currently work with my GlimmerHMM output, as the module expects a version number, which my output does not have - but I will fix that in CVS today. However it won't work with Glimmer 2.x and 3.x. And it probably shouldn't as the Eukaryotic stuff isn't relevant. New code has to be written. Most people only want the final gene predictions, which 2.x and 3.x use different formats and files for. I'm not sure whether to 1. parse them all under the same module, perhaps with a -format=>'glimmerXXX' parameter 2. create a single new module Glimmer2 and Glimmer3 3. create two new modules, one for Glimmer2 and one for Glimmer3, given they are different outputs both in syntax and number of output files Any advice from Bioperl 'old timers' appreciated ;-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Tue Sep 19 19:44:45 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 20 Sep 2006 09:44:45 +1000 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: Message-ID: <451080ED.5000908@infotech.monash.edu.au> Mark, > The initial checkin comments (circa '03) for Bio::Tools::Glimmer > describe it as a 'GlimmerM 3.0' parser. The POD says '...a module for > parsing Glimmer predictions (currently GlimmerM > 3.0 is all that has been tested)...'. However, the latest version of > GlimmerM looks to be 2.5.1 (ftp://ftp.tigr.org/pub/software/GlimmerM), > and there are multiple versions/flavors of Glimmer besides GlimmerM: > > Glimmer 2.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/glimmer2.jun01.shtml > Glimmer 3.X ( bacteria, archaea, and viruses): > http://www.cbcb.umd.edu/software/glimmer/ > GlimmerHMM ( eukaryotes ): > http://www.cbcb.umd.edu/software/GlimmerHMM/ > GlimmerM ( eukaryotes ): > http://www.cbcb.umd.edu/software/glimmerm/index.shtml > http://www.tigr.org/software/glimmerm/ > > I suspect Bio::Tools::Glimmer only parses GlimmerM, *maybe* > GlimmerHMM, but not Glimmer 2.X or Glimmer 3.X. I also noticed this last year some time but never got around to deducing the various flavours, and forgot about it... :-) I work with Bacteria and have only used "Glimmer 2.x" and "Glimmer 3.x". I will try and produce output files for all four variants today, then we can nut out a improvement path for Bio::Tools::Glimmer. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From arareko at campus.iztacala.unam.mx Wed Sep 20 00:32:17 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 19 Sep 2006 23:32:17 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> <45105A7F.5010804@campus.iztacala.unam.mx> <45106E6C.1000605@sendu.me.uk> Message-ID: <4510C451.6010400@campus.iztacala.unam.mx> Chris Fields wrote: > On Sep 19, 2006, at 5:25 PM, Sendu Bala wrote: [snip] >> Aren't there already optional things in Bioperl that only begin to >> work >> after you read the instructions and manually install something? Well, >> there must be, since I've had to do exactly that to get all tests >> in the >> suite to run (and not just skip). > > There shouldn't be! All dependencies should be found in the > Makefile.PL and listed in the INSTALL file dependencies. Using 'perl > Makefile.PL' doesn't force you to install them, but it does warn you > what Bioperl classes require them if they aren't present. There shouldn't be, but they actually happen to exist. An example of this is the use of Regexp::Common in the bioperl-live/maintenance/check_URLs.pl script. Even though this script is supposed to be used only by developers, it introduces the condition that Sendu describes. Other examples are version and Class::Inspector which are used by the Deobfuscator. These 3 dependencies haven't been added to the main Makefile.PL due to the intended use of the scripts that require them. > I think the large dependency list is the reason there is a separate > Bundle::Bioperl installation. And, even then, I don't get abi.t and > other similar tests to work b/c they require bioperl-ext (which I > find too much of a bother to install, really). > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From torsten.seemann at infotech.monash.edu.au Wed Sep 20 00:38:18 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 20 Sep 2006 14:38:18 +1000 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: References: <001001c6dbf4$65437260$15327e82@pyrimidine> Message-ID: <4510C5BA.9080408@infotech.monash.edu.au> David, > From running the deobfuscator on bioperl-live, I've found some > formatting inconsistencies in the POD of some modules which are hard > for the deobfuscator to parse. I've corrected some of these in my > working (checked-out) copy of bioperl-live. Please check them in. I have written a script to audit all Perl modules to ensure the NAME in the POD matches the Perl module and has the correct capitilization. % cd bioperl-live/maintenance % ./check_NAMEs.pl Here's the current buggy ones, before your commits AFAIK. Bio::IdCollectionI Bio::Location::SplitLocationI Bio::Search::Result::PullResultI Bio::Search::HSP::HmmpfamHSP Bio::Search::Hit::HmmpfamHit Bio::Expression::FeatureGroup::FeatureGroupMas50 Bio::Expression::FeatureSet::FeatureSetMas50 Bio::DB::EUtilities::Cookie Bio::DB::GFF::Adaptor::berkeleydb Bio::DB::GFF::Adaptor::biofetch_oracle Bio::DB::GFF::Adaptor::dbi::pg_fts Bio::DB::GFF::Adaptor::memory::feature_serializer Bio::DB::SeqFeature::Store::bdb Bio::DB::SeqFeature::Store::DBI::Iterator Bio::Matrix::PSM::SiteMatrixI Bio::Matrix::PSM::ProtPsm Bio::Matrix::PSM::IO::psiblast Bio::Matrix::PSM::IO::transfac Bio::Matrix::PSM::IO::meme Bio::Graphics::Util Bio::Graphics::Glyph::ex Bio::Graphics::Glyph::three_letters Bio::Graphics::Glyph::arrow Bio::Graphics::Glyph::ruler_arrow Bio::Graphics::Glyph::flag Bio::Phenotype::Measure Bio::Phenotype::PhenotypeI Bio::Phenotype::Phenotype Bio::Phenotype::Correlate Bio::Phenotype::OMIM::MiniMIMentry Bio::Phenotype::OMIM::OMIMparser Bio::Phenotype::OMIM::OMIMentry Bio::Phenotype::OMIM::OMIMentryAllelicVariant Bio::SeqIO::qual Bio::SeqIO::genbank Bio::OntologyIO::simplehierarchy Bio::OntologyIO::InterProParser Bio::OntologyIO::soflat Bio::OntologyIO::dagflat Bio::OntologyIO::obo Bio::OntologyIO::goflat Bio::OntologyIO::Handlers::InterProHandler Bio::Tools::pICalculator Bio::Tools::ECnumber Bio::SeqFeature::Gene::GeneStructureI Bio::SeqFeature::Gene::Poly_A_site Bio::Seq::SeqFastaSpeedFactory Bio::Ontology::RelationshipI Bio::Ontology::TermI Bio::Ontology::InterProTerm Bio::Ontology::RelationshipType Bio::Ontology::OBOterm Bio::Ontology::OBOEngine Bio::Ontology::PathI Bio::Ontology::OntologyEngineI Bio::Ontology::Path Bio::Ontology::GOterm Bio::Ontology::Term Bio::Ontology::SimpleGOEngine Bio::Ontology::Relationship Bio::Ontology::SimpleGOEngine::GraphAdaptor Bio::Ontology::SimpleGOEngine::GraphAdaptor02 -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Wed Sep 20 00:44:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 23:44:02 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4510C5BA.9080408@infotech.monash.edu.au> References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510C5BA.9080408@infotech.monash.edu.au> Message-ID: Ouch! That many? Chris On Sep 19, 2006, at 11:38 PM, Torsten Seemann wrote: > David, > >> From running the deobfuscator on bioperl-live, I've found some >> formatting inconsistencies in the POD of some modules which are >> hard for the deobfuscator to parse. I've corrected some of these >> in my working (checked-out) copy of bioperl-live. > > Please check them in. > > I have written a script to audit all Perl modules to ensure the > NAME in the POD matches the Perl module and has the correct > capitilization. > > % cd bioperl-live/maintenance > % ./check_NAMEs.pl > > Here's the current buggy ones, before your commits AFAIK. > > Bio::IdCollectionI > Bio::Location::SplitLocationI > Bio::Search::Result::PullResultI > Bio::Search::HSP::HmmpfamHSP > Bio::Search::Hit::HmmpfamHit > Bio::Expression::FeatureGroup::FeatureGroupMas50 > Bio::Expression::FeatureSet::FeatureSetMas50 > Bio::DB::EUtilities::Cookie > Bio::DB::GFF::Adaptor::berkeleydb > Bio::DB::GFF::Adaptor::biofetch_oracle > Bio::DB::GFF::Adaptor::dbi::pg_fts > Bio::DB::GFF::Adaptor::memory::feature_serializer > Bio::DB::SeqFeature::Store::bdb > Bio::DB::SeqFeature::Store::DBI::Iterator > Bio::Matrix::PSM::SiteMatrixI > Bio::Matrix::PSM::ProtPsm > Bio::Matrix::PSM::IO::psiblast > Bio::Matrix::PSM::IO::transfac > Bio::Matrix::PSM::IO::meme > Bio::Graphics::Util > Bio::Graphics::Glyph::ex > Bio::Graphics::Glyph::three_letters > Bio::Graphics::Glyph::arrow > Bio::Graphics::Glyph::ruler_arrow > Bio::Graphics::Glyph::flag > Bio::Phenotype::Measure > Bio::Phenotype::PhenotypeI > Bio::Phenotype::Phenotype > Bio::Phenotype::Correlate > Bio::Phenotype::OMIM::MiniMIMentry > Bio::Phenotype::OMIM::OMIMparser > Bio::Phenotype::OMIM::OMIMentry > Bio::Phenotype::OMIM::OMIMentryAllelicVariant > Bio::SeqIO::qual > Bio::SeqIO::genbank > Bio::OntologyIO::simplehierarchy > Bio::OntologyIO::InterProParser > Bio::OntologyIO::soflat > Bio::OntologyIO::dagflat > Bio::OntologyIO::obo > Bio::OntologyIO::goflat > Bio::OntologyIO::Handlers::InterProHandler > Bio::Tools::pICalculator > Bio::Tools::ECnumber > Bio::SeqFeature::Gene::GeneStructureI > Bio::SeqFeature::Gene::Poly_A_site > Bio::Seq::SeqFastaSpeedFactory > Bio::Ontology::RelationshipI > Bio::Ontology::TermI > Bio::Ontology::InterProTerm > Bio::Ontology::RelationshipType > Bio::Ontology::OBOterm > Bio::Ontology::OBOEngine > Bio::Ontology::PathI > Bio::Ontology::OntologyEngineI > Bio::Ontology::Path > Bio::Ontology::GOterm > Bio::Ontology::Term > Bio::Ontology::SimpleGOEngine > Bio::Ontology::Relationship > Bio::Ontology::SimpleGOEngine::GraphAdaptor > Bio::Ontology::SimpleGOEngine::GraphAdaptor02 > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Sep 20 00:48:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Sep 2006 23:48:36 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <4510C451.6010400@campus.iztacala.unam.mx> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> <45105A7F.5010804@campus.iztacala.unam.mx> <45106E6C.1000605@sendu.me.uk> <4510C451.6010400@campus.iztacala.unam.mx> Message-ID: On Sep 19, 2006, at 11:32 PM, Mauricio Herrera Cuadra wrote: .. >> There shouldn't be! All dependencies should be found in the >> Makefile.PL and listed in the INSTALL file dependencies. Using >> 'perl Makefile.PL' doesn't force you to install them, but it does >> warn you what Bioperl classes require them if they aren't present. > > There shouldn't be, but they actually happen to exist. An example > of this is the use of Regexp::Common in the bioperl-live/ > maintenance/check_URLs.pl script. Even though this script is > supposed to be used only by developers, it introduces the condition > that Sendu describes. > > Other examples are version and Class::Inspector which are used by > the Deobfuscator. These 3 dependencies haven't been added to the > main Makefile.PL due to the intended use of the scripts that > require them. > ... ...which makes sense. For those scripts we could add README files stating their use and requirements, or run eval(require) blocks to catch and throw if a required module isn't present. We should probably be stricter with regards to the core modules ('core' being anything that resides in the Bio namespace in the distribution). > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Sep 20 05:26:51 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Sep 2006 10:26:51 +0100 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <450F3537.5060605@infotech.monash.edu.au> References: <450F3537.5060605@infotech.monash.edu.au> Message-ID: <4511095B.6080002@sendu.me.uk> Torsten Seemann wrote: > > "return undef;" => "return;" # return undef intentional? > > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/pdf.pm: return undef unless $link; > Bio/DB/Biblio/pdf.pm: return undef; > Bio/DB/Biblio/eutils.pm: return undef; > Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || !defin > Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua > Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); > Bio/FeatureIO/gff.pm: # be graceful about empty lines or comments, an > Bio/FeatureIO/gff.pm:will return undef if not all features in the stre > Bio/Root/IOManager.pm: return undef unless -e $file; > Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; I've changed all those; none of them seemed intentional. > "die" => "$self->throw" # use Bio::Perl exception handling > > Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat forma > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) > Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $mutation ) ) { > Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something went ve > Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a result" u > Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" > Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" > Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for Smith Wa > Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot get > Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a > Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden Marko > Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to rea > Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; Of these, only Variation/IO.pm and Phenotype/OMIM/OMIMparser.pm needed changing (most are in the POD). I haven't investigated Lincoln's modules: > Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die "Co > Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die > Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't sto > Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No featur > Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open $path: > Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open imag From cjfields at uiuc.edu Wed Sep 20 07:48:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 06:48:13 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4511095B.6080002@sendu.me.uk> References: <450F3537.5060605@infotech.monash.edu.au> <4511095B.6080002@sendu.me.uk> Message-ID: <019AC52D-F5B8-4D2D-8D0D-F5598FE29878@uiuc.edu> If you can, please update the Code optimization page so we can keep track. http://www.bioperl.org/wiki/BioPerl_code_optimization Thanks Sendu! Chris On Sep 20, 2006, at 4:26 AM, Sendu Bala wrote: > Torsten Seemann wrote: >> >> "return undef;" => "return;" # return undef intentional? >> >> Bio/DB/Biblio/pdf.pm: return undef; >> Bio/DB/Biblio/pdf.pm: return undef unless $link; >> Bio/DB/Biblio/pdf.pm: return undef; >> Bio/DB/Biblio/eutils.pm: return undef; >> Bio/DB/WebDBSeqI.pm: return undef if ( !defined $self->ua || ! >> defin >> Bio/Tools/Run/RemoteBlast.pm: return undef if ( !defined $self->ua >> Bio/FeatureIO/gff.pm: return undef if $self->fasta_mode(); >> Bio/FeatureIO/gff.pm: # be graceful about empty lines or >> comments, an >> Bio/FeatureIO/gff.pm:will return undef if not all features in the >> stre >> Bio/Root/IOManager.pm: return undef unless -e $file; >> Bio/Root/Object.pm: return undef unless defined $self->{'_err'}; > > I've changed all those; none of them seemed intentional. > > >> "die" => "$self->throw" # use Bio::Perl exception handling >> >> Bio/Variation/IO.pm: $format2 = shift || die "Usage: reformat >> forma >> Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined( $description ) ) >> Bio/Phenotype/OMIM/OMIMparser.pm: if ( ! defined >> ( $mutation ) ) { >> Bio/LiveSeq/Chain.pm: die "_praepostinsert_array: Something >> went ve >> Bio/Tools/isPcr.pm: my $seq = $seqio->next_seq || die("cannot >> get a >> Bio/Tools/Analysis/DNA/ESEfinder.pm: die "Could not get a result" >> Bio/Tools/Analysis/Protein/NetPhos.pm: die "Could not get a >> result" u >> Bio/Tools/Analysis/Protein/Mitoprot.pm: die "Could not get a result" >> Bio/Tools/Analysis/Protein/Scansite.pm: die "Could not get a result" >> Bio/Tools/dpAlign.pm: die("\nThe C-compiled engine for >> Smith Wa >> Bio/Tools/ipcress.pm: my $seq = $seqio->next_seq || die("cannot >> get >> Bio/Tools/EPCR.pm: my $seq = $seqio->next_seq || die("cannot get a >> Bio/Tools/HMM.pm: die("\nThe C-compiled engine for Hidden >> Marko >> Bio/Seq/PrimedSeq.pm: my $file = shift || die "need a file to >> rea >> Bio/Seq/PrimedSeq.pm: my $file = shift || die "$0 "; > > Of these, only Variation/IO.pm and Phenotype/OMIM/OMIMparser.pm needed > changing (most are in the POD). I haven't investigated Lincoln's > modules: > >> Bio/DB/SeqFeature/Store/DBI/mysql.pm: $db->store($feature) or die >> "Co >> Bio/DB/SeqFeature/Store/berkeleydb.pm: $db->store($feature) or die >> Bio/DB/SeqFeature/Store.pm: $db->store($feature) or die "Couldn't >> sto >> Bio/Graphics/Glyph.pm: my $feature = $arg{-feature} or die "No >> featur >> Bio/Graphics/Glyph/image.pm: open F,$path or die "Can't open >> $path: >> Bio/Graphics/Panel.pm: open (F,">$imagefile") || die("Can't open >> imag > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Sep 20 10:05:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Sep 2006 15:05:32 +0100 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4510C5BA.9080408@infotech.monash.edu.au> References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510C5BA.9080408@infotech.monash.edu.au> Message-ID: <45114AAC.90302@sendu.me.uk> Torsten Seemann wrote: > David, > >> From running the deobfuscator on bioperl-live, I've found some >> formatting inconsistencies in the POD of some modules which are hard >> for the deobfuscator to parse. I've corrected some of these in my >> working (checked-out) copy of bioperl-live. > > Please check them in. > > I have written a script to audit all Perl modules to ensure the NAME in > the POD matches the Perl module and has the correct capitilization. > > % cd bioperl-live/maintenance > % ./check_NAMEs.pl > > Here's the current buggy ones, before your commits AFAIK. [snip] David, have you/will you fix all of those ones? I can do my own and any others you haven't covered, so let me know. From kak28 at cam.ac.uk Wed Sep 20 09:49:49 2006 From: kak28 at cam.ac.uk (Krys Kelly) Date: Wed, 20 Sep 2006 14:49:49 +0100 Subject: [Bioperl-l] Conversion of EMBL format flat file to GFF format flat file Message-ID: I am completely new to Bioperl. I would like to convert an EMBL flat file into a GFF flat file. To make sure I could get a perl script to work I have used the following script to read in my EMBL file and write it out as FASTA: my $in = Bio::SeqIO->new(-file => "Toxo1b_080605_test.embl", -format => 'EMBL'); my $out = Bio::SeqIO->new(-file => ">ChrIb_new.fasta", -format => 'fasta'); while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } But the list of formats available (http://bioperl.open-bio.org/wiki/HOWTO:SeqIO#Formats ) does not contain gff. I have tried searching the documentation and the mail archives, but I have not found anything that would help me. Are there any existing bioperl modules for this conversion? I would be grateful for any help. Thanks Krys Dr Krystyna A Kelly (Krys) Department of Pathology and MRC Biostatistics Unit University of Cambridge Institute of Public Health Tennis Court Road Robinson Way Cambridge CB2 1QP Cambridge CB2 2SR 01223 333331 01223 767408 kak28 at cam.ac.uk krystyna.kelly at mrc-bsu.cam.ac.uk From m.campitelli at repubblica.it Wed Sep 20 03:35:45 2006 From: m.campitelli at repubblica.it (Massimo Campitelli) Date: Wed, 20 Sep 2006 07:35:45 -0000 Subject: [Bioperl-l] Installation Failed Tests Message-ID: <878AAE1E-CCDA-46F4-95F7-D44F5E12EA28@repubblica.it> To the kind attention prof Simon Wagstaff Rome 21 july 2006 Il Venerd? di Repubblica , weekly supplement of the dalily la Repubblica in Rome Italy needs a picture of the professor Simon Wagstaff to publish to our magazine for an article about the research of the snake's venom . Our e mail is fotoven at repubblica.it . Can you help us in a short time ? Best regards Massimo Campitelli Il Venerdi di Repubblica From cjfields at uiuc.edu Wed Sep 20 10:52:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 09:52:28 -0500 Subject: [Bioperl-l] The results of your email commands In-Reply-To: <29AC1A3F62AAF54BA71E367C6D62CEB096C4FB@alpha.ioc.fiocruz.br> Message-ID: <002301c6dcc4$64d24380$15327e82@pyrimidine> It looks like no tree object is used as input for TreeIO::newick, which Phylo::Consense apparently needs (makes sense, as the command-line version also needs this). Strange that there isn't an error thrown, but you're using an older version of bioperl, so it may be fixed (I noticed that the line the error corresponds to for my local CVS copy of bioperl doesn't match up, so changes were obviously made at some point). This could occur at pretty much any step along your pipeline. Have you tried checking the output data from each step to make sure it's working correctly? I also noticed you have differing versions of Bioperl (1.5.0) and bioperl-run (1.4). You should always try to maintain the same versions for both to make sure they are compatible with one another. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kary Ann Del Carmen Soriano Ocana > Sent: Tuesday, September 19, 2006 12:24 PM > To: bioperl-l at lists.open-bio.org > Cc: karyanna at yahoo.com > Subject: [Bioperl-l] The results of your email commands > > Dear all, > I need help with a consense module. I am using bioperl to run clustalw, > seqboot, protdist(that are goingo to change by puzzle), weighbor and > consense, but for some reason I get this message: > MESSAGE 1 > Can't call method "get_root_node" on an undefined value at > /usr/local/bioperl-1.5.0/Bio/TreeIO/newick.pm line 236. > MESSAGE 2 > ------------- EXCEPTION ------------- > MSG: Expected a Bio::TreeI object > STACK Bio::Tools::Run::Phylo::Phylip::Consense::_setinput > /usr/local/bioperl-run-1.4/Bio/Tools/Run/Phylo/Phylip/Consense.pm:448 > STACK Bio::Tools::Run::Phylo::Phylip::Consense::run > /usr/local/bioperl-run-1.4/Bio/Tools/Run/Phylo/Phylip/Consense.pm:331 > STACK main::makePhylogenyPipeline filogenia_pipeline.pl:120 > STACK main::main filogenia_pipeline.pl:61 > STACK toplevel filogenia_pipeline.pl:42 > -------------------------------------- > > I have no idea where to get started to try to solve this. Here is my > object with the principal reference. > Thank you for the help. > Regards > Kary > > ########################################################################## > ###### > > > #!/usr/bin/perl -w > use lib "/usr/local/bioperl-1.5.0"; > use lib "/usr/local/bioperl-run-1.4"; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::Tools::Run::Phylo::Phylip::SeqBoot; > use Bio::Tools::Run::Phylo::Phylip::ProtDist; > use Bio::Tools::Run::Phylo::Phylip::Neighbor; > use Bio::Tools::Run::AnalysisFactory::Pise; > use Bio::Tools::Run::Phylo::Phylip::Consense; > use Bio::Tools::Run::Phylo::Phylip::DrawTree; > use Bio::AlignIO; > use Bio::SimpleAlign; > use strict; > > ################## dir ######################### > > sub makePhylogenyPipeline{ > my $dirin_mafft = $_[0]; > my $length_weighbor = $_[1]; > my $inputfilename = ""; > > &makeInvariantAwk($length_weighbor); > > open (READDIRMOD, "find $dirin_mafft |") or die "Cannot open > $dirin_mafft: $!"; > > while ($inputfilename = ){ > for ($inputfilename =~ /\.mafft$/) { > $inputfilename =~ s/\n//; > #Create a SimpleAlign object > my @params_align = ( 'ktuple' => 2, > 'matrix' => 'BLOSUM', > 'output' => 'PHYLIP', > 'outfile' => $inputfilename.'.phy'); > > my $factory = Bio::Tools::Run::Alignment::Clustalw- > >new(@params_align); > my $aln = $factory->align($inputfilename); # > $aln is a SimpleAlign object. > > #Use seqboot to generate bootstrap alignments > my @params = ( 'datatype'=>'SEQUENCE', > # 'replicates'=>1000); > 'replicates'=>1); > my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot- > >new(@params); > my $aln_ref = $seqboot_factory->run($aln); > > #next build distance matrices > my @params_protdist = ('MODEL' => 'PAM'); > my $protdist_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist- > >new(@params_protdist); > > #next construct trees > #Build a Pise factory > my $weighbor_factory = new Bio::Tools::Run::AnalysisFactory::Pise(); > > #Then create an application object > (Pise::Run::Tools::PiseApplication): > my $weighbor = $weighbor_factory->program('weighbor'); > > my @tree; > > foreach my $a (@{$aln_ref}){ > my $matrix = $protdist_factory->create_distance_matrix($a); > push @tree, $weighbor->run('infile' => $matrix, > 'length' => 500, > 'size' => 3.85); # Size of the > alphabet (-b) > > } > > #use consense to get a final tree > my $consense_factory = Bio::Tools::Run::Phylo::Phylip::Consense- > >new(); > my ($tree) = $consense_factory->run(\@tree); > > #now draw the tree > my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); > my $image_filename = $draw_factory->draw_tree($tree); > > } > } > close (READDIRMOD); > > > > } > > > > > > > > > From cjfields at uiuc.edu Wed Sep 20 11:08:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 10:08:16 -0500 Subject: [Bioperl-l] Conversion of EMBL format flat file to GFF format flatfile In-Reply-To: Message-ID: <002501c6dcc6$99876360$15327e82@pyrimidine> It depends on what version of GFF you want. I have only seen GenBank to GFF; you could probably convert EMBL->GenBank->GFF. If so I would recommend updating your local bioperl installation from CVS to deal with recent changes to the EMBL and GenBank SeqIO modules which makes them a bit more compatible with one another. Here are some scripts, which can also be found in the scripts directory when you download the full bioperl core distribution. http://www.bioperl.org/wiki/Bioperl_scripts#Bio::DB::GFF Just a warning: I think Lincoln and Scott are both working in having better GFF3 integration with Bioperl. Hopefully he or Scott Cain will also answer this post to get you up-to-date, and maybe offer a few extra suggestions. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Krys Kelly > Sent: Wednesday, September 20, 2006 8:50 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Conversion of EMBL format flat file to GFF format > flatfile > > I am completely new to Bioperl. I would like to convert an EMBL flat file > into a GFF flat file. To make sure I could get a perl script to work I > have > used the following script to read in my EMBL file and write it out as > FASTA: > > > > my $in = Bio::SeqIO->new(-file => "Toxo1b_080605_test.embl", > > -format => 'EMBL'); > > my $out = Bio::SeqIO->new(-file => ">ChrIb_new.fasta", > > -format => 'fasta'); > > while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > > But the list of formats available > (http://bioperl.open-bio.org/wiki/HOWTO:SeqIO#Formats ) does not contain > gff. > > > > I have tried searching the documentation and the mail archives, but I have > not found anything that would help me. Are there any existing bioperl > modules for this conversion? I would be grateful for any help. > > > > Thanks > > > > Krys > > > > > > Dr Krystyna A Kelly (Krys) > > > > > Department of Pathology > > and > > MRC Biostatistics Unit > > > University of Cambridge > > > > Institute of Public Health > > > Tennis Court Road > > > > Robinson Way > > > Cambridge CB2 1QP > > > > Cambridge CB2 2SR > > > 01223 333331 > > > > 01223 767408 > > > kak28 at cam.ac.uk > > > > krystyna.kelly at mrc-bsu.cam.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Sep 20 11:09:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 11:09:04 -0400 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <451095C7.3020905@infotech.monash.edu.au> References: <451095C7.3020905@infotech.monash.edu.au> Message-ID: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > I'm not sure whether to > > 1. parse them all under the same module, perhaps with a > -format=>'glimmerXXX' parameter > > 2. create a single new module Glimmer2 and Glimmer3 > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > given > they are different outputs both in syntax and number of output files > > Any advice from Bioperl 'old timers' appreciated ;-) > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an example for how this can work. If this would amount to basically 4 modules stringed together into one file (because the parsing code can't share much if anything between the flavors), it'd still be advantageous to have a single frontend module that would then dispatch. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 20 11:10:39 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 11:10:39 -0400 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510C5BA.9080408@infotech.monash.edu.au> Message-ID: <3D1D6B67-7BFC-4C20-ABD4-1AF787E88EAE@gmx.net> I wasn't even aware that that was a 'requirement' I'll see to fix the Ontology modules ... On Sep 20, 2006, at 12:44 AM, Chris Fields wrote: > Ouch! That many? > > Chris > > On Sep 19, 2006, at 11:38 PM, Torsten Seemann wrote: > >> David, >> >>> From running the deobfuscator on bioperl-live, I've found some >>> formatting inconsistencies in the POD of some modules which are >>> hard for the deobfuscator to parse. I've corrected some of these >>> in my working (checked-out) copy of bioperl-live. >> >> Please check them in. >> >> I have written a script to audit all Perl modules to ensure the >> NAME in the POD matches the Perl module and has the correct >> capitilization. >> >> % cd bioperl-live/maintenance >> % ./check_NAMEs.pl >> >> Here's the current buggy ones, before your commits AFAIK. >> >> Bio::IdCollectionI >> Bio::Location::SplitLocationI >> Bio::Search::Result::PullResultI >> Bio::Search::HSP::HmmpfamHSP >> Bio::Search::Hit::HmmpfamHit >> Bio::Expression::FeatureGroup::FeatureGroupMas50 >> Bio::Expression::FeatureSet::FeatureSetMas50 >> Bio::DB::EUtilities::Cookie >> Bio::DB::GFF::Adaptor::berkeleydb >> Bio::DB::GFF::Adaptor::biofetch_oracle >> Bio::DB::GFF::Adaptor::dbi::pg_fts >> Bio::DB::GFF::Adaptor::memory::feature_serializer >> Bio::DB::SeqFeature::Store::bdb >> Bio::DB::SeqFeature::Store::DBI::Iterator >> Bio::Matrix::PSM::SiteMatrixI >> Bio::Matrix::PSM::ProtPsm >> Bio::Matrix::PSM::IO::psiblast >> Bio::Matrix::PSM::IO::transfac >> Bio::Matrix::PSM::IO::meme >> Bio::Graphics::Util >> Bio::Graphics::Glyph::ex >> Bio::Graphics::Glyph::three_letters >> Bio::Graphics::Glyph::arrow >> Bio::Graphics::Glyph::ruler_arrow >> Bio::Graphics::Glyph::flag >> Bio::Phenotype::Measure >> Bio::Phenotype::PhenotypeI >> Bio::Phenotype::Phenotype >> Bio::Phenotype::Correlate >> Bio::Phenotype::OMIM::MiniMIMentry >> Bio::Phenotype::OMIM::OMIMparser >> Bio::Phenotype::OMIM::OMIMentry >> Bio::Phenotype::OMIM::OMIMentryAllelicVariant >> Bio::SeqIO::qual >> Bio::SeqIO::genbank >> Bio::OntologyIO::simplehierarchy >> Bio::OntologyIO::InterProParser >> Bio::OntologyIO::soflat >> Bio::OntologyIO::dagflat >> Bio::OntologyIO::obo >> Bio::OntologyIO::goflat >> Bio::OntologyIO::Handlers::InterProHandler >> Bio::Tools::pICalculator >> Bio::Tools::ECnumber >> Bio::SeqFeature::Gene::GeneStructureI >> Bio::SeqFeature::Gene::Poly_A_site >> Bio::Seq::SeqFastaSpeedFactory >> Bio::Ontology::RelationshipI >> Bio::Ontology::TermI >> Bio::Ontology::InterProTerm >> Bio::Ontology::RelationshipType >> Bio::Ontology::OBOterm >> Bio::Ontology::OBOEngine >> Bio::Ontology::PathI >> Bio::Ontology::OntologyEngineI >> Bio::Ontology::Path >> Bio::Ontology::GOterm >> Bio::Ontology::Term >> Bio::Ontology::SimpleGOEngine >> Bio::Ontology::Relationship >> Bio::Ontology::SimpleGOEngine::GraphAdaptor >> Bio::Ontology::SimpleGOEngine::GraphAdaptor02 >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 20 11:19:54 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 11:19:54 -0400 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <45106E6C.1000605@sendu.me.uk> References: <001b01c6dc1c$87dfe470$15327e82@pyrimidine> <45105A7F.5010804@campus.iztacala.unam.mx> <45106E6C.1000605@sendu.me.uk> Message-ID: <6F6967E9-6991-47B0-BDCC-5C982B33D4FD@gmx.net> My view is that fringe dependencies (as defined by those imposed solely by fringe modules) don't have to be listed in Makefile.PL and instead may just be documented in the module itself. The reason is that I think there is a balance that needs to be struck between easing the installation of a 'running' version of Bioperl by minimizing the attention necessary for installing dependencies on one hand, and not needlessly cluttering people's machines with software that will never get used on the other hand. -hilmar On Sep 19, 2006, at 6:25 PM, Sendu Bala wrote: > I don't think either is really an appropriate 'fit'; what is wrong > with > simply not listing the Ensembl API as a dependency in Makefile.PL? > Aren't there already optional things in Bioperl that only begin to > work > after you read the instructions and manually install something? Well, > there must be, since I've had to do exactly that to get all tests > in the > suite to run (and not just skip). -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 20 11:21:45 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 11:21:45 -0400 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <4510374C.5030805@sendu.me.uk> References: <000901c6dc11$15b84370$15327e82@pyrimidine> <4510374C.5030805@sendu.me.uk> Message-ID: <6FBB4340-2ABA-49FC-A391-AAD83F42141F@gmx.net> I guess not only is it circular, but also the version required isn't identical. This smells like something generating confusion and version clashes down the road, so I'm not sure I would want an 'installer' tool to take care of this automatically. -hilmar On Sep 19, 2006, at 2:30 PM, Sendu Bala wrote: > Chris Fields wrote: >> I think any dependencies are supposed to be listed in the Makefile >> and >> DEPENDENCIES regardless of how many times they are used or if the >> method is >> optional. > > Well, the issue here is that doing that would create a circular > dependency; Ensembl Perl API requires Bioperl. Do the various ways of > installing Bioperl not 'care' about circular dependencies? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Sep 20 12:06:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 11:06:51 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <6F6967E9-6991-47B0-BDCC-5C982B33D4FD@gmx.net> Message-ID: <002901c6dcce$ca5157a0$15327e82@pyrimidine> > My view is that fringe dependencies (as defined by those imposed > solely by fringe modules) don't have to be listed in Makefile.PL and > instead may just be documented in the module itself. > > The reason is that I think there is a balance that needs to be struck > between easing the installation of a 'running' version of Bioperl by > minimizing the attention necessary for installing dependencies on one > hand, and not needlessly cluttering people's machines with software > that will never get used on the other hand. > > -hilmar The Makefile.PL doesn't require that the dependencies are installed if using 'perl Makefile.PL', but it does warn which modules won't work for each uninstalled dependency, which I think is useful. It lets the user know, up front, what works and what doesn't. I'm not sure how it is handled when installing from CPAN; does installing Bioperl crash and burn if Bundle::Bioperl isn't installed first? I think there could be a nice compromise here. If we can separate modules based on how reliant normal Bioperl functionality is based on their presence, could we set up the Makefile to test those that are absolutely required (real dependencies) vs. those that are not (optional)? If we could do that, we could test for the optional Ensembl API (and whatever else falls into this category) but not have it installed automatically, yet one would still get the warning that Foo module with bar() method wouldn't work. BTW, I googled this to see how CPAN handles circular dependencies. Apparently not very well. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign .... From n.haigh at sheffield.ac.uk Wed Sep 20 12:16:06 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed, 20 Sep 2006 17:16:06 +0100 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <000001c6d909$086cf810$15327e82@pyrimidine> References: <000001c6d909$086cf810$15327e82@pyrimidine> Message-ID: <45116946.5030805@sheffield.ac.uk> Chris Fields wrote: >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of szhan at uoguelph.ca >> Sent: Friday, September 15, 2006 2:56 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows >> XPPC using ActivePerl PPM >> >> Dear Bioperl users, >> I have downloaded ActivePerl-5.8.8.819 MSI (x86), and installed it on >> Windows XP PC successfully. I used GUI PPM to install Bioperl 1.4 by >> opening GUI PPM, choosing Bioperl1.4 then marking for install but got >> error as below: >> ERROR: Installing File-Spec-0.82 would downgrade File::Spec from >> version 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 >> and File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from >> version 1.2 to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and >> File::Spec::VMS from version 1.4 to 1.1 and File::Spec::Win32 from >> version 1.6 to 1.2 >> Why did I get the error? Could you please help me out? >> Thanks! >> >> Josh >> > > I haven't used the new PPM GUI with ActivePerl 5.8.819 yet, but it's strange > that it says those will be downgraded. Have you tried the command line PPM > to install? I know it's still available with the distribution. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > I've been looking at this a little bit today. It is weird! Firstly, I thought new ppm4 can't actually find a File::Spec module other than version 0.82 and thought it might be due to the addition of new tags to ppd files ( and ) in order to better track dependencies etc. However, I don't have File::Spec installed, or PathTools so I have no idea where it gets the version numbers from! I would install a slightly earlier version of Perl which still uses PPM3. All versions >= 5.8.8.818 will only contain PPM4, so either download 5.6.1 or download an old release of 5.8.8 from here: http://downloads.activestate.com/ActivePerl/Windows/5.8/ Sorry I couldn't be too much help. Nathan From cjfields at uiuc.edu Wed Sep 20 12:40:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 11:40:16 -0500 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <45116946.5030805@sheffield.ac.uk> Message-ID: <002b01c6dcd3$73b48b60$15327e82@pyrimidine> Nathan, I'll try upgrading to the latest ActivePerl to see what's going on. We probably need to be prepared for this issue if it pops up, as I'm sure it will. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign ... > I've been looking at this a little bit today. It is weird! Firstly, I > thought new ppm4 can't actually find a File::Spec module other than > version 0.82 and thought it might be due to the addition of new tags to > ppd files ( and ) in order to better track > dependencies etc. However, I don't have File::Spec installed, or > PathTools so I have no idea where it gets the version numbers from! > > I would install a slightly earlier version of Perl which still uses > PPM3. All versions >= 5.8.8.818 will only contain PPM4, so either > download 5.6.1 or download an old release of 5.8.8 from here: > http://downloads.activestate.com/ActivePerl/Windows/5.8/ > > Sorry I couldn't be too much help. > Nathan From hlapp at gmx.net Wed Sep 20 14:05:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 14:05:37 -0400 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <002901c6dcce$ca5157a0$15327e82@pyrimidine> References: <002901c6dcce$ca5157a0$15327e82@pyrimidine> Message-ID: <07060BB1-23C5-47C2-8B1E-9CCF0E0B695D@gmx.net> On Sep 20, 2006, at 12:06 PM, Chris Fields wrote: > The Makefile.PL doesn't require that the dependencies are installed > if using > 'perl Makefile.PL', Right, I know. However, package maintainers may use this information to decide what to include in their pre-packaged dependencies and what not to. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From arareko at campus.iztacala.unam.mx Wed Sep 20 14:24:39 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 20 Sep 2006 13:24:39 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <07060BB1-23C5-47C2-8B1E-9CCF0E0B695D@gmx.net> References: <002901c6dcce$ca5157a0$15327e82@pyrimidine> <07060BB1-23C5-47C2-8B1E-9CCF0E0B695D@gmx.net> Message-ID: <45118767.9030909@campus.iztacala.unam.mx> Hilmar Lapp wrote: > On Sep 20, 2006, at 12:06 PM, Chris Fields wrote: > >> The Makefile.PL doesn't require that the dependencies are installed >> if using >> 'perl Makefile.PL', > > Right, I know. However, package maintainers may use this information > to decide what to include in their pre-packaged dependencies and what > not to. > Yeah, that's precisely how I track dependecies for the BioPerl FreeBSD ports. Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dmessina at wustl.edu Wed Sep 20 14:51:43 2006 From: dmessina at wustl.edu (David Messina) Date: Wed, 20 Sep 2006 13:51:43 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4510C5BA.9080408@infotech.monash.edu.au> References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510C5BA.9080408@infotech.monash.edu.au> Message-ID: On Sep 19, 2006, at 11:38 PM, Torsten Seemann wrote: > Please check them in. Looks like Chris beat me to it -- thanks Chris! Remaining are the Ontology and OntologyIO, which Hilmar will be doing. > I have written a script to audit all Perl modules to ensure the > NAME in the POD matches the Perl module and has the correct > capitilization. > > % cd bioperl-live/maintenance > % ./check_NAMEs.pl Very nice, that's really helpful. -Dave From cjfields at uiuc.edu Wed Sep 20 14:55:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 13:55:11 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <45118767.9030909@campus.iztacala.unam.mx> Message-ID: <000001c6dce6$4ce32060$15327e82@pyrimidine> My point was, is there any way we can have two sets of modules: one set that is 'required', and another that is 'optional' (for additional functionality, but not installed)? And have Makefile.PL test for all of them? If we can have both, Sendu could have the Ensembl API be 'optional' (not installed by default) but checked upon installation. There are a few others 'dependencies' which are critical for only one or two modules that could also be considered optional, such as Ace, Convert::Binary::C, Spreadsheet::ParseExcel, Bio::ASN1::EntrezGene, etc. If it's too much of a pain to worry about we can just have Sendu designate the Ensembl requirement in the POD like he suggests. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, September 20, 2006 1:25 PM > To: Hilmar Lapp > Cc: Chris Fields; 'Sendu Bala'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Optional 'circular dependency' ok? > > Hilmar Lapp wrote: > > On Sep 20, 2006, at 12:06 PM, Chris Fields wrote: > > > >> The Makefile.PL doesn't require that the dependencies are installed > >> if using > >> 'perl Makefile.PL', > > > > Right, I know. However, package maintainers may use this information > > to decide what to include in their pre-packaged dependencies and what > > not to. > > > > Yeah, that's precisely how I track dependecies for the BioPerl FreeBSD > ports. > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From dmessina at wustl.edu Wed Sep 20 15:00:49 2006 From: dmessina at wustl.edu (David Messina) Date: Wed, 20 Sep 2006 14:00:49 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <3D1D6B67-7BFC-4C20-ABD4-1AF787E88EAE@gmx.net> References: <001001c6dbf4$65437260$15327e82@pyrimidine> <4510C5BA.9080408@infotech.monash.edu.au> <3D1D6B67-7BFC-4C20-ABD4-1AF787E88EAE@gmx.net> Message-ID: <47950AEC-A08E-44CF-AE33-248BF7C554E6@wustl.edu> On Sep 20, 2006, at 10:10 AM, Hilmar Lapp wrote: > I wasn't even aware that that was a 'requirement' AFAIK, there wasn't any previously agreed upon set of standards for the BioPerl docs, although most modules follow a remarkably consistent layout. With the Deobfuscator, it helps to enforce a little more uniformity for ease of parsing. And since I haven't publicized any of these issues until yesterday, it's hardly your fault that you weren't aware. :) It may not happen before 1.5.2, but at some point I intend to write up a little guide which describes the de facto standard that BioPerl modules follow (maybe Mauricio has already thought about this?). If there is agreement on these as a small set of guidelines, we can add it to the developer instructions so that people writing new modules will be aware. -Dave From hlapp at gmx.net Wed Sep 20 15:11:29 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 20 Sep 2006 15:11:29 -0400 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <000001c6dce6$4ce32060$15327e82@pyrimidine> References: <000001c6dce6$4ce32060$15327e82@pyrimidine> Message-ID: <7FF9797A-276D-4B3C-87D7-273F9A9764BF@gmx.net> On Sep 20, 2006, at 2:55 PM, Chris Fields wrote: > My point was, is there any way we can have two sets of modules: one > set that > is 'required', and another that is 'optional' (for additional > functionality, > but not installed)? And have Makefile.PL test for all of them? You can suggest one :-) The default should be though not to test for the 'optional' ones (since most people are never going to need them, and most package maintainers may not want to include them). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Sep 20 15:22:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 14:22:17 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <47950AEC-A08E-44CF-AE33-248BF7C554E6@wustl.edu> Message-ID: <000201c6dcea$15e699d0$15327e82@pyrimidine> Dave, There is a small bit about what to have in POD in biodesign.pod, which is no longer in the core. I think most of it can be found in the wiki under 'Advanced BioPerl'. http://www.bioperl.org/wiki/Advanced_BioPerl There is a link to the older version of it: http://bioperl.org/Core/Latest/biodesign.html There is also an Emacs template, which I haven't toyed around with. I use ActiveState Komodo, which runs on Mac OS X and WinXP. Yes, I am pathetic... Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: David Messina [mailto:dmessina at wustl.edu] > Sent: Wednesday, September 20, 2006 2:01 PM > To: Hilmar Lapp > Cc: Chris Fields; Sendu Bala; bioperl-l; Torsten Seemann > Subject: Re: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and > undef/die > > On Sep 20, 2006, at 10:10 AM, Hilmar Lapp wrote: > > > I wasn't even aware that that was a 'requirement' > > AFAIK, there wasn't any previously agreed upon set of standards for > the BioPerl docs, although most modules follow a remarkably > consistent layout. > > With the Deobfuscator, it helps to enforce a little more uniformity > for ease of parsing. And since I haven't publicized any of these > issues until yesterday, it's hardly your fault that you weren't > aware. :) > > It may not happen before 1.5.2, but at some point I intend to write > up a little guide which describes the de facto standard that BioPerl > modules follow (maybe Mauricio has already thought about this?). If > there is agreement on these as a small set of guidelines, we can add > it to the developer instructions so that people writing new modules > will be aware. > > -Dave From johnsonm at gmail.com Wed Sep 20 15:31:28 2006 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 20 Sep 2006 14:31:28 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> Message-ID: On 9/19/06, Torsten Seemann wrote: > Glimmer3 produces two separate files: XXX.detail and XXX.predict. > The Glimmer3 .detail file is similar to the first part of the Glimmer > 2.x first part. The Glimmer3 .predict file conveys the same information > as the second part of a Glimmer2 file, but in a totally different format! Also, Glimmer2 only analyzes the first sequence in the input fasta file, whereas Glimmer3 will happily take many sequences, and even tell you which predictions go with what sequence in the output, with markers that looks like fasta headers: > sequence_name ... Apart from the technical issues of parsing the different formats, there is an up-front design question? Should both the open reading frames and final gene predictions be processed into features, or just the final predictions? If the former, that is going to make a single abstract interface tricky, as Glimmer3 has two input files that need to be parsed, where the rest would have one. Though, if only the predictions get parsed, well, that solves that problem. 8) ... ... ... > It doesn't currently work with my GlimmerHMM output, as the module > expects a version number, which my output does not have - but I will fix > that in CVS today. Sadly, Glimmer3 seems to output no version info to either the .predict or .detail file. From johnsonm at gmail.com Wed Sep 20 15:47:54 2006 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 20 Sep 2006 14:47:54 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: I think it's going to be at least two modules, one for the prokaryotic stuff and one for the eukaryotic. And really, the prokaryotic stuff is different enough to warrant two modules. So three different parsers. Could do it in one, but it would be ugly and nasty. However, this does not preclude three parsers and one abstract interface, which is your excellent suggestion. Oh, and excuse me, but I have a bit of a rant here, after dealing with parsers and pipelines for the last few months. Parsers should not load the whole input file into RAM to parse it. And Pipelines using the parsers (Ensembl / biopipe) should not stuff the whole result set from the parser into a single array. When you're trying to annotate assemblies, it sucks to have to split up contigs/supercontigs because the whole result set won't fit into RAM on a 12 gig blade. Sheesh. Though this doesn't matter for bacterial genomes, as they're tiny (by comparison to vertebrates). There, sorry, been saving up that frustration for a while. No offense meant, hope I didn't tick anybody off. 8) Torsten: You sound like you know what you're doing with respect to Bioperl more than I do, and I know I don't have CVS access, so I'll defer to you. I'd be happy to help out, though. On 9/20/06, Hilmar Lapp wrote: > > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > > > I'm not sure whether to > > > > 1. parse them all under the same module, perhaps with a > > -format=>'glimmerXXX' parameter > > > > 2. create a single new module Glimmer2 and Glimmer3 > > > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > > given > > they are different outputs both in syntax and number of output files > > > > Any advice from Bioperl 'old timers' appreciated ;-) > > > > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an > example for how this can work. > > If this would amount to basically 4 modules stringed together into > one file (because the parsing code can't share much if anything > between the flavors), it'd still be advantageous to have a single > frontend module that would then dispatch. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From cjfields at uiuc.edu Wed Sep 20 15:51:42 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 14:51:42 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <7FF9797A-276D-4B3C-87D7-273F9A9764BF@gmx.net> Message-ID: <000001c6dcee$327407f0$15327e82@pyrimidine> Sendu, Would this be pertinent for the 1.5.2 release? If not, it would give us a little time to hash out what to do here, as opposed to rushing to get this done before the 25th. Otherwise we probably won't worry about adding it to Makefile.PL but list it as an optional dependency, maybe in the INSTALL docs. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Wednesday, September 20, 2006 2:11 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'Sendu Bala'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Optional 'circular dependency' ok? > > > On Sep 20, 2006, at 2:55 PM, Chris Fields wrote: > > > My point was, is there any way we can have two sets of modules: one > > set that > > is 'required', and another that is 'optional' (for additional > > functionality, > > but not installed)? And have Makefile.PL test for all of them? > > You can suggest one :-) > > The default should be though not to test for the 'optional' ones > (since most people are never going to need them, and most package > maintainers may not want to include them). > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed Sep 20 16:15:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 20 Sep 2006 15:15:27 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <000201c6dcea$15e699d0$15327e82@pyrimidine> References: <000201c6dcea$15e699d0$15327e82@pyrimidine> Message-ID: <4511A15F.8020504@campus.iztacala.unam.mx> Chris Fields wrote: > Dave, > > There is a small bit about what to have in POD in biodesign.pod, which is no > longer in the core. I think most of it can be found in the wiki under > 'Advanced BioPerl'. > > http://www.bioperl.org/wiki/Advanced_BioPerl > > There is a link to the older version of it: > > http://bioperl.org/Core/Latest/biodesign.html Supposedly, this is all what people may need to get started by using a consistent design. > There is also an Emacs template, which I haven't toyed around with. I use > ActiveState Komodo, which runs on Mac OS X and WinXP. Yes, I am pathetic... Komodo is a great IDE (I use it daily) and I'm aware that there's some way to create a template for Bioperl such as the Emacs one (maybe I'll put hands on this someday). It also runs on Linux (I haven't tried to use it in FreeBSD yet) so it could be a nice alternative for many people. BTW: Vi rulez! ;) Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From bix at sendu.me.uk Wed Sep 20 16:50:00 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Sep 2006 21:50:00 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <000001c6dcee$327407f0$15327e82@pyrimidine> References: <000001c6dcee$327407f0$15327e82@pyrimidine> Message-ID: <4511A978.6010608@sendu.me.uk> Chris Fields wrote: > Sendu, > > Would this be pertinent for the 1.5.2 release? No, the thing I need Ensembl for won't make the 1.5.2 release, so there's no rush to do sort this out. From bix at sendu.me.uk Wed Sep 20 16:58:30 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Sep 2006 21:58:30 +0100 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4511A15F.8020504@campus.iztacala.unam.mx> References: <000201c6dcea$15e699d0$15327e82@pyrimidine> <4511A15F.8020504@campus.iztacala.unam.mx> Message-ID: <4511AB76.8080707@sendu.me.uk> Mauricio Herrera Cuadra wrote: > Chris Fields wrote: >> Dave, >> There is a small bit about what to have in POD in biodesign.pod, which >> is no longer in the core. I think most of it can be found in the wiki under >> 'Advanced BioPerl'. >> http://www.bioperl.org/wiki/Advanced_BioPerl >> >> There is a link to the older version of it: >> >> http://bioperl.org/Core/Latest/biodesign.html > > Supposedly, this is all what people may need to get started by using a > consistent design. There's also the Bioperl-independent need simply to write valid POD, and preferably POD that can be understood by all/many parsers. I used maintenance/pod.pl earlier today which shows you all the POD problems. The remaining warnings are 'ok'. >> There is also an Emacs template, which I haven't toyed around with. I >> use ActiveState Komodo, which runs on Mac OS X and WinXP. Yes, I am >> pathetic... > > Komodo is a great IDE (I use it daily) Agree, and likewise. From cjfields at uiuc.edu Wed Sep 20 17:16:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Sep 2006 16:16:13 -0500 Subject: [Bioperl-l] bioperl-live: audit of FIXME/???/TODO/etc and undef/die In-Reply-To: <4511AB76.8080707@sendu.me.uk> References: <000201c6dcea$15e699d0$15327e82@pyrimidine> <4511A15F.8020504@campus.iztacala.unam.mx> <4511AB76.8080707@sendu.me.uk> Message-ID: <6F87DE2B-0502-4F0C-B8E2-DA67AC348076@uiuc.edu> If we need to, we can upload a Komodo Bioperl template. I have a very old one that I passed on to Mauricio. We could clean it up and add it to the core distribution or elsewhere if needed. Chris On Sep 20, 2006, at 3:58 PM, Sendu Bala wrote: > Mauricio Herrera Cuadra wrote: >> Chris Fields wrote: >>> Dave, >>> There is a small bit about what to have in POD in biodesign.pod, >>> which >>> is no longer in the core. I think most of it can be found in the >>> wiki under >>> 'Advanced BioPerl'. >>> http://www.bioperl.org/wiki/Advanced_BioPerl >>> >>> There is a link to the older version of it: >>> >>> http://bioperl.org/Core/Latest/biodesign.html >> >> Supposedly, this is all what people may need to get started by >> using a >> consistent design. > > There's also the Bioperl-independent need simply to write valid > POD, and > preferably POD that can be understood by all/many parsers. I used > maintenance/pod.pl earlier today which shows you all the POD problems. > The remaining warnings are 'ok'. > > >>> There is also an Emacs template, which I haven't toyed around >>> with. I >>> use ActiveState Komodo, which runs on Mac OS X and WinXP. Yes, I am >>> pathetic... >> >> Komodo is a great IDE (I use it daily) > > Agree, and likewise. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Sep 21 03:20:12 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 21 Sep 2006 08:20:12 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <000001c6dce6$4ce32060$15327e82@pyrimidine> References: <000001c6dce6$4ce32060$15327e82@pyrimidine> Message-ID: <45123D2C.4040901@sheffield.ac.uk> Chris Fields wrote: > My point was, is there any way we can have two sets of modules: one set that > is 'required', and another that is 'optional' (for additional functionality, > but not installed)? And have Makefile.PL test for all of them? > > Something like this would definitely be my preference. It makes sense to document any/all dependencies in one place but to make a distinction between 2 or 3 "types" of dependencies. 1) core dependencies which are absolutely required for the functioning of the main modules in Bioperl; 2) Optional dependencies which enhance the basic installation and 3) The dependencies which are only present in the odd module. It may be more straightforward to deal with 2 and 3 as one set of optional dependencies. I know very little about Makefiles, but from my experience of installation of dependencies, it would be excellent if during the "perl Makefile.PL" that a summary table was produced at the end, either instead of, or in addition to the current verbose list of dependencies that are not met. This could make it much clearer as to which type of dependencies are missing and whether they need to be installed. A table such as this comes to mind: Module Dependency Status IO::String Required Installed File::Temp Required GD Optional Convert::Binary::C Optional If this information was coded into the Makefile.PL, much like the verbose description of the requirements currently is, you could just test that all "required" modules are installed before allowing the installation to proceed. All "optional" modules could also be added to Bundle::BioPerl for easy installation. This would also help with making a ppd by having all this info in one place. Like I said, I don't know much about Makefiles, but that's my thoughts anyway. Nathan From n.haigh at sheffield.ac.uk Thu Sep 21 03:44:19 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 21 Sep 2006 08:44:19 +0100 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <000001c6dce6$4ce32060$15327e82@pyrimidine> References: <000001c6dce6$4ce32060$15327e82@pyrimidine> Message-ID: <451242D3.50102@sheffield.ac.uk> Chris Fields wrote: > My point was, is there any way we can have two sets of modules: one set that > is 'required', and another that is 'optional' (for additional functionality, > but not installed)? And have Makefile.PL test for all of them? > I've just seen Module::Install that may well do this type of checking of dependencies. See http://search.cpan.org/~audreyt/Module-Install-0.64/lib/Module/Install.pod#SYNOPSIS Might be worth a look? Nathan > If we can have both, Sendu could have the Ensembl API be 'optional' (not > installed by default) but checked upon installation. There are a few others > 'dependencies' which are critical for only one or two modules that could > also be considered optional, such as Ace, Convert::Binary::C, > Spreadsheet::ParseExcel, Bio::ASN1::EntrezGene, etc. > > If it's too much of a pain to worry about we can just have Sendu designate > the Ensembl requirement in the POD like he suggests. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > From cjfields at uiuc.edu Thu Sep 21 09:39:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 08:39:24 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <45123D2C.4040901@sheffield.ac.uk> Message-ID: <000301c6dd83$5e2057b0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nathan S. Haigh > Sent: Thursday, September 21, 2006 2:20 AM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'Hilmar Lapp'; bioperl-l at bioperl.org; > 'Sendu Bala' > Subject: Re: [Bioperl-l] Optional 'circular dependency' ok? > > Chris Fields wrote: > > My point was, is there any way we can have two sets of modules: one set > that > > is 'required', and another that is 'optional' (for additional > functionality, > > but not installed)? And have Makefile.PL test for all of them? > > > > > Something like this would definitely be my preference. It makes sense to > document any/all dependencies in one place but to make a distinction > between 2 or 3 "types" of dependencies. 1) core dependencies which are > absolutely required for the functioning of the main modules in Bioperl; > 2) Optional dependencies which enhance the basic installation and 3) The > dependencies which are only present in the odd module. It may be more > straightforward to deal with 2 and 3 as one set of optional dependencies. > > I know very little about Makefiles, but from my experience of > installation of dependencies, it would be excellent if during the "perl > Makefile.PL" that a summary table was produced at the end, either > instead of, or in addition to the current verbose list of dependencies > that are not met. This could make it much clearer as to which type of > dependencies are missing and whether they need to be installed. A table > such as this comes to mind: > > Module Dependency Status > > IO::String Required Installed > File::Temp Required > GD Optional > Convert::Binary::C Optional > > If this information was coded into the Makefile.PL, much like the > verbose description of the requirements currently is, you could just > test that all "required" modules are installed before allowing the > installation to proceed. All "optional" modules could also be added to > Bundle::BioPerl for easy installation. This would also help with making > a ppd by having all this info in one place. > > Like I said, I don't know much about Makefiles, but that's my thoughts > anyway. > Nathan We're in a bit of a tricky situation here. What we probably should do is look at CPAN to see what other large packages do (like LWP or similar), whether they allow differentiation between 'optional' vs. 'required.' CPANPLUS supposedly allows some circular dependency (it checks for them before installing, then breaks them if they occur) but they aren't commonly used. Since we don't have a newer version available on CPAN (the last one is v 1.4 I think), it's not a pressing issue, but definitely one to think about as we approach v 1.6. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Sep 21 09:45:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 08:45:41 -0500 Subject: [Bioperl-l] Optional 'circular dependency' ok? In-Reply-To: <451242D3.50102@sheffield.ac.uk> Message-ID: <000401c6dd84$3afae2e0$15327e82@pyrimidine> Ah, but would we then have a dependency for Module::Install? ;> Kidding, kidding. We should work out viable options. If we could lower the number of 'required' modules w/o wreaking havoc on Makefile.PL, yet allow testing for 'optional' modules, that would be great. This is probably something to think about long term (before v 1.6) vs short term (v. 1.5.2). As Hilmar stated, though, we may need to have testing for the 'optional' modules be optional itself. Maybe have a query for optional module tests like we do for Bio::DB::GFF tests? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nathan S. Haigh > Sent: Thursday, September 21, 2006 2:44 AM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'Hilmar Lapp'; bioperl-l at bioperl.org; > 'Sendu Bala' > Subject: Re: [Bioperl-l] Optional 'circular dependency' ok? > > Chris Fields wrote: > > My point was, is there any way we can have two sets of modules: one set > that > > is 'required', and another that is 'optional' (for additional > functionality, > > but not installed)? And have Makefile.PL test for all of them? > > > I've just seen Module::Install that may well do this type of checking of > dependencies. See > http://search.cpan.org/~audreyt/Module-Install- > 0.64/lib/Module/Install.pod#SYNOPSIS > > Might be worth a look? > Nathan > > If we can have both, Sendu could have the Ensembl API be 'optional' (not > > installed by default) but checked upon installation. There are a few > others > > 'dependencies' which are critical for only one or two modules that could > > also be considered optional, such as Ace, Convert::Binary::C, > > Spreadsheet::ParseExcel, Bio::ASN1::EntrezGene, etc. > > > > If it's too much of a pain to worry about we can just have Sendu > designate > > the Ensembl requirement in the POD like he suggests. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From szhan at uoguelph.ca Thu Sep 21 09:15:07 2006 From: szhan at uoguelph.ca (szhan at uoguelph.ca) Date: Thu, 21 Sep 2006 09:15:07 -0400 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <002b01c6dcd3$73b48b60$15327e82@pyrimidine> References: <002b01c6dcd3$73b48b60$15327e82@pyrimidine> Message-ID: <20060921091507.i9s8x192wwswoso8@webmail.uoguelph.ca> Hello, Chris and Nathan, Thank you very much for your help! I downloaded an old release of 5.8.816 and installed successfully. Joshua Quoting Chris Fields : > Nathan, > > I'll try upgrading to the latest ActivePerl to see what's going on. We > probably need to be prepared for this issue if it pops up, as I'm sure it > will. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > ... >> I've been looking at this a little bit today. It is weird! Firstly, I >> thought new ppm4 can't actually find a File::Spec module other than >> version 0.82 and thought it might be due to the addition of new tags to >> ppd files ( and ) in order to better track >> dependencies etc. However, I don't have File::Spec installed, or >> PathTools so I have no idea where it gets the version numbers from! >> >> I would install a slightly earlier version of Perl which still uses >> PPM3. All versions >= 5.8.8.818 will only contain PPM4, so either >> download 5.6.1 or download an old release of 5.8.8 from here: >> http://downloads.activestate.com/ActivePerl/Windows/5.8/ >> >> Sorry I couldn't be too much help. >> Nathan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Sep 21 16:43:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 15:43:27 -0500 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <30D63782-E5EC-494C-A42E-3D1AC29043D8@gmx.net> Message-ID: <000001c6ddbe$97fe0690$15327e82@pyrimidine> Hilmar, Here's a question which I can't quite find the answer for. The current behavior of Bio::Location::Split is to propagate strand information (using $loc->strand()) for a Split location object to the various sublocations it contains. In other words, it isn't just a get/set, but has a direct effect on the sublocation objects and assumes that all sublocations have the same strand as the Split location container object. Would you know of any rationale for this? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, September 18, 2006 5:27 PM > To: Chris Fields > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Location::Split question > > On Sep 18, 2006, at 5:55 PM, Chris Fields wrote: > > > However, if I take the two examples above, run them through > > FTLocationFactory, then use to_FTstring() to get the feature > > string, this is > > what I get: > > > > complement(join(2691..4571,4918..5163)) > > > > complement(join(4918..5163,2691..4571)) > > So this looks like a bug, right? The correct result would be if both > yielded the same strings, or syntactically equivalent strings. The > two above are neither identical nor syntactically equivalent. > > Another test is if you set a feature location from either string and > then request the sub-sequence, the resulting sequence should be > identical given syntactically equivalent location specifications. > > Do you want to file (and possibly address?) this? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Sep 21 19:16:04 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 22 Sep 2006 09:16:04 +1000 Subject: [Bioperl-l] Questions about doc.bioperl.org PDOC Message-ID: <45131D34.40604@infotech.monash.edu.au> Some questions regarding online docs: How often is the online PDOC updated? ie. http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ How is it generated? ie. are all the scripts/CSS/templates in CVS somewhere? Is Raphael Leplae the correct person to send suggestions to? ie. raphael at scmbb.ulb.ac.be Thank you for any help, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Thu Sep 21 19:51:34 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 22 Sep 2006 09:51:34 +1000 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live Message-ID: <45132586.6030103@infotech.monash.edu.au> Hello all, While auditing bioperl-live recently I noticed a module Bio/Search/Processor.pm (Aaron Mackey) which is unreferenced anywhere else, and it looks like a deployer module, but there is no directory Bio/Search/Processor/ The last real CVS entry was date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: +10 -11 "likely meaningless changes as we will probably chuck these modules" What's the correct procedure to deal with this? Should it be deleted from CVS? Or schedule for removal in the 1.6 branch? If there are any more modules to be retired etc, should we have a Wiki page about it? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Thu Sep 21 20:12:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 19:12:40 -0500 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: <45132586.6030103@infotech.monash.edu.au> Message-ID: <000001c6dddb$d193ab90$15327e82@pyrimidine> I have a place on the wiki where I have been tracking deprecated and potentially deprecated modules: http://www.bioperl.org/wiki/Deprecated_modules I would say add anything you think may apply there. Or if you think it's worth saving, add it to the project priority list, maybe? A few more Jason pointed out which we could deprecate (which I haven't added yet) were Bio::Symbol modules, thought they may still be oof some use to someone out there willing to take care of them. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Thursday, September 21, 2006 6:52 PM > To: 'bioperl-l' > Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live > > Hello all, > > While auditing bioperl-live recently I noticed a module > Bio/Search/Processor.pm (Aaron Mackey) > which is unreferenced anywhere else, and it looks like a deployer > module, but there is no directory > Bio/Search/Processor/ > > The last real CVS entry was > date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: +10 -11 > "likely meaningless changes as we will probably chuck these modules" > > What's the correct procedure to deal with this? > Should it be deleted from CVS? > Or schedule for removal in the 1.6 branch? > > If there are any more modules to be retired etc, should we have a Wiki > page about it? > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Thu Sep 21 21:52:37 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 21 Sep 2006 20:52:37 -0500 Subject: [Bioperl-l] Questions about doc.bioperl.org PDOC In-Reply-To: <45131D34.40604@infotech.monash.edu.au> References: <45131D34.40604@infotech.monash.edu.au> Message-ID: <451341E5.2040104@campus.iztacala.unam.mx> Torsten Seemann wrote: > Some questions regarding online docs: > > How often is the online PDOC updated? > ie. http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ A cron job updates it every night. > How is it generated? > ie. are all the scripts/CSS/templates in CVS somewhere? We have a local installation of pdoc, no code in our CVS. You can check its SourceForge page: http://sourceforge.net/projects/pdoc > Is Raphael Leplae the correct person to send suggestions to? > ie. raphael at scmbb.ulb.ac.be Yes. > Thank you for any help, > Sure, no problem :) Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From osborne1 at optonline.net Thu Sep 21 22:20:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 21 Sep 2006 22:20:37 -0400 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: <45132586.6030103@infotech.monash.edu.au> Message-ID: Torsten, Without talking to the author one can't distinguish a module that may be a true orphan, unsupported and unreferenced, from a module that the author may want to build or elaborate on. Talk to the author, and if he or she doesn't respond then proceed as if it has no value and mark it as DEPRECATED, scheduled to be deleted in some future release. Brian O. On 9/21/06 7:51 PM, "Torsten Seemann" wrote: > Hello all, > > While auditing bioperl-live recently I noticed a module > Bio/Search/Processor.pm (Aaron Mackey) > which is unreferenced anywhere else, and it looks like a deployer > module, but there is no directory > Bio/Search/Processor/ > > The last real CVS entry was > date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: +10 -11 > "likely meaningless changes as we will probably chuck these modules" > > What's the correct procedure to deal with this? > Should it be deleted from CVS? > Or schedule for removal in the 1.6 branch? > > If there are any more modules to be retired etc, should we have a Wiki > page about it? From cjfields at uiuc.edu Thu Sep 21 23:02:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 22:02:54 -0500 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: References: Message-ID: <53E22B74-5054-45A5-8DE9-84A94EF33256@uiuc.edu> Speaking of, did we ever come to a determination on Bio::Graph? I think we were waiting for Nat... Chris On Sep 21, 2006, at 9:20 PM, Brian Osborne wrote: > Torsten, > > Without talking to the author one can't distinguish a module that > may be a > true orphan, unsupported and unreferenced, from a module that the > author may > want to build or elaborate on. > > Talk to the author, and if he or she doesn't respond then proceed > as if it > has no value and mark it as DEPRECATED, scheduled to be deleted in > some > future release. > > Brian O. > > > On 9/21/06 7:51 PM, "Torsten Seemann" > wrote: > >> Hello all, >> >> While auditing bioperl-live recently I noticed a module >> Bio/Search/Processor.pm (Aaron Mackey) >> which is unreferenced anywhere else, and it looks like a deployer >> module, but there is no directory >> Bio/Search/Processor/ >> >> The last real CVS entry was >> date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: >> +10 -11 >> "likely meaningless changes as we will probably chuck these modules" >> >> What's the correct procedure to deal with this? >> Should it be deleted from CVS? >> Or schedule for removal in the 1.6 branch? >> >> If there are any more modules to be retired etc, should we have a >> Wiki >> page about it? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Sep 21 23:18:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 21 Sep 2006 23:18:44 -0400 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: <53E22B74-5054-45A5-8DE9-84A94EF33256@uiuc.edu> Message-ID: Chris, Yes, good question. Removing these modules is fine with Richard Adams, but I've been unable to contact Nat, the address 'natg at shore.net' just bounces when I try it. Does Nat subscribe here? Or is there a better address? Brian O. On 9/21/06 11:02 PM, "Chris Fields" wrote: > Speaking of, did we ever come to a determination on Bio::Graph? I > think we were waiting for Nat... > > Chris > > On Sep 21, 2006, at 9:20 PM, Brian Osborne wrote: > >> Torsten, >> >> Without talking to the author one can't distinguish a module that >> may be a >> true orphan, unsupported and unreferenced, from a module that the >> author may >> want to build or elaborate on. >> >> Talk to the author, and if he or she doesn't respond then proceed >> as if it >> has no value and mark it as DEPRECATED, scheduled to be deleted in >> some >> future release. >> >> Brian O. >> >> >> On 9/21/06 7:51 PM, "Torsten Seemann" >> wrote: >> >>> Hello all, >>> >>> While auditing bioperl-live recently I noticed a module >>> Bio/Search/Processor.pm (Aaron Mackey) >>> which is unreferenced anywhere else, and it looks like a deployer >>> module, but there is no directory >>> Bio/Search/Processor/ >>> >>> The last real CVS entry was >>> date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: >>> +10 -11 >>> "likely meaningless changes as we will probably chuck these modules" >>> >>> What's the correct procedure to deal with this? >>> Should it be deleted from CVS? >>> Or schedule for removal in the 1.6 branch? >>> >>> If there are any more modules to be retired etc, should we have a >>> Wiki >>> page about it? >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From hlapp at gmx.net Thu Sep 21 23:29:39 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 21 Sep 2006 23:29:39 -0400 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: <000001c6ddbe$97fe0690$15327e82@pyrimidine> References: <000001c6ddbe$97fe0690$15327e82@pyrimidine> Message-ID: The idea is that those sub-locations are 'owned' by the container location unless they are remote. The motivation for Location::Split was not to have an arbitrary container which could have first-class locations added to it, but rather its purpose is to represent the location of a dis-contiguous feature transparently in a way that's compatible with Bio::LocationI. So if you call $loc->strand(-$loc->strand) and $loc happens to be a split location but doesn't propagate the location change to the sub- locations, you have a situation which is ambiguous and inconsistent. Obviously, if you assume that the sub-locations are first-class locations and you permit those to zig-zag between strands then propagating a new strand value would clearly lead to an incorrect result (namely the same strand for all sublocs when they did not have the same strand before). You could change that to only propagate the direction of the change, not the new value itself. -hilmar On Sep 21, 2006, at 4:43 PM, Chris Fields wrote: > Hilmar, > > Here's a question which I can't quite find the answer for. The > current > behavior of Bio::Location::Split is to propagate strand information > (using > $loc->strand()) for a Split location object to the various > sublocations it > contains. In other words, it isn't just a get/set, but has a > direct effect > on the sublocation objects and assumes that all sublocations have > the same > strand as the Split location container object. Would you know of any > rationale for this? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, September 18, 2006 5:27 PM >> To: Chris Fields >> Cc: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::Location::Split question >> >> On Sep 18, 2006, at 5:55 PM, Chris Fields wrote: >> >>> However, if I take the two examples above, run them through >>> FTLocationFactory, then use to_FTstring() to get the feature >>> string, this is >>> what I get: >>> >>> complement(join(2691..4571,4918..5163)) >>> >>> complement(join(4918..5163,2691..4571)) >> >> So this looks like a bug, right? The correct result would be if both >> yielded the same strings, or syntactically equivalent strings. The >> two above are neither identical nor syntactically equivalent. >> >> Another test is if you set a feature location from either string and >> then request the sub-sequence, the resulting sequence should be >> identical given syntactically equivalent location specifications. >> >> Do you want to file (and possibly address?) this? >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From torsten.seemann at infotech.monash.edu.au Thu Sep 21 23:34:37 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 22 Sep 2006 13:34:37 +1000 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: References: Message-ID: <451359CD.1010509@infotech.monash.edu.au> > Yes, good question. Removing these modules is fine with Richard Adams, but > I've been unable to contact Nat, the address 'natg at shore.net' just bounces > when I try it. > > Does Nat subscribe here? Or is there a better address? I couldn't find any emails in all bioperl-l that look like they come from him. His home page http://home.comcast.net/~natgoodman/ is still "up" but the HTTP header says: Last-Modified: Tue, 15 Feb 2005 17:31:24 GMT http://www.genome-technology.com/view-itguy.htm suggests goodman at genomeweb.com as an alternative, but it looks like it's from 2004. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Thu Sep 21 23:45:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Sep 2006 22:45:56 -0500 Subject: [Bioperl-l] Unused or incomplete modules in bioperl-live In-Reply-To: <451359CD.1010509@infotech.monash.edu.au> References: <451359CD.1010509@infotech.monash.edu.au> Message-ID: He also has a page on the wiki: http://www.bioperl.org/wiki/Nat_Goodman Chris On Sep 21, 2006, at 10:34 PM, Torsten Seemann wrote: >> Yes, good question. Removing these modules is fine with Richard >> Adams, but >> I've been unable to contact Nat, the address 'natg at shore.net' just >> bounces >> when I try it. >> Does Nat subscribe here? Or is there a better address? > > I couldn't find any emails in all bioperl-l that look like they > come from him. > > His home page http://home.comcast.net/~natgoodman/ is still "up" > but the HTTP header says: Last-Modified: Tue, 15 Feb 2005 17:31:24 GMT > > http://www.genome-technology.com/view-itguy.htm suggests > goodman at genomeweb.com as an alternative, but it looks like it's > from 2004. > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Sep 22 01:27:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 22 Sep 2006 00:27:00 -0500 Subject: [Bioperl-l] Bio::Location::Split question In-Reply-To: References: <000001c6ddbe$97fe0690$15327e82@pyrimidine> Message-ID: On Sep 21, 2006, at 10:29 PM, Hilmar Lapp wrote: > The idea is that those sub-locations are 'owned' by the container > location unless they are remote. > > The motivation for Location::Split was not to have an arbitrary > container which could have first-class locations added to it, but > rather its purpose is to represent the location of a dis-contiguous > feature transparently in a way that's compatible with Bio::LocationI. > > So if you call $loc->strand(-$loc->strand) and $loc happens to be a > split location but doesn't propagate the location change to the sub- > locations, you have a situation which is ambiguous and inconsistent. > > Obviously, if you assume that the sub-locations are first-class > locations and you permit those to zig-zag between strands then > propagating a new strand value would clearly lead to an incorrect > result (namely the same strand for all sublocs when they did not have > the same strand before). > > You could change that to only propagate the direction of the change, > not the new value itself. > > -hilmar There are two relatively serious problems that I have found, which are outlined in bugzilla (and which we have talked about): http://bugzilla.open-bio.org/show_bug.cgi?id=2101 Data::Dumper output shows that the objects are different in order and strand. I'm running more tests when I have time to check out the subsequences but they seem out-of-order, so using each_sub_Location will likely get the subsequences out of order as well. I'll see what I can work out. I know that, if we treat remote locations similarly to regular locations in strand(), then one bug is fixed. The reason I ask about the use of strand() is I found it much easier to fix some of these bugs when assuming the split location has a strand, if using it as nothing more than a point of reference for the sublocations (i.e. the strand doesn't mean anything except internally). I noticed that is done somewhat, but only with complemented strands. Chris > On Sep 21, 2006, at 4:43 PM, Chris Fields wrote: > >> Hilmar, >> >> Here's a question which I can't quite find the answer for. The >> current >> behavior of Bio::Location::Split is to propagate strand information >> (using >> $loc->strand()) for a Split location object to the various >> sublocations it >> contains. In other words, it isn't just a get/set, but has a >> direct effect >> on the sublocation objects and assumes that all sublocations have >> the same >> strand as the Split location container object. Would you know of any >> rationale for this? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>> Sent: Monday, September 18, 2006 5:27 PM >>> To: Chris Fields >>> Cc: Bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::Location::Split question >>> >>> On Sep 18, 2006, at 5:55 PM, Chris Fields wrote: >>> >>>> However, if I take the two examples above, run them through >>>> FTLocationFactory, then use to_FTstring() to get the feature >>>> string, this is >>>> what I get: >>>> >>>> complement(join(2691..4571,4918..5163)) >>>> >>>> complement(join(4918..5163,2691..4571)) >>> >>> So this looks like a bug, right? The correct result would be if both >>> yielded the same strings, or syntactically equivalent strings. The >>> two above are neither identical nor syntactically equivalent. >>> >>> Another test is if you set a feature location from either string and >>> then request the sub-sequence, the resulting sequence should be >>> identical given syntactically equivalent location specifications. >>> >>> Do you want to file (and possibly address?) this? >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Sep 22 02:41:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 22 Sep 2006 07:41:57 +0100 Subject: [Bioperl-l] Questions about doc.bioperl.org PDOC In-Reply-To: <451341E5.2040104@campus.iztacala.unam.mx> References: <45131D34.40604@infotech.monash.edu.au> <451341E5.2040104@campus.iztacala.unam.mx> Message-ID: <451385B5.6030408@sendu.me.uk> Mauricio Herrera Cuadra wrote: > Torsten Seemann wrote: >> Some questions regarding online docs: >> >> How often is the online PDOC updated? >> ie. http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ > > A cron job updates it every night. How do you get new modules to appear on it? For instance, Bio::PullParserI doesn't seem to be there. From n.haigh at sheffield.ac.uk Fri Sep 22 06:06:16 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 22 Sep 2006 11:06:16 +0100 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <002b01c6dcd3$73b48b60$15327e82@pyrimidine> References: <002b01c6dcd3$73b48b60$15327e82@pyrimidine> Message-ID: <4513B598.9030803@sheffield.ac.uk> Did you get round to trying this? Did you find anything? I'm still working on getting Bioperl 1.5.2 installed on ActivePerl 5.6.1 and as many dependencies as I can in order to run the test suit - however, I'm have a lot of headaches! I'm about to work through it a bit more systematically, and note the problems I'm having. One that comes to mind straight away is: To run the tests, I need nmake. However, nmake fails if commands break the command line length limit - which is shorter on Windows than other OS's. This was solved by the authors of ExtUtils::MakeMaker when I informed them, some time ago. So, to run the tests, I need ExtUtils::MakeMaker >=6.06 installed. However, I'm having problems with installing this! I've informed the authors, but they resorted to manually copying over the files and suggested using ActivePerl 5.8.8! This is not good as far as making testing easy - although I hope that once i've installed the latest version of ExtUtils::MakeMaker and the Bioperl dependencies, I'll be able to run the tests. Another thing is the lack of the correct implementation of some ppd modules, but that hopefully, should be reasonable straight forward. Nathan Chris Fields wrote: > Nathan, > > I'll try upgrading to the latest ActivePerl to see what's going on. We > probably need to be prepared for this issue if it pops up, as I'm sure it > will. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > ... > >> I've been looking at this a little bit today. It is weird! Firstly, I >> thought new ppm4 can't actually find a File::Spec module other than >> version 0.82 and thought it might be due to the addition of new tags to >> ppd files ( and ) in order to better track >> dependencies etc. However, I don't have File::Spec installed, or >> PathTools so I have no idea where it gets the version numbers from! >> >> I would install a slightly earlier version of Perl which still uses >> PPM3. All versions >= 5.8.8.818 will only contain PPM4, so either >> download 5.6.1 or download an old release of 5.8.8 from here: >> http://downloads.activestate.com/ActivePerl/Windows/5.8/ >> >> Sorry I couldn't be too much help. >> Nathan >> > > > > From cjfields at uiuc.edu Fri Sep 22 12:23:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 22 Sep 2006 11:23:44 -0500 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <4513B598.9030803@sheffield.ac.uk> Message-ID: <000601c6de63$7970c500$15327e82@pyrimidine> > Did you get round to trying this? Did you find anything? Yes I have. PPM4 is interesting, but I find the lack of a full command-line version a bit frustrating (though I can see why they are moving in this direction). It doesn't require the use of a package.lst in the ppm directory, but it looks for it first. If it isn't present it looks for the ppd files. This would be nice for Bioperl except we want to continue support for Perl 5.6.1, so we must include package.lst in the repository. I can definitely replicate the error reported before when adding the repository using ppm4: ERROR: Installing File-Spec-0.82 would downgrade File::Spec from version 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 and File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from version 1.2 to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and File::Spec::VMS from version 1.4 to 1.1 and File::Spec::Win32 from version 1.6 to 1.2 I think this comes from the various versioning requirements for those modules, which probably arise from the v 1.4 Makefile.PL. The new version doesn't have those requirements but a few others are present. When a new PPM is made we probably need to take that into consideration. I noticed that the GMOD repository (http://www.gmod.org/ggb/ppm/) carries a newer bioperl version (1.512) which is needed for GBrowse. This is a minimal installation w/o any requirements added and is accessible from PPM4; it installs w/o a hitch. > I'm still working on getting Bioperl 1.5.2 installed on ActivePerl 5.6.1 > and as many dependencies as I can in order to run the test suit - > however, I'm have a lot of headaches! I'm about to work through it a bit > more systematically, and note the problems I'm having. One that comes to > mind straight away is: > > To run the tests, I need nmake. However, nmake fails if commands break > the command line length limit - which is shorter on Windows than other > OS's. This was solved by the authors of ExtUtils::MakeMaker when I > informed them, some time ago. So, to run the tests, I need > ExtUtils::MakeMaker >=6.06 installed. However, I'm having problems with > installing this! I've informed the authors, but they resorted to > manually copying over the files and suggested using ActivePerl 5.8.8! > This is not good as far as making testing easy - although I hope that > once i've installed the latest version of ExtUtils::MakeMaker and the > Bioperl dependencies, I'll be able to run the tests. If you download the free MS Visual C++ Express Edition, it comes with a much newer version of nmake: G:\Program Files\Microsoft Visual Studio 8\VC>nmake -help Microsoft (R) Program Maintenance Utility Version 8.00.50727.42 ... It's available here: http://msdn.microsoft.com/vstudio/express/ That may resolve the issue. > Another thing is the lack of the correct implementation of some ppd > modules, but that hopefully, should be reasonable straight forward. > Nathan Which ones were you thinking of? GD::SVG and Text::Shellwords are both available on the bioperl site; we could add others as needed if they aren't available through ActiveState or the other repositories. The critical issue is that of binary compatibility between 5.6.1 and the various 5.8.x versions of perl. However, there are no binary components for Bioperl. One could probably modify the ppd file to deal with both versions (similar to the older ppd files in bioperl.org/DIST. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign ... From skumagai at life.bio.sunysb.edu Fri Sep 22 14:13:21 2006 From: skumagai at life.bio.sunysb.edu (Seiji Kumagai) Date: Fri, 22 Sep 2006 14:13:21 -0400 (EDT) Subject: [Bioperl-l] Inconsitency in return value of Bio::SimpleAlinment::remove_columns() Message-ID: Hi list, Yesterday, I reported a small bug relating to Bio::SimpleAlign::remove_columns(). My patch was committed to CVS immediately thanks to Chris. I, at the same time, noticed another issue with the method. According to the POD, the method returns a new alignment. However, iff argument is not passed to it, the return value is $self. I see this may be a problem because modifications on the returned value affect the original object only if this method is called in the way I mentioned. Secondly, it is just confusing to me that returned value is different but superficially identical (both are Bio::SimpleAlignment object). I want to make a modification on this issue by one of the following ways: 1) Make an argument mandatory. If argument is missing, throw an exception. It is user's responsibility to handle it properly. 2) Always return new SimpleAlign object. If the argument is not passed, return a clone of $self. This may make the scripts run slower and use more memory when no argument is given, but modification on new object is guaranteed not to affect original object. 3) Do not modify current version. 3.a) Do not modify current version, but state the difference in POD. What do you think? I personally think the first is the best since calling the method but doing nothing is not what users want to do in most of the cases. Those rare cases are deserved to be handled by an exception. Thanks, From arareko at campus.iztacala.unam.mx Fri Sep 22 14:52:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 22 Sep 2006 13:52:27 -0500 Subject: [Bioperl-l] Questions about doc.bioperl.org PDOC In-Reply-To: <451385B5.6030408@sendu.me.uk> References: <45131D34.40604@infotech.monash.edu.au> <451341E5.2040104@campus.iztacala.unam.mx> <451385B5.6030408@sendu.me.uk> Message-ID: <451430EB.90901@campus.iztacala.unam.mx> Sendu Bala wrote: > Mauricio Herrera Cuadra wrote: >> Torsten Seemann wrote: >>> Some questions regarding online docs: >>> >>> How often is the online PDOC updated? >>> ie. http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ >> A cron job updates it every night. > > How do you get new modules to appear on it? For instance, > Bio::PullParserI doesn't seem to be there. There was an access error to the CVS repository. I've updated rendered docs and it shows now. Thanks for pointing this out! :) Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Fri Sep 22 14:56:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 22 Sep 2006 13:56:32 -0500 Subject: [Bioperl-l] Inconsitency in return value ofBio::SimpleAlinment::remove_columns() In-Reply-To: Message-ID: <000001c6de78$d2651700$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Seiji Kumagai > Sent: Friday, September 22, 2006 1:13 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Inconsitency in return value > ofBio::SimpleAlinment::remove_columns() > > Hi list, > > Yesterday, I reported a small bug relating to > Bio::SimpleAlign::remove_columns(). My patch was committed to CVS > immediately thanks to Chris. > > I, at the same time, noticed another issue with the method. According to > the POD, the method returns a new alignment. However, iff argument is not > passed to it, the return value is $self. I see this may be a problem > because modifications on the returned value affect the original object > only if this method is called in the way I mentioned. Secondly, it is just > confusing to me that returned value is different but superficially > identical (both are Bio::SimpleAlignment object). > > I want to make a modification on this issue by one of the following ways: > > 1) Make an argument mandatory. If argument is missing, throw an > exception. It is user's responsibility to handle it properly. > > 2) Always return new SimpleAlign object. If the argument is not passed, > return a clone of $self. This may make the scripts run slower and use more > memory when no argument is given, but modification on new object is > guaranteed not to affect original object. > > 3) Do not modify current version. > > 3.a) Do not modify current version, but state the difference in POD. > > What do you think? I personally think the first is the best since calling > the method but doing nothing is not what users want to do in most of the > cases. Those rare cases are deserved to be handled by an exception. > > Thanks, The first option sounds appropriate considering the use of the method in POD. I also agree that returning $self isn't the best practice either, since it would not be a true clone of the object (unless there is some deep copy magic going on that I don't see). What would you suggest? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From skumagai at life.bio.sunysb.edu Fri Sep 22 16:03:36 2006 From: skumagai at life.bio.sunysb.edu (Seiji Kumagai) Date: Fri, 22 Sep 2006 16:03:36 -0400 (EDT) Subject: [Bioperl-l] Inconsitency in return value ofBio::SimpleAlinment::remove_columns() In-Reply-To: <000001c6de78$d2651700$15327e82@pyrimidine> References: <000001c6de78$d2651700$15327e82@pyrimidine> Message-ID: On Fri, 22 Sep 2006, Chris Fields wrote: >> I, at the same time, noticed another issue with the method. According to >> the POD, the method returns a new alignment. However, iff argument is not >> passed to it, the return value is $self. I see this may be a problem >> because modifications on the returned value affect the original object >> only if this method is called in the way I mentioned. Secondly, it is just >> confusing to me that returned value is different but superficially >> identical (both are Bio::SimpleAlignment object). >> >> I want to make a modification on this issue by one of the following ways: >> >> 1) Make an argument mandatory. If argument is missing, throw an >> exception. It is user's responsibility to handle it properly. >> >> 2) Always return new SimpleAlign object. If the argument is not passed, >> return a clone of $self. This may make the scripts run slower and use more >> memory when no argument is given, but modification on new object is >> guaranteed not to affect original object. >> >> 3) Do not modify current version. >> >> 3.a) Do not modify current version, but state the difference in POD. >> >> What do you think? I personally think the first is the best since calling >> the method but doing nothing is not what users want to do in most of the >> cases. Those rare cases are deserved to be handled by an exception. >> >> Thanks, > > The first option sounds appropriate considering the use of the method in > POD. I also agree that returning $self isn't the best practice either, > since it would not be a true clone of the object (unless there is some deep > copy magic going on that I don't see). > > What would you suggest? > Unless someone says otherwise, I think the following is enough for the first option. Index: SimpleAlign.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SimpleAlign.pm,v retrieving revision 1.110 diff -B -b -u -r1.110 SimpleAlign.pm --- SimpleAlign.pm 21 Sep 2006 19:20:12 -0000 1.110 +++ SimpleAlign.pm 22 Sep 2006 23:17:23 -0000 @@ -987,7 +987,10 @@ sub remove_columns { my ($self, at args) = @_; - @args || return $self; + @args || $self->throw + (-class => 'Bio::Root::BadParameter', + -text => 'Mandatory parameter is missing. see documentation', + -value 'array reference'); my $aln; if ($args[0][0] =~ /^[a-z_]+$/i) { Also, I noticed remove_columns() uses or claims to use a coordinate system starting from 0, whereas slice() uses a system from 1. Isn't it better to have the same scheme? Thanks, From N.Haigh at sheffield.ac.uk Fri Sep 22 17:23:15 2006 From: N.Haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri, 22 Sep 2006 22:23:15 +0100 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <000601c6de63$7970c500$15327e82@pyrimidine> References: <000601c6de63$7970c500$15327e82@pyrimidine> Message-ID: <1158960195.451454438a2d2@webmail.shef.ac.uk> Quoting Chris Fields : > > Did you get round to trying this? Did you find anything? > > Yes I have. PPM4 is interesting, but I find the lack of a full command-line > version a bit frustrating (though I can see why they are moving in this > direction). It doesn't require the use of a package.lst in the ppm > directory, but it looks for it first. If it isn't present it looks for the > ppd files. This would be nice for Bioperl except we want to continue > support for Perl 5.6.1, so we must include package.lst in the repository. > > I can definitely replicate the error reported before when adding the > repository using ppm4: > > ERROR: Installing File-Spec-0.82 would downgrade File::Spec from version > 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 and > File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from version 1.2 > to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and File::Spec::VMS from > version 1.4 to 1.1 and File::Spec::Win32 from version 1.6 to 1.2 > > I think this comes from the various versioning requirements for those > modules, which probably arise from the v 1.4 Makefile.PL. The new version > doesn't have those requirements but a few others are present. When a new > PPM is made we probably need to take that into consideration. Not sure I understand :o( > > I noticed that the GMOD repository (http://www.gmod.org/ggb/ppm/) carries a > newer bioperl version (1.512) which is needed for GBrowse. This is a > minimal installation w/o any requirements added and is accessible from PPM4; > it installs w/o a hitch. > > > I'm still working on getting Bioperl 1.5.2 installed on ActivePerl 5.6.1 > > and as many dependencies as I can in order to run the test suit - > > however, I'm have a lot of headaches! I'm about to work through it a bit > > more systematically, and note the problems I'm having. One that comes to > > mind straight away is: > > > > To run the tests, I need nmake. However, nmake fails if commands break > > the command line length limit - which is shorter on Windows than other > > OS's. This was solved by the authors of ExtUtils::MakeMaker when I > > informed them, some time ago. So, to run the tests, I need > > ExtUtils::MakeMaker >=6.06 installed. However, I'm having problems with > > installing this! I've informed the authors, but they resorted to > > manually copying over the files and suggested using ActivePerl 5.8.8! > > This is not good as far as making testing easy - although I hope that > > once i've installed the latest version of ExtUtils::MakeMaker and the > > Bioperl dependencies, I'll be able to run the tests. > > If you download the free MS Visual C++ Express Edition, it comes with a much > newer version of nmake: > > G:\Program Files\Microsoft Visual Studio 8\VC>nmake -help > > Microsoft (R) Program Maintenance Utility Version 8.00.50727.42 > ... > > It's available here: > > http://msdn.microsoft.com/vstudio/express/ > > That may resolve the issue. Thanks - I'll give it a try! I did download the Borland compilier v5.5 (i think) and I didn't seem to have much luck with that either - I may have become delierious by then though! > > > Another thing is the lack of the correct implementation of some ppd > > modules, but that hopefully, should be reasonable straight forward. > > Nathan > > Which ones were you thinking of? GD::SVG and Text::Shellwords are both > available on the bioperl site; we could add others as needed if they aren't > available through ActiveState or the other repositories. > > The critical issue is that of binary compatibility between 5.6.1 and the > various 5.8.x versions of perl. However, there are no binary components for > Bioperl. One could probably modify the ppd file to deal with both versions > (similar to the older ppd files in bioperl.org/DIST. There were several modules but can't remember which off hand. Some were not direct dependents of Bioperl, but dependencies of denpendencies! Most (i think) were because only 1 implementation tag was in the ppd file - for Perl 5.8 but non for 5.6. Most of these should be easy to rectify, but others may not be. I'll have another bash after the weekend! > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > ... > > From aaron.j.mackey at gsk.com Fri Sep 22 08:07:44 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 22 Sep 2006 08:07:44 -0400 Subject: [Bioperl-l] Bioperl - deprecate Bio/Search/Processor ? In-Reply-To: <451355FC.7020006@infotech.monash.edu.au> Message-ID: I don't know that it even needs to be deprecated - no code uses it, it should just (finally) be chucked (it was the predecessor of the SearchIO engine). -Aaron "Torsten Seemann" wrote on 09/21/2006 11:18:20 PM: > Aaron, > > While auditing bioperl-live recently I noticed your module > Bio/Search/Processor.pm which is unreferenced anywhere else, and it > looks like a deployer module, but there is no directory > Bio/Search/Processor/ > > The last real CVS entry was > date: 2000/11/20 17:10:57; author: jason; state: Exp; lines: +10 -11 > "likely meaningless changes as we will probably chuck these modules" > > Are you happy for this to be scheduled for deprecation? > eg bioperl-1.6 release > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > From cjfields at uiuc.edu Fri Sep 22 17:54:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 22 Sep 2006 16:54:11 -0500 Subject: [Bioperl-l] problem with installation of Bioperl1.4 on Windows XPPC using ActivePerl PPM In-Reply-To: <1158960195.451454438a2d2@webmail.shef.ac.uk> Message-ID: <000601c6de91$a3a03e90$15327e82@pyrimidine> ... > > ERROR: Installing File-Spec-0.82 would downgrade File::Spec from version > > 3.12 to 0.82 and File::Spec::Functions from version 1.3 to 1.1 and > > File::Spec::Mac from version 1.4 to 1.2 and File::Spec::OS2 from version > 1.2 > > to 1.1 and File::Spec::Unix from version 1.5 to 1.2 and File::Spec::VMS > from > > version 1.4 to 1.1 and File::Spec::Win32 from version 1.6 to 1.2 > > > > I think this comes from the various versioning requirements for those > > modules, which probably arise from the v 1.4 Makefile.PL. The new > version > > doesn't have those requirements but a few others are present. When a > new > > PPM is made we probably need to take that into consideration. > > Not sure I understand :o( I think PPM4 has stricter requirements that previous versions, so if a version is stipulated in the PPD file, then it will try to install it, regardless on whether a newer version is available. I think this is bad bad bad on ActiveState's part. Grr.... Due to this alone I would recommend not having any versioning requirements in ppd files until this gets sorted out. I did manage to get everything installed and working, though. I haven't tried Perl 5.6.1 myself. ... > > If you download the free MS Visual C++ Express Edition, it comes with a > much > > newer version of nmake: > > > > G:\Program Files\Microsoft Visual Studio 8\VC>nmake -help > > > > Microsoft (R) Program Maintenance Utility Version 8.00.50727.42 > > ... > > > > It's available here: > > > > http://msdn.microsoft.com/vstudio/express/ > > > > That may resolve the issue. > > Thanks - I'll give it a try! I did download the Borland compilier v5.5 (i > think) and I didn't seem to have much luck with that either - I may have > become delierious by then though! I think the Makefile generated using 'perl Makefile.PL' only works with nmake. I tried regular gnu-make and it doesn't work either. ... > > The critical issue is that of binary compatibility between 5.6.1 and the > > various 5.8.x versions of perl. However, there are no binary components > for > > Bioperl. One could probably modify the ppd file to deal with both > versions > > (similar to the older ppd files in bioperl.org/DIST. > > There were several modules but can't remember which off hand. Some were > not direct dependents of Bioperl, but dependencies of denpendencies! Most > (i > think) were because only 1 implementation tag was in the ppd file - for > Perl 5.8 but non for 5.6. Most of these should be easy to rectify, but > others > may not be. > > I'll have another bash after the weekend! I managed to get everything installed and working for Perl 5.8.8; I hate to say it, but we may need to recommend Windows users install that version if there are major problems with Perl 5.6.1. The only tests that didn't work were the ones that required bioperl-ext (which I can pretty much guarantee won't work under native Windows conditions). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Sat Sep 23 15:41:52 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 23 Sep 2006 14:41:52 -0500 Subject: [Bioperl-l] Are Bio::DB::XEMBL and Bio::DB::XEMBLService still valid? Message-ID: <45158E00.1050503@campus.iztacala.unam.mx> Folks, While validating URLs in the source tree I've found that these 2 modules make use of addresses that are no longer valid: http://www.ebi.ac.uk/XEMBL http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl Searching for the XEMBL service to update them, I've noticed that the service itself has been changed to offer new functionality: http://www.ebi.ac.uk/xembl/ http://www.ebi.ac.uk/xembl/oldindex.html Browsing through the commit history for this 2 modules, I've found that the last *real* changes for them were made 3-4 years ago. Its test output confirms my idea that this modules no longer work: [bioperl at nordwand] ~/src/bioperl-live % perl -I. -w t/XEMBL_DB.t 1..9 ok 1 ok 2 # server may be down ok 3 # Cannot run XEMBL_DB tests ok 4 # Cannot run XEMBL_DB tests ok 5 # Cannot run XEMBL_DB tests ok 6 # Cannot run XEMBL_DB tests ok 7 # Cannot run XEMBL_DB tests ok 8 # Cannot run XEMBL_DB tests ok 9 # Cannot run XEMBL_DB tests I've searched through the mailing lists archives and nobody has reported them as non functional. Does anybody have the time to look into this? Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sat Sep 23 16:26:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 23 Sep 2006 15:26:12 -0500 Subject: [Bioperl-l] Are Bio::DB::XEMBL and Bio::DB::XEMBLService still valid? In-Reply-To: <45158E00.1050503@campus.iztacala.unam.mx> References: <45158E00.1050503@campus.iztacala.unam.mx> Message-ID: Mauricio, There were similar issues with the Biblio_biofetch.t tests recently, which Brian fixed. It has to do with the way testing for remote databases is set up for most test cases, which requires BIOPERLDEBUG=1. We should probably have tests in the suite that check the URL, have the test actually fail if the URL can't be found, then skip subsequent tests that rely on the returned results. You can do this with Test::More relatively easily by using skip blocks, which allow you to conditionally skip tests if something bad happens. You could do something like this: SKIP:{ my $db = Bio::DB::GenBank->new(); my $seq; eval { $seq = $db->get_Seq_by_acc('ABC123')}; ok(!$@, 'get_Seq_by_acc() URL passes'); skip('Bio::DB::GenBank failure', 4) if $@; ... # four more tests based on $seq } This way you could run sets of tests that may rely on different URLs in the same test suite; just wrap each one in a skip block and test using eval{};. All of this, however, relies on the fact that an error is actually thrown by the module being tested. Tests could be run or skipped based on what setting BIOPERLDEBUG has early on, maybe in BEGIN{}. I plan on adding similar tests to EUtilities.t at some point. Chris On Sep 23, 2006, at 2:41 PM, Mauricio Herrera Cuadra wrote: > Folks, > > While validating URLs in the source tree I've found that these 2 > modules > make use of addresses that are no longer valid: > > http://www.ebi.ac.uk/XEMBL > http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl > > Searching for the XEMBL service to update them, I've noticed that the > service itself has been changed to offer new functionality: > > http://www.ebi.ac.uk/xembl/ > http://www.ebi.ac.uk/xembl/oldindex.html > > Browsing through the commit history for this 2 modules, I've found > that > the last *real* changes for them were made 3-4 years ago. Its test > output confirms my idea that this modules no longer work: > > [bioperl at nordwand] ~/src/bioperl-live % perl -I. -w t/XEMBL_DB.t > 1..9 > ok 1 > ok 2 # server may be down > ok 3 # Cannot run XEMBL_DB tests > ok 4 # Cannot run XEMBL_DB tests > ok 5 # Cannot run XEMBL_DB tests > ok 6 # Cannot run XEMBL_DB tests > ok 7 # Cannot run XEMBL_DB tests > ok 8 # Cannot run XEMBL_DB tests > ok 9 # Cannot run XEMBL_DB tests > > I've searched through the mailing lists archives and nobody has > reported > them as non functional. Does anybody have the time to look into this? > > Regards, > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Sat Sep 23 21:09:29 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 24 Sep 2006 11:09:29 +1000 Subject: [Bioperl-l] New Bioperl minimum Perl version -- 5.6.1 or 5.8 ? Message-ID: <4515DAC9.4000108@infotech.monash.edu.au> From a previous thread it was unclear whether it had been decided to move the minimum Perl version for Bioperl to 5.6.1 or 5.8 ? http://news.gmane.org/find-root.php?group=gmane.comp.lang.perl.bio.general&article=12304 The "INSTALL" file says 5.005, the wiki says "5.6, prefer 5.8", recent MAC OS X users only have 5.6, a lot of modules already need 5.6 anyway, and Sendu says his new pull-parsers need 5.8. My impression was that it was informally decided to require 5.6.1, but there appears to be a good case to move to 5.8 (now 3 years old). Obviously there are still people who are unable to migrate to Perl 5.8 yet, but are they the same users who will be using the latest Bioperl version anyway? -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From hlapp at gmx.net Sat Sep 23 23:48:49 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 23 Sep 2006 23:48:49 -0400 Subject: [Bioperl-l] New Bioperl minimum Perl version -- 5.6.1 or 5.8 ? In-Reply-To: <4515DAC9.4000108@infotech.monash.edu.au> References: <4515DAC9.4000108@infotech.monash.edu.au> Message-ID: They may well be. I'd recommend against requiring a version that you don't have to require. Much (most) of Bioperl will run just fine in 5.6.1. Modules that need Perl 5.8.x and higher should check that (via 'use' at compile or 'require' at runtime), and it is easy enough to test for the version of perl and exit gracefully in a test script if the perl version falls short. -hilmar On Sep 23, 2006, at 9:09 PM, Torsten Seemann wrote: > > From a previous thread it was unclear whether it had been decided > to move the > minimum Perl version for Bioperl to 5.6.1 or 5.8 ? > > http://news.gmane.org/find-root.php? > group=gmane.comp.lang.perl.bio.general&article=12304 > > The "INSTALL" file says 5.005, the wiki says "5.6, prefer 5.8", > recent MAC OS X > users only have 5.6, a lot of modules already need 5.6 anyway, and > Sendu says > his new pull-parsers need 5.8. > > My impression was that it was informally decided to require 5.6.1, > but there > appears to be a good case to move to 5.8 (now 3 years old). > Obviously there are > still people who are unable to migrate to Perl 5.8 yet, but are > they the same > users who will be using the latest Bioperl version anyway? > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Sep 24 00:36:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 23 Sep 2006 23:36:14 -0500 Subject: [Bioperl-l] New Bioperl minimum Perl version -- 5.6.1 or 5.8 ? In-Reply-To: References: <4515DAC9.4000108@infotech.monash.edu.au> Message-ID: <2374D5D5-1F71-4FFE-AF16-5BDCD784B254@uiuc.edu> Agreed. There are classes in Bioperl that no longer support versions older than 5.6.1 (FTLocationFactory, for instance), so that's the minimum. We could always require 5.6.1 but recommend 5.8 or higher, but I can't think of any that absolutely require v. 5.8. If so, we can always follow Hilmar's suggestion. We also should change the INSTALL and the wiki docs to reflect this. Chris On Sep 23, 2006, at 10:48 PM, Hilmar Lapp wrote: > They may well be. I'd recommend against requiring a version that you > don't have to require. Much (most) of Bioperl will run just fine in > 5.6.1. Modules that need Perl 5.8.x and higher should check that (via > 'use' at compile or 'require' at runtime), and it is easy enough to > test for the version of perl and exit gracefully in a test script if > the perl version falls short. > > -hilmar > > On Sep 23, 2006, at 9:09 PM, Torsten Seemann wrote: > >> >> From a previous thread it was unclear whether it had been decided >> to move the >> minimum Perl version for Bioperl to 5.6.1 or 5.8 ? >> >> http://news.gmane.org/find-root.php? >> group=gmane.comp.lang.perl.bio.general&article=12304 >> >> The "INSTALL" file says 5.005, the wiki says "5.6, prefer 5.8", >> recent MAC OS X >> users only have 5.6, a lot of modules already need 5.6 anyway, and >> Sendu says >> his new pull-parsers need 5.8. >> >> My impression was that it was informally decided to require 5.6.1, >> but there >> appears to be a good case to move to 5.8 (now 3 years old). >> Obviously there are >> still people who are unable to migrate to Perl 5.8 yet, but are >> they the same >> users who will be using the latest Bioperl version anyway? >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Sun Sep 24 08:08:00 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 24 Sep 2006 13:08:00 +0100 Subject: [Bioperl-l] Package status for 1.5.2 Message-ID: <45167520.6060803@sendu.me.uk> To what extent do we want a unified release for all the various packages? As I see it, here is the current status: Core ---- P1 and P2 bugs remain in bugzilla, but nothing that need prevent the release. These only need to be resolved before 1.6. Test suite passes satisfactorily, though see http://www.bioperl.org/wiki/Release_1.5.2 for some minor issues (again, nothing that prevents release). Installation issues (ie. PPM, Windows) are being investigated by Nathan and Chris. Are they serious enough to prevent release? (Minimal Perl version will be 5.6.1) Conclusion: ok to be tagged bioperl-release-1-5-2-rc1 and released for testing. Run --- Has no INSTALL file. I don't have most of the programs installed, so I skip most tests. However I still manage to fail 2 tests: Failed Test Stat Wstat Total Fail Failed List of Failed t/Analysis_soap.t 17 1 5.88% 17 t/Hmmer.t 20 8 40.00% 5-8 12-15 Should an attempt be made to get things in the Run package to work? Does anyone have a comprehensive selection of programs installed for better testing? Conclusion: NOT ok to be tagged. Ext --- Has no INSTALL file. I can't install the staden stuff, so can't even run the test suite. Conclusion: ?? GUI --- Has no INSTALL file. Passes its single test. Conclusion: ok to tag, but do we want to? (It doesn't seem to have been part of a unified release in the past.) DB -- Test suite passes first time, but subsequently warnings and failures seem to arrise due to things stored in DB by test scritps not being removed from DB (even though the test script tries to). eg. perl t/04swiss.t 1..52 ok 1 [...] ok 51 not ok 52 # Test 52 got: (t/04swiss.t at line 119) # Expected: '1' This may be a problem particular to me. Conclusion: ??, do we even want a unified release with this (haven't in the past)? Pedigree -------- Has no README or INSTALL file. Test suite passes, though t/PedIO.t warns about uninitialized values in ped.pm line 291. Conclusion: ok to tag, but do we want to? (It doesn't seem to have been part of a unified release in the past.) Microarray ---------- Has no README or INSTALL file. All tests fail because they can't find Bio/Expression/FeatureSet.pm. I can't find it either. Conclusion: NOT ok to tag, do we even want a unified release with this (haven't in the past)? Network ------- Has no README. Test suite has big problems: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/Graph-Articulation.t 9 2304 47 0 0.00% ?? t/Graph-MD5.t 2 512 19 0 0.00% ?? t/Graph-Seq.t 255 65280 18 0 0.00% ?? t/IO_dip_tab.t 9 2304 16 0 0.00% ?? t/IO_psi.t 255 65280 21 0 0.00% ?? t/Interaction.t 255 65280 17 0 0.00% ?? t/ProteinNet.t 9 2304 168 0 0.00% ?? 297 subtests skipped. Failed 7/9 test scripts, 22.22% okay. 0/343 subtests failed, 100.00% okay. eg. t/Graph-Articulation....ok 1/47Can't locate object method "get_nodes_by_id" via package "Graph::Undirected" at [...]Bio/Network/IO/dip_tab.pm line 141, line 1. t/Graph-MD5.............Can't locate object method "new" via package "Graph::Undirected" (perhaps you forgot to load "Graph::Undirected"?) at t/Graph-MD5.t line 44. Conclusion: NOT ok to tag, do we even want a unified release with this (haven't in the past)? Pipeline -------- Is there a config file? Many tests fail for me because they try a database connection with user root, but with no password this fails. Conclusion: ??, do we even want a unified release with this (haven't in the past)? From dr.hogart at gmail.com Sun Sep 24 07:50:40 2006 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 24 Sep 2006 15:50:40 +0400 Subject: [Bioperl-l] using the libgd library Message-ID: Hello everybody, may be my question will be very easy for programmers, but nevethelless: i want to use BioPerl::Grafics and for its correct work, according to HOWTO:Grafics, the libgd C library is necessary. I successfully installed it but I dont know how to use it for scripting :( : after installation I try to call example scripts Bio::Grafics (http://www.bioperl.org/wiki/Bioperl_scripts) but obtain the resulting message: Usage: lots_of_glyphs IMAGE_CLASS - where IMAGE_CLASS is one of GD or GD::SVG - GD generate png output; GD::SVG generates SVG. My platform is WinXP. Thanks in advance. -- From cjfields at uiuc.edu Sun Sep 24 10:03:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 24 Sep 2006 09:03:46 -0500 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <45167520.6060803@sendu.me.uk> References: <45167520.6060803@sendu.me.uk> Message-ID: Past releases of bioperl-run used to coincide with core, if I remember correctly. For the last release, I think Jason set up tar- balls for all of them at roughly the same time; I'm sure there were bugs present in them at the time. If there are any bugs (tests that don't pass) they should probably be submitted to bugzilla, unless you can fix them. The ones in hmmer may arise from recent changes in SearchIO. Chris On Sep 24, 2006, at 7:08 AM, Sendu Bala wrote: > To what extent do we want a unified release for all the various > packages? As I see it, here is the current status: > > Core > ---- > P1 and P2 bugs remain in bugzilla, but nothing that need prevent the > release. These only need to be resolved before 1.6. > > Test suite passes satisfactorily, though see > http://www.bioperl.org/wiki/Release_1.5.2 for some minor issues > (again, > nothing that prevents release). > > Installation issues (ie. PPM, Windows) are being investigated by > Nathan > and Chris. Are they serious enough to prevent release? (Minimal Perl > version will be 5.6.1) > > Conclusion: ok to be tagged bioperl-release-1-5-2-rc1 and released for > testing. > > > Run > --- > Has no INSTALL file. > > I don't have most of the programs installed, so I skip most tests. > However I still manage to fail 2 tests: > > Failed Test Stat Wstat Total Fail Failed List of Failed > t/Analysis_soap.t 17 1 5.88% 17 > t/Hmmer.t 20 8 40.00% 5-8 12-15 > > Should an attempt be made to get things in the Run package to work? > Does > anyone have a comprehensive selection of programs installed for better > testing? > > Conclusion: NOT ok to be tagged. > > > Ext > --- > Has no INSTALL file. > > I can't install the staden stuff, so can't even run the test suite. > > Conclusion: ?? > > > GUI > --- > Has no INSTALL file. > > Passes its single test. > > Conclusion: ok to tag, but do we want to? (It doesn't seem to have > been > part of a unified release in the past.) > > > DB > -- > Test suite passes first time, but subsequently warnings and failures > seem to arrise due to things stored in DB by test scritps not being > removed from DB (even though the test script tries to). > > eg. perl t/04swiss.t > 1..52 > ok 1 > [...] > ok 51 > not ok 52 > # Test 52 got: (t/04swiss.t at line 119) > # Expected: '1' > > This may be a problem particular to me. > > Conclusion: ??, do we even want a unified release with this > (haven't in > the past)? > > > Pedigree > -------- > Has no README or INSTALL file. > > Test suite passes, though t/PedIO.t warns about uninitialized > values in > ped.pm line 291. > > Conclusion: ok to tag, but do we want to? (It doesn't seem to have > been > part of a unified release in the past.) > > > Microarray > ---------- > Has no README or INSTALL file. > > All tests fail because they can't find Bio/Expression/FeatureSet.pm. I > can't find it either. > > Conclusion: NOT ok to tag, do we even want a unified release with this > (haven't in the past)? > > > Network > ------- > Has no README. > > Test suite has big problems: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > --------- > t/Graph-Articulation.t 9 2304 47 0 0.00% ?? > t/Graph-MD5.t 2 512 19 0 0.00% ?? > t/Graph-Seq.t 255 65280 18 0 0.00% ?? > t/IO_dip_tab.t 9 2304 16 0 0.00% ?? > t/IO_psi.t 255 65280 21 0 0.00% ?? > t/Interaction.t 255 65280 17 0 0.00% ?? > t/ProteinNet.t 9 2304 168 0 0.00% ?? > 297 subtests skipped. > Failed 7/9 test scripts, 22.22% okay. 0/343 subtests failed, > 100.00% okay. > > eg. > t/Graph-Articulation....ok 1/47Can't locate object method > "get_nodes_by_id" via package "Graph::Undirected" at > [...]Bio/Network/IO/dip_tab.pm line 141, > line 1. > > t/Graph-MD5.............Can't locate object method "new" via package > "Graph::Undirected" (perhaps you forgot to load > "Graph::Undirected"?) at > t/Graph-MD5.t line 44. > > Conclusion: NOT ok to tag, do we even want a unified release with this > (haven't in the past)? > > > Pipeline > -------- > Is there a config file? Many tests fail for me because they try a > database connection with user root, but with no password this fails. > > Conclusion: ??, do we even want a unified release with this > (haven't in > the past)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun Sep 24 11:32:40 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 24 Sep 2006 11:32:40 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <45167520.6060803@sendu.me.uk> Message-ID: Sendu, Did you test the latest version of bioperl-network? Perhaps part of the issue is that one doesn't just require Graph, you have to have at least v. 70 of Graph. Let me amend the *t's... Brian O. On 9/24/06 8:08 AM, "Sendu Bala" wrote: > Network > ------- > Has no README. > > Test suite has big problems: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------------> - > t/Graph-Articulation.t 9 2304 47 0 0.00% ?? > t/Graph-MD5.t 2 512 19 0 0.00% ?? > t/Graph-Seq.t 255 65280 18 0 0.00% ?? > t/IO_dip_tab.t 9 2304 16 0 0.00% ?? > t/IO_psi.t 255 65280 21 0 0.00% ?? > t/Interaction.t 255 65280 17 0 0.00% ?? > t/ProteinNet.t 9 2304 168 0 0.00% ?? > 297 subtests skipped. > Failed 7/9 test scripts, 22.22% okay. 0/343 subtests failed, 100.00% okay. > > eg. > t/Graph-Articulation....ok 1/47Can't locate object method > "get_nodes_by_id" via package "Graph::Undirected" at > [...]Bio/Network/IO/dip_tab.pm line 141, > line 1. > > t/Graph-MD5.............Can't locate object method "new" via package > "Graph::Undirected" (perhaps you forgot to load "Graph::Undirected"?) at > t/Graph-MD5.t line 44. > > Conclusion: NOT ok to tag, do we even want a unified release with this > (haven't in the past)? From bix at sendu.me.uk Sun Sep 24 11:55:08 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 24 Sep 2006 16:55:08 +0100 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: References: Message-ID: <4516AA5C.8010600@sendu.me.uk> Brian Osborne wrote: > Sendu, > > Did you test the latest version of bioperl-network? Perhaps part of the > issue is that one doesn't just require Graph, you have to have at least v. > 70 of Graph. Let me amend the *t's... Yes, the latest network from cvs. I only installed Graph today - built from J/JH/JHI/Graph-0.78.tar.gz by cpan, which tests all ok and seems to install fine (though ends up with CPAN_VERSION 0.78 vs INST_VERSION 0.20101 when I look at 'i Graph' - is that expected?). After your recent commits I get: make test PERL_DL_NONLAZY=1 /usr/bin/perl5.8.8 "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/Edge...........ok t/Graph-MD5......Can't locate object method "new" via package "Graph::Undirected" (perhaps you forgot to load "Graph::Undirected"?) at t/Graph-MD5.t line 44. t/Graph-MD5......dubious Test returned status 2 (wstat 512, 0x200) after all the subtests completed successfully t/Graph-Seq......ok 1/18Use of uninitialized value in hash slice at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 66. Use of uninitialized value in exists at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 332. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 338. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 339. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 342. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 310. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 315. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 316. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 360. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 361. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Can't call method "seq" on an undefined value at t/Graph-Seq.t line 56. t/Graph-Seq......dubious Test returned status 255 (wstat 65280, 0xff00) after all the subtests completed successfully t/Interaction....ok 1/17Can't locate object method "add_interaction" via package "Graph::Undirected" at t/Interaction.t line 59. t/Interaction....dubious Test returned status 255 (wstat 65280, 0xff00) after all the subtests completed successfully t/IO_dip_tab.....ok 1/16Can't locate object method "get_nodes_by_id" via package "Graph::Undirected" at /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/dip_tab.pm line 141, line 1. t/IO_dip_tab.....dubious Test returned status 9 (wstat 2304, 0x900) after all the subtests completed successfully t/IO_psi.........ok 2/21Can't locate object method "add_node" via package "Graph::Undirected" at /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/psi.pm line 346. t/IO_psi.........dubious Test returned status 255 (wstat 65280, 0xff00) after all the subtests completed successfully t/Node...........ok t/ProteinNet.....ok 1/168Can't locate object method "get_nodes_by_id" via package "Graph::Undirected" at /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/dip_tab.pm line 141, line 1. t/ProteinNet.....dubious Test returned status 9 (wstat 2304, 0x900) after all the subtests completed successfully Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/Graph-MD5.t 2 512 19 0 0.00% ?? t/Graph-Seq.t 255 65280 18 0 0.00% ?? t/IO_dip_tab.t 9 2304 16 0 0.00% ?? t/IO_psi.t 255 65280 21 0 0.00% ?? t/Interaction.t 255 65280 17 0 0.00% ?? t/ProteinNet.t 9 2304 168 0 0.00% ?? 252 subtests skipped. Failed 6/8 test scripts, 25.00% okay. 0/296 subtests failed, 100.00% okay. make: *** [test_dynamic] Error 9 From osborne1 at optonline.net Sun Sep 24 16:15:59 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 24 Sep 2006 16:15:59 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <4516AA5C.8010600@sendu.me.uk> Message-ID: Sendu, Let's take these one at a time, starting with t/Graph-MD5 (I'm not seeing any of these errors so you'll have to test, sorry about that). I've changed 'eval { require Graph; };' to 'eval { require Graph::Undirected; };'. Does this help? Brian O. On 9/24/06 11:55 AM, "Sendu Bala" wrote: > Brian Osborne wrote: >> Sendu, >> >> Did you test the latest version of bioperl-network? Perhaps part of the >> issue is that one doesn't just require Graph, you have to have at least v. >> 70 of Graph. Let me amend the *t's... > > Yes, the latest network from cvs. I only installed Graph today - built > from J/JH/JHI/Graph-0.78.tar.gz by cpan, which tests all ok and seems to > install fine (though ends up with CPAN_VERSION 0.78 vs INST_VERSION > 0.20101 when I look at 'i Graph' - is that expected?). > > After your recent commits I get: > > make test > PERL_DL_NONLAZY=1 /usr/bin/perl5.8.8 "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > t/Edge...........ok > t/Graph-MD5......Can't locate object method "new" via package > "Graph::Undirected" (perhaps you forgot to load "Graph::Undirected"?) at > t/Graph-MD5.t line 44. > t/Graph-MD5......dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > t/Graph-Seq......ok 1/18Use of uninitialized value in hash slice at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 66. > Use of uninitialized value in exists at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 332. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 338. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 339. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 342. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 310. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 315. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 316. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 360. > Use of uninitialized value in hash element at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 361. > Use of uninitialized value in sort at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. > Use of uninitialized value in sort at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. > Use of uninitialized value in sort at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. > Use of uninitialized value in sort at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. > Use of uninitialized value in sort at > /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. > Can't call method "seq" on an undefined value at t/Graph-Seq.t line 56. > t/Graph-Seq......dubious > Test returned status 255 (wstat 65280, 0xff00) > after all the subtests completed successfully > t/Interaction....ok 1/17Can't locate object method "add_interaction" via > package "Graph::Undirected" at t/Interaction.t line 59. > t/Interaction....dubious > Test returned status 255 (wstat 65280, 0xff00) > after all the subtests completed successfully > t/IO_dip_tab.....ok 1/16Can't locate object method "get_nodes_by_id" via > package "Graph::Undirected" at > /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/dip > _tab.pm > line 141, line 1. > t/IO_dip_tab.....dubious > Test returned status 9 (wstat 2304, 0x900) > after all the subtests completed successfully > t/IO_psi.........ok 2/21Can't locate object method "add_node" via > package "Graph::Undirected" at > /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/psi > .pm > line 346. > t/IO_psi.........dubious > Test returned status 255 (wstat 65280, 0xff00) > after all the subtests completed successfully > t/Node...........ok > t/ProteinNet.....ok 1/168Can't locate object method "get_nodes_by_id" > via package "Graph::Undirected" at > /mnt/data_local/os_homes/linux/src/bioperl/network/blib/lib/Bio/Network/IO/dip > _tab.pm > line 141, line 1. > t/ProteinNet.....dubious > Test returned status 9 (wstat 2304, 0x900) > after all the subtests completed successfully > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------------> - > t/Graph-MD5.t 2 512 19 0 0.00% ?? > t/Graph-Seq.t 255 65280 18 0 0.00% ?? > t/IO_dip_tab.t 9 2304 16 0 0.00% ?? > t/IO_psi.t 255 65280 21 0 0.00% ?? > t/Interaction.t 255 65280 17 0 0.00% ?? > t/ProteinNet.t 9 2304 168 0 0.00% ?? > 252 subtests skipped. > Failed 6/8 test scripts, 25.00% okay. 0/296 subtests failed, 100.00% okay. > make: *** [test_dynamic] Error 9 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Sun Sep 24 16:59:53 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sun, 24 Sep 2006 15:59:53 -0500 Subject: [Bioperl-l] Are Bio::DB::XEMBL and Bio::DB::XEMBLService still valid? In-Reply-To: References: <45158E00.1050503@campus.iztacala.unam.mx> Message-ID: <4516F1C9.7060407@campus.iztacala.unam.mx> Chris, I have the same opinion about the testing for these type of modules. My point here actually was that the XEMBL modules *could be* no longer compatible with the new XEMBL service and should be audited to see if they need changes. Cheers, Mauricio. Chris Fields wrote: > Mauricio, > > There were similar issues with the Biblio_biofetch.t tests recently, > which Brian fixed. It has to do with the way testing for remote > databases is set up for most test cases, which requires BIOPERLDEBUG=1. > > We should probably have tests in the suite that check the URL, have > the test actually fail if the URL can't be found, then skip > subsequent tests that rely on the returned results. You can do this > with Test::More relatively easily by using skip blocks, which allow > you to conditionally skip tests if something bad happens. You could > do something like this: > > SKIP:{ > my $db = Bio::DB::GenBank->new(); > my $seq; > eval { $seq = $db->get_Seq_by_acc('ABC123')}; > ok(!$@, 'get_Seq_by_acc() URL passes'); > skip('Bio::DB::GenBank failure', 4) if $@; > ... # four more tests based on $seq > } > > This way you could run sets of tests that may rely on different URLs > in the same test suite; just wrap each one in a skip block and test > using eval{};. All of this, however, relies on the fact that an > error is actually thrown by the module being tested. Tests could be > run or skipped based on what setting BIOPERLDEBUG has early on, maybe > in BEGIN{}. > > I plan on adding similar tests to EUtilities.t at some point. > > Chris > > On Sep 23, 2006, at 2:41 PM, Mauricio Herrera Cuadra wrote: > >> Folks, >> >> While validating URLs in the source tree I've found that these 2 >> modules >> make use of addresses that are no longer valid: >> >> http://www.ebi.ac.uk/XEMBL >> http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl >> >> Searching for the XEMBL service to update them, I've noticed that the >> service itself has been changed to offer new functionality: >> >> http://www.ebi.ac.uk/xembl/ >> http://www.ebi.ac.uk/xembl/oldindex.html >> >> Browsing through the commit history for this 2 modules, I've found >> that >> the last *real* changes for them were made 3-4 years ago. Its test >> output confirms my idea that this modules no longer work: >> >> [bioperl at nordwand] ~/src/bioperl-live % perl -I. -w t/XEMBL_DB.t >> 1..9 >> ok 1 >> ok 2 # server may be down >> ok 3 # Cannot run XEMBL_DB tests >> ok 4 # Cannot run XEMBL_DB tests >> ok 5 # Cannot run XEMBL_DB tests >> ok 6 # Cannot run XEMBL_DB tests >> ok 7 # Cannot run XEMBL_DB tests >> ok 8 # Cannot run XEMBL_DB tests >> ok 9 # Cannot run XEMBL_DB tests >> >> I've searched through the mailing lists archives and nobody has >> reported >> them as non functional. Does anybody have the time to look into this? >> >> Regards, >> Mauricio. >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lzhtom at hotmail.com Sun Sep 24 23:46:28 2006 From: lzhtom at hotmail.com (zhihua li) Date: Mon, 25 Sep 2006 03:46:28 +0000 Subject: [Bioperl-l] HELP: script cannot find primer3, which has been properly installed Message-ID: hi, netters i installed primer3_1.0b and bioperl-run-1.4 following the instructions. the source of primer3 executables is in /home/zz/setup/primer3_1.0b/src/primer3_core. then i tried the script in the documentation for bio::tool::run::primer3 use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-file=>'input.1'); my $seq = $seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3->new(-seq => $seq, -outfile => "temp.out", -path => "/home/zz/setup/primer3_1.0b/src/primer3_core"); $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); my $results = $primer3->run; but when i ran it, there was an error message saying that primer3 couldn't be found. i'm pretty sure primer3 has been properly installed. 'cause when i went to the source directory /home/zz/setup/primer3_1.0b/src/, i could run the program there using: ./primer3_core References: <45158E00.1050503@campus.iztacala.unam.mx> <4516F1C9.7060407@campus.iztacala.unam.mx> Message-ID: <5123A5D2-8F82-4FB7-A023-C8E99590A08C@uiuc.edu> Mauricio, I'll try to take a look at it. There may not be much we can do before the release candidate. If it is a more complicated fix, we made need to file a bug to remind us of it's presence. Chris On Sep 24, 2006, at 3:59 PM, Mauricio Herrera Cuadra wrote: > Chris, > > I have the same opinion about the testing for these type of > modules. My > point here actually was that the XEMBL modules *could be* no longer > compatible with the new XEMBL service and should be audited to see if > they need changes. > > Cheers, > Mauricio. > > Chris Fields wrote: >> Mauricio, >> >> There were similar issues with the Biblio_biofetch.t tests recently, >> which Brian fixed. It has to do with the way testing for remote >> databases is set up for most test cases, which requires >> BIOPERLDEBUG=1. >> >> We should probably have tests in the suite that check the URL, have >> the test actually fail if the URL can't be found, then skip >> subsequent tests that rely on the returned results. You can do this >> with Test::More relatively easily by using skip blocks, which allow >> you to conditionally skip tests if something bad happens. You could >> do something like this: >> >> SKIP:{ >> my $db = Bio::DB::GenBank->new(); >> my $seq; >> eval { $seq = $db->get_Seq_by_acc('ABC123')}; >> ok(!$@, 'get_Seq_by_acc() URL passes'); >> skip('Bio::DB::GenBank failure', 4) if $@; >> ... # four more tests based on $seq >> } >> >> This way you could run sets of tests that may rely on different URLs >> in the same test suite; just wrap each one in a skip block and test >> using eval{};. All of this, however, relies on the fact that an >> error is actually thrown by the module being tested. Tests could be >> run or skipped based on what setting BIOPERLDEBUG has early on, maybe >> in BEGIN{}. >> >> I plan on adding similar tests to EUtilities.t at some point. >> >> Chris >> >> On Sep 23, 2006, at 2:41 PM, Mauricio Herrera Cuadra wrote: >> >>> Folks, >>> >>> While validating URLs in the source tree I've found that these 2 >>> modules >>> make use of addresses that are no longer valid: >>> >>> http://www.ebi.ac.uk/XEMBL >>> http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl >>> >>> Searching for the XEMBL service to update them, I've noticed that >>> the >>> service itself has been changed to offer new functionality: >>> >>> http://www.ebi.ac.uk/xembl/ >>> http://www.ebi.ac.uk/xembl/oldindex.html >>> >>> Browsing through the commit history for this 2 modules, I've found >>> that >>> the last *real* changes for them were made 3-4 years ago. Its test >>> output confirms my idea that this modules no longer work: >>> >>> [bioperl at nordwand] ~/src/bioperl-live % perl -I. -w t/XEMBL_DB.t >>> 1..9 >>> ok 1 >>> ok 2 # server may be down >>> ok 3 # Cannot run XEMBL_DB tests >>> ok 4 # Cannot run XEMBL_DB tests >>> ok 5 # Cannot run XEMBL_DB tests >>> ok 6 # Cannot run XEMBL_DB tests >>> ok 7 # Cannot run XEMBL_DB tests >>> ok 8 # Cannot run XEMBL_DB tests >>> ok 9 # Cannot run XEMBL_DB tests >>> >>> I've searched through the mailing lists archives and nobody has >>> reported >>> them as non functional. Does anybody have the time to look into >>> this? >>> >>> Regards, >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Sep 25 00:31:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 24 Sep 2006 23:31:38 -0500 Subject: [Bioperl-l] Are Bio::DB::XEMBL and Bio::DB::XEMBLService still valid? In-Reply-To: <4516F1C9.7060407@campus.iztacala.unam.mx> References: <45158E00.1050503@campus.iztacala.unam.mx> <4516F1C9.7060407@campus.iztacala.unam.mx> Message-ID: Mauricio, I take that back! After a quick look at your link, it looks like XEMBL is essentially to be discontinued (well, merged with DBFetch). I'm cc'ing this to Lincoln to get his thoughts (looks like he is the designated maintainer in the POD), but I think they probably could be deprecated if they have no function anymore. Lincoln, in short, Mauricio has found that the XEMBL-related urls with Bio::DB::XEMBL and Bio::DB::XEMBLServices no longer work. Do these modules have any value beyond their utility for accessing XEMBL? Chris On Sep 24, 2006, at 3:59 PM, Mauricio Herrera Cuadra wrote: > Chris, > > I have the same opinion about the testing for these type of > modules. My > point here actually was that the XEMBL modules *could be* no longer > compatible with the new XEMBL service and should be audited to see if > they need changes. > > Cheers, > Mauricio. > > Chris Fields wrote: >> Mauricio, >> >> There were similar issues with the Biblio_biofetch.t tests recently, >> which Brian fixed. It has to do with the way testing for remote >> databases is set up for most test cases, which requires >> BIOPERLDEBUG=1. >> >> We should probably have tests in the suite that check the URL, have >> the test actually fail if the URL can't be found, then skip >> subsequent tests that rely on the returned results. You can do this >> with Test::More relatively easily by using skip blocks, which allow >> you to conditionally skip tests if something bad happens. You could >> do something like this: >> >> SKIP:{ >> my $db = Bio::DB::GenBank->new(); >> my $seq; >> eval { $seq = $db->get_Seq_by_acc('ABC123')}; >> ok(!$@, 'get_Seq_by_acc() URL passes'); >> skip('Bio::DB::GenBank failure', 4) if $@; >> ... # four more tests based on $seq >> } >> >> This way you could run sets of tests that may rely on different URLs >> in the same test suite; just wrap each one in a skip block and test >> using eval{};. All of this, however, relies on the fact that an >> error is actually thrown by the module being tested. Tests could be >> run or skipped based on what setting BIOPERLDEBUG has early on, maybe >> in BEGIN{}. >> >> I plan on adding similar tests to EUtilities.t at some point. >> >> Chris >> >> On Sep 23, 2006, at 2:41 PM, Mauricio Herrera Cuadra wrote: >> >>> Folks, >>> >>> While validating URLs in the source tree I've found that these 2 >>> modules >>> make use of addresses that are no longer valid: >>> >>> http://www.ebi.ac.uk/XEMBL >>> http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl >>> >>> Searching for the XEMBL service to update them, I've noticed that >>> the >>> service itself has been changed to offer new functionality: >>> >>> http://www.ebi.ac.uk/xembl/ >>> http://www.ebi.ac.uk/xembl/oldindex.html >>> >>> Browsing through the commit history for this 2 modules, I've found >>> that >>> the last *real* changes for them were made 3-4 years ago. Its test >>> output confirms my idea that this modules no longer work: >>> >>> [bioperl at nordwand] ~/src/bioperl-live % perl -I. -w t/XEMBL_DB.t >>> 1..9 >>> ok 1 >>> ok 2 # server may be down >>> ok 3 # Cannot run XEMBL_DB tests >>> ok 4 # Cannot run XEMBL_DB tests >>> ok 5 # Cannot run XEMBL_DB tests >>> ok 6 # Cannot run XEMBL_DB tests >>> ok 7 # Cannot run XEMBL_DB tests >>> ok 8 # Cannot run XEMBL_DB tests >>> ok 9 # Cannot run XEMBL_DB tests >>> >>> I've searched through the mailing lists archives and nobody has >>> reported >>> them as non functional. Does anybody have the time to look into >>> this? >>> >>> Regards, >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Sep 25 00:50:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 24 Sep 2006 23:50:20 -0500 Subject: [Bioperl-l] Failed tests for FeatureIO Message-ID: I am getting failed tests for FeatureIO.t on make test using bioperl- live on Mac OS X (10.4.7, perl 5.8.6): Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------ ------- t/FeatureIO.t 255 65280 33 0 ?? 5 subtests skipped. Failed 1/238 test scripts. 0/12392 subtests failed. Files=238, Tests=12392, 402 wallclock secs (99.35 cusr + 14.14 csys = 113.49 CPU) Failed 1/238 test programs. 0/12392 subtests failed. Here's the direct run: pyr:~/src/bioperl-live cjfields$ perl -I. -w t/FeatureIO.t 1..33 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 ok 11 ok 12 ok 13 ok 14 ok 15 -------------------- WARNING --------------------- MSG: '##feature-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##attribute-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##source-ontology' directive handling not yet implemented --------------------------------------------------- ok 16 ok 17 ok 18 ok 19 ok 20 ok 21 ok 22 ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Invalid protein count STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:331 STACK: Bio::FeatureIO::ptt::_initialize Bio/FeatureIO/ptt.pm:147 STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:266 STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:286 STACK: t/FeatureIO.t:189 ----------------------------------------------------------- ok 23 # Cannot complete FeatureIO tests ok 24 # Cannot complete FeatureIO tests ok 25 # Cannot complete FeatureIO tests ok 26 # Cannot complete FeatureIO tests ok 27 # Cannot complete FeatureIO tests ok 28 # Cannot complete FeatureIO tests ok 29 # Cannot complete FeatureIO tests ok 30 # Cannot complete FeatureIO tests ok 31 # Cannot complete FeatureIO tests ok 32 # Cannot complete FeatureIO tests ok 33 # Cannot complete FeatureIO tests Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Sep 25 04:31:09 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 25 Sep 2006 09:31:09 +0100 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: References: Message-ID: <451793CD.3060504@sendu.me.uk> Brian Osborne wrote: > Let's take these one at a time, starting with t/Graph-MD5 (I'm not seeing > any of these errors so you'll have to test, sorry about that). I've changed > 'eval { require Graph; };' to 'eval { require Graph::Undirected; };'. Does > this help? I compared my copy of Graph.pm (my command-line CPAN claims that 0.78 is the latest version) to the one I could find via search.cpan.org (v0.80) and they're nothing alike. Mine is just $VERSION = 0.20101; @ISA = qw(Graph::Directed Graph::Base); And nothing else, which is obviously why Graph::Undirected wasn't being loaded. How do I install whatever variant of Graph you expect? The network INSTALL file is pretty vague on that front. "bioperl-network also depends on Perl's Graph package. See CPAN at www.perl.org for instructions on downloading and installing Graph, use Graph version .70 or greater." CPAN is at www.cpan.org or cpan.perl.org, and there are no download or install 'instructions' for things on CPAN (just a download link as far as I can see). Anyway, new results following your commit: perl -w t/Graph-MD5.t 1..19 # Running under perl version 5.008008 for linux # Current time local: Mon Sep 25 09:00:31 2006 # Current time GMT: Mon Sep 25 08:00:31 2006 # Using Test.pm version 1.25 ok 1 Use of uninitialized value in hash slice at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 66. Use of uninitialized value in exists at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 332. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 338. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 339. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 342. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 310. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 315. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 316. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 360. Use of uninitialized value in hash element at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 361. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Use of uninitialized value in sort at /usr/lib64/perl5/vendor_perl/5.8.8/Graph/Base.pm line 96. Can't call method "add" on an undefined value at t/Graph-MD5.t line 59. ok 2 # skip Missing dependencies. Skipping tests ok 3 # skip Missing dependencies. Skipping tests ok 4 # skip Missing dependencies. Skipping tests ok 5 # skip Missing dependencies. Skipping tests ok 6 # skip Missing dependencies. Skipping tests ok 7 # skip Missing dependencies. Skipping tests ok 8 # skip Missing dependencies. Skipping tests ok 9 # skip Missing dependencies. Skipping tests ok 10 # skip Missing dependencies. Skipping tests ok 11 # skip Missing dependencies. Skipping tests ok 12 # skip Missing dependencies. Skipping tests ok 13 # skip Missing dependencies. Skipping tests ok 14 # skip Missing dependencies. Skipping tests ok 15 # skip Missing dependencies. Skipping tests ok 16 # skip Missing dependencies. Skipping tests ok 17 # skip Missing dependencies. Skipping tests ok 18 # skip Missing dependencies. Skipping tests ok 19 # skip Missing dependencies. Skipping tests (That's a bad skip message btw, given that it isn't true) From torsten.seemann at infotech.monash.edu.au Mon Sep 25 05:43:36 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 25 Sep 2006 19:43:36 +1000 Subject: [Bioperl-l] Failed tests for FeatureIO In-Reply-To: References: Message-ID: <4517A4C8.50004@infotech.monash.edu.au> Chris, > I am getting failed tests for FeatureIO.t on make test using bioperl- > live on Mac OS X (10.4.7, perl 5.8.6): > pyr:~/src/bioperl-live cjfields$ perl -I. -w t/FeatureIO.t > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Invalid protein count > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:331 > STACK: Bio::FeatureIO::ptt::_initialize Bio/FeatureIO/ptt.pm:147 > STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:266 > STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:286 > STACK: t/FeatureIO.t:189 > ----------------------------------------------------------- Hmmm, I committed that module last week. I just did a fresh complete checkout of current CVS, and all the tests pass fine for me on Linux 2.6 + Perl 5.8.5 ? Do you have t/data/test.ptt ? (the .pm isn't completely robust yet...) I can't think of any other issues - here's the offending code: sub _initialize { my($self,%arg) = @_; $self->SUPER::_initialize(%arg); if ($self->mode eq 'r') { # Line 1 my $desc = $self->_readline(); chomp $desc; $self->description($desc); # Line 2 my $line = $self->_readline(); $line =~ m/^(\d+) proteins/ or $self->throw("Invalid protein count"); $self->protein_count($1); # Line 3 $self->_readline(); } } -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From hlapp at gmx.net Mon Sep 25 08:25:51 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 25 Sep 2006 08:25:51 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: References: Message-ID: <2D3CE620-90DD-492F-A2CE-340A0B7D7259@gmx.net> On Sep 24, 2006, at 11:32 AM, Brian Osborne wrote: > Did you test the latest version of bioperl-network? Perhaps part of > the > issue is that one doesn't just require Graph, you have to have at > least v. > 70 of Graph. Let me amend the *t's... BTW if you have a minimum version requirement that should go into the module 'use' as well, not just the test script. Maybe you had done that anyway already? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Sep 25 08:30:01 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 25 Sep 2006 08:30:01 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <45167520.6060803@sendu.me.uk> References: <45167520.6060803@sendu.me.uk> Message-ID: <91604478-4451-4C3F-976A-8926B245CA6C@gmx.net> On Sep 24, 2006, at 8:08 AM, Sendu Bala wrote: > > DB > -- > Test suite passes first time, but subsequently warnings and failures > seem to arrise due to things stored in DB by test scritps not being > removed from DB (even though the test script tries to). This most likely means you are testing against a MySQL database and you do not have innodb enabled (and hence have no transactions). (Note: once you do get InnoDB enabled you will have to recreate the schema. Note also that MySQL does not tell you when it silently converts the InnoDB table handler to the MyISAM table handler because the former one isn't fully enabled.) > > eg. perl t/04swiss.t > 1..52 > ok 1 > [...] > ok 51 > not ok 52 > # Test 52 got: (t/04swiss.t at line 119) > # Expected: '1' > > This may be a problem particular to me. > > Conclusion: ??, do we even want a unified release with this > (haven't in > the past)? We have not in the past but in general I think it would be desirable. There is an outstanding bug I have to investigate (it is not tested for yet) but that would only prevent a 1.6 co-release, not really 1.5.2. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Mon Sep 25 08:52:38 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 25 Sep 2006 13:52:38 +0100 Subject: [Bioperl-l] Failed tests for FeatureIO In-Reply-To: <4517A4C8.50004@infotech.monash.edu.au> References: <4517A4C8.50004@infotech.monash.edu.au> Message-ID: <4517D116.5080308@sendu.me.uk> Torsten Seemann wrote: > Chris, > >> I am getting failed tests for FeatureIO.t on make test using bioperl- >> live on Mac OS X (10.4.7, perl 5.8.6): >> pyr:~/src/bioperl-live cjfields$ perl -I. -w t/FeatureIO.t >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Invalid protein count >> STACK: Error::throw >> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:331 >> STACK: Bio::FeatureIO::ptt::_initialize Bio/FeatureIO/ptt.pm:147 >> STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:266 >> STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:286 >> STACK: t/FeatureIO.t:189 >> ----------------------------------------------------------- > > Hmmm, I committed that module last week. I just did a fresh complete checkout > of current CVS, and all the tests pass fine for me on Linux 2.6 + Perl 5.8.5 ? They also pass for me under linux, but I confirm the problem under Mac OS X 10.4.7 perl 5.8.6. From n.haigh at sheffield.ac.uk Mon Sep 25 08:56:34 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon, 25 Sep 2006 13:56:34 +0100 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <2D3CE620-90DD-492F-A2CE-340A0B7D7259@gmx.net> References: <2D3CE620-90DD-492F-A2CE-340A0B7D7259@gmx.net> Message-ID: <4517D202.6030308@sheffield.ac.uk> Hilmar Lapp wrote: > On Sep 24, 2006, at 11:32 AM, Brian Osborne wrote: > > >> Did you test the latest version of bioperl-network? Perhaps part of >> the >> issue is that one doesn't just require Graph, you have to have at >> least v. >> 70 of Graph. Let me amend the *t's... >> > > BTW if you have a minimum version requirement that should go into the > module 'use' as well, not just the test script. Maybe you had done > that anyway already? > > -hilmar > And also specify the minimum version in Makefile.PL Nath From bix at sendu.me.uk Mon Sep 25 08:47:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 25 Sep 2006 13:47:44 +0100 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <451793CD.3060504@sendu.me.uk> References: <451793CD.3060504@sendu.me.uk> Message-ID: <4517CFF0.9070904@sendu.me.uk> Sendu Bala wrote: > Brian Osborne wrote: >> Let's take these one at a time, starting with t/Graph-MD5 (I'm not seeing >> any of these errors so you'll have to test, sorry about that). I've changed >> 'eval { require Graph; };' to 'eval { require Graph::Undirected; };'. Does >> this help? > > I compared my copy of Graph.pm (my command-line CPAN claims that 0.78 is > the latest version) to the one I could find via search.cpan.org (v0.80) > and they're nothing alike. Mine is just Well, clearly 0.78 has issues so you might want to mention that in the documentation. Apparently the UK CPAN mirrors at least are out of date; I've removed mirrors from my CPAN urllist to get things direct from perl.org and now have 0.80 installed: make test PERL_DL_NONLAZY=1 /usr/bin/perl5.8.8 "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/Edge...........ok t/Graph-MD5......ok t/Graph-Seq......ok t/Interaction....ok t/IO_dip_tab.....ok t/IO_psi.........ok 1/21# Test 7 got: "pylori 26695" (t/IO_psi.t at line 60) # Expected: "Helicobacter pylori 26695" # t/IO_psi.t line 60 is: ok $proteins[0]->species->binomial,"Helicobacter pylori 26695"; t/IO_psi.........NOK 7# Test 16 got: "virus 40" (t/IO_psi.t at line 81) # Expected: "Simian virus 40" # t/IO_psi.t line 81 is: ok $proteins[0]->species->binomial,"Simian virus 40"; t/IO_psi.........FAILED tests 7, 16 Failed 2/21 tests, 90.48% okay t/Node...........ok t/ProteinNet.....ok Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/IO_psi.t 21 2 9.52% 7 16 Failed 1/8 test scripts, 87.50% okay. 2/296 subtests failed, 99.32% okay. make: *** [test_dynamic] Error 255 So this is 'just' another taxonomy issue. I'll look into it. Do you/anyone want to see network tagged and distributed alongside core 1.5.2? From n.haigh at sheffield.ac.uk Mon Sep 25 09:47:53 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon, 25 Sep 2006 14:47:53 +0100 Subject: [Bioperl-l] Bio::SEarchIO::blastxml Message-ID: <4517DE09.8090808@sheffield.ac.uk> I'm trying to parse a blastxml file i've generated locally. I used a FASTA file containing multiple sequences as input to blastall 2.2.14 out output the results to an xml file. When using SearchIO to parse the results, I can't seem to get just hit data pertaining to a single query seq, I just seem to get all the hits from all the queries without knowing which hit is for which query seq. Any ideas what might be going on? My code is below. Thanks Nathan 1#!/usr/bin/perl -w 2 3 use strict; 4 5 use Bio::SearchIO; 6 use Data::Dumper; 7 8 my $in = new Bio::SearchIO( 9 -format => 'blastxml', 10 -file => $ARGV[0], 11 ); 12 13 while( my $result = $in->next_result ) { 14 # 15 while( my $hit = $result->next_hit ) { 16 while( my $hsp = $hit->next_hsp ) { 17 print "Hit= ", $hit->name, 18 ",E-Value=", $hsp->evalue, 19 ",Length=", $hsp->length('total'), 20 ",Percent_id=", $hsp->percent_identity, "\n"; 21 } 22 #last; # only look at first hit for each query 23 } 24 } From osborne1 at optonline.net Mon Sep 25 10:09:07 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 25 Sep 2006 10:09:07 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <2D3CE620-90DD-492F-A2CE-340A0B7D7259@gmx.net> Message-ID: Hilmar, Yes I have but I'm puzzled since it doesn't seem to make a difference. Example, I put this into the key module Bio::Network::ProteinNet: use Graph .90; Now there is no version .90, the latest is .80. I've even run 'make install' to assure that all versions of this module on my machine bear this statement yet still all the tests run fine. Same thing with the test code, it doesn't seem to matter what version I require, existing or non-existent. How is this supposed to work? And it's not that Graph.pm doesn't state the version: ~/bioperl-network>grep -i version /Library/Perl/5.8.6/Graph.pm use vars qw($VERSION); $VERSION = '0.80'; Brian O. On 9/25/06 8:25 AM, "Hilmar Lapp" wrote: > BTW if you have a minimum version requirement that should go into the > module 'use' as well, not just the test script. Maybe you had done > that anyway already? From osborne1 at optonline.net Mon Sep 25 10:19:45 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 25 Sep 2006 10:19:45 -0400 Subject: [Bioperl-l] Package status for 1.5.2 In-Reply-To: <91604478-4451-4C3F-976A-8926B245CA6C@gmx.net> Message-ID: Hilmar, I suspect this is the consequence of the test script I added, t/16obda.t. In order for the OBDA queries to work you must insert some records, then you must commit. Run this script twice and you'll see warnings about unique constraints violated. All the other bioperl-db tests will insert but never commit since these other tests only need to query the object layer. In addition, removing the persisting objects does not remove the records (these are my general impressions, I'll be corrected if I'm wrong). So, is there a way to delete records using the elegant bioperl-db API? Or shall I mess with raw SQL? Brian O. On 9/25/06 8:30 AM, "Hilmar Lapp" wrote: >> Test suite passes first time, but subsequently warnings and failures >> seem to arrise due to things stored in DB by test scritps not being >> removed from DB (even though the test script tries to). From n.haigh at sheffield.ac.uk Mon Sep 25 10:36:19 2006 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Mon, 25 Sep 2006 15:36:19 +0100 Subject: [Bioperl-l] Circular dependency? Message-ID: <4517E963.20303@sheffield.ac.uk> I've just come across this circular dependency: Bioperl optionally requires Bio::ASN1::EntrezGene and Bio::ASN1::EntrezGene::Indexer requires Bio::Index::AbstractSeq I don't know much about circular dependencies so thought I'd post it to the list, could this be a problem at all? I'm generating PPD files, so this would cause a circular dependency. Therefore, I excluded Bio::Index::AbstractSeq as a dependency for the Bio::ASN1::EntrezGene PPD. Cheers, Nathan From cjfields at uiuc.edu Mon Sep 25 10:46:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 25 Sep 2006 09:46:08 -0500 Subject: [Bioperl-l] Circular dependency? In-Reply-To: <4517E963.20303@sheffield.ac.uk> Message-ID: <001701c6e0b1$56e6fe50$15327e82@pyrimidine> I never saw it cause issues when installing under Mac OS X or Windows (using nmake, not PPM). I know that Bio::ASN1::EntrezGene is one of the modules tested for when 'perl Makefile.PL' is run, but I'm not sure it is required. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nathan Haigh > Sent: Monday, September 25, 2006 9:36 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Circular dependency? > > I've just come across this circular dependency: > > Bioperl optionally requires Bio::ASN1::EntrezGene > and Bio::ASN1::EntrezGene::Indexer requires Bio::Index::AbstractSeq > > I don't know much about circular dependencies so thought I'd post it to > the list, could this be a problem at all? > > I'm generating PPD files, so this would cause a circular dependency. > Therefore, I excluded Bio::Index::AbstractSeq as a dependency for the > Bio::ASN1::EntrezGene PPD. > > Cheers, > Nathan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Mon Sep 25 11:02:25 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 25 Sep 2006 16:02:25 +0100 Subject: [Bioperl-l] Failed tests for FeatureIO In-Reply-To: <4517A4C8.50004@infotech.monash.edu.au> References: <4517A4C8.50004@infotech.monash.edu.au> Message-ID: <4517EF81.5010106@sendu.me.uk> Torsten Seemann wrote: > Chris, > >> I am getting failed tests for FeatureIO.t on make test using bioperl- >> live on Mac OS X (10.4.7, perl 5.8.6): >> pyr:~/src/bioperl-live cjfields$ perl -I. -w t/FeatureIO.t >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Invalid protein count >> STACK: Error::throw >> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:331 >> STACK: Bio::FeatureIO::ptt::_initialize Bio/FeatureIO/ptt.pm:147 >> STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:266 >> STACK: Bio::FeatureIO::new Bio/FeatureIO.pm:286 >> STACK: t/FeatureIO.t:189 >> ----------------------------------------------------------- > > Hmmm, I committed that module last week. I just did a fresh complete checkout > of current CVS, and all the tests pass fine for me on Linux 2.6 + Perl 5.8.5 ? The problem is your use of Bio::Root::IO::mode. Your's is the only module in bioperl that seems to use it, and for good reason: it was broken. I've fixed it and tested under Linux and Mac OS X, not tried Windows but hopefully that's ok too. (Briefly, mode() would call IO::Handle::getline which would return a seemingly random (but consistent per platform) line from the file, and then mode() would happily push that random line into the buffer. Worse, under Mac OS X, getline