From p.j.a.cock at googlemail.com Wed Jul 1 03:44:12 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Jul 2009 08:44:12 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> Hi all (BioPerl and Biopython), This is a continuation of a long thread on the BioPerl mailing list, which I have now CC'd to the Biopython mailing list. See: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html On this thread we have been discussing next gen sequencing tools and co-coordinating things like consistent file format naming between Biopython, BioPerl and EMBOSS. I've been chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, and he will look into setting up a cross project mailing list for this kind of discussion in future. In the mean time, my replies to Giles below cover both BioPerl and Biopython (and EMBOSS). Giles' original email is here: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html Peter On 6/30/09, Giles Weaver wrote: > > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters > (and poor quality sequence) from Illumina reads. > > My current Illumina sequence processing pipeline is an unholy mix of > biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting > the Illumina fastq to Sanger fastq, bioperl to read the quality values, > pure perl to trim the poor quality sequence from each read, and bioperl > with emboss to remove the adapter sequence. I'm aware that the pipeline > contains bugs and would like to simplify it, but at least it does work... > > Ideally I'd like to replace as much of the pipeline as possible with > bioperl/bioperl-run, but this isn't currently possible due to both a lack > of features and poor performance. I'm sure the features will come with > time, but the performance is more of a concern to me. .. I gather you would rather work with (Bio)Perl, but since you are already using Biopython to do the FASTQ conversion, you could also use it for more of your pipe line. Our tutorial includes examples of simple FASTQ quality filtering, and trimming of primer sequences (something like this might be helpful for removing adaptors). See: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Alternatively, with the new release of EMBOSS this July, you will also be able to do the Illumina FASTQ to Sanger standard FASTQ with EMBOSS, and I'm sure BioPerl will offer this soon too. > Regarding trimming bad quality bases (see comments from > Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed > pure/bioperl solution to be much faster than a primarily bioperl > based implementation. I found Bio::Seq->subseq(a,b) and > Bio::Seq->subqual(a,b) to be far too slow. My current code trims > ~1300 sequences/second, including unzipping the raw data and > converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. There are several ways of doing quality trimming, and it would make an excellent cookbook example (both for BioPerl and Biopython). Could you go into a bit more detail about your trimming algorithm? e.g. Do you just trim any bases on the right below a certain threshold, perhaps with a minimum length to retain the trimmed read afterwards? > Hope this looooong post was of interest to someone! I was interested at least ;) Peter From cjfields at illinois.edu Wed Jul 1 08:35:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Jul 2009 07:35:14 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> Message-ID: <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> Peter, I just committed a fix to FASTQ parsing last night to support read/ write for Sanger/Solexa/Illumina following the biopython convention; the only thing needed is more extensive testing for the quality scores. There are a few other oddities with it I intend to address soon, but it appears to be working. The Seq instance iterator actually calls a raw data iterator (hash refs of named arguments to the class constructor). That should act as a decent filtering step if needed. We have automated EMBOSS wrapping but I'm not sure how intuitive it is; we can probably reconfigure some of that. chris On Jul 1, 2009, at 2:44 AM, Peter Cock wrote: > Hi all (BioPerl and Biopython), > > This is a continuation of a long thread on the BioPerl mailing > list, which I have now CC'd to the Biopython mailing list. See: > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html > > On this thread we have been discussing next gen sequencing > tools and co-coordinating things like consistent file format > naming between Biopython, BioPerl and EMBOSS. I've been > chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, > and he will look into setting up a cross project mailing list for > this kind of discussion in future. > > In the mean time, my replies to Giles below cover both BioPerl > and Biopython (and EMBOSS). Giles' original email is here: > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html > > Peter > > On 6/30/09, Giles Weaver wrote: >> >> I'm developing a transcriptomics database for use with next-gen >> data, and >> have found processing the raw data to be a big hurdle. >> >> I'm a bit late in responding to this thread, so most issues have >> already >> been discussed. One thing that hasn't been mentioned is removal of >> adapters >> from raw Illumina sequence. This is a PITA, and I'm not aware of >> any well >> developed and documented open source software for removal of adapters >> (and poor quality sequence) from Illumina reads. >> >> My current Illumina sequence processing pipeline is an unholy mix of >> biopython, bioperl, pure perl, emboss and bowtie. Biopython for >> converting >> the Illumina fastq to Sanger fastq, bioperl to read the quality >> values, >> pure perl to trim the poor quality sequence from each read, and >> bioperl >> with emboss to remove the adapter sequence. I'm aware that the >> pipeline >> contains bugs and would like to simplify it, but at least it does >> work... >> >> Ideally I'd like to replace as much of the pipeline as possible with >> bioperl/bioperl-run, but this isn't currently possible due to both >> a lack >> of features and poor performance. I'm sure the features will come >> with >> time, but the performance is more of a concern to me. .. > > I gather you would rather work with (Bio)Perl, but since you are > already using Biopython to do the FASTQ conversion, you could > also use it for more of your pipe line. Our tutorial includes examples > of simple FASTQ quality filtering, and trimming of primer sequences > (something like this might be helpful for removing adaptors). See: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Alternatively, with the new release of EMBOSS this July, you will > also be able to do the Illumina FASTQ to Sanger standard FASTQ > with EMBOSS, and I'm sure BioPerl will offer this soon too. > >> Regarding trimming bad quality bases (see comments from >> Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed >> pure/bioperl solution to be much faster than a primarily bioperl >> based implementation. I found Bio::Seq->subseq(a,b) and >> Bio::Seq->subqual(a,b) to be far too slow. My current code trims >> ~1300 sequences/second, including unzipping the raw data and >> converting it to sanger fastq with biopython. Processing an entire >> sequencing run with the whole pipeline takes in the region of 6-12h. > > There are several ways of doing quality trimming, and it would > make an excellent cookbook example (both for BioPerl and > Biopython). > > Could you go into a bit more detail about your trimming > algorithm? e.g. Do you just trim any bases on the right below > a certain threshold, perhaps with a minimum length to retain > the trimmed read afterwards? > >> Hope this looooong post was of interest to someone! > > I was interested at least ;) > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Jonathan_Epstein at nih.gov Wed Jul 1 09:20:50 2009 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Wed, 01 Jul 2009 09:20:50 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <4A4B62B2.3090502@nih.gov> I too am interested in these topics. In particular, I would like to learn more about "sequencing adapter removal," i.e. what these adapters look like, and what strategies you've employed for finding and removing them. Jonathan Giles Weaver wrote: > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters (and > poor quality sequence) from Illumina reads. > > My current Illumina sequence processing pipeline is an unholy mix of > biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting > the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure > perl to trim the poor quality sequence from each read, and bioperl with > emboss to remove the adapter sequence. I'm aware that the pipeline contains > bugs and would like to simplify it, but at least it does work... > > Ideally I'd like to replace as much of the pipeline as possible with > bioperl/bioperl-run, but this isn't currently possible due to both a lack of > features and poor performance. I'm sure the features will come with time, > but the performance is more of a concern to me. I wonder if Bio::Moose might > be used to alleviate some of the performance issues? Might next-gen modules > be an ideal guinea pig for Bio::Moose? > > For my purposes the tools that would love to see supported in > bioperl/bioperl-run are: > > - next-gen sequence quality parsing (to output phred scores) > - sequence quality based trimming > - sequencing adapter removal > - filtering based on sequence complexity (repeats, entropy etc) > - bioperl-run modules for bowtie etc. > > Obviously all of these need to be fast! > I'd love to muck in, but I doubt I'll contribute much before > Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares! > > Regarding trimming bad quality bases (see comments from Tristan Lefebure) > from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be > much faster than a primarily bioperl based implementation. I found > Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My > current code trims ~1300 sequences/second, including unzipping the raw data > and converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. > > Hope this looooong post was of interest to someone! > > Giles > > 2009/6/17 Tristan Lefebure > > >> Hello, >> Regarding next-gen sequences and bioperl, following my >> experience, another issue is bioperl speed. For example, if >> you want to trim bad quality bases at ends of 1E6 Solexa >> reads using Bio::SeqIO::fastq and some methods in >> Bio::Seq::Quality, well, you've got to be patient (but may >> be I missed some shortcuts...). >> >> A pure perl solution will be between 100 to 1000x faster... >> Would it be possible to have an ultra-light quality object >> with few simple methods for next-gen reads? >> >> I can contribute some tests if that sounds like an important >> point. >> >> -Tristan >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Jul 1 09:42:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Jul 2009 09:42:23 -0400 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: References: Message-ID: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> You guys earned your scrap: http://www.bioperl.org/wiki/Random_sequence_generation cheers and thanks! MAJ ----- Original Message ----- From: "Roger Hall" To: Sent: Friday, June 26, 2009 2:28 AM Subject: [Bioperl-l] Random nucleotide string generator? > All, > > Is there a random generator for creating nucleotides (of length l with > composition frequencies a, c, g, and t) in there somewhere? > > I noticed a thread about it from 2000 and nothing since (searching for "random > sequence"). > > If not - what should the namespace be for such a module should it be undone > and desirable? > > TIA! > > Roger > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocarnorsk138 at gmail.com Wed Jul 1 10:30:47 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 1 Jul 2009 10:30:47 -0400 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> References: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> Message-ID: Thanks for the add to the wiki Mark. Cheers. O'car Campos C. Bioinformatics Engineering Student. University of Talca. 2009/7/1 Mark A. Jensen > You guys earned your scrap: > http://www.bioperl.org/wiki/Random_sequence_generation > > cheers and thanks! MAJ > ----- Original Message ----- From: "Roger Hall" > To: > Sent: Friday, June 26, 2009 2:28 AM > Subject: [Bioperl-l] Random nucleotide string generator? > > > > All, >> >> Is there a random generator for creating nucleotides (of length l with >> composition frequencies a, c, g, and t) in there somewhere? >> >> I noticed a thread about it from 2000 and nothing since (searching for >> "random sequence"). >> >> If not - what should the namespace be for such a module should it be >> undone and desirable? >> >> TIA! >> >> Roger >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From giles.weaver at googlemail.com Wed Jul 1 12:27:22 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Wed, 1 Jul 2009 17:27:22 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> Message-ID: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Peter, the trimming algorithm I use employs a sliding window, as follows: - For each sequence position calculate the mean phred quality score for a window around that position. - Record whether the mean score is above or below a threshold as an array of zeros and ones. - Use a regular expression on the joined array to find the start and end of the good quality sequence(s). - Extract the quality sequence(s) and replace any bases below the quality threshold with N. - Trim any Ns from the ends. A refinement would be to weight the scores from positions in the window, but this could give a performance hit, and the method seems to work well enough as is. Chris, thanks for committing the fix, I'll give bioperl illumina fastq parsing a workout soon. Peter, as much as I'd love to help out with biopython, I'm under too much time pressure right now! Jonathan, some of the Illumina sequencing adapters are listed at http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland http://seqanswers.com/forums/showthread.php?t=198 Adapter sequence typically appears towards the end of the read, though the latter part of it is often misread as the sequencing quality drops off. I abuse needle (EMBOSS) into aligning the adapter sequence with each read. I then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify real alignments and trim the sequence. This is not the ideal way of doing things, but it's fast enough, and does seem to work. The adapter sequence shouldn't be gapped, so I'm sure there is a lot of scope for optimising the adapter removal. I'll happily share some code once I've got it to the stage where I'm not embarrassed by it! Giles 2009/7/1 Chris Fields > Peter, > > I just committed a fix to FASTQ parsing last night to support read/write > for Sanger/Solexa/Illumina following the biopython convention; the only > thing needed is more extensive testing for the quality scores. There are a > few other oddities with it I intend to address soon, but it appears to be > working. > > The Seq instance iterator actually calls a raw data iterator (hash refs of > named arguments to the class constructor). That should act as a decent > filtering step if needed. > > We have automated EMBOSS wrapping but I'm not sure how intuitive it is; we > can probably reconfigure some of that. > > chris > > > On Jul 1, 2009, at 2:44 AM, Peter Cock wrote: > > Hi all (BioPerl and Biopython), >> >> This is a continuation of a long thread on the BioPerl mailing >> list, which I have now CC'd to the Biopython mailing list. See: >> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html >> >> On this thread we have been discussing next gen sequencing >> tools and co-coordinating things like consistent file format >> naming between Biopython, BioPerl and EMBOSS. I've been >> chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, >> and he will look into setting up a cross project mailing list for >> this kind of discussion in future. >> >> In the mean time, my replies to Giles below cover both BioPerl >> and Biopython (and EMBOSS). Giles' original email is here: >> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html >> >> Peter >> >> On 6/30/09, Giles Weaver wrote: >> >>> >>> I'm developing a transcriptomics database for use with next-gen data, and >>> have found processing the raw data to be a big hurdle. >>> >>> I'm a bit late in responding to this thread, so most issues have already >>> been discussed. One thing that hasn't been mentioned is removal of >>> adapters >>> from raw Illumina sequence. This is a PITA, and I'm not aware of any well >>> developed and documented open source software for removal of adapters >>> (and poor quality sequence) from Illumina reads. >>> >>> My current Illumina sequence processing pipeline is an unholy mix of >>> biopython, bioperl, pure perl, emboss and bowtie. Biopython for >>> converting >>> the Illumina fastq to Sanger fastq, bioperl to read the quality values, >>> pure perl to trim the poor quality sequence from each read, and bioperl >>> with emboss to remove the adapter sequence. I'm aware that the pipeline >>> contains bugs and would like to simplify it, but at least it does work... >>> >>> Ideally I'd like to replace as much of the pipeline as possible with >>> bioperl/bioperl-run, but this isn't currently possible due to both a lack >>> of features and poor performance. I'm sure the features will come with >>> time, but the performance is more of a concern to me. .. >>> >> >> I gather you would rather work with (Bio)Perl, but since you are >> already using Biopython to do the FASTQ conversion, you could >> also use it for more of your pipe line. Our tutorial includes examples >> of simple FASTQ quality filtering, and trimming of primer sequences >> (something like this might be helpful for removing adaptors). See: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html >> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf >> >> Alternatively, with the new release of EMBOSS this July, you will >> also be able to do the Illumina FASTQ to Sanger standard FASTQ >> with EMBOSS, and I'm sure BioPerl will offer this soon too. >> >> Regarding trimming bad quality bases (see comments from >>> Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed >>> pure/bioperl solution to be much faster than a primarily bioperl >>> based implementation. I found Bio::Seq->subseq(a,b) and >>> Bio::Seq->subqual(a,b) to be far too slow. My current code trims >>> ~1300 sequences/second, including unzipping the raw data and >>> converting it to sanger fastq with biopython. Processing an entire >>> sequencing run with the whole pipeline takes in the region of 6-12h. >>> >> >> There are several ways of doing quality trimming, and it would >> make an excellent cookbook example (both for BioPerl and >> Biopython). >> >> Could you go into a bit more detail about your trimming >> algorithm? e.g. Do you just trim any bases on the right below >> a certain threshold, perhaps with a minimum length to retain >> the trimmed read afterwards? >> >> Hope this looooong post was of interest to someone! >>> >> >> I was interested at least ;) >> >> Peter >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From cjfields at illinois.edu Wed Jul 1 12:46:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Jul 2009 11:46:49 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Message-ID: <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: ... > Peter, the trimming algorithm I use employs a sliding window, as > follows: > > - For each sequence position calculate the mean phred quality > score for a > window around that position. > - Record whether the mean score is above or below a threshold as > an array > of zeros and ones. > - Use a regular expression on the joined array to find the start > and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the > quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the > window, but > this could give a performance hit, and the method seems to work well > enough > as is. > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! Just let me know if the qual values match up with what is expected. You can also iterate through the data with hashrefs using next_dataset (faster than objects). This is from the fastq tests in core: ----------------------------------------- $in_qual = Bio::SeqIO->new(-file => test_input_file('fastq','test3_illumina.fastq'), -variant => 'illumina', -format => 'fastq'); $qual = $in_qual->next_dataset(); isa_ok($qual, 'HASH'); is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); is($qual->{-id}, 'FC12044_91407_8_200_406_24'); is($qual->{-desc}, ''); is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); is(join(',',@{$qual->{-qual}}[0..10]), '19,24,24,20,24,24,24,24,24,24,24'); ----------------------------------------- So one could check those values directly and then filter them through as needed directly into Bio::Seq::Quality if necessary (note some of the key values are constructor args): my $qualobj = Bio::Seq::Quality->new(%$qual); chris From gmodhelp at googlemail.com Wed Jul 1 13:38:14 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Wed, 1 Jul 2009 10:38:14 -0700 Subject: [Bioperl-l] August 2009 GMOD Meeting In-Reply-To: <71ee57c70907011037o574666f9k8af120c04b2ea54c@mail.gmail.com> References: <71ee57c70907011032k25daa9cche0f4778e1c2c0093@mail.gmail.com> <71ee57c70907011036w49b9c144qbe04fcd8d8d1d7d0@mail.gmail.com> <71ee57c70907011037o574666f9k8af120c04b2ea54c@mail.gmail.com> Message-ID: <71ee57c70907011038u7bf75f00x7e486cb1b8a00e35@mail.gmail.com> Hello all, The next GMOD meeting will be held 6-7 August, at the University of Oxford, in Oxford, United Kingdom. Registration is now open. Space is available on a first come, first served basis and there is room for 55 attendees. The meeting cost is ?50. ?See http://gmod.org/wiki/August_2009_GMOD_Meeting to register As with previous GMOD meetings, this meeting will have a mixture of project, component, and user talks. The agenda is driven by attendee suggestions, and you are encouraged to add your suggestions now (see http://gmod.org/wiki/August_2009_GMOD_Meeting#Agenda_Suggestions). For examples of what happens at a GMOD meeting, see the writeups of the January 2009, July 2008, or any other previous meeting (see http://gmod.org/wiki/Meetings). GMOD meetings are an excellent way to meet other GMOD developers and users and to learn (and affect) what's coming in the project. Please join us in Oxford this August, Dave Clements GMOD Help Desk Note: Unless you have applied to and been admitted to the Summer School, don't you dare register for it. The registration web site will let you do this, but bureaucratic hellishness will ensue. -- * Learn more about GMOD at: ISMB/ECCB: http://www.iscb.org/ismbeccb2009/ ? (BioMart, Chado, Galaxy, InterMine) * Please keep responses on the list! * Was this helpful? ?Let us know at http://gmod.org/wiki/Help_Desk_Feedback From p.j.a.cock at googlemail.com Thu Jul 2 03:20:07 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 2 Jul 2009 08:20:07 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Message-ID: <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> On 7/1/09, Giles Weaver wrote: > Peter, the trimming algorithm I use employs a sliding window, as follows: > > - For each sequence position calculate the mean phred quality score for a > window around that position. > - Record whether the mean score is above or below a threshold as an array > of zeros and ones. > - Use a regular expression on the joined array to find the start and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the window, but > this could give a performance hit, and the method seems to work well enough > as is. Thanks for the details - that is a bit more complex that what I had been thinking. Do you have any favoured window size and quality threshold, or does this really depend on the data itself? Also, if you find a sequence read that goes "good - poor - good" for example, do you extract the two good regions as two sub reads (presumably with a minimum length)? This may be silly for Illumina where the reads are very short, but might make sense for Roche 454. > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! Even use cases are useful - so thank you. > Jonathan, some of the Illumina sequencing adapters are listed at > http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland > http://seqanswers.com/forums/showthread.php?t=198 > Adapter sequence typically appears towards the end of the read, though the > latter part of it is often misread as the sequencing quality drops off. > I abuse needle (EMBOSS) into aligning the adapter sequence with each read. I > then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify > real alignments and trim the sequence. This is not the ideal way of doing > things, but it's fast enough, and does seem to work. The adapter sequence > shouldn't be gapped, so I'm sure there is a lot of scope for optimising the > adapter removal. > > I'll happily share some code once I've got it to the stage where I'm not > embarrassed by it! > > Giles Cheers, Peter From florian.mittag at uni-tuebingen.de Thu Jul 2 05:28:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 2 Jul 2009 11:28:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl Message-ID: <200907021128.21239.florian.mittag@uni-tuebingen.de> Hi! I previously posted a message on the BioSQL mailinglist regarding a BioSQL schema for DB2 and we are several steps closer to completion now. We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to fill our DB2 database with taxonomy data, but loading the gene ontology with BioPerl's "load_ontology.pl" is somewhat harder. We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents of the Oracle package into it. Then we changed the (what we thought) appropriate methods whenever we encountered an error, but now we are a bit frustrated. We execute the command: perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user --dbpass passwd --namespace "Gene Ontology" --format obo --debug gene_ontology.1_2.obo It first ran a few minutes processing the file and then died after the following SQL-command was prepared and executed: "SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" I don't know if the "NULL" column is supposed to be there, but DB2 doesn't like it. After ours of digging into the code, I gave up and simply commented out the line that added the NULL column in Bio::DB::BioSQL::BaseDriver::_build_select_list ... if((! $attr) || (! $entitymap->{$tbl}) || $dont_select_attrs->{$tbl .".". $attr}) { # push(@attrs, "NULL"); } else { ... The script completed with a few warnings, like: "no adaptor found for class Bio::Annotation::TypeManager" or "-------------------- WARNING --------------------- MSG: PMID:15012271 exists in the dblink of _default" so we don't know, if it really worked. Since removing this one line will probably break compatibility with other databases, it is not a real solution and we would appreciate any hints pointing us to the real cause. We would really like to contribute to the BioPerl project by adding DB2 support, but we need some help here, since none of us has experience with either Perl or BioPerl ;-) Keep up the good work! Regards, Florian -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From jonathancrabtree at gmail.com Thu Jul 2 09:23:54 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Thu, 2 Jul 2009 09:23:54 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021128.21239.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> Message-ID: <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> Hi Florian, Just based on what's in your e-mail, it looks as though BioSQL *wants* a NULL value to come back as the 6th column of every row in the result of that query. So by removing it you run the risk that BioSQL is going to retrieve the wrong values from the query result, at least for those columns after the 6th (and assuming all the columns are retrieved by position--it's not entirely clear if this is the case.) I'd be inclined to throw in a test here to see if the backend is DB2 and, if so, substitute the appropriate syntax instead of "NULL". I'm not sure what that syntax is, but a bit of web searching suggests that you might be able to select the value from a dummy table (this might be more difficult because it would require non-local code changes -- this method is only for the select list) or use a function called "nullif" with appropriately-chosen arguments. Another comment I saw suggested that using "NULL" was OK but it has to be coerced/typecast into the right type. Jonathan On Thu, Jul 2, 2009 at 5:28 AM, Florian Mittag < florian.mittag at uni-tuebingen.de> wrote: > Hi! > > I previously posted a message on the BioSQL mailinglist regarding a BioSQL > schema for DB2 and we are several steps closer to completion now. > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to > fill > our DB2 database with taxonomy data, but loading the gene ontology with > BioPerl's "load_ontology.pl" is somewhat harder. > > We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents of > the Oracle package into it. Then we changed the (what we thought) > appropriate > methods whenever we encountered an error, but now we are a bit frustrated. > > We execute the command: > perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user > --dbpass passwd --namespace "Gene Ontology" > --format obo --debug gene_ontology.1_2.obo > > It first ran a few minutes processing the file and then died after the > following SQL-command was prepared and executed: > > "SELECT term.term_id, term.identifier, term.name, term.definition, > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" > > I don't know if the "NULL" column is supposed to be there, but DB2 doesn't > like it. After ours of digging into the code, I gave up and simply > commented > out the line that added the NULL column in > Bio::DB::BioSQL::BaseDriver::_build_select_list > > ... > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > # push(@attrs, "NULL"); > } else { > ... > > The script completed with a few warnings, like: > "no adaptor found for class Bio::Annotation::TypeManager" > or > "-------------------- WARNING --------------------- > MSG: PMID:15012271 exists in the dblink of _default" > > so we don't know, if it really worked. Since removing this one line will > probably break compatibility with other databases, it is not a real > solution > and we would appreciate any hints pointing us to the real cause. > > > We would really like to contribute to the BioPerl project by adding DB2 > support, but we need some help here, since none of us has experience with > either Perl or BioPerl ;-) > > > Keep up the good work! > > Regards, > Florian > > > > -- > Dipl. Inf. Florian Mittag > Universit?t Tuebingen > WSI-RA, Sand 1 > 72076 Tuebingen, Germany > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florian.mittag at uni-tuebingen.de Thu Jul 2 10:52:27 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 2 Jul 2009 16:52:27 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> Message-ID: <200907021652.27472.florian.mittag@uni-tuebingen.de> Hi Jonathan, thanks for your quick answer. On Thursday 02 July 2009 15:23, Jonathan Crabtree wrote: > Just based on what's in your e-mail, it looks as though BioSQL *wants* a > NULL value to come back as the 6th column of every row in the result of > that query. So by removing it you run the risk that BioSQL is going to > retrieve the wrong values from the query result, at least for those columns > after the 6th (and assuming all the columns are retrieved by position--it's > not entirely clear if this is the case.) Well, what made me suspicious was that the returned columns were exactly the ones from the term table plus the NULL column. One way to verify this would be to look whether the same thing happens with other tables as well. > I'd be inclined to throw in a > test here to see if the backend is DB2 and, if so, substitute the > appropriate syntax instead of "NULL". I'm not sure what that syntax is, > but a bit of web searching suggests that you might be able to select the > value from a dummy table (this might be more difficult because it would > require non-local code changes -- this method is only for the select list) > or use a function called "nullif" with appropriately-chosen arguments. > Another comment I saw suggested that using "NULL" was OK but it has to be > coerced/typecast into the right type. Yeah, this was what I've found, too, but I couldn't figure out what was the right type to cast to. Unfortunately, now that the database is filled (hopefully correctly), the script gives me a different error message and I don't know if it is because of a change I made or because the database is not empty. Originally, I was struggling with Hibernate and I'm back to it again (damn CLOBs...), so I am happy to have a seemingly correct database to work with. I'm pretty confident that I can write a working DB2 driver for BioPerl, but for that I should start from scratch instead of copying the MySQL one and modifying it until all error messages disappear. And this would take far too much time, if I'm doing this by trial and error. Is there any developers guide that would help to find out what methods I have to override to implement database specific stuff? Thanks, Florian > On Thu, Jul 2, 2009 at 5:28 AM, Florian Mittag > wrote: > > Hi! > > > > I previously posted a message on the BioSQL mailinglist regarding a > > BioSQL schema for DB2 and we are several steps closer to completion now. > > > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to > > fill > > our DB2 database with taxonomy data, but loading the gene ontology with > > BioPerl's "load_ontology.pl" is somewhat harder. > > > > We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents > > of the Oracle package into it. Then we changed the (what we thought) > > appropriate > > methods whenever we encountered an error, but now we are a bit > > frustrated. > > > > We execute the command: > > perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user > > --dbpass passwd --namespace "Gene Ontology" > > --format obo --debug gene_ontology.1_2.obo > > > > It first ran a few minutes processing the file and then died after the > > following SQL-command was prepared and executed: > > > > "SELECT term.term_id, term.identifier, term.name, term.definition, > > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" > > > > I don't know if the "NULL" column is supposed to be there, but DB2 > > doesn't like it. After ours of digging into the code, I gave up and > > simply commented > > out the line that added the NULL column in > > Bio::DB::BioSQL::BaseDriver::_build_select_list > > > > ... > > if((! $attr) || (! $entitymap->{$tbl}) || > > $dont_select_attrs->{$tbl .".". $attr}) { > > # push(@attrs, "NULL"); > > } else { > > ... > > > > The script completed with a few warnings, like: > > "no adaptor found for class Bio::Annotation::TypeManager" > > or > > "-------------------- WARNING --------------------- > > MSG: PMID:15012271 exists in the dblink of _default" > > > > so we don't know, if it really worked. Since removing this one line will > > probably break compatibility with other databases, it is not a real > > solution > > and we would appreciate any hints pointing us to the real cause. > > > > > > We would really like to contribute to the BioPerl project by adding DB2 > > support, but we need some help here, since none of us has experience with > > either Perl or BioPerl ;-) > > > > > > Keep up the good work! > > > > Regards, > > Florian > > > > > > > > -- > > Dipl. Inf. Florian Mittag > > Universit?t Tuebingen > > WSI-RA, Sand 1 > > 72076 Tuebingen, Germany > > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From jonathancrabtree at gmail.com Thu Jul 2 11:39:32 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Thu, 2 Jul 2009 11:39:32 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021652.27472.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> <200907021652.27472.florian.mittag@uni-tuebingen.de> Message-ID: <8e5b8bf80907020839y15e97aedldd0ebadd1fddae69@mail.gmail.com> Hi Florian, On Thu, Jul 2, 2009 at 10:52 AM, Florian Mittag wrote: > Well, what made me suspicious was that the returned columns were exactly the > ones from the term table plus the NULL column. One way to verify this would > be to look whether the same thing happens with other tables as well. Others may disagree, but I think it's fairly clear (from just looking at the subroutine you mentioned) that the inclusion of the NULL value is most definitely deliberate; note that it is only done if $entitymap doesn't have a value for the table in question, and $entitymap is described as follows: A reference to a hash table mapping entity names to aliases (if omitted, aliases will not be used, and SELECT columns can only be from one table) So I suspect what we're seeing here is a select in which aliases _aren't_ being used and therefore the order of the returned values is significant, and the NULL value is needed to keep everything in the right order for whatever piece of code is reading the result. But never having worked much with BioSQL I don't know where you'd go to find the type information needed to determine what type the NULL value needs to be coerced into... Jonathan From gummyduk at gmail.com Thu Jul 2 14:50:29 2009 From: gummyduk at gmail.com (John Tyree) Date: Thu, 2 Jul 2009 14:50:29 -0400 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage In-Reply-To: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> References: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> Message-ID: <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> I'm trying to use Bio::DB::GenBank to download a large number of files by accession number. The docs say not to do this in normal mode to reduce server load. There is some kind of helper function associated with this. %params = Bio::DB::GenBank->get_params('batch'); But I don't understand how to use it. If you pass the hash using: Bio::DB::GenBank->new(%params); it raises the following and dies: --------------------- WARNING --------------------- MSG: invalid retrieval type tool must be one of (pipeline,io_string,tempfile --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: seq_start() must be integer value if set STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::DB::NCBIHelper::seq_start /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 STACK: Bio::DB::NCBIHelper::new /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 STACK: Find_Patient_By_AccNo.pl:93 There is a deprecated method called get_Stream_by_batch() but how does one achieve batch mode using the proper get_Stream_by_id() ? Thanks, John From cjfields at illinois.edu Thu Jul 2 15:29:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Jul 2009 14:29:29 -0500 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage In-Reply-To: <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> References: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> Message-ID: <49458034-D329-4953-883B-298355513D35@illinois.edu> If you are just downloading the records to a file it might be better to retrieve the raw records using EUtilities, providing you have either the accession number or the GI. If downloading files via Bio::DB::GenBank, it requires a preparse and write to file via Bio::SeqIO. --------------------------- use Bio::DB::EUtilities; use Bio::SeqIO; my @ids = (); # your GI/acc here my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); $factory->get_Response(-file => "records.gb"); --------------------------- If you have a long lost of IDs you can use epost first, then efetch using the search history. This page has a few recipe scripts: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris On Jul 2, 2009, at 1:50 PM, John Tyree wrote: > I'm trying to use Bio::DB::GenBank to download a large number of files > by accession number. The docs say not to do this in normal mode to > reduce server load. There is some kind of helper function associated > with this. > > %params = Bio::DB::GenBank->get_params('batch'); > > But I don't understand how to use it. If you pass the hash using: > > Bio::DB::GenBank->new(%params); > > it raises the following and dies: > > --------------------- WARNING --------------------- > MSG: invalid retrieval type tool must be one of > (pipeline,io_string,tempfile > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: seq_start() must be integer value if set > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::DB::NCBIHelper::seq_start > /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 > STACK: Bio::DB::NCBIHelper::new > /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 > STACK: Find_Patient_By_AccNo.pl:93 > > There is a deprecated method called get_Stream_by_batch() but how does > one achieve batch mode using the proper get_Stream_by_id() ? > > Thanks, > John > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Thu Jul 2 19:56:07 2009 From: wrp at virginia.edu (William Pearson) Date: Thu, 2 Jul 2009 19:56:07 -0400 Subject: [Bioperl-l] Course Announcement: 2009 CSHL Computational and Comparative Genomics Deadline References: Message-ID: <610369A4-28CA-476C-A641-654A63D1FBEE@virginia.edu> Course announcement - Application deadline, July 15, 2009 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 4 - 10, 2009 Application Deadline: July 15, 2009 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Lisa Stubbs, Ph.D., University of Illinois, Urbana, IL Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include identifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. This year, there will be a special focus on metagenomics and functional prediction. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2008 course, see: http://fasta.bioch.virginia.edu/cshl/ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml Bill Pearson wrp at virginia.edu From giles.weaver at googlemail.com Fri Jul 3 11:35:00 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Fri, 3 Jul 2009 16:35:00 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> Message-ID: <1d06cd5d0907030835w14407249l5b47db8893820816@mail.gmail.com> Regarding the trimming algorithm, I've been using a window size of 5, a minimum score of 20 and a minimum length of 15 with the Illumina data. In the past I have used a similar algorithm with a larger window size and much longer minimum length with sequence from ABI 3XXX machines. I imagine that the ideal parameters for ABI SOLiD and Roche 454 would likely be similar to those for Illumina and Sanger sequencing respectively. Window size doesn't appear to affect performance much, if at all. For sequences with multiple good regions, I do extract all good regions. Even with the Illumina data there are sometimes two good regions, but usually the second is adapter or junk and gets filtered out later. I haven't seen quality data from a 454 machine recently, and would be interested to know if multiple good regions are commonplace in 454 data. Can anyone with access to 454 data comment on this? Giles 2009/7/2 Peter Cock > On 7/1/09, Giles Weaver wrote: > > Peter, the trimming algorithm I use employs a sliding window, as follows: > > > > - For each sequence position calculate the mean phred quality score > for a > > window around that position. > > - Record whether the mean score is above or below a threshold as an > array > > of zeros and ones. > > - Use a regular expression on the joined array to find the start and > end > > of the good quality sequence(s). > > - Extract the quality sequence(s) and replace any bases below the > quality > > threshold with N. > > - Trim any Ns from the ends. > > > > A refinement would be to weight the scores from positions in the window, > but > > this could give a performance hit, and the method seems to work well > enough > > as is. > > Thanks for the details - that is a bit more complex that what I had been > thinking. Do you have any favoured window size and quality threshold, > or does this really depend on the data itself? > > Also, if you find a sequence read that goes "good - poor - good" for > example, do you extract the two good regions as two sub reads > (presumably with a minimum length)? This may be silly for Illumina > where the reads are very short, but might make sense for Roche 454. > > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > > parsing a workout soon. Peter, as much as I'd love to help out with > > biopython, I'm under too much time pressure right now! > > Even use cases are useful - so thank you. > > > Jonathan, some of the Illumina sequencing adapters are listed at > > > http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland > > http://seqanswers.com/forums/showthread.php?t=198 > > Adapter sequence typically appears towards the end of the read, though > the > > latter part of it is often misread as the sequencing quality drops off. > > I abuse needle (EMBOSS) into aligning the adapter sequence with each > read. I > > then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify > > real alignments and trim the sequence. This is not the ideal way of doing > > things, but it's fast enough, and does seem to work. The adapter sequence > > shouldn't be gapped, so I'm sure there is a lot of scope for optimising > the > > adapter removal. > > > > I'll happily share some code once I've got it to the stage where I'm not > > embarrassed by it! > > > > Giles > > Cheers, > > Peter > From giles.weaver at googlemail.com Fri Jul 3 11:35:20 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Fri, 3 Jul 2009 16:35:20 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> Message-ID: <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> Chris, I've just tested your Illumina/Solexa fastq parsing code and am pleased to report that I haven't encountered any issues thus far. To give an idea of the processing overhead of object instantiation, fastq parsing performance on a lowly 3GHz Core 2 Duo (using one core) is as follows: Illumina fastq with next_dataset: ~1 million sequences/minute Solexa fastq with next_dataset: ~500000 sequences/minute Illumina fastq with next_seq: ~215000 sequences/minute Solexa fastq with next_seq: ~175000 sequences/minute My quality trimming script does about 300000 sequences/minute with next_dataset, up from ~130000 sequences/minute with next_seq, so it shaves hours off the run time, thanks! Giles 2009/7/1 Chris Fields > On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: > > ... > > Peter, the trimming algorithm I use employs a sliding window, as follows: >> >> - For each sequence position calculate the mean phred quality score for a >> window around that position. >> - Record whether the mean score is above or below a threshold as an array >> of zeros and ones. >> - Use a regular expression on the joined array to find the start and end >> of the good quality sequence(s). >> - Extract the quality sequence(s) and replace any bases below the quality >> threshold with N. >> - Trim any Ns from the ends. >> >> A refinement would be to weight the scores from positions in the window, >> but >> this could give a performance hit, and the method seems to work well >> enough >> as is. >> >> Chris, thanks for committing the fix, I'll give bioperl illumina fastq >> parsing a workout soon. Peter, as much as I'd love to help out with >> biopython, I'm under too much time pressure right now! >> > > Just let me know if the qual values match up with what is expected. You > can also iterate through the data with hashrefs using next_dataset (faster > than objects). This is from the fastq tests in core: > > ----------------------------------------- > $in_qual = Bio::SeqIO->new(-file => > test_input_file('fastq','test3_illumina.fastq'), > -variant => 'illumina', > -format => 'fastq'); > > $qual = $in_qual->next_dataset(); > > isa_ok($qual, 'HASH'); > is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); > is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); > is($qual->{-id}, 'FC12044_91407_8_200_406_24'); > is($qual->{-desc}, ''); > is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); > is(join(',',@{$qual->{-qual}}[0..10]), '19,24,24,20,24,24,24,24,24,24,24'); > ----------------------------------------- > > So one could check those values directly and then filter them through as > needed directly into Bio::Seq::Quality if necessary (note some of the key > values are constructor args): > > my $qualobj = Bio::Seq::Quality->new(%$qual); > > chris > From Xianjun.Dong at bccs.uib.no Fri Jul 3 12:22:01 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Fri, 03 Jul 2009 18:22:01 +0200 Subject: [Bioperl-l] [Bio::Graphics::Panel] code reference cannot pass to -link, why? Message-ID: <4A4E3029.4020109@ii.uib.no> Hi, I have a problem while using the -link in Bio::Graphics (version 1.96): As the POD of Bio::Graphics described (http://search.cpan.org/~lds/Bio-Graphics-1.96/lib/Bio/Graphics/Panel.pm#Creating_Imagemaps), link format like: -link => 'http://www.google.com/search?q=$description' works well in my code, but the format like -link => sub { my ($feature,$panel) = @_; my $type = $feature->primary_tag; my $name = $feature->display_name; if ($primary_tag eq 'clone') { return "http://www.google.com/search?q=$name"; } else { return "http://www.yahoo.com/search?p=$name"; } does not output image map as expected. Here I attached a simple code as example for anyone who is willing to test for me: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-display_name=>'ZK154.1',-type=>'UTR'); my $trans2 = $ftr->new(-start=>100,-end=>50,-display_name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-display_name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>700,-end=>650,-display_name=>'ZK154.4',-type=>'UTR'); my @trans = ($trans1,$trans2,$trans3,$trans4); my $panel= Bio::Graphics::Panel->new(-start =>0,-length=>1050); $panel->add_track(\@trans, -glyph => 'transcript2', # This works well! #-link => 'http://www.google.com/search?q=$name', # while, the following code does not work as expected. -link => sub { my ($feature,$panel) = @_; my $type = $feature->primary_tag; my $name = $feature->display_name; if ($type eq 'CDS') { return "http://www.google.com/search?q=$name"; } else { return "http://www.yahoo.com/search?p=$name"; } } ); my $map = $panel->create_web_map("mapname"); print $map; $panel->finished(); In my test (Bioperl 1.6.0), its output is: It seems $feature->primary_tag returns 'track' (I don't know where this come from...), but not the type of features. Anyone has clue for this problem? Thanks -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From cjfields at illinois.edu Fri Jul 3 13:34:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 3 Jul 2009 12:34:25 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> Message-ID: <8A756A09-8E26-4B6A-9390-151533EDB48A@illinois.edu> No problem. Scary to see that creating an instance is 2-4x slower than a simple hash ref. Not sure there is an easy way around that; maybe we need a direct_new? The next step is to ensure this works cross-platform and get indexing (via Bio::Index::Fastq) optimized. Would be nice to get output working with the hash refs as well. chris On Jul 3, 2009, at 10:35 AM, Giles Weaver wrote: > Chris, I've just tested your Illumina/Solexa fastq parsing code and > am pleased to report that I haven't encountered any issues thus far. > > To give an idea of the processing overhead of object instantiation, > fastq parsing performance on a lowly 3GHz Core 2 Duo (using one > core) is as follows: > Illumina fastq with next_dataset: ~1 million sequences/minute > Solexa fastq with next_dataset: ~500000 sequences/minute > Illumina fastq with next_seq: ~215000 sequences/minute > Solexa fastq with next_seq: ~175000 sequences/minute > > My quality trimming script does about 300000 sequences/minute with > next_dataset, up from ~130000 sequences/minute with next_seq, so it > shaves hours off the run time, thanks! > > Giles > > 2009/7/1 Chris Fields > On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: > > ... > > > Peter, the trimming algorithm I use employs a sliding window, as > follows: > > - For each sequence position calculate the mean phred quality score > for a > window around that position. > - Record whether the mean score is above or below a threshold as an > array > of zeros and ones. > - Use a regular expression on the joined array to find the start > and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the > quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the > window, but > this could give a performance hit, and the method seems to work well > enough > as is. > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! > > Just let me know if the qual values match up with what is expected. > You can also iterate through the data with hashrefs using > next_dataset (faster than objects). This is from the fastq tests in > core: > > ----------------------------------------- > $in_qual = Bio::SeqIO->new(-file => > test_input_file('fastq','test3_illumina.fastq'), > -variant => 'illumina', > -format => 'fastq'); > > $qual = $in_qual->next_dataset(); > > isa_ok($qual, 'HASH'); > is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); > is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); > is($qual->{-id}, 'FC12044_91407_8_200_406_24'); > is($qual->{-desc}, ''); > is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); > is(join(',',@{$qual->{-qual}}[0..10]), > '19,24,24,20,24,24,24,24,24,24,24'); > ----------------------------------------- > > So one could check those values directly and then filter them > through as needed directly into Bio::Seq::Quality if necessary (note > some of the key values are constructor args): > > my $qualobj = Bio::Seq::Quality->new(%$qual); > > chris > From lskatz at gatech.edu Fri Jul 3 18:08:43 2009 From: lskatz at gatech.edu (Lee Katz) Date: Fri, 3 Jul 2009 22:08:43 +0000 (UTC) Subject: [Bioperl-l] chromatogram References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> <1195136486.2785.12.camel@localhost.localdomain> Message-ID: Thank you Scott. I know that this message is really late, but I got really side tracked and want to follow through with this. My code so far is a mutt between everything I found online. It only produces a generic track though; however, I want to produce a chromatogram image, as shown on the gbrowse tutorial I found at http://wheat.pw.usda.gov/gbrowse/tutorial/tutorial.html (section 15: Displaying Trace Data, where a semantic zoom is shown). Can you guys help me finish it off? Thanks. use Bio::Graphics; use Bio::Seq; use Bio::SeqFeature::Generic; my @scfFile=qw(1.scf 2.scf); my $bsg = 'Bio::SeqFeature::Generic'; my $seq = Bio::Seq->new(-length=>900); my $whole = $bsg->new(-display_name => 'Clone82', -start => 1, -end => $seq->length); my $trace1 = $bsg->new(-start => 1, -end => 500, -display_name => 'Trace', -tag=>{ trace=>"$scfFile[0]" } ); my $panel = Bio::Graphics::Panel->new(-length => $seq->length, -width => 800, -truecolor => 1, -key_style => 'between', -pad_left => 10, -pad_right => 10, ); $panel->add_track($whole, -glyph => 'arrow', -double => 1, -tick => 2, -label => 1, ); $panel->add_track([$trace1], -feature=>'read', -strand_arrow=>1, -glyph => 'trace', -a_color=>'green', -c_color=>'blue', -g_color=>'black', -t_color=>'red', -trace_height=>80, -description=>1, -label => 1, -key => 'Traces'); binmode STDOUT; print $panel->png; From hlapp at gmx.net Sat Jul 4 06:39:37 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 12:39:37 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021128.21239.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> Message-ID: <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> Hi Florian: On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote: > Hi! > > I previously posted a message on the BioSQL mailinglist regarding a > BioSQL > schema for DB2 and we are several steps closer to completion now. Good to hear! > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL > to fill > our DB2 database with taxonomy data Would you mind posting to the BioSQL list which changes you had to make to make the script work with DB2? More generally, is there some kind of comprehensive documentation on what is different in DB2 from standard SQL92? The load_ncbi_taxonomy.pl script should in principle work with any SQL92- compliant RDBMS ... Have you found that not to be the case (which would be a bug), or is DB2 in some ways not SQL92-compliant? > , but loading the gene ontology with BioPerl's "load_ontology.pl" is > somewhat harder. The ontology as well as the sequence loader are really just front-ends to the Bioperl-db object-relational mappers (ORMs). So I would start there, rather than looking at errors the script does or does not throw (you don't want to run all combinations of command line parameters that would exercise each and every feature of the script). In order to create DB2 driver support in Bioperl-db, you need to add two things. First, you need to create a module Bio/DB/DBI/DB2.pm that overrides the methods from base.pm according to DB2. The fact that you didn't report any errors about that module not having been found suggests that you've done this already. The second step is as you say to create a package Bio/DB/BioSQL/DB2 with at least BasePersistenceAdaptorDriver.pm as module in it, and starting with a copy of the existing ones is indeed the best way to get started on this. Unless you also created the DB2 database DDL scripts from the Oracle ones, I wouldn't necessarily copy from Oracle though, but maybe rather from Pg. And rather than looking for errors of one of the scripts, I'd just go systematically through the files and make sure the SQL in there is DB2 compliant. > [...] > It first ran a few minutes processing the file and then died after the > following SQL-command was prepared and executed: > > "SELECT term.term_id, term.identifier, term.name, term.definition, > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier > = ?" Could you post the full error message? It is rather difficult to diagnose what's going on w/o the error message and stack trace. I'd be surprised BTW if DB2 were indeed offended by the NULL in the above statement - I'm pretty sure that "SELECT NULL FROM sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you sure that if you execute such a statement at a SQL prompt it results in an error? Since I can hardly believe that DB2 doesn't support selecting constants (NULL is as much a constant as 1 is), maybe what it wants though is aliasing the column. So if SELECT NULL FROM bioentry; yields an error, does SELECT NULL AS colAlias FROM bioentry; work fine? > I don't know if the "NULL" column is supposed to be there It is. The code in BaseDriver.pm that you were looking at should not need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is supposed to override any method that needs to be adapted to DB2.) The way the ORM works is by trying to map all properties of a BioPerl object that are persistent to a column of a table in the database. If it can't map a property (for whatever reason) its value is simply always undef (or NULL in SQL). I.e., NULL columns are the placeholder for a column that failed to be mapped to a property. You can't simply remove them or all subsequent columns are shifted. Hth, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Jul 4 08:02:33 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 08:02:33 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: <59386.10.2.4.168.1245679938.squirrel@webmail.istge.it> References: <59386.10.2.4.168.1245679938.squirrel@webmail.istge.it> Message-ID: Hi Achille, according to Chris Mungall from the GO Consortium, the .ontology files have been deprecated by GO. You should use the .obo files instead, and BioPerl has a parser for that (and load_ontology.pl supports all formats that BioPerl supports). There has been a near identical issue report earlier (April 20 - I don't have the thread from the archives at hand). According to Chris, the BioPerl parser for the .ontology files appears to fail to deal with the new relations in GO, and so with the obsoletion of the .ontology format we have scheduled the respective parser for deprecation. -hilmar On Jun 22, 2009, at 10:12 AM, Achille Zappa wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load ontologies > into biosql. > > I've tried to load the flat files gene ontologies : > > load_ontology.pl --driver mysql --dbuser xxx --dbpass xxx --host > localhost --dbname biosql --namespace "Gene Ontology" --format goflat > --fmtargs "-defs_file,GO.defs" function.ontology process.ontology > component.ontology > > as in the script info but I have an error, > > a lot of ------------ WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > and at the end > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: format error (file /home/user/Download/process.ontology) > offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of angiogenesis > ; synonym:downregulation of angiogenesis ; synonym:inhibition of > angiogenesis % negative regulation of developmental process ; > GO:0051093 % regulation of angiogenesis ; GO:0045765 > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::OntologyIO::dagflat::_parse_flat_file > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK: Bio::OntologyIO::dagflat::parse > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:284 > STACK: Bio::OntologyIO::dagflat::next_ontology > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK: load_ontology.pl:604 > ----------------------------------------------------------- > > could you help me? > is it possible to use the OBO format with the loader? > those GO flat files are deprecated by the Gene Ontology site > is there a list of format to use with the biosql perl scripts? > > thank you > regards > Achille > > > > > > -- > Achille Zappa > -Bioinformatics > National Cancer Research Institute (IST) > Largo Benzi 10 > 16132 Genova - ITALY > Tel. 010 5737288 > -IEIIT - Sezione di Genova > National Research Council (CNR) > via De Marini, 6 > 16149 Genova - ITALY > > > Aiutaci TU ad aiutare TANTI: Il tuo 5 per MILLE a sostegno della > nostra RICERCA. > Come fare: > Nella prossima dichiarazione dei redditi metti la firma > nell'apposito riquadro del 5 per mille, > scrivendo anche il codice fiscale dell'Istituto Nazionale per la > Ricerca sul Cancro di Genova : > c.f. 80 100 850 108 > Istituto Nazionale per la Ricerca sul Cancro > L.go R. Benzi, 10 -16132 Genova > http://www.istge.it > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Jul 4 09:49:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 09:49:39 -0400 Subject: [Bioperl-l] FASTQ output In-Reply-To: References: Message-ID: On Jul 1, 2009, at 5:48 AM, Chris Fields wrote: > I am working on FASTQ output and noticed a real oddity. Apparently, > there are three write_* methods for this module, with the odd choice > of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ. > write_qual() writes Qual format: Maybe the motivating thought was that a SeqIO module ought to write sequences when write_seq() is called. I agree though that a writer for a format ought to write that format and not something else. > [...] is there a reason for duplicating output code for qual and > FASTA output within Bio::SeqIO::fastq Hopefully not. > [...] Anyone have problems with me changing that up a bit? Go ahead. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From yewcoccus at gmail.com Sat Jul 4 22:39:23 2009 From: yewcoccus at gmail.com (yewcoccus) Date: Sun, 5 Jul 2009 10:39:23 +0800 Subject: [Bioperl-l] Bio::SeqIO::swiss.pm module help Message-ID: <200907051039202349328@gmail.com> Hi all, I want to parse uniprot_sprot.dat, get each of the features. but I found it is hard to understand how to use the Bio::SeqIO::swiss.pm module. I will be appreciate if there is anyone who can help. ID 002R_IIV3 Reviewed; 458 AA. AC Q197F8; DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot. DT 11-JUL-2006, sequence version 1. DT 16-JUN-2009, entry version 10. DE RecName: Full=Uncharacterized protein 002R; GN ORFNames=IIV3-002R; OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus). OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Chloriridovirus. OX NCBI_TaxID=345201; OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) (Culex vexans). OH NCBI_TaxID=42431; Culex territans. OH NCBI_TaxID=332058; Culiseta annulata. OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh mosquito). OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus. OH NCBI_TaxID=7183; Psorophora ferox. RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RX PubMed=16912294; DOI=10.1128/JVI.00464-06; RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser B.A., RA Kutish G.F., Rock D.L.; RT "Genome of invertebrate iridescent virus type 3 (mosquito iridescent RT virus)."; RL J. Virol. 80:8439-8449(2006). CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; DQ643392; ABF82032.1; -; Genomic_DNA. DR RefSeq; YP_654574.1; -. DR GeneID; 4156251; -. PE 4: Predicted; FT CHAIN 1 458 Uncharacterized protein 002R. FT /FTId=PRO_0000377938. SQ SEQUENCE 458 AA; 53921 MW; E46E5C85D7ACA139 CRC64; MASNTVSAQG GSNRPVRDFS NIQDVAQFLL FDPIWNEQPG SIVPWKMNRE QALAERYPEL QTSEPSEDYS GPVESLELLP LEIKLDIMQY LSWEQISWCK HPWLWTRWYK DNVVRVSAIT FEDFQREYAF PEKIQEIHFT DTRAEEIKAI LETTPNVTRL VIRRIDDMNY NTHGDLGLDD LEFLTHLMVE DACGFTDFWA PSLTHLTIKN LDMHPRWFGP VMDGIKSMQS TLKYLYIFET YGVNKPFVQW CTDNIETFYC TNSYRYENVP RPIYVWVLFQ EDEWHGYRVE DNKFHRRYMY STILHKRDTD WVENNPLKTP AQVEMYKFLL RISQLNRDGT GYESDSDPEN EHFDDESFSS GEEDSSDEDD PTWAPDSDDS DWETETEEEP SVAARILEKG KLTITNLMKS LGFKPKPKKI QSIDRYFCSL DSNYNSEDED FEYDSDSEDD DSDSEDDC // 2009-07-05 yewcoccus From bosborne11 at verizon.net Sat Jul 4 22:50:40 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 04 Jul 2009 22:50:40 -0400 Subject: [Bioperl-l] Bio::SeqIO::swiss.pm module help In-Reply-To: <200907051039202349328@gmail.com> References: <200907051039202349328@gmail.com> Message-ID: <3B3CB9A4-4C89-4730-B0E7-52D62DFB1BF0@verizon.net> yewcoccus, You took a look at the Feature-Annotation HOWTO? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Jul 4, 2009, at 10:39 PM, yewcoccus wrote: > Hi all, > > I want to parse uniprot_sprot.dat, get each of the features. > but I found it is hard to understand how to use the > Bio::SeqIO::swiss.pm module. I will be appreciate if there is > anyone who can help. > > > > ID 002R_IIV3 Reviewed; 458 AA. > AC Q197F8; > DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot. > DT 11-JUL-2006, sequence version 1. > DT 16-JUN-2009, entry version 10. > DE RecName: Full=Uncharacterized protein 002R; > GN ORFNames=IIV3-002R; > OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent > virus). > OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; > Chloriridovirus. > OX NCBI_TaxID=345201; > OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) > (Culex vexans). > OH NCBI_TaxID=42431; Culex territans. > OH NCBI_TaxID=332058; Culiseta annulata. > OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh > mosquito). > OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus. > OH NCBI_TaxID=7183; Psorophora ferox. > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > RX PubMed=16912294; DOI=10.1128/JVI.00464-06; > RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser > B.A., > RA Kutish G.F., Rock D.L.; > RT "Genome of invertebrate iridescent virus type 3 (mosquito > iridescent > RT virus)."; > RL J. Virol. 80:8439-8449(2006). > CC > ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs > License > CC > ----------------------------------------------------------------------- > DR EMBL; DQ643392; ABF82032.1; -; Genomic_DNA. > DR RefSeq; YP_654574.1; -. > DR GeneID; 4156251; -. > PE 4: Predicted; > FT CHAIN 1 458 Uncharacterized protein 002R. > FT /FTId=PRO_0000377938. > SQ SEQUENCE 458 AA; 53921 MW; E46E5C85D7ACA139 CRC64; > MASNTVSAQG GSNRPVRDFS NIQDVAQFLL FDPIWNEQPG SIVPWKMNRE QALAERYPEL > QTSEPSEDYS GPVESLELLP LEIKLDIMQY LSWEQISWCK HPWLWTRWYK DNVVRVSAIT > FEDFQREYAF PEKIQEIHFT DTRAEEIKAI LETTPNVTRL VIRRIDDMNY NTHGDLGLDD > LEFLTHLMVE DACGFTDFWA PSLTHLTIKN LDMHPRWFGP VMDGIKSMQS TLKYLYIFET > YGVNKPFVQW CTDNIETFYC TNSYRYENVP RPIYVWVLFQ EDEWHGYRVE DNKFHRRYMY > STILHKRDTD WVENNPLKTP AQVEMYKFLL RISQLNRDGT GYESDSDPEN EHFDDESFSS > GEEDSSDEDD PTWAPDSDDS DWETETEEEP SVAARILEKG KLTITNLMKS LGFKPKPKKI > QSIDRYFCSL DSNYNSEDED FEYDSDSEDD DSDSEDDC > // > > 2009-07-05 > > > > yewcoccus > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Jonathan.Moore at warwick.ac.uk Wed Jul 1 06:04:24 2009 From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan) Date: Wed, 1 Jul 2009 11:04:24 +0100 Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB95E@LAUREL.ads.warwick.ac.uk> Thanks for the suggestion Jason. There is a bit of a gulf between the tigrxml test file and the TAIR9 Arabidopsis release in TIGR XML format. BP's tigrxml test file's top-level object is ASSEMBLY, whereas in the TAIR file ASSEMBLY is already two levels deep in the object hierarchy inside TIGR and PSEUDOCHROMOSOME. In addition, the two main objects within the TAIR ASSEMBLY object, GENE_LIST and ASSEMBLY_SEQUENCE, don't get a mention in our test file. Looks like a bit of work would be needed to map this. Jay >There are several flavors of TIGR XML for rice and arabidoposis, and >other projects etc, I don't know which is tracked with the current >tigrxml version unfortunately but one can compare the test files in t/ >data to the versions downloaded to see what is currently supported. >Usually the gbk will be more consistently parseable but we can try and >work it out if it is a sensible transformation. > > >> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML >> files at the TAIR FTP site. >> >> I've tried SeqIO with both tigr and tigrxml formats but both are >> giving errors in 1.6.0. Has anyone advice on whether it's likely to >> be doable, or should I wait til the .gb files are available? >> >> Jay Moore > > >-- >Jason Stajich >jason at bioperl.org From johntyree at gmail.com Wed Jul 1 15:36:33 2009 From: johntyree at gmail.com (John Tyree) Date: Wed, 1 Jul 2009 15:36:33 -0400 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage Message-ID: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> I'm trying to use Bio::DB::GenBank to download a large number of files by accession number. The docs say not to do this in normal mode to reduce server load. There is some kind of helper function associated with this. %params = Bio::DB::GenBank->get_params('batch'); But I don't understand how to use it. If you pass the hash using: Bio::DB::GenBank->new(%params); it raises the following and dies: --------------------- WARNING --------------------- MSG: invalid retrieval type tool must be one of (pipeline,io_string,tempfile --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: seq_start() must be integer value if set STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::DB::NCBIHelper::seq_start /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 STACK: Bio::DB::NCBIHelper::new /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 STACK: Find_Patient_By_AccNo.pl:93 There is a deprecated method called get_Stream_by_batch() but how does one achieve batch mode using the proper get_Stream_by_id() ? Thanks, John From lacava at gmail.com Fri Jul 3 14:07:50 2009 From: lacava at gmail.com (John LaCava) Date: Fri, 3 Jul 2009 14:07:50 -0400 Subject: [Bioperl-l] Question regarding BioPerl / BioSQL - InterPro Optional IDs Message-ID: <48FCB39E-5CA8-4BE9-825D-49CFB14FDBB7@gmail.com> Hi all, I am trying to use the BioPerl-db script: load_seqdatabase.pl to parse a SwissProt.dat file (Yeast.dat, this is the yeast proteome with annotations etc.). The particular entry I am interested is the InterPro optional ID, which is the domain name. I have put a short stub up which displays the 4 pieces of info I want to parse into my data base. That can be found here: http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master You can see that near the bottom, we get the optional ID: $protein_ids->{interpro_domain} = $dblink->{optional_id}; I do not think the bioperl script load_seqdatabase.pl retrieves this information. At least, I cannot find it in the db built from parsing a test .dat file. I would like some help figuring out: 1) WHY doesn't it retrieve this information, since it seems to be parsing "all" annotations... 2) HOW might I edit the script to include this particular annotation of interest in the info it passes to my db (biosql) I am a bit out of my depth on this, and so, any help is appreciated. Cheers, John From Russell.Smithies at agresearch.co.nz Sun Jul 5 17:00:16 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 6 Jul 2009 09:00:16 +1200 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> I'd guess it's a difference in the parameters used. Interesting that both have the number of letters in the db as "-1,125,070,205", I assume that's a bug :-) Stats from your remote_blast: 'stats' => { 'S1' => '42', 'S1_bits' => '20.8', 'lambda' => '0.309', 'entropy' => '0.345', 'kappa_gapped' => '0.0410', 'T' => '11', 'kappa' => '0.122', 'X3_bits' => '24.7', 'X1' => '16', 'lambda_gapped' => '0.267', 'X2' => '38', 'S2' => '74', 'seqs_better_than_cutoff' => '0', 'posted_date' => 'Jul 4, 2009 4:41 AM', 'Hits_to_DB' => '60102303', 'dbletters' => '-1125070205', 'A' => '40', 'num_successful_extensions' => '2004', 'num_extensions' => '1436892', 'X1_bits' => '7.1', 'X3' => '64', 'entropy_gapped' => '0.140', 'dbentries' => '9252258', 'X2_bits' => '14.6', 'S2_bits' => '33.1' } Stats from a blast done on the NCBI webpage: Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 4, 2009 4:41 AM Number of letters in database: -1,125,070,205 Number of sequences in database: 9,252,258 Lambda K H 0.309 0.124 0.340 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252258 Number of Hits to DB: 86493230 Number of extensions: 3101413 Number of successful extensions: 9001 Number of sequences better than 100: 65 Number of HSP's better than 100 without gapping: 0 Number of HSP's gapped: 9000 Number of HSP's successfully gapped: 66 Length of query: 150 Length of database: 3169897087 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124391933 Effective search space: 78602501521 Effective search space used: 78602501521 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 65 (29.6 bits) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Sunday, 28 June 2009 10:15 p.m. > To: BioPerl List > Subject: [Bioperl-l] different results with remote-blast skript > > Hi again :) > please, I only have this little question: > why do I get different results with my remote::blast perl skript then on the > ncbi blast homepage? > I am using blastp, the query is an amino-sequence (different results with any > sequence, differences not only in number of hits but even in e-values, scores > etc...), the database is 'nr'. > PLEASE help me, > thank you in advance, > Jonas > > ps: my skript: > ############################################################################## > ## > use Bio::Seq::SeqFactory; > use Bio::Tools::Run::RemoteBlast; > use strict; > my @blast_report; > my $prog = 'blastp'; > my $db = 'nr'; > my $e_val= '1e-10'; > #my $e_val= '10'; > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > > my > $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > PDDEYE'; > #$v is just to turn on and off the messages > my $v = 1; > my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); > my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > "$blast_seq"); > my $filename='temp2.out'; > my $r = $factory->submit_blast($seq); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) > { > foreach my $rid ( @rids ) > { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > } > else > { > my $result = $rc->next_result(); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > > > } > @blast_report = get_file_data ($filename); > return @blast_report; > ############################################################################## > #### > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Sun Jul 5 17:40:41 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 5 Jul 2009 14:40:41 -0700 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: integer overflow in blast.... On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > I'd guess it's a difference in the parameters used. > Interesting that both have the number of letters in the db as > "-1,125,070,205", I assume that's a bug :-) > > Stats from your remote_blast: > > 'stats' => { > 'S1' => '42', > 'S1_bits' => '20.8', > 'lambda' => '0.309', > 'entropy' => '0.345', > 'kappa_gapped' => '0.0410', > 'T' => '11', > 'kappa' => '0.122', > 'X3_bits' => '24.7', > 'X1' => '16', > 'lambda_gapped' => '0.267', > 'X2' => '38', > 'S2' => '74', > 'seqs_better_than_cutoff' => '0', > 'posted_date' => 'Jul 4, 2009 4:41 AM', > 'Hits_to_DB' => '60102303', > 'dbletters' => '-1125070205', > 'A' => '40', > 'num_successful_extensions' => '2004', > 'num_extensions' => '1436892', > 'X1_bits' => '7.1', > 'X3' => '64', > 'entropy_gapped' => '0.140', > 'dbentries' => '9252258', > 'X2_bits' => '14.6', > 'S2_bits' => '33.1' > } > > > Stats from a blast done on the NCBI webpage: > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt > +PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 4, 2009 4:41 AM > Number of letters in database: -1,125,070,205 > Number of sequences in database: 9,252,258 > > Lambda K H > 0.309 0.124 0.340 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252258 > Number of Hits to DB: 86493230 > Number of extensions: 3101413 > Number of successful extensions: 9001 > Number of sequences better than 100: 65 > Number of HSP's better than 100 without gapping: 0 > Number of HSP's gapped: 9000 > Number of HSP's successfully gapped: 66 > Length of query: 150 > Length of database: 3169897087 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124391933 > Effective search space: 78602501521 > Effective search space used: 78602501521 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 65 (29.6 bits) > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> Sent: Sunday, 28 June 2009 10:15 p.m. >> To: BioPerl List >> Subject: [Bioperl-l] different results with remote-blast skript >> >> Hi again :) >> please, I only have this little question: >> why do I get different results with my remote::blast perl skript >> then on the >> ncbi blast homepage? >> I am using blastp, the query is an amino-sequence (different >> results with any >> sequence, differences not only in number of hits but even in e- >> values, scores >> etc...), the database is 'nr'. >> PLEASE help me, >> thank you in advance, >> Jonas >> >> ps: my skript: >> ############################################################################## >> ## >> use Bio::Seq::SeqFactory; >> use Bio::Tools::Run::RemoteBlast; >> use strict; >> my @blast_report; >> my $prog = 'blastp'; >> my $db = 'nr'; >> my $e_val= '1e-10'; >> #my $e_val= '10'; >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO' ); >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> >> my >> $ >> blast_seq >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> PDDEYE'; >> #$v is just to turn on and off the messages >> my $v = 1; >> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> 'Bio::PrimarySeq'); >> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> "$blast_seq"); >> my $filename='temp2.out'; >> my $r = $factory->submit_blast($seq); >> print STDERR "waiting..." if( $v > 0 ); >> while ( my @rids = $factory->each_rid ) >> { >> foreach my $rid ( @rids ) >> { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> } >> else >> { >> my $result = $rc->next_result(); >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), >> "\n"; >> while ( my $hit = $result->next_hit ) >> { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) >> { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> >> >> } >> @blast_report = get_file_data ($filename); >> return @blast_report; >> ############################################################################## >> #### >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at illinois.edu Sun Jul 5 18:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 5 Jul 2009 17:51:39 -0500 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: That inspires confidence ;> chris On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > integer overflow in blast.... > > On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> I'd guess it's a difference in the parameters used. >> Interesting that both have the number of letters in the db as >> "-1,125,070,205", I assume that's a bug :-) >> >> Stats from your remote_blast: >> >> 'stats' => { >> 'S1' => '42', >> 'S1_bits' => '20.8', >> 'lambda' => '0.309', >> 'entropy' => '0.345', >> 'kappa_gapped' => '0.0410', >> 'T' => '11', >> 'kappa' => '0.122', >> 'X3_bits' => '24.7', >> 'X1' => '16', >> 'lambda_gapped' => '0.267', >> 'X2' => '38', >> 'S2' => '74', >> 'seqs_better_than_cutoff' => '0', >> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> 'Hits_to_DB' => '60102303', >> 'dbletters' => '-1125070205', >> 'A' => '40', >> 'num_successful_extensions' => '2004', >> 'num_extensions' => '1436892', >> 'X1_bits' => '7.1', >> 'X3' => '64', >> 'entropy_gapped' => '0.140', >> 'dbentries' => '9252258', >> 'X2_bits' => '14.6', >> 'S2_bits' => '33.1' >> } >> >> >> Stats from a blast done on the NCBI webpage: >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 4, 2009 4:41 AM >> Number of letters in database: -1,125,070,205 >> Number of sequences in database: 9,252,258 >> >> Lambda K H >> 0.309 0.124 0.340 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252258 >> Number of Hits to DB: 86493230 >> Number of extensions: 3101413 >> Number of successful extensions: 9001 >> Number of sequences better than 100: 65 >> Number of HSP's better than 100 without gapping: 0 >> Number of HSP's gapped: 9000 >> Number of HSP's successfully gapped: 66 >> Length of query: 150 >> Length of database: 3169897087 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124391933 >> Effective search space: 78602501521 >> Effective search space used: 78602501521 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 65 (29.6 bits) >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>> Sent: Sunday, 28 June 2009 10:15 p.m. >>> To: BioPerl List >>> Subject: [Bioperl-l] different results with remote-blast skript >>> >>> Hi again :) >>> please, I only have this little question: >>> why do I get different results with my remote::blast perl skript >>> then on the >>> ncbi blast homepage? >>> I am using blastp, the query is an amino-sequence (different >>> results with any >>> sequence, differences not only in number of hits but even in e- >>> values, scores >>> etc...), the database is 'nr'. >>> PLEASE help me, >>> thank you in advance, >>> Jonas >>> >>> ps: my skript: >>> ############################################################################## >>> ## >>> use Bio::Seq::SeqFactory; >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my @blast_report; >>> my $prog = 'blastp'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> #my $e_val= '10'; >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> $ >>> Bio >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>> = '1'; >>> >>> my >>> $ >>> blast_seq >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>> PDDEYE'; >>> #$v is just to turn on and off the messages >>> my $v = 1; >>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>> 'Bio::PrimarySeq'); >>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>> "$blast_seq"); >>> my $filename='temp2.out'; >>> my $r = $factory->submit_blast($seq); >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) >>> { >>> foreach my $rid ( @rids ) >>> { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> } >>> else >>> { >>> my $result = $rc->next_result(); >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), >>> "\n"; >>> while ( my $hit = $result->next_hit ) >>> { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) >>> { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> >>> >>> } >>> @blast_report = get_file_data ($filename); >>> return @blast_report; >>> ############################################################################## >>> #### >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sun Jul 5 18:00:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 5 Jul 2009 18:00:42 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: References: Message-ID: Hi John, I presume you mean the scripts/biosql/load_interpro.pl script in Bioperl-db. It has indeed been obsoleted for a long time and I guess I should remove it because the functionality is now in load_ontology.pl. This is b/c InterPro for the purposes of BioPerl is an ontology. Have you found it not to work with load_ontology.pl? -hilmar On Jul 5, 2009, at 4:30 PM, John LaCava wrote: > Greetings, > > I am attempting to use this script, but I don't seem to be able to > determine the appropriate syntax. > Documentation on this script seems minimal. Moreover, I am not yet > terribly experienced with > these endeavors. > > Could someone possibly provide me with an example syntax? > > e.g. > load_interpro.pl ... > > then what? > > I must specify -db -file -version ? > > I tried a couple of ways, but I get the similar errors each time: > > e.g. > /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near > unexpected token `$file,' > /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' > > Also, from reading the comments, it appears this is supposed to be > made obsolete or superseded by > the script load_ontologies.pl. Why is this? > > Sorry to bother, and thanks in advance. > > John > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lacava at gmail.com Sun Jul 5 23:12:18 2009 From: lacava at gmail.com (John LaCava) Date: Sun, 5 Jul 2009 23:12:18 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: References: Message-ID: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> Hi again, Thanks for the response. Actually, I felt I did not need the entire functions of the ontology script, and thought I might see what the interpro script would generate, at the potential benefit of lower complexity to me, the novice programmer. But I was having trouble getting the interpro script to run, since I couldnt land the syntax. Anyway, I have since started writing my own script and will not pursue this matter further. Best wishes, John On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: > Hi John, > > I presume you mean the scripts/biosql/load_interpro.pl script in > Bioperl-db. It has indeed been obsoleted for a long time and I guess > I should remove it because the functionality is now in > load_ontology.pl. This is b/c InterPro for the purposes of BioPerl > is an ontology. > > Have you found it not to work with load_ontology.pl? > > -hilmar > > On Jul 5, 2009, at 4:30 PM, John LaCava wrote: > >> Greetings, >> >> I am attempting to use this script, but I don't seem to be able to >> determine the appropriate syntax. >> Documentation on this script seems minimal. Moreover, I am not yet >> terribly experienced with >> these endeavors. >> >> Could someone possibly provide me with an example syntax? >> >> e.g. > load_interpro.pl ... >> >> then what? >> >> I must specify -db -file -version ? >> >> I tried a couple of ways, but I get the similar errors each time: >> >> e.g. >> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >> unexpected token `$file,' >> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' >> >> Also, from reading the comments, it appears this is supposed to be >> made obsolete or superseded by >> the script load_ontologies.pl. Why is this? >> >> Sorry to bother, and thanks in advance. >> >> John >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Sun Jul 5 23:30:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 5 Jul 2009 23:30:34 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> References: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> Message-ID: <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> Hi John - the load_ontology.pl script is oriented towards the end-user (it runs off-the-bat), indeed not the novice programmer. Was there something that the script doesn't do that you wanted to program into it? -hilmar On Jul 5, 2009, at 11:12 PM, John LaCava wrote: > Hi again, > > Thanks for the response. > > Actually, I felt I did not need the entire functions of the ontology > script, and thought I might see what > the interpro script would generate, at the potential benefit of > lower complexity to me, the novice programmer. > But I was having trouble getting the interpro script to run, since I > couldnt land the syntax. > > Anyway, I have since started writing my own script and will not > pursue this matter further. > > Best wishes, > John > > > On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: > >> Hi John, >> >> I presume you mean the scripts/biosql/load_interpro.pl script in >> Bioperl-db. It has indeed been obsoleted for a long time and I >> guess I should remove it because the functionality is now in >> load_ontology.pl. This is b/c InterPro for the purposes of BioPerl >> is an ontology. >> >> Have you found it not to work with load_ontology.pl? >> >> -hilmar >> >> On Jul 5, 2009, at 4:30 PM, John LaCava wrote: >> >>> Greetings, >>> >>> I am attempting to use this script, but I don't seem to be able to >>> determine the appropriate syntax. >>> Documentation on this script seems minimal. Moreover, I am not >>> yet terribly experienced with >>> these endeavors. >>> >>> Could someone possibly provide me with an example syntax? >>> >>> e.g. > load_interpro.pl ... >>> >>> then what? >>> >>> I must specify -db -file -version ? >>> >>> I tried a couple of ways, but I get the similar errors each time: >>> >>> e.g. >>> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >>> unexpected token `$file,' >>> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' >>> >>> Also, from reading the comments, it appears this is supposed to be >>> made obsolete or superseded by >>> the script load_ontologies.pl. Why is this? >>> >>> Sorry to bother, and thanks in advance. >>> >>> John >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lacava at gmail.com Mon Jul 6 00:10:01 2009 From: lacava at gmail.com (John LaCava) Date: Mon, 6 Jul 2009 00:10:01 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> References: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> Message-ID: <0258CC4C-1CED-41DC-A637-025E37C7D913@gmail.com> Hi again, No, mainly I was scanning the different scripts to see if any of them were friendly towards capturing the InterPro domain-type info that we discussed in relation to load-seqdatabase.pl. I plan to explore the options you mentioned in your other reply to that thread, so that I can take full advantage of this script and the BioSQL schema... However, in the mean time I have authored a small and somewhat crappy script that will capture the SwissProt protein ID and accession number as well as the InterPro accession number and domain type and pass these into the bioentry and dbxref tables of the BioSQL schema. I had to add an additional column to the dbxref table that would accept the InterPro domain type (the optional ID) since I couldn't get it into dbxref_qualifier_value table without upsetting the mysql foreign key settings. Cheers again, John On Jul 5, 2009, at 11:30 PM, Hilmar Lapp wrote: > Hi John - the load_ontology.pl script is oriented towards the end- > user (it runs off-the-bat), indeed not the novice programmer. Was > there something that the script doesn't do that you wanted to > program into it? > > -hilmar > > On Jul 5, 2009, at 11:12 PM, John LaCava wrote: > >> Hi again, >> >> Thanks for the response. >> >> Actually, I felt I did not need the entire functions of the >> ontology script, and thought I might see what >> the interpro script would generate, at the potential benefit of >> lower complexity to me, the novice programmer. >> But I was having trouble getting the interpro script to run, since >> I couldnt land the syntax. >> >> Anyway, I have since started writing my own script and will not >> pursue this matter further. >> >> Best wishes, >> John >> >> >> On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: >> >>> Hi John, >>> >>> I presume you mean the scripts/biosql/load_interpro.pl script in >>> Bioperl-db. It has indeed been obsoleted for a long time and I >>> guess I should remove it because the functionality is now in >>> load_ontology.pl. This is b/c InterPro for the purposes of BioPerl >>> is an ontology. >>> >>> Have you found it not to work with load_ontology.pl? >>> >>> -hilmar >>> >>> On Jul 5, 2009, at 4:30 PM, John LaCava wrote: >>> >>>> Greetings, >>>> >>>> I am attempting to use this script, but I don't seem to be able >>>> to determine the appropriate syntax. >>>> Documentation on this script seems minimal. Moreover, I am not >>>> yet terribly experienced with >>>> these endeavors. >>>> >>>> Could someone possibly provide me with an example syntax? >>>> >>>> e.g. > load_interpro.pl ... >>>> >>>> then what? >>>> >>>> I must specify -db -file -version ? >>>> >>>> I tried a couple of ways, but I get the similar errors each time: >>>> >>>> e.g. >>>> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >>>> unexpected token `$file,' >>>> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, >>>> $version);' >>>> >>>> Also, from reading the comments, it appears this is supposed to >>>> be made obsolete or superseded by >>>> the script load_ontologies.pl. Why is this? >>>> >>>> Sorry to bother, and thanks in advance. >>>> >>>> John >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From pmr at ebi.ac.uk Mon Jul 6 10:09:21 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 15:09:21 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <4A520591.3070407@ebi.ac.uk> Giles Weaver wrote: > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters (and > poor quality sequence) from Illumina reads. We would like to add this to EMBOSS. Can you describe the method you would like to use (I see you currently use a combination of bioperl and emboss for this). > For my purposes the tools that would love to see supported in > bioperl/bioperl-run are: > > - next-gen sequence quality parsing (to output phred scores) > - sequence quality based trimming > - sequencing adapter removal > - filtering based on sequence complexity (repeats, entropy etc) > - bioperl-run modules for bowtie etc. We would like to see these supported in all the Open-Bio Projects and they are a priority for EMBOSS. Can you suggest quality filters, trimming methods, adaptor removal methods, sequence filters and any other applications we could provide in EMBOSS. We hope to keep in line with what the other projects do so that EMBOSS, bioperl, biopython etc. can be used interchangeably in pipelines. > Obviously all of these need to be fast! .... My > current code trims ~1300 sequences/second, including unzipping the raw data > and converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. OK, we will see what speed we can reach. > Hope this looooong post was of interest to someone! Very interesting! regards, Peter Rice From Jonas_Schaer at gmx.de Sun Jul 5 11:46:52 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Sun, 5 Jul 2009 17:46:52 +0200 Subject: [Bioperl-l] bioperl 1.6?? Message-ID: <51AF33DA19004A7B891743D1B094F4B1@jonas> what is the difference between bioperl 1.6 and 1.5.2??? which one should i use??? thx, jonas From Brotelzwieb at gmx.de Mon Jul 6 08:14:18 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Mon, 6 Jul 2009 14:14:18 +0200 Subject: [Bioperl-l] different results with remote-blast skript References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: <46A05E0132144D73A0F805953B580B2F@jonas> Hi guys, thanks for your answers so far. @jason: integer overflow in blast.... sorry, but what do you mean by that? how can I fix it...? Since I never really changed any parameters I thought them all to be default. whatever, I tried to get "better" results with my prog by changing these: $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; with no effect...I guess these were default values anyway. So please maybe you can tell me all the other parameters I can change with my perl-skript AND how to do that? Unfortunately both, perl and the blast-algorithm are pretty much new to me, maybe thats why I just cannot find out how to do that on my own... :/ Here is the output I get with my remote-blast skript: ################################################################################################################# Query Name: MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL L hit name is ref|XP_001702807.1| score is 442 BLASTP 2.2.21+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. RID: 53STX5G2013 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 9,252,587 sequences; 3,169,972,781 total letters Query= MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM ATGPDPDDEYE Length=150 Score E Sequences producing significant alignments: (Bits) Value ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 2e-42 ALIGNMENTS >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] Length=303 Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS 213 Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR 120 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR 273 Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 5, 2009 4:41 AM Number of letters in database: -1,124,994,511 Number of sequences in database: 9,252,587 Lambda K H 0.309 0.122 0.345 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252587 Number of Hits to DB: 60273703 Number of extensions: 1448367 Number of successful extensions: 2103 Number of sequences better than 10: 0 Number of HSP's better than 10 without gapping: 0 Number of HSP's gapped: 2113 Number of HSP's successfully gapped: 0 Length of query: 150 Length of database: 3169972781 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124430450 Effective search space: 78603926650 Effective search space used: 78603926650 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 74 (33.1 bits) ################################################################################################################# and here are the hits (?) of the blast-algorithm on the ncbi-homepage with the same query of course: ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 3e-80 ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 1.1 ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 1.8 ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 4.2 ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 6.0 ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 7.0 #################################################################################################################at least the first hit is the same, but even there there is a different score and e-value. thanks so much for any help :) regards, jonas ----- Original Message ----- From: "Chris Fields" To: "Jason Stajich" Cc: "Smithies, Russell" ; "'BioPerl List'" ; "'Jonas Schaer'" Sent: Monday, July 06, 2009 12:51 AM Subject: Re: [Bioperl-l] different results with remote-blast skript > That inspires confidence ;> > > chris > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >> integer overflow in blast.... >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >>> I'd guess it's a difference in the parameters used. >>> Interesting that both have the number of letters in the db as >>> "-1,125,070,205", I assume that's a bug :-) >>> >>> Stats from your remote_blast: >>> >>> 'stats' => { >>> 'S1' => '42', >>> 'S1_bits' => '20.8', >>> 'lambda' => '0.309', >>> 'entropy' => '0.345', >>> 'kappa_gapped' => '0.0410', >>> 'T' => '11', >>> 'kappa' => '0.122', >>> 'X3_bits' => '24.7', >>> 'X1' => '16', >>> 'lambda_gapped' => '0.267', >>> 'X2' => '38', >>> 'S2' => '74', >>> 'seqs_better_than_cutoff' => '0', >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>> 'Hits_to_DB' => '60102303', >>> 'dbletters' => '-1125070205', >>> 'A' => '40', >>> 'num_successful_extensions' => '2004', >>> 'num_extensions' => '1436892', >>> 'X1_bits' => '7.1', >>> 'X3' => '64', >>> 'entropy_gapped' => '0.140', >>> 'dbentries' => '9252258', >>> 'X2_bits' => '14.6', >>> 'S2_bits' => '33.1' >>> } >>> >>> >>> Stats from a blast done on the NCBI webpage: >>> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >>> +PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 4, 2009 4:41 AM >>> Number of letters in database: -1,125,070,205 >>> Number of sequences in database: 9,252,258 >>> >>> Lambda K H >>> 0.309 0.124 0.340 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252258 >>> Number of Hits to DB: 86493230 >>> Number of extensions: 3101413 >>> Number of successful extensions: 9001 >>> Number of sequences better than 100: 65 >>> Number of HSP's better than 100 without gapping: 0 >>> Number of HSP's gapped: 9000 >>> Number of HSP's successfully gapped: 66 >>> Length of query: 150 >>> Length of database: 3169897087 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124391933 >>> Effective search space: 78602501521 >>> Effective search space used: 78602501521 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 65 (29.6 bits) >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>> To: BioPerl List >>>> Subject: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi again :) >>>> please, I only have this little question: >>>> why do I get different results with my remote::blast perl skript >>>> then on the >>>> ncbi blast homepage? >>>> I am using blastp, the query is an amino-sequence (different >>>> results with any >>>> sequence, differences not only in number of hits but even in e- >>>> values, scores >>>> etc...), the database is 'nr'. >>>> PLEASE help me, >>>> thank you in advance, >>>> Jonas >>>> >>>> ps: my skript: >>>> ############################################################################## >>>> ## >>>> use Bio::Seq::SeqFactory; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use strict; >>>> my @blast_report; >>>> my $prog = 'blastp'; >>>> my $db = 'nr'; >>>> my $e_val= '1e-10'; >>>> #my $e_val= '10'; >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO' ); >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> $ >>>> Bio >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>> = '1'; >>>> >>>> my >>>> $ >>>> blast_seq >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>>> PDDEYE'; >>>> #$v is just to turn on and off the messages >>>> my $v = 1; >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>> 'Bio::PrimarySeq'); >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>> "$blast_seq"); >>>> my $filename='temp2.out'; >>>> my $r = $factory->submit_blast($seq); >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) >>>> { >>>> foreach my $rid ( @rids ) >>>> { >>>> my $rc = $factory->retrieve_blast($rid); >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>>> } >>>> else >>>> { >>>> my $result = $rc->next_result(); >>>> $factory->save_output($filename); >>>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", $result->query_name(), >>>> "\n"; >>>> while ( my $hit = $result->next_hit ) >>>> { >>>> next unless ( $v > 0); >>>> print "\thit name is ", $hit->name, "\n"; >>>> while( my $hsp = $hit->next_hsp ) >>>> { >>>> print "\t\tscore is ", $hsp->score, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> } >>>> @blast_report = get_file_data ($filename); >>>> return @blast_report; >>>> ############################################################################## >>>> #### >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> = >>> = >>> ===================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> ===================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 05:53:00 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 231 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 358 bytes Desc: not available URL: From s_oheigeartaigh at yahoo.co.uk Mon Jul 6 11:01:04 2009 From: s_oheigeartaigh at yahoo.co.uk (Sean ohEigeartaigh) Date: Mon, 6 Jul 2009 15:01:04 +0000 (GMT) Subject: [Bioperl-l] bioperl BLAST question Message-ID: <626933.96171.qm@web27405.mail.ukl.yahoo.com> Hi, I'm trying to use bioperl to limit the number of BLAST results. However, when I use the following bit of code, it limits to less than the cutoff number, and excludes BLAST results that are halfway up the BLAST results page (without the limit) which it shouldn't exclude. $blast = Bio::Tools::Run::StandAloneBlast ->new(program => 'tblastn', database =>$blastdb, b =>100, v =>100, F=>$fil, outfile=>$out) ->blastall($seq1); } Using this bit of code, I get 60 results for my query (out of 173 with no hit limit and an e-value cutoff of e=10). If I use b =>150, v=>150, I get 85 results, and some BLAST results appear halfway up the results page. In other words, the limit seems to be removing results at random throughout the file, and is also not giving me enough results. Am I using the b and v parameters (to limit blast results and blast one-line summaries) incorrectly? Thanks very much for your help, Se?n ? h?igeartaigh From cjfields at illinois.edu Mon Jul 6 11:24:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 10:24:39 -0500 Subject: [Bioperl-l] bioperl 1.6?? In-Reply-To: <51AF33DA19004A7B891743D1B094F4B1@jonas> References: <51AF33DA19004A7B891743D1B094F4B1@jonas> Message-ID: <17AE2E44-66F3-49D5-AECE-D016FE66BD66@illinois.edu> The latest one (1.6). chris On Jul 5, 2009, at 10:46 AM, Jonas Schaer wrote: > what is the difference between bioperl 1.6 and 1.5.2??? > which one should i use??? > thx, jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florian.mittag at uni-tuebingen.de Mon Jul 6 12:08:18 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Mon, 6 Jul 2009 18:08:18 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> Message-ID: <200907061808.18651.florian.mittag@uni-tuebingen.de> Hi! On Saturday 04 July 2009 12:39, Hilmar Lapp wrote: > On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote: > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL > > to fill > > our DB2 database with taxonomy data > > Would you mind posting to the BioSQL list which changes you had to > make to make the script work with DB2? No problem, I will post the diff sometime this week, since there are a few changes not necessary anymore, e.g., the new DB2 Express-C version 9.7 supports the "TRUNCATE TABLE" command, which it previously didn't. > More generally, is there some kind of comprehensive documentation on > what is different in DB2 from standard SQL92? The > load_ncbi_taxonomy.pl script should in principle work with any SQL92- > compliant RDBMS ... Have you found that not to be the case (which > would be a bug), or is DB2 in some ways not SQL92-compliant? I don't know, I haven't looked for this kind of documentation, but the two things that annoyed me most were: 1) DB2 doesn't support UNIQUE on columns that allow for NULL values. Solution: create triggers that ensure UNIQUEness and create an INDEX. 2) Columns of type CLOB do not allow to be compared through "=", but only through "LIKE", which leads to problems with BioJava's Hibernate queries. Solution: currently none I want to discuss these problems in more detail on the other mailinglists, since they do not really belong here. > > , but loading the gene ontology with BioPerl's "load_ontology.pl" is > > somewhat harder. > > The ontology as well as the sequence loader are really just front-ends > to the Bioperl-db object-relational mappers (ORMs). So I would start > there, rather than looking at errors the script does or does not throw > (you don't want to run all combinations of command line parameters > that would exercise each and every feature of the script). > > In order to create DB2 driver support in Bioperl-db, you need to add > two things. First, you need to create a module Bio/DB/DBI/DB2.pm that > overrides the methods from base.pm according to DB2. The fact that you > didn't report any errors about that module not having been found > suggests that you've done this already. Correct ;-) > The second step is as you say to create a package Bio/DB/BioSQL/DB2 > with at least BasePersistenceAdaptorDriver.pm as module in it, and > starting with a copy of the existing ones is indeed the best way to > get started on this. Unless you also created the DB2 database DDL > scripts from the Oracle ones, I wouldn't necessarily copy from Oracle > though, but maybe rather from Pg. And rather than looking for errors > of one of the scripts, I'd just go systematically through the files > and make sure the SQL in there is DB2 compliant. Okay, I'll do that, but that will take some time and I'll probably turn to this mailings for further assistance with more specific questions. > > [...] > > It first ran a few minutes processing the file and then died after the > > following SQL-command was prepared and executed: > > > > "SELECT term.term_id, term.identifier, term.name, term.definition, > > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier > > = ?" > > Could you post the full error message? It is rather difficult to > diagnose what's going on w/o the error message and stack trace. Right now, unfortunately not, because this error message won't appear again. I'm not sure is this is because of the database now containing data or because of some other changes I've made, but I will see this in the process of rewriting the DDL scripts. > I'd be surprised BTW if DB2 were indeed offended by the NULL in the > above statement - I'm pretty sure that "SELECT NULL FROM > sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you > sure that if you execute such a statement at a SQL prompt it results > in an error? > > Since I can hardly believe that DB2 doesn't support selecting > constants (NULL is as much a constant as 1 is), maybe what it wants > though is aliasing the column. So if > > SELECT NULL FROM bioentry; > > yields an error, does > > SELECT NULL AS colAlias FROM bioentry; > > work fine? Well, it is like this with version 9.5 of DB2 Express-C: SELECT NULL FROM bioentry; yields: SQL0206N "NULL" is not valid in the context where it is used. SQLSTATE=42703 SQLCODE=-206 But if I do: SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; it returns the correct result without error. Thew new version 9.7 claims to have changed this behavior, so that the first query would run fine, but I didn't have time to test the new version, yet. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.wn.doc/doc/i0054263.html > > > I don't know if the "NULL" column is supposed to be there > > It is. The code in BaseDriver.pm that you were looking at should not > need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is > supposed to override any method that needs to be adapted to DB2.) The > way the ORM works is by trying to map all properties of a BioPerl > object that are persistent to a column of a table in the database. If > it can't map a property (for whatever reason) its value is simply > always undef (or NULL in SQL). I.e., NULL columns are the placeholder > for a column that failed to be mapped to a property. You can't simply > remove them or all subsequent columns are shifted. It ran fine without the NULL column, but that isn't necessarily a sign of correctness. My problem was that (as stated above) the old version of DB2 requires you to cast the NULL value to a data type, which I wasn't able to determine from the code. With the new version, it should work, so I'll have to rerun my tests again and see if the problem is still there. I will keep you updated on the Perl issues and hope to have some useful results by the end of the week. And I hope you excuse me for posting things here that are hardly related to BioPerl, but the some problems are a complex entanglement of issues with BioSQL, BioPerl and BioJava, so it's hard to decide where to post it ;-) Regards, Florian From biopython at maubp.freeserve.co.uk Mon Jul 6 12:19:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 17:19:54 +0100 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907061808.18651.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> <200907061808.18651.florian.mittag@uni-tuebingen.de> Message-ID: <320fb6e00907060919w1ce69284r30fede63ec05adbb@mail.gmail.com> On BioPerl-l, July 2009, Florian Mittag wrote: > ... > Okay, I'll do that, but that will take some time and I'll probably turn to > this mailings for further assistance with more specific questions. > ... > I will keep you updated on the Perl issues and hope to have some useful > results by the end of the week. And I hope you excuse me for posting things > here that are hardly related to BioPerl, but the some problems are a complex > entanglement of issues with BioSQL, BioPerl and BioJava, so it's hard to > decide where to post it ;-) You may want to cross post some things (e.g. the hibernate issue to BioSQL and BioJava lists). I've CC'd this reply to BioSQL-l for example. I think some guidance from Hilmar on this etiquette would help ;) I would not expect BioJava people to follow BioPerl-l for example. (Although here I am as a Biopython person keeping an eye on BioPerl-l sometimes). I assume (hope?) that people from all the Bio* projects with BioSQL bindings will be following the BioSQL-l mailing list - so for anything clearly cross project like the schemas themselves, at very least please CC the BioSQL-l mailing list. Peter (Biopython) From cjfields at illinois.edu Mon Jul 6 12:42:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 11:42:10 -0500 Subject: [Bioperl-l] bioperl BLAST question In-Reply-To: <626933.96171.qm@web27405.mail.ukl.yahoo.com> References: <626933.96171.qm@web27405.mail.ukl.yahoo.com> Message-ID: <9441E227-2833-4B3D-AA15-5C5797D7F88F@illinois.edu> On Jul 6, 2009, at 10:01 AM, Sean ohEigeartaigh wrote: > > Hi, > > I'm trying to use bioperl to limit the number of BLAST results. > However, when I use the following bit of code, it limits to less > than the cutoff number, and excludes BLAST results that are halfway > up the BLAST results page (without the limit) which it shouldn't > exclude. > > $blast = Bio::Tools::Run::StandAloneBlast > ->new(program => 'tblastn', database =>$blastdb, b =>100, v > =>100, F=>$fil, outfile=>$out) > ->blastall($seq1); > } > > Using this bit of code, I get 60 results for my query (out of 173 > with no hit limit and an e-value cutoff of e=10). If I use b =>150, > v=>150, I get 85 results, and some BLAST results appear halfway up > the results page. In other words, the limit seems to be removing > results at random throughout the file, and is also not giving me > enough results. The problem is we can't adequately diagnose the problem with the script segment and w/o an example report and description of what you expect. The best way to handle this is to file a bug report so we can look things over: http://www.bioperl.org/wiki/Bugs > Am I using the b and v parameters (to limit blast results and blast > one-line summaries) incorrectly? > Thanks very much for your help, > Se?n ? h?igeartaigh chris From stevey_mac2k2 at hotmail.com Mon Jul 6 14:31:26 2009 From: stevey_mac2k2 at hotmail.com (stephenmcgowan1) Date: Mon, 6 Jul 2009 11:31:26 -0700 (PDT) Subject: [Bioperl-l] Bioperl Installation Message-ID: <24360594.post@talk.nabble.com> Hi, I seem to be having trouble with Installing Bioperl 1.6 in CPAN. I have attached a log of the install, i just can't see why it seems to be falling over. Thanks, Stephen http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf BioPerl+Install.rtf http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc BioPerl+Install.doc -- View this message in context: http://www.nabble.com/Bioperl-Installation-tp24360594p24360594.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Mon Jul 6 15:38:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link: Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answer make is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave From Kevin.M.Brown at asu.edu Mon Jul 6 15:28:48 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 6 Jul 2009 12:28:48 -0700 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <1A4207F8295607498283FE9E93B775B406130F1C@EX02.asurite.ad.asu.edu> Well, without the error messages that should have been printed out not sure how much help we can be. No idea what OS you're running, perl version, etc... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > stephenmcgowan1 > Sent: Monday, July 06, 2009 11:31 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bioperl Installation > > > Hi, > > I seem to be having trouble with Installing Bioperl 1.6 in CPAN. > > I have attached a log of the install, i just can't see why it > seems to be > falling over. > > Thanks, > > Stephen > > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf > BioPerl+Install.rtf > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc > BioPerl+Install.doc > -- > View this message in context: > http://www.nabble.com/Bioperl-Installation-tp24360594p24360594.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Mon Jul 6 15:48:21 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 6 Jul 2009 15:48:21 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Message-ID: <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> After you get make installed, you may need to reconfigure cpan so it knows where to find it. Do this: sudo cpan (Assuming you want the libraries installed in the system paths) cpan> o conf init You can probably answer yes to the "do you want me to automatically configure" question, and it should sense that make is now present. If not, do it again and answer "no" and accept all of the defaults until it gets to the part about where make is. Scott On Jul 6, 2009, at 3:38 PM, Dave Messina wrote: > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your > log is: > > Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the > developer tools. > > > Go to developer.apple.com and create an account if you don't already > have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > > You'll need to login to get to it, and then you'll get to the > download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that > you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Mon Jul 6 16:16:49 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 6 Jul 2009 16:16:49 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> Message-ID: <724E02A7-2A73-422D-9A33-FF945547A777@scottcain.net> Always, always, always reply to the list, as the original author of the email that you are replying to doesn't always know the answer, like now. I don't recall how I installed libxslt. In fact, I don't remember doing it. Have you searched your hard drive? I think it gets installed with the developers tools. Scott On Jul 6, 2009, at 4:01 PM, Steven McGowan wrote: > Hi Scott, > > I removed my previous install with rm -rf ~/.cpan/build/* > > I've tried re-installing (install C/CJ/CJFIELDS/BioPerl- > db-1.6.0.tar.gz), and upon installing have noticed the error: > > Checking prerequisites... > - ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install > Data::Stag" in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > is not installed on this computer. > Data::Stag::XSLTHandler in Data::Stag needs it for XSLT > Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > is not installed on this computer. > Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > is not installed on this computer. > stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and > algorithms, > is not installed on this computer. > Data::Stag::GraphHandler in Data::Stag needs it for transforming > stag trees to graphs > > External Module Tk, Tk, > is not installed on this computer. > stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's > lacking Data::Stag who's install is lacking the list above. How > would i go about installing the above list? is there an easier way > or something i'm doing wrong? > > Thanks, > > Stephen > > > From: cain.cshl at gmail.com > > To: David.Messina at sbc.su.se > > Subject: Re: [Bioperl-l] Bioperl Installation > > Date: Mon, 6 Jul 2009 15:48:21 -0400 > > CC: stevey_mac2k2 at hotmail.com; Bioperl-l at lists.open-bio.org > > > > After you get make installed, you may need to reconfigure cpan so it > > knows where to find it. Do this: > > > > sudo cpan > > > > (Assuming you want the libraries installed in the system paths) > > > > cpan> o conf init > > > > You can probably answer yes to the "do you want me to automatically > > configure" question, and it should sense that make is now present. > If > > not, do it again and answer "no" and accept all of the defaults > until > > it gets to the part about where make is. > > > > Scott > > > > On Jul 6, 2009, at 3:38 PM, Dave Messina wrote: > > > >> Hi Stephen, > >> This is on a Mac, correct? > >> > >> You need to install the developer tools first. The key line in your > >> log is: > >> > >> Can't test without successful make > >> > >> > >> Admittedly, that's cryptic. What it means is that it needs the > program > >> called make. That program is installed when you install the > >> developer tools. > >> > >> > >> Go to developer.apple.com and create an account if you don't > already > >> have > >> one. > >> > >> > >> Go to the Mac Dev Center, and click on "Xcode 3". > >> > >> > >> This should be the right link: > >> > >> Xcode 3 >>> > >> > >> You'll need to login to get to it, and then you'll get to the > >> download page > >> for the massive 986 MB Xcode 3.1.3 download. > >> > >> After you run the Xcode installer, you can check in Terminal that > >> you've got > >> 'make' installed by typing: > >> > >> which make > >> > >> on the command line. It should give you the answer > >> make is /usr/bin/make > >> > >> If it does, then you're good to try again with the bioperl install. > >> > >> Dave > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > ----------------------------------------------------------------------- > > Scott Cain, Ph. D. scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > > > View your Twitter and Flickr updates from one place ? Learn more! ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Mon Jul 6 16:26:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 22:26:55 +0200 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Message-ID: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Hi Steven, Forwarding this to the list so that everyone can follow along...please keep the list on any replies. Don't quit out of the install -- cpan can automatically detect required dependencies and will try to install them first. Amidst all of the kerfuffle in your previous install there was this bit: ---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queue of modules we are processing right now? [yes] So if you go back into cpan, try the Bioperl-1.6 install again, you should be prompted again about those missing dependencies. A note to the Bioperl core-devs: Data::Stag seems to have a couple of tricky dependencies of its own, namely GD and Tk, and it looks like they're for a couple of included scripts which I'm guessing Bioperl doesn't use. Perhaps we should send a request to the Data::Stag author to make GD and Tk optional instead of required? Dave ---------- Forwarded message ---------- From: Steven McGowan Date: Mon, Jul 6, 2009 at 22:02 Subject: RE: [Bioperl-l] Bioperl Installation To: david.messina at sbc.su.se Hi Dave, I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen ------------------------------ From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: Bioperl-l at lists.open-bio.org Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link: Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answer make is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave ------------------------------ View your Twitter and Flickr updates from one place ? Learn more! From stevey_mac2k2 at hotmail.com Mon Jul 6 16:19:37 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:19:37 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> Message-ID: I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen _________________________________________________________________ MSN straight to your mobile - news, entertainment, videos and more. http://clk.atdmt.com/UKM/go/147991039/direct/01/ From David.Messina at sbc.su.se Mon Jul 6 16:47:06 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 22:47:06 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> > do i now want to Install [a]ll optional external modules, [n]one, or choose > [i]nteractively? [n] > > Data::Stag, Test::Harness, and CPAN are required, not optional. So I think they'll be installed even if you answer n to the question about the optional external modules. Dave From scott at scottcain.net Mon Jul 6 16:50:41 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 6 Jul 2009 16:50:41 -0400 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> Hi Dave, I think you are confused about the prereqs for Data::Stag: I have it installed and working and don't have Tk. cpantesters.org also thinks that IO::String is the only dependency: http://deps.cpantesters.org/?module=Data::Stag;perl=latest Scott On Mon, Jul 6, 2009 at 4:26 PM, Dave Messina wrote: > Hi Steven, > Forwarding this to the list so that everyone can follow along...please keep > the list on any replies. > > Don't quit out of the install -- cpan can automatically detect required > dependencies and will try to install them first. > > Amidst all of the kerfuffle in your previous install there was this bit: > > ---- Unsatisfied dependencies detected during > [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- > > ? ?Test::Harness > > ? ?Data::Stag > > ? ?CPAN > > Shall I follow them and prepend them to the queue > > of modules we are processing right now? [yes] > > > So if you go back into cpan, try the Bioperl-1.6 install again, you should > be prompted again about those missing dependencies. > > > > A note to the Bioperl core-devs: > > Data::Stag seems to have a couple of tricky dependencies of its own, namely > GD and Tk, and it looks like they're for a couple of included scripts which > I'm guessing Bioperl doesn't use. > > Perhaps we should send a request to the Data::Stag author to make GD and Tk > optional instead of required? > > > Dave > > > > > ---------- Forwarded message ---------- > From: Steven McGowan > Date: Mon, Jul 6, 2009 at 22:02 > Subject: RE: [Bioperl-l] Bioperl Installation > To: david.messina at sbc.su.se > > > ?Hi Dave, > > I managed to sort it and have had a go at > installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon > installing have noticed the error: > > Checking prerequisites... > ?- ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install Data::Stag" > in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > ?is not installed on this computer. > ?Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > ?is not installed on this computer. > ?Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > ?is not installed on this computer. > ?stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and algorithms, > ?is not installed on this computer. > ?Data::Stag::GraphHandler in Data::Stag needs it for transforming stag > trees to graphs > > External Module Tk, Tk, > ?is not installed on this computer. > ?stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking > Data::Stag who's install is lacking the list above. How would i go about > installing the above list? is there an easier way or something i'm doing > wrong? > > Thanks, > > Stephen > > ------------------------------ > From: David.Messina at sbc.su.se > Date: Mon, 6 Jul 2009 21:38:09 +0200 > Subject: Re: [Bioperl-l] Bioperl Installation > To: stevey_mac2k2 at hotmail.com > CC: Bioperl-l at lists.open-bio.org > > > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your log is: > > ?Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the developer tools. > > > Go to developer.apple.com and create an account if you don't already have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > You'll need to login to get to it, and then you'll get to the download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > > > ------------------------------ > View your Twitter and Flickr updates from one place ? Learn > more! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Mon Jul 6 16:56:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 7 Jul 2009 08:56:41 +1200 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: <46A05E0132144D73A0F805953B580B2F@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> Hi Jonas, You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped > BLAST and PSI-BLAST: a new generation of protein database search programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" > To: "Jason Stajich" > Cc: "Smithies, Russell" ; "'BioPerl > List'" ; "'Jonas Schaer'" > > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 From stevey_mac2k2 at hotmail.com Mon Jul 6 16:39:08 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:39:08 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: I have since initialised the install and receive the message: Checking prerequisites... - ERROR: Data::Stag is not installed (I think I'm being run by CPAN, so will rely on CPAN to handle prerequisite installation) I'll get CPAN to prepend the installation of this - ERROR: Test::Harness (2.56) is installed, but we need version>= 2.62 I'll get CPAN to prepend the installation of this - ERROR: CPAN (1.7602) is installed, but we need version>= 1.81 I'll get CPAN to prepend the installation of this Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] do i now want to Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] i'm guessing installing all external modules will include Data::Stag? StephenFrom: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 22:26:55 +0200 Subject: Fwd: [Bioperl-l] Bioperl Installation To: bioperl-l at lists.open-bio.org CC: stevey_mac2k2 at hotmail.com Hi Steven, Forwarding this to the list so that everyone can follow along...please keep the list on any replies. Don't quit out of the install -- cpan can automatically detect required dependencies and will try to install them first. Amidst all of the kerfuffle in your previous install there was this bit: ---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queue of modules we are processing right now? [yes] So if you go back into cpan, try the Bioperl-1.6 install again, you should be prompted again about those missing dependencies. A note to the Bioperl core-devs: Data::Stag seems to have a couple of tricky dependencies of its own, namely GD and Tk, and it looks like they're for a couple of included scripts which I'm guessing Bioperl doesn't use. Perhaps we should send a request to the Data::Stag author to make GD and Tk optional instead of required? Dave ---------- Forwarded message ---------- From: Steven McGowan Date: Mon, Jul 6, 2009 at 22:02 Subject: RE: [Bioperl-l] Bioperl Installation To: david.messina at sbc.su.se Hi Dave, I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: Bioperl-l at lists.open-bio.org Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link:Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answermake is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave View your Twitter and Flickr updates from one place ? Learn more! _________________________________________________________________ Get the best of MSN on your mobile http://clk.atdmt.com/UKM/go/147991039/direct/01/ From stevey_mac2k2 at hotmail.com Mon Jul 6 16:52:46 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:52:46 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> Message-ID: ok i've hit [n] the next bypasses a list of optional prerequisites...apart from: * XML::SAX (0.14) is installed, but we prefer to have 0.15 (wanted for parsing xml, used by Bio::SearchIO::blastxml, Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) this does not seem to be an optional prerequisite but seems to be bypassed? and then i receive: ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versionsof the modules indicated above before proceeding with this installation Checking features: Network..................enabled BioDBSeqFeature_mysql....enabled BioDBGFF.................enabled BioDBSeqFeature_BDB......enabled Do you want to run the Bio::DB::GFF or Bio::DB::SeqFeature::Store live database tests? y/n [n] n - will not run the BioDBGFF or BioDBSeqFeature live database tests Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 22:47:06 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org do i now want to Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] Data::Stag, Test::Harness, and CPAN are required, not optional. So I think they'll be installed even if you answer n to the question about the optional external modules. Dave _________________________________________________________________ MSN straight to your mobile - news, entertainment, videos and more. http://clk.atdmt.com/UKM/go/147991039/direct/01/ From cjfields at illinois.edu Mon Jul 6 17:04:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 16:04:29 -0500 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: (cc'ing Chris M about this) The Tk and GD dependencies should probably be optional if they are only required for the Data::Stag scripts. As for the libxslt, I'm not sure but I believe that's available with the dev kit; if not it's available via fink/macports. The inclusion of those as a requirement is a bit troubling for me, but this is the first time I've seen issues with it pop up. chris On Jul 6, 2009, at 3:26 PM, Dave Messina wrote: > Hi Steven, > Forwarding this to the list so that everyone can follow > along...please keep > the list on any replies. > > Don't quit out of the install -- cpan can automatically detect > required > dependencies and will try to install them first. > > Amidst all of the kerfuffle in your previous install there was this > bit: > > ---- Unsatisfied dependencies detected during > [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- > > Test::Harness > > Data::Stag > > CPAN > > Shall I follow them and prepend them to the queue > > of modules we are processing right now? [yes] > > > So if you go back into cpan, try the Bioperl-1.6 install again, you > should > be prompted again about those missing dependencies. > > > > A note to the Bioperl core-devs: > > Data::Stag seems to have a couple of tricky dependencies of its own, > namely > GD and Tk, and it looks like they're for a couple of included > scripts which > I'm guessing Bioperl doesn't use. > > Perhaps we should send a request to the Data::Stag author to make GD > and Tk > optional instead of required? > > > Dave > > > > > ---------- Forwarded message ---------- > From: Steven McGowan > Date: Mon, Jul 6, 2009 at 22:02 > Subject: RE: [Bioperl-l] Bioperl Installation > To: david.messina at sbc.su.se > > > Hi Dave, > > I managed to sort it and have had a go at > installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon > installing have noticed the error: > > Checking prerequisites... > - ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install > Data::Stag" > in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > is not installed on this computer. > Data::Stag::XSLTHandler in Data::Stag needs it for XSLT > Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > is not installed on this computer. > Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > is not installed on this computer. > stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and > algorithms, > is not installed on this computer. > Data::Stag::GraphHandler in Data::Stag needs it for transforming stag > trees to graphs > > External Module Tk, Tk, > is not installed on this computer. > stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's > lacking > Data::Stag who's install is lacking the list above. How would i go > about > installing the above list? is there an easier way or something i'm > doing > wrong? > > Thanks, > > Stephen > > ------------------------------ > From: David.Messina at sbc.su.se > Date: Mon, 6 Jul 2009 21:38:09 +0200 > Subject: Re: [Bioperl-l] Bioperl Installation > To: stevey_mac2k2 at hotmail.com > CC: Bioperl-l at lists.open-bio.org > > > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your > log is: > > Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the > developer tools. > > > Go to developer.apple.com and create an account if you don't already > have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > You'll need to login to get to it, and then you'll get to the > download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that > you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > > > ------------------------------ > View your Twitter and Flickr updates from one place ? Learn > more! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jul 6 17:06:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 16:06:56 -0500 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> Message-ID: Okay, that makes more sense to me (and also makes sense looking at the Data::Stag makefile). None of those additional modules are required for bioperl core functionality. chris On Jul 6, 2009, at 3:50 PM, Scott Cain wrote: > Hi Dave, > > I think you are confused about the prereqs for Data::Stag: I have it > installed and working and don't have Tk. cpantesters.org also thinks > that IO::String is the only dependency: > > http://deps.cpantesters.org/?module=Data::Stag;perl=latest > > Scott > > > On Mon, Jul 6, 2009 at 4:26 PM, Dave > Messina wrote: >> Hi Steven, >> Forwarding this to the list so that everyone can follow >> along...please keep >> the list on any replies. >> >> Don't quit out of the install -- cpan can automatically detect >> required >> dependencies and will try to install them first. >> >> Amidst all of the kerfuffle in your previous install there was this >> bit: >> >> ---- Unsatisfied dependencies detected during >> [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- >> >> Test::Harness >> >> Data::Stag >> >> CPAN >> >> Shall I follow them and prepend them to the queue >> >> of modules we are processing right now? [yes] >> >> >> So if you go back into cpan, try the Bioperl-1.6 install again, you >> should >> be prompted again about those missing dependencies. >> >> >> >> A note to the Bioperl core-devs: >> >> Data::Stag seems to have a couple of tricky dependencies of its >> own, namely >> GD and Tk, and it looks like they're for a couple of included >> scripts which >> I'm guessing Bioperl doesn't use. >> >> Perhaps we should send a request to the Data::Stag author to make >> GD and Tk >> optional instead of required? >> >> >> Dave >> >> >> >> >> ---------- Forwarded message ---------- >> From: Steven McGowan >> Date: Mon, Jul 6, 2009 at 22:02 >> Subject: RE: [Bioperl-l] Bioperl Installation >> To: david.messina at sbc.su.se >> >> >> Hi Dave, >> >> I managed to sort it and have had a go at >> installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon >> installing have noticed the error: >> >> Checking prerequisites... >> - ERROR: Data::Stag is not installed >> >> so i have then quit out of the install, and entered "install >> Data::Stag" >> in>CPAN >> >> but receive the following error messages: >> >> External Module XML::LibXSLT, XSLT, >> is not installed on this computer. >> Data::Stag::XSLTHandler in Data::Stag needs it for XSLT >> Transformations >> >> External Module XML::Parser::PerlSAX, SAX Handler, >> is not installed on this computer. >> Data::Stag::XMLParser in Data::Stag needs it for parsing XML >> >> External Module GD, Graphical Drawing Toolkit, >> is not installed on this computer. >> stag-drawtree.pl in Data::Stag needs it for drawing trees >> >> External Module Graph::Directed, Generic Graph data stucture and >> algorithms, >> is not installed on this computer. >> Data::Stag::GraphHandler in Data::Stag needs it for transforming >> stag >> trees to graphs >> >> External Module Tk, Tk, >> is not installed on this computer. >> stag-view.pl in Data::Stag needs it for tree viewer >> >> ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's >> lacking >> Data::Stag who's install is lacking the list above. How would i go >> about >> installing the above list? is there an easier way or something i'm >> doing >> wrong? >> >> Thanks, >> >> Stephen >> >> ------------------------------ >> From: David.Messina at sbc.su.se >> Date: Mon, 6 Jul 2009 21:38:09 +0200 >> Subject: Re: [Bioperl-l] Bioperl Installation >> To: stevey_mac2k2 at hotmail.com >> CC: Bioperl-l at lists.open-bio.org >> >> >> Hi Stephen, >> This is on a Mac, correct? >> >> You need to install the developer tools first. The key line in your >> log is: >> >> Can't test without successful make >> >> >> Admittedly, that's cryptic. What it means is that it needs the >> program >> called make. That program is installed when you install the >> developer tools. >> >> >> Go to developer.apple.com and create an account if you don't >> already have >> one. >> >> >> Go to the Mac Dev Center, and click on "Xcode 3". >> >> >> This should be the right link: >> >> Xcode 3 >> >> You'll need to login to get to it, and then you'll get to the >> download page >> for the massive 986 MB Xcode 3.1.3 download. >> >> After you run the Xcode installer, you can check in Terminal that >> you've got >> 'make' installed by typing: >> >> which make >> >> on the command line. It should give you the answer >> make is /usr/bin/make >> >> If it does, then you're good to try again with the bioperl install. >> >> Dave >> >> >> ------------------------------ >> View your Twitter and Flickr updates from one place ? Learn >> more! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at > scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jul 6 17:09:05 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:09:05 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> Message-ID: <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> > > * XML::SAX (0.14) is installed, but we prefer to have 0.15 > (wanted for parsing xml, used by Bio::SearchIO::blastxml, > Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) > > [snip] > > ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the > versions > of the modules indicated above before proceeding with this installation > This "error/warning" refers to XML::SAX. I'm pretty sure that's optional. Not exactly sure why it's getting called out specifically here, but I think you can safely ignore it. Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] > > You went ahead and answered this question, right? The installation should have started at this point. D From David.Messina at sbc.su.se Mon Jul 6 17:16:00 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:16:00 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> Message-ID: <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> > > I'm just hanging on the question at the moment.. not sure whether to > install all [a] or [n]one. I'm probably going to go with [a]ll > Yes, you'll probably want all the scripts. From David.Messina at sbc.su.se Mon Jul 6 17:29:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:29:21 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> Message-ID: <628aabb70907061429h1d10623dxdca6239ecfe66c29@mail.gmail.com> Did you confirm that make is available to cpan before you started, by following Scott's earlier instructions? From stevey_mac2k2 at hotmail.com Mon Jul 6 17:11:14 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 21:11:14 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> Message-ID: I'm just hanging on the question at the moment.. not sure whether to install all [a] or [n]one. I'm probably going to go with [a]ll From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 23:09:05 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org * XML::SAX (0.14) is installed, but we prefer to have 0.15 (wanted for parsing xml, used by Bio::SearchIO::blastxml, Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) [snip] ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versionsof the modules indicated above before proceeding with this installation This "error/warning" refers to XML::SAX. I'm pretty sure that's optional. Not exactly sure why it's getting called out specifically here, but I think you can safely ignore it. Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] You went ahead and answered this question, right? The installation should have started at this point. D _________________________________________________________________ Share your photos with Windows Live Photos ? Free. http://clk.atdmt.com/UKM/go/134665338/direct/01/ From stevey_mac2k2 at hotmail.com Mon Jul 6 17:23:41 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 21:23:41 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> Message-ID: ok... after choosing to install all scripts, i receive: Creating new 'Build' script for 'BioPerl' version '1.006000'Warning: PREREQ_PM mentions Test::Harness more than once, last mention wins at /System/Library/Perl/5.8.8/CPAN.pm line 4689, line 1.Warning: PREREQ_PM mentions CPAN more than once, last mention wins at /System/Library/Perl/5.8.8/CPAN.pm line 4689, line 1.---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queueof modules we are processing right now? [yes] y -------------TEST HARNESS------------- Running install for module Test::Harness CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz Checking if your kit is complete...Looks goodWriting Makefile for Test::Harness -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible ---------DATA::STAG---------- Running install for module Data::StagRunning make for C/CM/CMUNGALL/Data-Stag-0.11.tar.gz CPAN.pm: Going to build C/CM/CMUNGALL/Data-Stag-0.11.tar.gz External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer There are some external packages and perl modules, listed above, which stag uses. This only effects the functionality which is listed above: the rest of stag will work fine, which includes nearly all of the core functionality. Enjoy the rest of stag, which you can use after going 'make install' Checking if your kit is complete...Looks goodWriting Makefile for Data -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible -------------CPAN-------------- CPAN.pm: Going to build A/AN/ANDK/CPAN-1.9402.tar.gz Checking if your kit is complete...Looks goodWarning: prerequisite File::HomeDir 0.69 not found.Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.Writing Makefile for CPAN---- Unsatisfied dependencies detected during [A/AN/ANDK/CPAN-1.9402.tar.gz] ----- Test::Harness File::HomeDirShall I follow them and prepend them to the queueof modules we are processing right now? [yes] yRunning make test Delayed until after prerequisitesRunning make install Delayed until after prerequisitesRunning install for module Test::HarnessRunning make for A/AN/ANDYA/Test-Harness-3.17.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/Test-Harness-3.17 Has already been processed within this sessionRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible -------Further On-------> CPAN.pm: Going to build A/AD/ADAMK/File-HomeDir-0.86.tar.gz Checking if your kit is complete...Looks goodWriting Makefile for File::HomeDir -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossibleRunning make for A/AN/ANDK/CPAN-1.9402.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/CPAN-1.9402 CPAN.pm: Going to build A/AN/ANDK/CPAN-1.9402.tar.gz -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossibleRunning make for C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/BioPerl-1.6.0 CPAN.pm: Going to build C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 23:16:00 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org I'm just hanging on the question at the moment.. not sure whether to install all [a] or [n]one. I'm probably going to go with [a]ll Yes, you'll probably want all the scripts. _________________________________________________________________ With Windows Live, you can organise, edit, and share your photos. http://clk.atdmt.com/UKM/go/134665338/direct/01/ From rmb32 at cornell.edu Mon Jul 6 14:59:38 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 06 Jul 2009 11:59:38 -0700 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <4A52499A.7010208@cornell.edu> Hi Stephen, It looks to me like your CPAN installation has gotten a bit confused, possibly from it getting stopped in the middle of doing something. Try doing rm -rf ~/.cpan/build/* and trying the installation again. Also, it's usually best to just cut and paste logs like this into the body of an email, but try to paste only the most relevant parts. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu stephenmcgowan1 wrote: > Hi, > > I seem to be having trouble with Installing Bioperl 1.6 in CPAN. > > I have attached a log of the install, i just can't see why it seems to be > falling over. > > Thanks, > > Stephen > > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf > BioPerl+Install.rtf > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc > BioPerl+Install.doc From manchunjohn-ma at uiowa.edu Mon Jul 6 18:10:17 2009 From: manchunjohn-ma at uiowa.edu (John M.C. Ma) Date: Mon, 6 Jul 2009 17:10:17 -0500 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? Message-ID: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> We have told the guys at RepeatMasker that RM-3.1.6 have a problem causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). And as of today, they are now at 3.2.8, and the problem is not fixed. And I don't want my project to be stalled-- any tips for a workaround? From David.Messina at sbc.su.se Mon Jul 6 18:32:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Jul 2009 00:32:55 +0200 Subject: [Bioperl-l] Fwd: Bioperl Installation Message-ID: <628aabb70907061532t3805e1bar3a02e6328f1f5b6c@mail.gmail.com> ---------- Forwarded message ---------- From: Steven McGowan Date: Tue, Jul 7, 2009 at 00:01 Subject: RE: To: david.messina at sbc.su.se I think it's done the trick! although if i want to be 100% sure it's > installed ok is there a command i can type to make sure it's installed ok? Yep, try this on the command line: perl -e 'use Bio::SeqIO; print "Success!\n";' If you see Success! then you're good to go. Now that bioperl is installed, i will now install the bioperl-db-. Okay. I'm going to bed now. :) Thanks for all your time and help Dave. You're welcome! Dave From koenvanderdrift at gmail.com Mon Jul 6 18:41:38 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Mon, 6 Jul 2009 18:41:38 -0400 Subject: [Bioperl-l] Bioperl Installation Message-ID: Hi, Installation problems on a Mac seems to be a recurring question on this mailing list. Just as a reminder, besides CPAN, one of the easiest ways to install bioperl on a Mac is through fink. The instructions are available on the bioperl website here: http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink , but are rather hidden. Maybe http://www.bioperl.org/wiki/Installing_BioPerl can be edited to state Installing Bioperl for Unix (including Mac OS X)? I don't seem to have privileges to edit that page, so I'll leave that up to the team. Also, the file PACKAGES contains a link about installation on Mac OS X that is *very* outdated. Can this be removed from the package, I think it only creates confusion? Cheers, - Koen. From bix at sendu.me.uk Mon Jul 6 19:43:23 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 07 Jul 2009 00:43:23 +0100 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> Message-ID: <4A528C1B.2030506@sendu.me.uk> John M.C. Ma wrote: > We have told the guys at RepeatMasker that RM-3.1.6 have a problem > causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > And as of today, they are now at 3.2.8, and the problem is not fixed. > And I don't want my project to be stalled-- any tips for a workaround? Here's my mail to some RepeatMasker devs that they never replied to: ----- Hi, Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint cannot be used because of error 'Unknown option: noint-species'. This is caused by line 1131 having no space after the "-noint". Likewise, -lcambig on 1128 would probably suffer a similar problem. Will this be fixed in the next version, and how often do you release new versions? ----- If it really is the same bug, it should be easy to fix the latest version in the same way yourself. From Russell.Smithies at agresearch.co.nz Mon Jul 6 20:06:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 7 Jul 2009 12:06:54 +1200 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <4A528C1B.2030506@sendu.me.uk> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> <4A528C1B.2030506@sendu.me.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B8697D0@exchsth.agresearch.co.nz> Is it the "-noint" bug causing the crash? We had major problems (with version 3.2.8) where it would stack-dump which I worked around by running it with the "-no_is" option so it doesn't check for bacterial insertion elements. We've never had a crash after that :-) Also, it is open-source so you could fix your own copy if you know what the bugs are. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Tuesday, 7 July 2009 11:43 a.m. > To: manchunjohn-ma at uiowa.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any > workarounds? > > John M.C. Ma wrote: > > We have told the guys at RepeatMasker that RM-3.1.6 have a problem > > causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > > And as of today, they are now at 3.2.8, and the problem is not fixed. > > And I don't want my project to be stalled-- any tips for a workaround? > > Here's my mail to some RepeatMasker devs that they never replied to: > > ----- > Hi, > > Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint > cannot be used because of error 'Unknown option: noint-species'. > This is caused by line 1131 having no space after the "-noint". > Likewise, -lcambig on 1128 would probably suffer a similar problem. > > Will this be fixed in the next version, and how often do you release new > versions? > ----- > > If it really is the same bug, it should be easy to fix the latest > version in the same way yourself. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From rmb32 at cornell.edu Mon Jul 6 20:13:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 06 Jul 2009 17:13:06 -0700 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> Message-ID: <4A529312.9040905@cornell.edu> John M.C. Ma wrote: > And as of today, they are now at 3.2.8, and the problem is not fixed. > And I don't want my project to be stalled-- any tips for a workaround? FORK! Just kidding. Mostly. Actually, svn vendor branches or something similar can be a good option for unpleasant things like this, see http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rhubley at systemsbiology.org Tue Jul 7 09:58:01 2009 From: rhubley at systemsbiology.org (Robert Hubley) Date: Tue, 07 Jul 2009 06:58:01 -0700 Subject: [Bioperl-l] RepeatMasker Message-ID: <4A535469.4060603@systemsbiology.org> This list email as forwarded to us by a colleague. I fixed this bug awhile back and I just double checked 3.2.8 and don't see any problems with the options -noint or -lcambig. Could someone help us determine how this is breaking bio-perl? Thanks, -Robert |We have told the guys at RepeatMasker that RM-3.1.6 have a problem |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). |And as of today, they are now at 3.2.8, and the problem is not fixed. |And I don't want my project to be stalled-- any tips for a workaround? || ||Hi, || ||Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint ||cannot be used because of error 'Unknown option: noint-species'. ||This is caused by line 1131 having no space after the "-noint". ||Likewise, -lcambig on 1128 would probably suffer a similar problem. || ||Will this be fixed in the next version, and how often do you release new ||versions? From manchunjohn-ma at uiowa.edu Tue Jul 7 13:17:40 2009 From: manchunjohn-ma at uiowa.edu (John M.C. Ma) Date: Tue, 7 Jul 2009 12:17:40 -0500 Subject: [Bioperl-l] RepeatMasker Re: Bioperl-l Digest, Vol 75, Issue 10 Message-ID: <5486b2980907071017o24a6c186paefdef0bcbfe6ecc@mail.gmail.com> Hi, Sorry that I thought it was the same as 2138, as I never used -noint. I used -species and -noisy-- but it does not matter any more. I tried to run it without parameters and it crashed in the same way as 2138. John Ma On Tue, Jul 7, 2009 at 11:00 AM, wrote: > Send Bioperl-l mailing list submissions to > ? ? ? ?bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > ? ? ? ?http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > ? ? ? ?bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > ? ? ? ?bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > ? 1. Re: ?RepeatMasker still did not act upon Bug 2138: Any > ? ? ?workarounds? (Robert Buels) > ? 2. ?RepeatMasker (Robert Hubley) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 06 Jul 2009 17:13:06 -0700 > From: Robert Buels > Subject: Re: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: > ? ? ? ?Any workarounds? > To: manchunjohn-ma at uiowa.edu, BioPerl List > ? ? ? ? > Message-ID: <4A529312.9040905 at cornell.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > John M.C. Ma wrote: >> And as of today, they are now at 3.2.8, and the problem is not fixed. >> And I don't want my project to be stalled-- any tips for a workaround? > > FORK! > > Just kidding. ?Mostly. > > Actually, svn vendor branches or something similar can be a good option > for unpleasant things like this, see > http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html > > Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > ------------------------------ > > Message: 2 > Date: Tue, 07 Jul 2009 06:58:01 -0700 > From: Robert Hubley > Subject: [Bioperl-l] RepeatMasker > To: bioperl-l at bioperl.org > Message-ID: <4A535469.4060603 at systemsbiology.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > This list email as forwarded to us by a colleague. ?I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any problems > with the options -noint or -lcambig. ?Could someone help us determine > how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint > ||cannot be used because of error 'Unknown option: noint-species'. > ||This is caused by line 1131 having no space after the "-noint". > ||Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you release new > ||versions? > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 75, Issue 10 > ***************************************** > From cjfields at illinois.edu Tue Jul 7 13:30:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:30:23 -0500 Subject: [Bioperl-l] YAPC::NA hackathon Message-ID: <37D8DDC8-358F-4E56-9C49-C21281735A3A@illinois.edu> On behalf of the bioperl core devs I want to thank the participants of the YAPC::NA 2009 BioPerl hackathon. Robert Buels, Jay Hannah, and Bruno Vecchi managed to squash several bugs in the process; Robert recently merged these back to trunk. Great work! chris From cjfields at illinois.edu Tue Jul 7 13:23:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:23:56 -0500 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <4A535469.4060603@systemsbiology.org> References: <4A535469.4060603@systemsbiology.org> Message-ID: <870B41F0-A31B-44FF-B44F-2957ECDB6F9E@illinois.edu> Robert, the best way to handle this is to file a bug report indicating all the specifics as well as some example code demonstrating the problem. http://www.bioperl.org/wiki/Bugs chris On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: > This list email as forwarded to us by a colleague. I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any > problems with the options -noint or -lcambig. Could someone help us > determine how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug > 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a > workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 - > noint ||cannot be used because of error 'Unknown option: noint- > species'. > ||This is caused by line 1131 having no space after the "-noint". || > Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you > release new ||versions? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jul 7 13:52:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:52:44 -0500 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <4A535469.4060603@systemsbiology.org> References: <4A535469.4060603@systemsbiology.org> Message-ID: <3E4C0788-8B44-4408-BB26-FA9F48133948@illinois.edu> Robert, Sorry about that last post, thought you were reporting a problem not inquiring about one. Here's what we have: http://bugzilla.open-bio.org/show_bug.cgi?id=2138 Not sure but from the last few reports this is still a problem with RepeatMasker and bioperl. I'll try looking into it from our end. chris On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: > This list email as forwarded to us by a colleague. I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any > problems with the options -noint or -lcambig. Could someone help us > determine how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug > 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a > workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 - > noint ||cannot be used because of error 'Unknown option: noint- > species'. > ||This is caused by line 1131 having no space after the "-noint". || > Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you > release new ||versions? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Tue Jul 7 13:59:41 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 7 Jul 2009 10:59:41 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... Message-ID: Hi All, I am trying to use bp_genbank2gff.pl script to convert a locally downloaded genbank file. It is throwing stack errors. But, the script works beautifully when I use --accession option to download and convert. Any suggestions? Thanks very much for checking this. the command i use: perl bp_genbank2gff.pl --stdout --file NC_004329.nb and i am getting the following exception message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/embl.pm:189 STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/biofetch.pm:163 STACK: bp_genbank2gff.pl:274 ----------------------------------------------------------- Many thanks in advance, Gowtham From cain.cshl at gmail.com Tue Jul 7 15:18:50 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 7 Jul 2009 15:18:50 -0400 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Hi Gotham, I was going to send you an email to complain to the author, until I realized that it was me :-) It has been quite a while since I looked at the code for this script, as the one I typically use these days is bp_genbank2gff3.pl, but I think I have a "fix". Try changing the name of the file to NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic that the script uses to determine if a file is genbank formated versus embl (note that the error message says it's not an embl file--that's why). If that doesn't do it, let me (and the mailing list) know. Scott PS: I wasn't really going to say to complain to the author directly-- that was just me trying to be funny. PPS: As another side note, it is fairly funny to me that the code that this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the documentation that it is proof-of-principle and should not be used in production. On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > > Hi All, > I am trying to use bp_genbank2gff.pl script to convert a locally > downloaded genbank file. It is throwing stack errors. But, the > script works beautifully when I use --accession option to download > and convert. > > Any suggestions? Thanks very much for checking this. > > the command i use: > perl bp_genbank2gff.pl --stdout --file NC_004329.nb > > and i am getting the following exception message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ > Root/Root.pm:359 > STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ > SeqIO/embl.pm:189 > STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ > perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ > biofetch.pm:163 > STACK: bp_genbank2gff.pl:274 > ----------------------------------------------------------- > > > Many thanks in advance, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From gowthaman.ramasamy at sbri.org Tue Jul 7 16:06:16 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 07 Jul 2009 13:06:16 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Message-ID: Hi Scott, Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I wouldn't mistake it in any other ways. I was rushing back to email the list to tell I got it working. There is NOTHING wrong with script. Its perfectly good. (not many scripts stands the time. This one does). Its the genbank record. Some of the genbank records I tried did produce that error, while others did not. I'll dig into those files to see if anything is obvious (to my eyes) that causes this error. PS1: I tried changing the name, and that did NOT solve the problem. Thanks once again, Gowtham On 7/7/09 12:18 PM, "Scott Cain" wrote: > Hi Gotham, > > I was going to send you an email to complain to the author, until I > realized that it was me :-) > > It has been quite a while since I looked at the code for this script, > as the one I typically use these days is bp_genbank2gff3.pl, but I > think I have a "fix". Try changing the name of the file to > NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic > that the script uses to determine if a file is genbank formated versus > embl (note that the error message says it's not an embl file--that's > why). If that doesn't do it, let me (and the mailing list) know. > > Scott > > PS: I wasn't really going to say to complain to the author directly-- > that was just me trying to be funny. > > PPS: As another side note, it is fairly funny to me that the code that > this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the > documentation that it is proof-of-principle and should not be used in > production. > > > On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > >> >> Hi All, >> I am trying to use bp_genbank2gff.pl script to convert a locally >> downloaded genbank file. It is throwing stack errors. But, the >> script works beautifully when I use --accession option to download >> and convert. >> >> Any suggestions? Thanks very much for checking this. >> >> the command i use: >> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >> >> and i am getting the following exception message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >> Root/Root.pm:359 >> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ >> SeqIO/embl.pm:189 >> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >> biofetch.pm:163 >> STACK: bp_genbank2gff.pl:274 >> ----------------------------------------------------------- >> >> >> Many thanks in advance, >> Gowtham >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > From gowthaman.ramasamy at sbri.org Tue Jul 7 16:12:51 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 07 Jul 2009 13:12:51 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Message-ID: And bp_genbank2gff3.pl script handled them very well....... Thanks again, gowtham On 7/7/09 12:18 PM, "Scott Cain" wrote: > Hi Gotham, > > I was going to send you an email to complain to the author, until I > realized that it was me :-) > > It has been quite a while since I looked at the code for this script, > as the one I typically use these days is bp_genbank2gff3.pl, but I > think I have a "fix". Try changing the name of the file to > NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic > that the script uses to determine if a file is genbank formated versus > embl (note that the error message says it's not an embl file--that's > why). If that doesn't do it, let me (and the mailing list) know. > > Scott > > PS: I wasn't really going to say to complain to the author directly-- > that was just me trying to be funny. > > PPS: As another side note, it is fairly funny to me that the code that > this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the > documentation that it is proof-of-principle and should not be used in > production. > > > On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > >> >> Hi All, >> I am trying to use bp_genbank2gff.pl script to convert a locally >> downloaded genbank file. It is throwing stack errors. But, the >> script works beautifully when I use --accession option to download >> and convert. >> >> Any suggestions? Thanks very much for checking this. >> >> the command i use: >> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >> >> and i am getting the following exception message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >> Root/Root.pm:359 >> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ >> SeqIO/embl.pm:189 >> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >> biofetch.pm:163 >> STACK: bp_genbank2gff.pl:274 >> ----------------------------------------------------------- >> >> >> Many thanks in advance, >> Gowtham >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > From scott at scottcain.net Tue Jul 7 16:17:58 2009 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Jul 2009 16:17:58 -0400 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: <12B256A9-A227-4234-B1D3-2ED4216CB745@scottcain.net> Hi Gowthaman, I thought I knew you but wasn't sure; hi again. About the problematic genbank files: was it something that the genbank parser should have handled but did not? If so, we should get that fixed anyway. Scott On Jul 7, 2009, at 4:06 PM, Gowthaman Ramasamy wrote: > Hi Scott, > Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I > wouldn't > mistake it in any other ways. > > I was rushing back to email the list to tell I got it working. There > is > NOTHING wrong with script. Its perfectly good. (not many scripts > stands the > time. This one does). > > Its the genbank record. Some of the genbank records I tried did > produce that > error, while others did not. I'll dig into those files to see if > anything is > obvious (to my eyes) that causes this error. > > PS1: I tried changing the name, and that did NOT solve the problem. > > Thanks once again, > Gowtham > > > On 7/7/09 12:18 PM, "Scott Cain" wrote: > >> Hi Gotham, >> >> I was going to send you an email to complain to the author, until I >> realized that it was me :-) >> >> It has been quite a while since I looked at the code for this script, >> as the one I typically use these days is bp_genbank2gff3.pl, but I >> think I have a "fix". Try changing the name of the file to >> NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic >> that the script uses to determine if a file is genbank formated >> versus >> embl (note that the error message says it's not an embl file--that's >> why). If that doesn't do it, let me (and the mailing list) know. >> >> Scott >> >> PS: I wasn't really going to say to complain to the author directly-- >> that was just me trying to be funny. >> >> PPS: As another side note, it is fairly funny to me that the code >> that >> this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in >> the >> documentation that it is proof-of-principle and should not be used in >> production. >> >> >> On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: >> >>> >>> Hi All, >>> I am trying to use bp_genbank2gff.pl script to convert a locally >>> downloaded genbank file. It is throwing stack errors. But, the >>> script works beautifully when I use --accession option to download >>> and convert. >>> >>> Any suggestions? Thanks very much for checking this. >>> >>> the command i use: >>> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >>> >>> and i am getting the following exception message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: EMBL stream with no ID. Not embl in my book >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >>> Root/Root.pm:359 >>> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/ >>> Bio/ >>> SeqIO/embl.pm:189 >>> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >>> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >>> biofetch.pm:163 >>> STACK: bp_genbank2gff.pl:274 >>> ----------------------------------------------------------- >>> >>> >>> Many thanks in advance, >>> Gowtham >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 7 18:30:28 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 17:30:28 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: I may take a look at that one as well (it would be nice to know if this is something that's popping up in newer records). chris On Jul 7, 2009, at 3:06 PM, Gowthaman Ramasamy wrote: > Hi Scott, > Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I > wouldn't > mistake it in any other ways. > > I was rushing back to email the list to tell I got it working. There > is > NOTHING wrong with script. Its perfectly good. (not many scripts > stands the > time. This one does). > > Its the genbank record. Some of the genbank records I tried did > produce that > error, while others did not. I'll dig into those files to see if > anything is > obvious (to my eyes) that causes this error. > > PS1: I tried changing the name, and that did NOT solve the problem. > > Thanks once again, > Gowtham > > > On 7/7/09 12:18 PM, "Scott Cain" wrote: > >> Hi Gotham, >> >> I was going to send you an email to complain to the author, until I >> realized that it was me :-) >> >> It has been quite a while since I looked at the code for this script, >> as the one I typically use these days is bp_genbank2gff3.pl, but I >> think I have a "fix". Try changing the name of the file to >> NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic >> that the script uses to determine if a file is genbank formated >> versus >> embl (note that the error message says it's not an embl file--that's >> why). If that doesn't do it, let me (and the mailing list) know. >> >> Scott >> >> PS: I wasn't really going to say to complain to the author directly-- >> that was just me trying to be funny. >> >> PPS: As another side note, it is fairly funny to me that the code >> that >> this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in >> the >> documentation that it is proof-of-principle and should not be used in >> production. >> >> >> On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: >> >>> >>> Hi All, >>> I am trying to use bp_genbank2gff.pl script to convert a locally >>> downloaded genbank file. It is throwing stack errors. But, the >>> script works beautifully when I use --accession option to download >>> and convert. >>> >>> Any suggestions? Thanks very much for checking this. >>> >>> the command i use: >>> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >>> >>> and i am getting the following exception message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: EMBL stream with no ID. Not embl in my book >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >>> Root/Root.pm:359 >>> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/ >>> Bio/ >>> SeqIO/embl.pm:189 >>> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >>> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >>> biofetch.pm:163 >>> STACK: bp_genbank2gff.pl:274 >>> ----------------------------------------------------------- >>> >>> >>> Many thanks in advance, >>> Gowtham >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abhishek.vit at gmail.com Wed Jul 8 10:24:05 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Jul 2009 10:24:05 -0400 Subject: [Bioperl-l] Classifying SNPs Message-ID: Hi All This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic? then find the position on the gene where amino acid is getting changed and to what ?...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi From Xianjun.Dong at bccs.uib.no Wed Jul 8 11:04:15 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Wed, 08 Jul 2009 17:04:15 +0200 Subject: [Bioperl-l] [Bio::Graphics::Panel] code reference cannot pass to -link, why? In-Reply-To: <4A4E3029.4020109@ii.uib.no> References: <4A4E3029.4020109@ii.uib.no> Message-ID: <4A54B56F.6050204@ii.uib.no> Hi, Scott Thanks for your help to my previous question about background layer. It works well! Now, I have another question regarding the -link function in imagemap. I post to Bioperl mailist. It seems to detail to get much attention. I followed the code in the Bio::Graphics POD, but it does not work. Could you pls take a look? Thanks again Xianjun Xianjun Dong wrote: > Hi, > > I have a problem while using the -link in Bio::Graphics (version 1.96): > > As the POD of Bio::Graphics described > (http://search.cpan.org/~lds/Bio-Graphics-1.96/lib/Bio/Graphics/Panel.pm#Creating_Imagemaps), > > > link format like: > > -link => 'http://www.google.com/search?q=$description' > > > works well in my code, but the format like > > -link => sub { > my ($feature,$panel) = @_; > my $type = $feature->primary_tag; > my $name = $feature->display_name; > if ($primary_tag eq 'clone') { > return "http://www.google.com/search?q=$name"; > } else { > return "http://www.yahoo.com/search?p=$name"; > } > > > does not output image map as expected. > > Here I attached a simple code as example for anyone who is willing to > test for me: > > #!/usr/bin/perl > use strict; > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > # processed_transcript > my $trans1 = > > $ftr->new(-start=>50,-end=>10,-display_name=>'ZK154.1',-type=>'UTR'); > my $trans2 = > > $ftr->new(-start=>100,-end=>50,-display_name=>'ZK154.2',-type=>'CDS'); > my $trans3 = > > $ftr->new(-start=>350,-end=>225,-display_name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = > > $ftr->new(-start=>700,-end=>650,-display_name=>'ZK154.4',-type=>'UTR'); > my @trans = ($trans1,$trans2,$trans3,$trans4); > > my $panel= Bio::Graphics::Panel->new(-start =>0,-length=>1050); > > $panel->add_track(\@trans, > -glyph => 'transcript2', > # This works well! > #-link => > 'http://www.google.com/search?q=$name', > # while, the following code does not work as > expected. > -link => sub { > my ($feature,$panel) = @_; > my $type = $feature->primary_tag; > my $name = $feature->display_name; > if ($type eq 'CDS') { > return > "http://www.google.com/search?q=$name"; > } else { > return > "http://www.yahoo.com/search?p=$name"; > } > } > ); > my $map = $panel->create_web_map("mapname"); > print $map; > $panel->finished(); > > In my test (Bioperl 1.6.0), its output is: > > > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > > > > It seems $feature->primary_tag returns 'track' (I don't know where > this come from...), but not the type of features. Anyone has clue for > this problem? > > Thanks > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From giles.weaver at googlemail.com Wed Jul 8 11:26:54 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Wed, 8 Jul 2009 16:26:54 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A520591.3070407@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> Message-ID: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> I've just added a sequence adapter removal implementation to the bioperl scrapbook at http://www.bioperl.org/wiki/Removing_sequencing_adapters. I think the basic method is sound, but the implementation is ugly. Performance wise, it currently takes around 80 minutes to remove adapters from a ~3.2 million read Illumina run. This includes quality trimming and grouping the sequences to reduce processing time. The quality trimming (described earlier in this thread) takes about 15 minutes, so adapter removal is definitely the bottleneck. I'm confident that some relatively simple developments in Bioperl and/or EMBOSS will yield some big performance improvements - if you see my sample code in the scrapbook you'll understand why! I've also been experimenting with sequence entropy calculations for filtering out junk sequence. I used Mark Jensens code at http://www.bioperl.org/wiki/Site_entropy_in_an_alignment for inspiration. Here is my current entropy calculation code: sub entropy { my ($seq_str, $word_size) = @_; my %res_counts; for (my $i = 0; $i <= ((length $seq_str) - $word_size); $i ++) { my $word = substr $seq_str, $i, $word_size; if ($word !~ /N/) { $res_counts{$word} ++; } } #~ print STDERR join (" ", keys %res_counts), "\n"; #~ print STDERR join (" ", values %res_counts), "\n"; my @counts = values %res_counts; my $word_count = sum @counts; map {$_ /= $word_count} @counts; return sum map {-$_*log2($_)} @counts; } sub log2 { my $n = shift; return log($n)/log(2); } I don't know if this does "the right thing", and have yet to determine a suitable word size and entropy threshold for sequence filtering, so feel free to comment/test away. Giles 2009/7/6 Peter Rice > Giles Weaver wrote: > > I'm developing a transcriptomics database for use with next-gen data, and > > have found processing the raw data to be a big hurdle. > > > > I'm a bit late in responding to this thread, so most issues have already > > been discussed. One thing that hasn't been mentioned is removal of > adapters > > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > > developed and documented open source software for removal of adapters > (and > > poor quality sequence) from Illumina reads. > > We would like to add this to EMBOSS. Can you describe the method you > would like to use (I see you currently use a combination of bioperl and > emboss for this). > > > For my purposes the tools that would love to see supported in > > bioperl/bioperl-run are: > > > > - next-gen sequence quality parsing (to output phred scores) > > - sequence quality based trimming > > - sequencing adapter removal > > - filtering based on sequence complexity (repeats, entropy etc) > > - bioperl-run modules for bowtie etc. > > We would like to see these supported in all the Open-Bio Projects and > they are a priority for EMBOSS. > > Can you suggest quality filters, trimming methods, adaptor removal > methods, sequence filters and any other applications we could provide in > EMBOSS. > > We hope to keep in line with what the other projects do so that EMBOSS, > bioperl, biopython etc. can be used interchangeably in pipelines. > > > Obviously all of these need to be fast! .... My > > current code trims ~1300 sequences/second, including unzipping the raw > data > > and converting it to sanger fastq with biopython. Processing an entire > > sequencing run with the whole pipeline takes in the region of 6-12h. > > OK, we will see what speed we can reach. > > > Hope this looooong post was of interest to someone! > > Very interesting! > > regards, > > Peter Rice > From maj at fortinbras.us Wed Jul 8 11:23:54 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 11:23:54 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: Message-ID: <6269F0005AD041A69233C82E9BE1E776@NewLife> Hey Abhishek- You might root around in Bio::PopGen. Here's a script to get stuff from raw fasta data--see comments within. cheers Mark use Bio::AlignIO; use Bio::PopGen::Utilities; $file = "your_raw_file.fas"; my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; # get the alignment into a Bio::PopGen::Population format, with codons # as the marker sites my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, -site_model=>'cod'); # here are your variable codons... my @cdnpos = $pop->get_marker_names; # here are your individuals represented in the alignment my @inds = $pop->get_Individuals; # which have names like "Codon-3-9", "Codon-4-12", etc foreach my $cdn (@cdnpos) { # calculate the unique codons represented at this codon position my (%ucdns, @ucdns); @genos = $pop->get_Genotypes(-marker=>$cdn); $ucdns{$_->get_Alleles}++ for @genos; @ucdns = sort keys %ucdns; # # here, use translate or something faster to identify syn/non-syn # check out code in Bio::Align::DNAStatistics for various methods } # relate back to individuals with this foreach my $ind (@inds) { print "Individual ".$ind->unique_id."\n"; print "Site\tAllele\n"; foreach my $cdn (@cdnpos) { print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; } } 1; ----- Original Message ----- From: "Abhishek Pratap" To: Sent: Wednesday, July 08, 2009 10:24 AM Subject: [Bioperl-l] Classifying SNPs Hi All This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic then find the position on the gene where amino acid is getting changed and to what ...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmr at ebi.ac.uk Wed Jul 8 11:57:47 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Jul 2009 16:57:47 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> Message-ID: <4A54C1FB.8050708@ebi.ac.uk> Giles Weaver wrote: > I've just added a sequence adapter removal implementation to the bioperl > scrapbook at http://www.bioperl.org/wiki/Removing_sequencing_adapters. I > think the basic method is sound, but the implementation is ugly. Ugly perhaps, but I'll look anyway :-) I see you don't use needle because it creates gapped alignments, but that can be fixed with a sufficiently high gap penalty (just to see if it works - it won't be fast). We also have word-based matching methods in EMBOSS but they would not allow mismatches. I will play with alternatives and see what works best. Some word-based seed should allow for a faster solution. The provisional EMBOSS name for a quality filter and adaptor removal application is "quaffle" regards, Peter Rice From cjfields at illinois.edu Wed Jul 8 12:24:27 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 11:24:27 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A54C1FB.8050708@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> Message-ID: On Jul 8, 2009, at 10:57 AM, Peter Rice wrote: > Giles Weaver wrote: >> I've just added a sequence adapter removal implementation to the >> bioperl >> scrapbook at http://www.bioperl.org/wiki/ >> Removing_sequencing_adapters. I >> think the basic method is sound, but the implementation is ugly. > > Ugly perhaps, but I'll look anyway :-) > > I see you don't use needle because it creates gapped alignments, but > that can be fixed with a sufficiently high gap penalty (just to see if > it works - it won't be fast). > > We also have word-based matching methods in EMBOSS but they would not > allow mismatches. I will play with alternatives and see what works > best. > Some word-based seed should allow for a faster solution. > > The provisional EMBOSS name for a quality filter and adaptor removal > application is "quaffle" > > regards, > > Peter Rice In the meantime, we can probably add this in to Bio::SeqUtils for general use as an exported method. It would be nice to get some regression tests going for this to make sure it does what we expect, so maybe some test data and expected results? chris From IRytsareva at dow.com Wed Jul 8 15:42:54 2009 From: IRytsareva at dow.com (Rytsareva, Inna (I)) Date: Wed, 8 Jul 2009 15:42:54 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl Message-ID: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Hello, I have a follow script to parse the BLAST report: my $in = Bio::SearchIO->new ( -file =>$out_file, -format =>'blast') or die $!; while (my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { $qhit = $hit->name; $start = $hsp->hit->start; $end = $hsp->hit->end; } } print "Hit= ", $qhit, ",Start = ", $start, ",End = ", $end,"\n"; } Usually, the report has a number of the same hsp for each hit. Using "print" command it gives me a hit name, start and end positions for each hit, except last on. For last one it prints all the hsps. Something like this: Hit= gnl|DAS|22386,Start = 7578,End = 7601 Hit= gnl|DAS|25627,Start = 2824,End = 2863 Hit= gnl|DAS|25328,Start = 8864,End = 8887 Hit= gnl|DAS|4890,Start = 1896,End = 1919 Hit= gnl|DAS|12191,Start = 1898,End = 1921 Hit= gnl|DAS|4276,Start = 557,End = 580 Hit= gnl|DAS|12959,Start = 801,End = 824 Hit= gnl|DAS|4092,Start = 2266,End = 2304 Hit= gnl|DAS|19740,Start = 13572,End = 13610 Hit= gnl|DAS|12393,Start = 3901,End = 3924 Hit= gnl|DAS|25687,Start = 10415,End = 10438 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. I don't need these duplicates. How can I fix that? Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 From rmb32 at cornell.edu Wed Jul 8 18:45:09 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 08 Jul 2009 15:45:09 -0700 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> Message-ID: <4A552175.70009@cornell.edu> Giles Weaver wrote: > takes about 15 minutes, so adapter removal is definitely the bottleneck. I'm > confident that some relatively simple developments in Bioperl and/or EMBOSS > will yield some big performance improvements - if you see my sample code in Apropos this kind of thing, have you guys already discussed using lazy object creation for objects returned from bioperl parsers? Not really relevant in the short term, but it could be a useful avenue to pursue for addressing some performance concerns people (like ebi) have. In very vague terms, one would probably implement this by defining a very light-weight role/class called something like Bio::LazyInflator, that would provide only an `inflate` method. Parsers would parse into lightweight structures (probably arrayrefs) that implement LazyInflator and users could choose between grabbing data out of the uninflated arrayref directly, or they could call inflate() on it to transform it into a real object (like a Bio::Annotation or Bio::Seq or something). The exact implementation of this would vary depending on whether Moose is being used. This could potentially also be compatible with having some of the tight parsing loops be implemented in XS. Rob From torsten.seemann at infotech.monash.edu.au Wed Jul 8 20:25:34 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 9 Jul 2009 10:25:34 +1000 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: Inna, > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > ? ? ? ? ? ? ? ? ? ? ?? $start = $hsp->hit->start; > ? ? ? ? ? ? ? ? ? ? ? ?$end = $hsp->hit->end; Are you sure you mean $hsp->hit->start ? Perhaps you mean $hsp->start() or $hsp->start('hit') ? --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From jason at bioperl.org Wed Jul 8 20:50:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 8 Jul 2009 17:50:54 -0700 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <465D31E0-BBFE-4C73-A5E8-2CA9C0DF6DE9@bioperl.org> both work...TMTOWTDI $hsp->query->start and $hsp->start('query') are equivalent. as are $hsp->hit->start and $hsp->start('hit') . On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote: > Inna, > >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? > >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; > > Are you sure you mean $hsp->hit->start ? > Perhaps you mean $hsp->start() or $hsp->start('hit') ? > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From jason at bioperl.org Wed Jul 8 20:50:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 8 Jul 2009 17:50:54 -0700 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <465D31E0-BBFE-4C73-A5E8-2CA9C0DF6DE9@bioperl.org> both work...TMTOWTDI $hsp->query->start and $hsp->start('query') are equivalent. as are $hsp->hit->start and $hsp->start('hit') . On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote: > Inna, > >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? > >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; > > Are you sure you mean $hsp->hit->start ? > Perhaps you mean $hsp->start() or $hsp->start('hit') ? > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Wed Jul 8 21:00:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:00:19 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <165113A382BE4F20AAF10157EDF8E3FE@NewLife> My guess would be you have multiple query sequences (27, to be exact) that hit the same subject, viz. 12277 MAJ ----- Original Message ----- From: "Rytsareva, Inna (I)" To: Sent: Wednesday, July 08, 2009 3:42 PM Subject: [Bioperl-l] While loop - SearchIO for BioPerl > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jul 8 21:08:33 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:08:33 -0500 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> I'm curious as to what this report looks like. The example report you posted to the gbrowse list had serious issues (different problem, 'No midline' error which I replicated); mainly there were no blank lines making it pretty much invalid, so the parser had issues with it. Example lines from one HSP: > gnl|DAS|24699 pDAB101580 Length = 12942 Score = 50.1 bits (25), Expect = 5e-06 Identities = 37/41 (90%) Strand = Plus / Plus Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 ||||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 ... chris On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 8 21:41:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:41:01 -0500 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: Yep, that's what I was thinking. The fragment in question is fairly short. Inna, if you want the best HSP you could just grab the one that best fits what you expect (best eval, score, whatever). chris On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > A lack of low-complexity filtering (as seems apparent from this > report snippet, if > I understand that concept correctly) could explain the multiple > query hits on a > short (24bp) region of the same subject... > ----- Original Message ----- From: "Chris Fields" > > To: "Rytsareva, Inna (I)" > Cc: > Sent: Wednesday, July 08, 2009 9:08 PM > Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > > >> I'm curious as to what this report looks like. The example report >> you posted to the gbrowse list had serious issues (different >> problem, 'No midline' error which I replicated); mainly there were >> no blank lines making it pretty much invalid, so the parser had >> issues with it. Example lines from one HSP: >> >> > gnl|DAS|24699 pDAB101580 >> Length = 12942 >> Score = 50.1 bits (25), Expect = 5e-06 >> Identities = 37/41 (90%) >> Strand = Plus / Plus >> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >> ||||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> >> ... >> >> chris >> >> >> >> >> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >> >>> Hello, >>> >>> I have a follow script to parse the BLAST report: >>> >>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>> -format =>'blast') or die $!; >>> >>> while (my $result = $in->next_result) { >>> while (my $hit = $result->next_hit) >>> { >>> while (my $hsp = $hit->next_hsp) { >>> $qhit = $hit->name; >>> $start = $hsp->hit->start; >>> $end = $hsp->hit->end; >>> } >>> >>> >>> } print "Hit= ", $qhit, >>> ",Start = ", $start, >>> ",End = ", $end,"\n"; } >>> >>> Usually, the report has a number of the same hsp for each hit. >>> Using "print" command it gives me a hit name, start and end >>> positions >>> for each hit, except last on. For last one it prints all the hsps. >>> Something like this: >>> >>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> >>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>> I don't need these duplicates. >>> How can I fix that? >>> >>> Thanks, >>> Inna Rytsareva >>> Discovery Information Management >>> Dow AgroSciences >>> Indianapolis, IN >>> 317-337-4716 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Jul 8 21:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:31:27 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: A lack of low-complexity filtering (as seems apparent from this report snippet, if I understand that concept correctly) could explain the multiple query hits on a short (24bp) region of the same subject... ----- Original Message ----- From: "Chris Fields" To: "Rytsareva, Inna (I)" Cc: Sent: Wednesday, July 08, 2009 9:08 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > I'm curious as to what this report looks like. The example report you posted > to the gbrowse list had serious issues (different problem, 'No midline' error > which I replicated); mainly there were no blank lines making it pretty much > invalid, so the parser had issues with it. Example lines from one HSP: > > > gnl|DAS|24699 pDAB101580 > Length = 12942 > Score = 50.1 bits (25), Expect = 5e-06 > Identities = 37/41 (90%) > Strand = Plus / Plus > Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 > ||||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > > ... > > chris > > > > > On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > >> Hello, >> >> I have a follow script to parse the BLAST report: >> >> my $in = Bio::SearchIO->new ( -file =>$out_file, >> -format =>'blast') or die $!; >> >> while (my $result = $in->next_result) { >> while (my $hit = $result->next_hit) >> { >> while (my $hsp = $hit->next_hsp) { >> $qhit = $hit->name; >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; >> } >> >> >> } print "Hit= ", $qhit, >> ",Start = ", $start, >> ",End = ", $end,"\n"; } >> >> Usually, the report has a number of the same hsp for each hit. >> Using "print" command it gives me a hit name, start and end positions >> for each hit, except last on. For last one it prints all the hsps. >> Something like this: >> >> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >> Hit= gnl|DAS|4276,Start = 557,End = 580 >> Hit= gnl|DAS|12959,Start = 801,End = 824 >> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? >> >> Thanks, >> Inna Rytsareva >> Discovery Information Management >> Dow AgroSciences >> Indianapolis, IN >> 317-337-4716 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jul 8 21:54:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:54:16 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A552175.70009@cornell.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A552175.70009@cornell.edu> Message-ID: On Jul 8, 2009, at 5:45 PM, Robert Buels wrote: > Giles Weaver wrote: >> takes about 15 minutes, so adapter removal is definitely the >> bottleneck. I'm >> confident that some relatively simple developments in Bioperl and/ >> or EMBOSS >> will yield some big performance improvements - if you see my sample >> code in > > Apropos this kind of thing, have you guys already discussed using > lazy object creation for objects returned from bioperl parsers? Not > really relevant in the short term, but it could be a useful avenue > to pursue for addressing some performance concerns people (like ebi) > have. There are some lazy parsers for SearchIO, but each of those has specific classes geared towards the SearchIO format, an issue I worry about. I'm not sure about going down the path of having a Bio::Search::Result::FooResult, Bio::Search::Hit::FooHit, and Bio::Search::HSP::FooHSP for each 'Foo' format. The same thing could occur with SeqIO, TreeIO, etc. A possible maintenance nightmare. What I would like to see are generic lazy implementations for some of the various class, primarily Seq, AnnotationCollection, FeatureHolder/ Collection, etc, and parsers pass in just the necessary data (lazy implies file points or stream points). This may not be terribly hard to do if using iterators, but (as you may have seen) many of the current methods are greedily defined, so new interface methods would need to be drawn up (and older ones refactored to work with newer ones). > In very vague terms, one would probably implement this by defining a > very light-weight role/class called something like > Bio::LazyInflator, that would provide only an `inflate` method. > Parsers would parse into lightweight structures (probably arrayrefs) > that implement LazyInflator and users could choose between grabbing > data out of the uninflated arrayref directly, or they could call > inflate() on it to transform it into a real object (like a > Bio::Annotation or Bio::Seq or something). I would go one step further and reimplement the various AnnotationCollection/featureHolder methods in terms of a completely lazy implementation (i.e. parses the file or stream into a lazy Seq). See SwissKnife for instance. > The exact implementation of this would vary depending on whether > Moose is being used. This may be an area where optimization via Moose may not matter as much. It would be best to attempt some of this initially in bioperl, then port to Moose/Bio::Moose. > This could potentially also be compatible with having some of the > tight parsing loops be implemented in XS. > > Rob That's where it'll get a little trickier; you would probably need a decent grammar to get everything out the way you want it, or at least parse everything event-based, and other grammars would have to have similarly named rules/tokens so the same action could be tied to the data being parsed. I had a first go at generic parsing in the gbdriver/embldriver/swissdriver modules, which just pass data chunks to the handler object (which could do anything it wants with the data). The only thing not passed in yet are file points. That needs to be fleshed out more when I have the tuits, but you are more than welcome to look. Also, just to note (and something to think about): Perl6 has this 'solved' to a large degree with grammar/action combinations, where you define a grammar for a particular format and attach an Action class to process everything: my $action = MyActionClass.new(); while Bio::Grammar::Fasta.parse($filehandle, :action($action)) { # do interesting things with data from $action } In this case the Action class could create a Seq out of all the data, or possibly create something much more lightweight and lazily evaluated (for instance, use the file points instead of the actual text). The grammar in this case would essentially be C- or PIR-based I believe. Note the quotes above with 'solved'; with Rakudo you can almost do this now, however some of the Perl 6 specification needs to be fleshed out re: Grammars, and the grammar engine for Parrot (PGE) needs to be properly set up for iteration through a stream. There is enough interest that I think things could be worked out fairly quickly (e.g. months, not years). chris From maj at fortinbras.us Wed Jul 8 21:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:48:39 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: Message-ID: <5EFB3F2EAA9E486E8B0C248180EA0E34@NewLife> Hi Koen- I've put the link on Installing BioPerl (tho it seems bizarre that you weren't able to make that mod). Thanks! MAJ ----- Original Message ----- From: "Koen van der Drift" To: "BioPerl List" Cc: Sent: Monday, July 06, 2009 6:41 PM Subject: Re: [Bioperl-l] Bioperl Installation > Hi, > > Installation problems on a Mac seems to be a recurring question on this > mailing list. Just as a reminder, besides CPAN, one of the easiest ways to > install bioperl on a Mac is through fink. The instructions are available on > the bioperl website here: > http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink , but are > rather hidden. > > Maybe http://www.bioperl.org/wiki/Installing_BioPerl can be edited to state > Installing Bioperl for Unix (including Mac OS X)? I don't seem to have > privileges to edit that page, so I'll leave that up to the team. > > Also, the file PACKAGES contains a link about installation on Mac OS X that > is *very* outdated. Can this be removed from the package, I think it only > creates confusion? > > Cheers, > > - Koen. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Jul 8 21:51:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:51:40 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: <3E96958FF18E4BA5A6446AC690DEC0C3@NewLife> Allow me to shamelessly plug the following: http://www.bioperl.org/wiki/HOWTO:Tiling#Quick_and_Dirty_.22Tiling.22 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Rytsareva, Inna (I)" ; Sent: Wednesday, July 08, 2009 9:41 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > Yep, that's what I was thinking. The fragment in question is fairly > short. > > Inna, if you want the best HSP you could just grab the one that best > fits what you expect (best eval, score, whatever). > > chris > > On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > >> A lack of low-complexity filtering (as seems apparent from this >> report snippet, if >> I understand that concept correctly) could explain the multiple >> query hits on a >> short (24bp) region of the same subject... >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Rytsareva, Inna (I)" >> Cc: >> Sent: Wednesday, July 08, 2009 9:08 PM >> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl >> >> >>> I'm curious as to what this report looks like. The example report >>> you posted to the gbrowse list had serious issues (different >>> problem, 'No midline' error which I replicated); mainly there were >>> no blank lines making it pretty much invalid, so the parser had >>> issues with it. Example lines from one HSP: >>> >>> > gnl|DAS|24699 pDAB101580 >>> Length = 12942 >>> Score = 50.1 bits (25), Expect = 5e-06 >>> Identities = 37/41 (90%) >>> Strand = Plus / Plus >>> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >>> ||||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> >>> ... >>> >>> chris >>> >>> >>> >>> >>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >>> >>>> Hello, >>>> >>>> I have a follow script to parse the BLAST report: >>>> >>>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>>> -format =>'blast') or die $!; >>>> >>>> while (my $result = $in->next_result) { >>>> while (my $hit = $result->next_hit) >>>> { >>>> while (my $hsp = $hit->next_hsp) { >>>> $qhit = $hit->name; >>>> $start = $hsp->hit->start; >>>> $end = $hsp->hit->end; >>>> } >>>> >>>> >>>> } print "Hit= ", $qhit, >>>> ",Start = ", $start, >>>> ",End = ", $end,"\n"; } >>>> >>>> Usually, the report has a number of the same hsp for each hit. >>>> Using "print" command it gives me a hit name, start and end >>>> positions >>>> for each hit, except last on. For last one it prints all the hsps. >>>> Something like this: >>>> >>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> >>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>>> I don't need these duplicates. >>>> How can I fix that? >>>> >>>> Thanks, >>>> Inna Rytsareva >>>> Discovery Information Management >>>> Dow AgroSciences >>>> Indianapolis, IN >>>> 317-337-4716 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > From Brotelzwieb at gmx.de Thu Jul 9 06:16:06 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Thu, 9 Jul 2009 12:16:06 +0200 Subject: [Bioperl-l] cdd-search with remoteblast? References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> Message-ID: <426C1893A5AD499DB4DBFEEBD257B254@jonas> Hi guys, Thank you all so much for your help and patience :). Of course you were right and I finaly found the right put-parameter to get exactly the same hits as on the homepage. I do have an other question though :)... I now want to include a search for conserved domains, but when I try to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub:CDD_SEARCH) like the other put-parameters the way chris once told me(works fine with the other params): my %put = ( WORD_SIZE => 3, HITLIST_SIZE => 100, THRESHOLD => 11, FILTER => 'R', GENETIC_CODE => 1, CDD_SEARCH => 'on' ###I tried it with 'true' and '1', too. ); for my $putName (keys %put) { $factory->submit_parameter($putName,$put{$putName}); } ...an exception is thrown: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: CDD_SEARCH is not a valid PUT parameter. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:325 STACK: main::blast_a_sequence firsteval0.8.pm:383 STACK: main::blast_it firsteval0.8.pm:288 STACK: firsteval0.8.pm:35 ----------------------------------------------------------- . I guess somehow this could be the solution to my problem: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID-for-Simultaneous , but unfortunately I don't understand what to do. I'm so sorry to bother you with this but please help me once more...:) Best regards and thanks in advance, Jonas ----- Original Message ----- From: "Smithies, Russell" To: "'Jonas Schaer'" Cc: "'Chris Fields'" ; "'BioPerl List'" Sent: Monday, July 06, 2009 10:56 PM Subject: RE: [Bioperl-l] different results with remote-blast skript Hi Jonas, You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be > default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = > '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with > my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to > me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped > BLAST and PSI-BLAST: a new generation of protein database search > programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, > Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based > stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 > (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" > To: "Jason Stajich" > Cc: "Smithies, Russell" ; "'BioPerl > List'" ; "'Jonas Schaer'" > > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: 07/05/09 17:54:00 From cjfields at illinois.edu Thu Jul 9 11:08:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 10:08:53 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <426C1893A5AD499DB4DBFEEBD257B254@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: I'm not sure, but I think adding this in will take a little work (we'll need to catch the RID returned, which I'm fairly sure will require some modifications to checking the returned output). I would also have to look at the RemoteBlast API to see how this would fit in (I'm assuming we could either lump it in with other returned RIDs or create a new method for that). You are more than welcome to add this in as an enhancement request to bugzilla for BioPerl: http://bugzilla.open-bio.org/ chris On Jul 9, 2009, at 5:16 AM, Jonas Schaer wrote: > Hi guys, > Thank you all so much for your help and patience :). Of course you > were right and I finaly found the right put-parameter to get exactly > the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but when I try > to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub > :CDD_SEARCH) like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' ###I tried > it with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/ > lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID- > for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- From: "Smithies, Russell" > > To: "'Jonas Schaer'" > Cc: "'Chris Fields'" ; "'BioPerl List'" > > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a > "better" result. > I'd suggest that if you aren't sure what they do, you should leave > them alone as small changes can make huge differences in the output > - it's quite possible to miss finding what you're looking for by > using the wrong parameters. > If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > > Don't worry about the integer overflow bug as there's nothing you > can do about it. If you're interested, Google and Wikipedia are your > friends: http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> Sent: Tuesday, 7 July 2009 12:14 a.m. >> To: BioPerl List; Chris Fields >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> Hi guys, thanks for your answers so far. >> @jason: integer overflow in blast.... sorry, but what do you mean >> by that? >> how can I fix it...? >> >> Since I never really changed any parameters I thought them all to >> be default. >> whatever, I tried to get "better" results with my prog by changing >> these: >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> with no effect...I guess these were default values anyway. >> >> So please maybe you can tell me all the other parameters I can >> change with my >> perl-skript AND how to do that? >> Unfortunately both, perl and the blast-algorithm are pretty much >> new to me, >> maybe thats why I just cannot find out how to do that on my own... :/ >> >> Here is the output I get with my remote-blast skript: >> ############################################################################## >> ################################### >> Query Name: >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >> L >> hit name is ref|XP_001702807.1| >> score is 442 >> BLASTP 2.2.21+ >> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >> (1997), "Gapped >> BLAST and PSI-BLAST: a new generation of protein database search >> programs", >> Nucleic Acids Res. 25:3389-3402. >> >> >> Reference for composition-based statistics: Alejandro A. >> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. >> Spouge, Yuri >> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >> accuracy of PSI-BLAST protein database searches with composition- >> based >> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >> >> >> RID: 53STX5G2013 >> >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> from WGS projects >> 9,252,587 sequences; 3,169,972,781 total letters Query= >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >> ATGPDPDDEYE >> Length=150 >> >> >> >> Score >> E >> Sequences producing significant alignments: >> (Bits) >> Value >> >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 174 >> 2e-42 >> >> >> ALIGNMENTS >> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >> Length=303 >> >> Score = 174 bits (442), Expect = 2e-42, Method: Composition-based >> stats. >> Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = >> 0/150 (0%) >> >> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 >> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> 213 >> >> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >> 120 >> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> 273 >> >> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >> >> >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 5, 2009 4:41 AM >> Number of letters in database: -1,124,994,511 >> Number of sequences in database: 9,252,587 >> >> Lambda K H >> 0.309 0.122 0.345 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252587 >> Number of Hits to DB: 60273703 >> Number of extensions: 1448367 >> Number of successful extensions: 2103 >> Number of sequences better than 10: 0 >> Number of HSP's better than 10 without gapping: 0 >> Number of HSP's gapped: 2113 >> Number of HSP's successfully gapped: 0 >> Length of query: 150 >> Length of database: 3169972781 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124430450 >> Effective search space: 78603926650 >> Effective search space used: 78603926650 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 74 (33.1 bits) >> >> ############################################################################## >> ################################### >> and here are the hits (?) of the blast-algorithm on the ncbi- >> homepage with >> the same query of course: >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 300 >> 3e-80 >> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... >> 36.2 >> 1.1 >> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... >> 35.4 >> 1.8 >> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... >> 34.3 >> 4.2 >> ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... >> 33.5 >> 6.0 >> ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... >> 33.5 >> 7.0 >> ############################################################################## >> ###################################at >> least the first hit is the same, but even there there is a >> different score >> and e-value. >> >> thanks so much for any help :) >> regards, jonas >> >> >> ----- Original Message ----- >> From: "Chris Fields" >> To: "Jason Stajich" >> Cc: "Smithies, Russell" ; >> "'BioPerl >> List'" ; "'Jonas Schaer'" >> >> Sent: Monday, July 06, 2009 12:51 AM >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> >> > That inspires confidence ;> >> > >> > chris >> > >> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >> > >> >> integer overflow in blast.... >> >> >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >> >> >>> I'd guess it's a difference in the parameters used. >> >>> Interesting that both have the number of letters in the db as >> >>> "-1,125,070,205", I assume that's a bug :-) >> >>> >> >>> Stats from your remote_blast: >> >>> >> >>> 'stats' => { >> >>> 'S1' => '42', >> >>> 'S1_bits' => '20.8', >> >>> 'lambda' => '0.309', >> >>> 'entropy' => '0.345', >> >>> 'kappa_gapped' => '0.0410', >> >>> 'T' => '11', >> >>> 'kappa' => '0.122', >> >>> 'X3_bits' => '24.7', >> >>> 'X1' => '16', >> >>> 'lambda_gapped' => '0.267', >> >>> 'X2' => '38', >> >>> 'S2' => '74', >> >>> 'seqs_better_than_cutoff' => '0', >> >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> >>> 'Hits_to_DB' => '60102303', >> >>> 'dbletters' => '-1125070205', >> >>> 'A' => '40', >> >>> 'num_successful_extensions' => '2004', >> >>> 'num_extensions' => '1436892', >> >>> 'X1_bits' => '7.1', >> >>> 'X3' => '64', >> >>> 'entropy_gapped' => '0.140', >> >>> 'dbentries' => '9252258', >> >>> 'X2_bits' => '14.6', >> >>> 'S2_bits' => '33.1' >> >>> } >> >>> >> >>> >> >>> Stats from a blast done on the NCBI webpage: >> >>> >> >>> Database: All non-redundant GenBank CDS translations+PDB >> +SwissProt >> >>> +PIR+PRF >> >>> excluding environmental samples from WGS projects >> >>> Posted date: Jul 4, 2009 4:41 AM >> >>> Number of letters in database: -1,125,070,205 >> >>> Number of sequences in database: 9,252,258 >> >>> >> >>> Lambda K H >> >>> 0.309 0.124 0.340 >> >>> Gapped >> >>> Lambda K H >> >>> 0.267 0.0410 0.140 >> >>> Matrix: BLOSUM62 >> >>> Gap Penalties: Existence: 11, Extension: 1 >> >>> Number of Sequences: 9252258 >> >>> Number of Hits to DB: 86493230 >> >>> Number of extensions: 3101413 >> >>> Number of successful extensions: 9001 >> >>> Number of sequences better than 100: 65 >> >>> Number of HSP's better than 100 without gapping: 0 >> >>> Number of HSP's gapped: 9000 >> >>> Number of HSP's successfully gapped: 66 >> >>> Length of query: 150 >> >>> Length of database: 3169897087 >> >>> Length adjustment: 113 >> >>> Effective length of query: 37 >> >>> Effective length of database: 2124391933 >> >>> Effective search space: 78602501521 >> >>> Effective search space used: 78602501521 >> >>> T: 11 >> >>> A: 40 >> >>> X1: 16 (7.1 bits) >> >>> X2: 38 (14.6 bits) >> >>> X3: 64 (24.7 bits) >> >>> S1: 42 (20.8 bits) >> >>> S2: 65 (29.6 bits) >> >>> >> >>> >> >>>> -----Original Message----- >> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >> >>>> To: BioPerl List >> >>>> Subject: [Bioperl-l] different results with remote-blast skript >> >>>> >> >>>> Hi again :) >> >>>> please, I only have this little question: >> >>>> why do I get different results with my remote::blast perl skript >> >>>> then on the >> >>>> ncbi blast homepage? >> >>>> I am using blastp, the query is an amino-sequence (different >> >>>> results with any >> >>>> sequence, differences not only in number of hits but even in e- >> >>>> values, scores >> >>>> etc...), the database is 'nr'. >> >>>> PLEASE help me, >> >>>> thank you in advance, >> >>>> Jonas >> >>>> >> >>>> ps: my skript: >> >>>> >> ############################################################################## >> >>>> ## >> >>>> use Bio::Seq::SeqFactory; >> >>>> use Bio::Tools::Run::RemoteBlast; >> >>>> use strict; >> >>>> my @blast_report; >> >>>> my $prog = 'blastp'; >> >>>> my $db = 'nr'; >> >>>> my $e_val= '1e-10'; >> >>>> #my $e_val= '10'; >> >>>> my @params = ( '-prog' => $prog, >> >>>> '-data' => $db, >> >>>> '-expect' => $e_val, >> >>>> '-readmethod' => 'SearchIO' ); >> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> >>>> $ >> >>>> Bio >> > >> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >> >>>> = '1'; >> >>>> >> >>>> my >> >>>> $ >> >>>> blast_seq >> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> >>>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> >>>> PDDEYE'; >> >>>> #$v is just to turn on and off the messages >> >>>> my $v = 1; >> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> >>>> 'Bio::PrimarySeq'); >> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> >>>> "$blast_seq"); >> >>>> my $filename='temp2.out'; >> >>>> my $r = $factory->submit_blast($seq); >> >>>> print STDERR "waiting..." if( $v > 0 ); >> >>>> while ( my @rids = $factory->each_rid ) >> >>>> { >> >>>> foreach my $rid ( @rids ) >> >>>> { >> >>>> my $rc = $factory->retrieve_blast($rid); >> >>>> if( !ref($rc) ) >> >>>> { >> >>>> if( $rc < 0 ) >> >>>> { >> >>>> $factory->remove_rid($rid); >> >>>> } >> >>>> print STDERR "." if ( $v > 0 ); >> >>>> } >> >>>> else >> >>>> { >> >>>> my $result = $rc->next_result(); >> >>>> $factory->save_output($filename); >> >>>> $factory->remove_rid($rid); >> >>>> print "\nQuery Name: ", $result->query_name(), >> >>>> "\n"; >> >>>> while ( my $hit = $result->next_hit ) >> >>>> { >> >>>> next unless ( $v > 0); >> >>>> print "\thit name is ", $hit->name, "\n"; >> >>>> while( my $hsp = $hit->next_hsp ) >> >>>> { >> >>>> print "\t\tscore is ", $hsp->score, >> "\n"; >> >>>> } >> >>>> } >> >>>> } >> >>>> } >> >>>> >> >>>> >> >>>> } >> >>>> @blast_report = get_file_data ($filename); >> >>>> return @blast_report; >> >>>> >> ############################################################################## >> >>>> #### >> >>>> _______________________________________________ >> >>>> Bioperl-l mailing list >> >>>> Bioperl-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> Attention: The information contained in this message and/or >> >>> attachments >> >>> from AgResearch Limited is intended only for the persons or >> entities >> >>> to which it is addressed and may contain confidential and/or >> >>> privileged >> >>> material. Any review, retransmission, dissemination or other use >> >>> of, or >> >>> taking of any action in reliance upon, this information by >> persons or >> >>> entities other than the intended recipients is prohibited by >> >>> AgResearch >> >>> Limited. If you have received this message in error, please >> notify >> >>> the >> >>> sender immediately. >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> >> Jason Stajich >> >> jason at bioperl.org >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------------ >> -- >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: >> 07/05/09 >> 05:53:00 > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: > 07/05/09 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Thu Jul 9 11:50:20 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 11:50:20 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... Message-ID: <200907091150.20729.tristan.lefebure@gmail.com> Hello, I have been bumping into problems while rerooting trees that contained bootstrap scores. Basically, after re-rooting the tree, some scores end-up at the wrong place (i.e. node) and some nodes lose their score. I found this thread from Bank Beszter, back in 2007, that exactly explains the same problems: http://lists.open-bio.org/pipermail/bioperl-l/2007- May/025599.html I attach a script that reproduces the bug and implements the fix that Bank described (at least this is my understanding, and it works on this example): #! /usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $in = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA, -internal_node_id => 'bootstrap'); my $out = Bio::TreeIO->new(-format => 'newick', -file => ">out.tree"); while( my $t = $in->next_tree ){ my $old_root = $t->get_root_node(); my ($b) = $t->find_node(-id =>"B"); my $b_anc = $b->ancestor; $out->write_tree($t); #reroot with B -> wrong, and the tree is kind of weird $t->reroot($b); $out->write_tree($t); #reroot with B ancestor -> wrong $t->reroot($b_anc); $out->write_tree($t); #a fix, following Bank Beszteri description my $node = $old_root; while (my $anc_node = $node->ancestor) { $node->bootstrap($anc_node->bootstrap()); $anc_node->bootstrap(''); $node = $anc_node; } $out->write_tree($t); #->good this time } __DATA__ (A:52,(B:46,C:50)68:11,D:70); Here is the output: (A:52,(B:46,C:50)68:11,D:70); ((C:50,(A:52,D:70):11)68:46)B; (B:46,C:50,(A:52,D:70):11)68; (B:46,C:50,(A:52,D:70)68:11); Tree #2 and #3 have the score 68 moved to the wrong node, while tree #4 is OK. (BTW tree #2 is really weird, except if B, is the real ancestor (a fossil ?), it really does not make much sense to me). My understanding here is that the problem is linked to the well-known difficulty to differentiate node from branch labels in newick trees. Bootstrap scores are branch attributes not node attributes, but since Bio::TreeI has no branch/edge/bipartition object they are attached to a node, and in fact reflects the bootstrap score of the ancestral branch leading to that node. Troubles naturally come when you are dealing with an unrooted tree or reroot a tree: a child can become an ancestor, and, if the bootstrap scores is not moved from the old child to the new child, it will end up attached at the wrong place (i.e. wrong node). I see several fix to that: 1- incorporate Bank's fix into the root() method. I.e. if there is bootstrap score, after re-rooting, the one on the old to new ancestor path, should be moved to the right node. 2- Modify the way trees are stored in bioperl to incorporate branch/edge/bipartition object, and move the bootstrap scores to them. That won't be easy and will break many things... What do you think? --Tristan From MEC at stowers.org Thu Jul 9 11:56:25 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 9 Jul 2009 10:56:25 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <426C1893A5AD499DB4DBFEEBD257B254@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: Jonas, If you want to continue to use the bioperl remoteblast interface, probably what you should do is simply call it twice. Once, as you already know how to do, which will return without CDD results. Secondly, to get the CDD results, call remoteblast a second time. This time, using -database => 'CDD' -program => 'rpsblast' However, the wrapper may object to the 'rpsblast' program. It is not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) If so, my guess is that changing the perl wrapper to allow rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for his opinion on this. Also, you might want to perform the CDD search first, especially if you are streaming results to eyeball that might like something to look at while the second (presumably longer) search is running. Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jonas Schaer > Sent: Thursday, July 09, 2009 5:16 AM > To: BioPerl List; Smithies, Russell > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi guys, > Thank you all so much for your help and patience :). Of > course you were right and I finaly found the right > put-parameter to get exactly the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but > when I try to use the CDD_SEARCH-parameter > (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > sub:CDD_SEARCH) > like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' > ###I tried it > with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > ub:RID-for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- > From: "Smithies, Russell" > To: "'Jonas Schaer'" > Cc: "'Chris Fields'" ; "'BioPerl List'" > > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a "better" > result. > I'd suggest that if you aren't sure what they do, you should > leave them > alone as small changes can make huge differences in the > output - it's quite > possible to miss finding what you're looking for by using the wrong > parameters. > If all else fails, read the blast manual: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > _all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: > http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > > Don't worry about the integer overflow bug as there's nothing > you can do > about it. If you're interested, Google and Wikipedia are your > friends: > http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > > Sent: Tuesday, 7 July 2009 12:14 a.m. > > To: BioPerl List; Chris Fields > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > Hi guys, thanks for your answers so far. > > @jason: integer overflow in blast.... sorry, but what do > you mean by that? > > how can I fix it...? > > > > Since I never really changed any parameters I thought them > all to be > > default. > > whatever, I tried to get "better" results with my prog by changing > > these: > > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > STICS'} = > > '1'; > > with no effect...I guess these were default values anyway. > > > > So please maybe you can tell me all the other parameters I > can change with > > my > > perl-skript AND how to do that? > > Unfortunately both, perl and the blast-algorithm are pretty > much new to > > me, > > maybe thats why I just cannot find out how to do that on my > own... :/ > > > > Here is the output I get with my remote-blast skript: > > > ############################################################## > ################ > > ################################### > > Query Name: > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > > L > > hit name is ref|XP_001702807.1| > > score is 442 > > BLASTP 2.2.21+ > > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > A. Schaffer, > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > Lipman (1997), > > "Gapped > > BLAST and PSI-BLAST: a new generation of protein database search > > programs", > > Nucleic Acids Res. 25:3389-3402. > > > > > > Reference for composition-based statistics: Alejandro A. > > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > John L. Spouge, > > Yuri > > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the > > accuracy of PSI-BLAST protein database searches with > composition-based > > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > > > > RID: 53STX5G2013 > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > > from WGS projects > > 9,252,587 sequences; 3,169,972,781 total letters Query= > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > > > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > > ATGPDPDDEYE > > Length=150 > > > > > > > Score > > E > > Sequences producing significant alignments: > (Bits) > > Value > > > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 174 > > 2e-42 > > > > > > ALIGNMENTS > > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > > Length=303 > > > > Score = 174 bits (442), Expect = 2e-42, Method: > Composition-based > > stats. > > Identities = 150/150 (100%), Positives = 150/150 (100%), > Gaps = 0/150 > > (0%) > > > > Query 1 > MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > > 60 > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > Sbjct 154 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > 213 > > > > Query 61 > dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > > 120 > > > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > Sbjct 214 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > 273 > > > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF > > excluding environmental samples from WGS projects > > Posted date: Jul 5, 2009 4:41 AM > > Number of letters in database: -1,124,994,511 > > Number of sequences in database: 9,252,587 > > > > Lambda K H > > 0.309 0.122 0.345 > > Gapped > > Lambda K H > > 0.267 0.0410 0.140 > > Matrix: BLOSUM62 > > Gap Penalties: Existence: 11, Extension: 1 > > Number of Sequences: 9252587 > > Number of Hits to DB: 60273703 > > Number of extensions: 1448367 > > Number of successful extensions: 2103 > > Number of sequences better than 10: 0 > > Number of HSP's better than 10 without gapping: 0 > > Number of HSP's gapped: 2113 > > Number of HSP's successfully gapped: 0 > > Length of query: 150 > > Length of database: 3169972781 > > Length adjustment: 113 > > Effective length of query: 37 > > Effective length of database: 2124430450 > > Effective search space: 78603926650 > > Effective search space used: 78603926650 > > T: 11 > > A: 40 > > X1: 16 (7.1 bits) > > X2: 38 (14.6 bits) > > X3: 64 (24.7 bits) > > S1: 42 (20.8 bits) > > S2: 74 (33.1 bits) > > > > > ############################################################## > ################ > > ################################### > > and here are the hits (?) of the blast-algorithm on the > ncbi-homepage with > > the same query of course: > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 300 > > 3e-80 > > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > [Acyrtho... 36.2 > > 1.1 > > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > [Blautia... 35.4 > > 1.8 > > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > brazil... 34.3 > > 4.2 > > ref|XP_680841.1| hypothetical protein AN7572.2 > [Aspergillus n... 33.5 > > 6.0 > > ref|YP_001768110.1| hypothetical protein M446_1150 > [Methyloba... 33.5 > > 7.0 > > > ############################################################## > ################ > > ###################################at > > least the first hit is the same, but even there there is a > different score > > and e-value. > > > > thanks so much for any help :) > > regards, jonas > > > > > > ----- Original Message ----- > > From: "Chris Fields" > > To: "Jason Stajich" > > Cc: "Smithies, Russell" > ; "'BioPerl > > List'" ; "'Jonas Schaer'" > > > > Sent: Monday, July 06, 2009 12:51 AM > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > > > > That inspires confidence ;> > > > > > > chris > > > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > > > >> integer overflow in blast.... > > >> > > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > > >> > > >>> I'd guess it's a difference in the parameters used. > > >>> Interesting that both have the number of letters in the db as > > >>> "-1,125,070,205", I assume that's a bug :-) > > >>> > > >>> Stats from your remote_blast: > > >>> > > >>> 'stats' => { > > >>> 'S1' => '42', > > >>> 'S1_bits' => '20.8', > > >>> 'lambda' => '0.309', > > >>> 'entropy' => '0.345', > > >>> 'kappa_gapped' => '0.0410', > > >>> 'T' => '11', > > >>> 'kappa' => '0.122', > > >>> 'X3_bits' => '24.7', > > >>> 'X1' => '16', > > >>> 'lambda_gapped' => '0.267', > > >>> 'X2' => '38', > > >>> 'S2' => '74', > > >>> 'seqs_better_than_cutoff' => '0', > > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > > >>> 'Hits_to_DB' => '60102303', > > >>> 'dbletters' => '-1125070205', > > >>> 'A' => '40', > > >>> 'num_successful_extensions' => '2004', > > >>> 'num_extensions' => '1436892', > > >>> 'X1_bits' => '7.1', > > >>> 'X3' => '64', > > >>> 'entropy_gapped' => '0.140', > > >>> 'dbentries' => '9252258', > > >>> 'X2_bits' => '14.6', > > >>> 'S2_bits' => '33.1' > > >>> } > > >>> > > >>> > > >>> Stats from a blast done on the NCBI webpage: > > >>> > > >>> Database: All non-redundant GenBank CDS > translations+PDB+SwissProt > > >>> +PIR+PRF > > >>> excluding environmental samples from WGS projects > > >>> Posted date: Jul 4, 2009 4:41 AM > > >>> Number of letters in database: -1,125,070,205 > > >>> Number of sequences in database: 9,252,258 > > >>> > > >>> Lambda K H > > >>> 0.309 0.124 0.340 > > >>> Gapped > > >>> Lambda K H > > >>> 0.267 0.0410 0.140 > > >>> Matrix: BLOSUM62 > > >>> Gap Penalties: Existence: 11, Extension: 1 > > >>> Number of Sequences: 9252258 > > >>> Number of Hits to DB: 86493230 > > >>> Number of extensions: 3101413 > > >>> Number of successful extensions: 9001 > > >>> Number of sequences better than 100: 65 > > >>> Number of HSP's better than 100 without gapping: 0 > > >>> Number of HSP's gapped: 9000 > > >>> Number of HSP's successfully gapped: 66 > > >>> Length of query: 150 > > >>> Length of database: 3169897087 > > >>> Length adjustment: 113 > > >>> Effective length of query: 37 > > >>> Effective length of database: 2124391933 > > >>> Effective search space: 78602501521 > > >>> Effective search space used: 78602501521 > > >>> T: 11 > > >>> A: 40 > > >>> X1: 16 (7.1 bits) > > >>> X2: 38 (14.6 bits) > > >>> X3: 64 (24.7 bits) > > >>> S1: 42 (20.8 bits) > > >>> S2: 65 (29.6 bits) > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > > >>>> To: BioPerl List > > >>>> Subject: [Bioperl-l] different results with remote-blast skript > > >>>> > > >>>> Hi again :) > > >>>> please, I only have this little question: > > >>>> why do I get different results with my remote::blast > perl skript > > >>>> then on the > > >>>> ncbi blast homepage? > > >>>> I am using blastp, the query is an amino-sequence (different > > >>>> results with any > > >>>> sequence, differences not only in number of hits but even in e- > > >>>> values, scores > > >>>> etc...), the database is 'nr'. > > >>>> PLEASE help me, > > >>>> thank you in advance, > > >>>> Jonas > > >>>> > > >>>> ps: my skript: > > >>>> > > > ############################################################## > ################ > > >>>> ## > > >>>> use Bio::Seq::SeqFactory; > > >>>> use Bio::Tools::Run::RemoteBlast; > > >>>> use strict; > > >>>> my @blast_report; > > >>>> my $prog = 'blastp'; > > >>>> my $db = 'nr'; > > >>>> my $e_val= '1e-10'; > > >>>> #my $e_val= '10'; > > >>>> my @params = ( '-prog' => $prog, > > >>>> '-data' => $db, > > >>>> '-expect' => $e_val, > > >>>> '-readmethod' => 'SearchIO' ); > > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > >>>> $ > > >>>> Bio > > >>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > > >>>> = '1'; > > >>>> > > >>>> my > > >>>> $ > > >>>> blast_seq > > >>>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > > >>>> > > > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > AFRQAHQNTAMATGPD > > >>>> PDDEYE'; > > >>>> #$v is just to turn on and off the messages > > >>>> my $v = 1; > > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > > >>>> 'Bio::PrimarySeq'); > > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > > >>>> "$blast_seq"); > > >>>> my $filename='temp2.out'; > > >>>> my $r = $factory->submit_blast($seq); > > >>>> print STDERR "waiting..." if( $v > 0 ); > > >>>> while ( my @rids = $factory->each_rid ) > > >>>> { > > >>>> foreach my $rid ( @rids ) > > >>>> { > > >>>> my $rc = $factory->retrieve_blast($rid); > > >>>> if( !ref($rc) ) > > >>>> { > > >>>> if( $rc < 0 ) > > >>>> { > > >>>> $factory->remove_rid($rid); > > >>>> } > > >>>> print STDERR "." if ( $v > 0 ); > > >>>> } > > >>>> else > > >>>> { > > >>>> my $result = $rc->next_result(); > > >>>> $factory->save_output($filename); > > >>>> $factory->remove_rid($rid); > > >>>> print "\nQuery Name: ", > $result->query_name(), > > >>>> "\n"; > > >>>> while ( my $hit = $result->next_hit ) > > >>>> { > > >>>> next unless ( $v > 0); > > >>>> print "\thit name is ", $hit->name, "\n"; > > >>>> while( my $hsp = $hit->next_hsp ) > > >>>> { > > >>>> print "\t\tscore is ", > $hsp->score, "\n"; > > >>>> } > > >>>> } > > >>>> } > > >>>> } > > >>>> > > >>>> > > >>>> } > > >>>> @blast_report = get_file_data ($filename); > > >>>> return @blast_report; > > >>>> > > > ############################################################## > ################ > > >>>> #### > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> Attention: The information contained in this message and/or > > >>> attachments > > >>> from AgResearch Limited is intended only for the > persons or entities > > >>> to which it is addressed and may contain confidential and/or > > >>> privileged > > >>> material. Any review, retransmission, dissemination or other use > > >>> of, or > > >>> taking of any action in reliance upon, this information > by persons or > > >>> entities other than the intended recipients is prohibited by > > >>> AgResearch > > >>> Limited. If you have received this message in error, > please notify > > >>> the > > >>> sender immediately. > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> -- > > >> Jason Stajich > > >> jason at bioperl.org > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -------------------------------------------------------------- > ---------------- > > -- > > > > > > > > No virus found in this incoming message. > > Checked by AVG - www.avg.com > > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > Date: 07/05/09 > > 05:53:00 > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > Date: 07/05/09 > 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Jul 9 14:02:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 14:02:01 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <200907091150.20729.tristan.lefebure@gmail.com> References: <200907091150.20729.tristan.lefebure@gmail.com> Message-ID: Hi Tristan-- Would you enter this in bugzilla? I did an overhaul of the root/reroot a while back, and maybe you're running into some stuff I need to check out. Thanks a lot- Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Thursday, July 09, 2009 11:50 AM Subject: [Bioperl-l] Bootstrap, root, reroot... > Hello, > > I have been bumping into problems while rerooting trees that > contained bootstrap scores. Basically, after re-rooting the > tree, some scores end-up at the wrong place (i.e. node) and > some nodes lose their score. I found this thread from Bank > Beszter, back in 2007, that exactly explains the same > problems: > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > May/025599.html > > I attach a script that reproduces the bug and implements the > fix that Bank described (at least this is my understanding, > and it works on this example): > > > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::TreeIO; > > > my $in = Bio::TreeIO->new(-format => 'newick', > -fh => \*DATA, > -internal_node_id => 'bootstrap'); > > my $out = Bio::TreeIO->new(-format => 'newick', -file => > ">out.tree"); > > while( my $t = $in->next_tree ){ > my $old_root = $t->get_root_node(); > my ($b) = $t->find_node(-id =>"B"); > my $b_anc = $b->ancestor; > $out->write_tree($t); > > #reroot with B -> wrong, and the tree is kind of weird > $t->reroot($b); > $out->write_tree($t); > > #reroot with B ancestor -> wrong > $t->reroot($b_anc); > $out->write_tree($t); > > #a fix, following Bank Beszteri description > my $node = $old_root; > while (my $anc_node = $node->ancestor) { > $node->bootstrap($anc_node->bootstrap()); > $anc_node->bootstrap(''); > $node = $anc_node; > } > $out->write_tree($t); #->good this time > } > > > __DATA__ > (A:52,(B:46,C:50)68:11,D:70); > > > Here is the output: > > (A:52,(B:46,C:50)68:11,D:70); > ((C:50,(A:52,D:70):11)68:46)B; > (B:46,C:50,(A:52,D:70):11)68; > (B:46,C:50,(A:52,D:70)68:11); > > > Tree #2 and #3 have the score 68 moved to the wrong node, > while tree #4 is OK. (BTW tree #2 is really weird, except if > B, is the real ancestor (a fossil ?), it really does not > make much sense to me). > > My understanding here is that the problem is linked to the > well-known difficulty to differentiate node from branch > labels in newick trees. Bootstrap scores are branch > attributes not node attributes, but since Bio::TreeI has no > branch/edge/bipartition object they are attached to a node, > and in fact reflects the bootstrap score of the ancestral > branch leading to that node. Troubles naturally come when > you are dealing with an unrooted tree or reroot a tree: a > child can become an ancestor, and, if the bootstrap scores > is not moved from the old child to the new child, it will > end up attached at the wrong place (i.e. wrong node). > > I see several fix to that: > > 1- incorporate Bank's fix into the root() method. I.e. if > there is bootstrap score, after re-rooting, the one on the > old to new ancestor path, should be moved to the right node. > > 2- Modify the way trees are stored in bioperl to incorporate > branch/edge/bipartition object, and move the bootstrap > scores to them. That won't be easy and will break many > things... > > > What do you think? > > --Tristan > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tristan.lefebure at gmail.com Thu Jul 9 14:30:57 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 14:30:57 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: <200907091150.20729.tristan.lefebure@gmail.com> Message-ID: <200907091430.58284.tristan.lefebure@gmail.com> Done. bug #2877. -Tristan On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: > Hi Tristan-- > Would you enter this in bugzilla? I did an overhaul of > the root/reroot a while back, and maybe you're running > into some stuff I need to check out. Thanks a lot- > Mark > ----- Original Message ----- > From: "Tristan Lefebure" > To: "BioPerl List" > Sent: Thursday, July 09, 2009 11:50 AM > Subject: [Bioperl-l] Bootstrap, root, reroot... > > > Hello, > > > > I have been bumping into problems while rerooting trees > > that contained bootstrap scores. Basically, after > > re-rooting the tree, some scores end-up at the wrong > > place (i.e. node) and some nodes lose their score. I > > found this thread from Bank Beszter, back in 2007, that > > exactly explains the same problems: > > > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > > May/025599.html > > > > I attach a script that reproduces the bug and > > implements the fix that Bank described (at least this > > is my understanding, and it works on this example): > > > > > > #! /usr/bin/perl > > > > use strict; > > use warnings; > > use Bio::TreeIO; > > > > > > my $in = Bio::TreeIO->new(-format => 'newick', > > -fh => \*DATA, > > -internal_node_id => 'bootstrap'); > > > > my $out = Bio::TreeIO->new(-format => 'newick', -file > > => ">out.tree"); > > > > while( my $t = $in->next_tree ){ > > my $old_root = $t->get_root_node(); > > my ($b) = $t->find_node(-id =>"B"); > > my $b_anc = $b->ancestor; > > $out->write_tree($t); > > > > #reroot with B -> wrong, and the tree is kind of weird > > $t->reroot($b); > > $out->write_tree($t); > > > > #reroot with B ancestor -> wrong > > $t->reroot($b_anc); > > $out->write_tree($t); > > > > #a fix, following Bank Beszteri description > > my $node = $old_root; > > while (my $anc_node = $node->ancestor) { > > $node->bootstrap($anc_node->bootstrap()); > > $anc_node->bootstrap(''); > > $node = $anc_node; > > } > > $out->write_tree($t); #->good this time > > } > > > > > > __DATA__ > > (A:52,(B:46,C:50)68:11,D:70); > > > > > > Here is the output: > > > > (A:52,(B:46,C:50)68:11,D:70); > > ((C:50,(A:52,D:70):11)68:46)B; > > (B:46,C:50,(A:52,D:70):11)68; > > (B:46,C:50,(A:52,D:70)68:11); > > > > > > Tree #2 and #3 have the score 68 moved to the wrong > > node, while tree #4 is OK. (BTW tree #2 is really > > weird, except if B, is the real ancestor (a fossil ?), > > it really does not make much sense to me). > > > > My understanding here is that the problem is linked to > > the well-known difficulty to differentiate node from > > branch labels in newick trees. Bootstrap scores are > > branch attributes not node attributes, but since > > Bio::TreeI has no branch/edge/bipartition object they > > are attached to a node, and in fact reflects the > > bootstrap score of the ancestral branch leading to that > > node. Troubles naturally come when you are dealing with > > an unrooted tree or reroot a tree: a child can become > > an ancestor, and, if the bootstrap scores is not moved > > from the old child to the new child, it will end up > > attached at the wrong place (i.e. wrong node). > > > > I see several fix to that: > > > > 1- incorporate Bank's fix into the root() method. I.e. > > if there is bootstrap score, after re-rooting, the one > > on the old to new ancestor path, should be moved to the > > right node. > > > > 2- Modify the way trees are stored in bioperl to > > incorporate branch/edge/bipartition object, and move > > the bootstrap scores to them. That won't be easy and > > will break many things... > > > > > > What do you think? > > > > --Tristan > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Thu Jul 9 15:18:39 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 15:18:39 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <200907091430.58284.tristan.lefebure@gmail.com> References: <200907091150.20729.tristan.lefebure@gmail.com> <200907091430.58284.tristan.lefebure@gmail.com> Message-ID: I just add a quick look at the reroot() function of TreeFunctionsI, and it looks like that what should be done for the bootstrap scores is what is already done for the branch lengths. See this loop starting line 954: # reverse the ancestor & children pointers my $former_anc = $tmp_node->ancestor; my @path_from_oldroot = ($self->get_lineage_nodes($tmp_node), $tmp_node); for (my $i = 0; $i < @path_from_oldroot - 1; $i++) { my $current = $path_from_oldroot[$i]; my $next = $path_from_oldroot[$i + 1]; $current->remove_Descendent($next); $current->branch_length($next->branch_length); $next->add_Descendent($current); } It makes sense to me to treat bootstrap and branch lenght in a similar way: the branch lengths are stored inside the node object, but as the bootstrap, they really are branch attributes... Nope? -Tristan On Thu, Jul 9, 2009 at 2:30 PM, Tristan Lefebure wrote: > Done. bug #2877. > -Tristan > > On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: > > Hi Tristan-- > > Would you enter this in bugzilla? I did an overhaul of > > the root/reroot a while back, and maybe you're running > > into some stuff I need to check out. Thanks a lot- > > Mark > > ----- Original Message ----- > > From: "Tristan Lefebure" > > To: "BioPerl List" > > Sent: Thursday, July 09, 2009 11:50 AM > > Subject: [Bioperl-l] Bootstrap, root, reroot... > > > > > Hello, > > > > > > I have been bumping into problems while rerooting trees > > > that contained bootstrap scores. Basically, after > > > re-rooting the tree, some scores end-up at the wrong > > > place (i.e. node) and some nodes lose their score. I > > > found this thread from Bank Beszter, back in 2007, that > > > exactly explains the same problems: > > > > > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > > > May/025599.html > > > > > > I attach a script that reproduces the bug and > > > implements the fix that Bank described (at least this > > > is my understanding, and it works on this example): > > > > > > > > > #! /usr/bin/perl > > > > > > use strict; > > > use warnings; > > > use Bio::TreeIO; > > > > > > > > > my $in = Bio::TreeIO->new(-format => 'newick', > > > -fh => \*DATA, > > > -internal_node_id => 'bootstrap'); > > > > > > my $out = Bio::TreeIO->new(-format => 'newick', -file > > > => ">out.tree"); > > > > > > while( my $t = $in->next_tree ){ > > > my $old_root = $t->get_root_node(); > > > my ($b) = $t->find_node(-id =>"B"); > > > my $b_anc = $b->ancestor; > > > $out->write_tree($t); > > > > > > #reroot with B -> wrong, and the tree is kind of weird > > > $t->reroot($b); > > > $out->write_tree($t); > > > > > > #reroot with B ancestor -> wrong > > > $t->reroot($b_anc); > > > $out->write_tree($t); > > > > > > #a fix, following Bank Beszteri description > > > my $node = $old_root; > > > while (my $anc_node = $node->ancestor) { > > > $node->bootstrap($anc_node->bootstrap()); > > > $anc_node->bootstrap(''); > > > $node = $anc_node; > > > } > > > $out->write_tree($t); #->good this time > > > } > > > > > > > > > __DATA__ > > > (A:52,(B:46,C:50)68:11,D:70); > > > > > > > > > Here is the output: > > > > > > (A:52,(B:46,C:50)68:11,D:70); > > > ((C:50,(A:52,D:70):11)68:46)B; > > > (B:46,C:50,(A:52,D:70):11)68; > > > (B:46,C:50,(A:52,D:70)68:11); > > > > > > > > > Tree #2 and #3 have the score 68 moved to the wrong > > > node, while tree #4 is OK. (BTW tree #2 is really > > > weird, except if B, is the real ancestor (a fossil ?), > > > it really does not make much sense to me). > > > > > > My understanding here is that the problem is linked to > > > the well-known difficulty to differentiate node from > > > branch labels in newick trees. Bootstrap scores are > > > branch attributes not node attributes, but since > > > Bio::TreeI has no branch/edge/bipartition object they > > > are attached to a node, and in fact reflects the > > > bootstrap score of the ancestral branch leading to that > > > node. Troubles naturally come when you are dealing with > > > an unrooted tree or reroot a tree: a child can become > > > an ancestor, and, if the bootstrap scores is not moved > > > from the old child to the new child, it will end up > > > attached at the wrong place (i.e. wrong node). > > > > > > I see several fix to that: > > > > > > 1- incorporate Bank's fix into the root() method. I.e. > > > if there is bootstrap score, after re-rooting, the one > > > on the old to new ancestor path, should be moved to the > > > right node. > > > > > > 2- Modify the way trees are stored in bioperl to > > > incorporate branch/edge/bipartition object, and move > > > the bootstrap scores to them. That won't be easy and > > > will break many things... > > > > > > > > > What do you think? > > > > > > --Tristan > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields1 at gmail.com Thu Jul 9 15:19:15 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Jul 2009 14:19:15 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> I've scheduled this tentatively for the 1.6 release series (just not sure when yet). It may work as is, but I haven't tried it out yet (and am hazarding to guess it only retrieves the single main RID at the moment). chris On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > Jonas, > > If you want to continue to use the bioperl remoteblast interface, > probably what you should do is simply call it twice. > > Once, as you already know how to do, which will return without CDD > results. > > Secondly, to get the CDD results, call remoteblast a second time. > This time, using > -database => 'CDD' > -program => 'rpsblast' > > However, the wrapper may object to the 'rpsblast' program. It is > not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) > If so, my guess is that changing the perl wrapper to allow > rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for > his opinion on this. > > Also, you might want to perform the CDD search first, especially if > you are streaming results to eyeball that might like something to > look at while the second (presumably longer) search is running. > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jonas Schaer >> Sent: Thursday, July 09, 2009 5:16 AM >> To: BioPerl List; Smithies, Russell >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi guys, >> Thank you all so much for your help and patience :). Of >> course you were right and I finaly found the right >> put-parameter to get exactly the same hits as on the homepage. >> I do have an other question though :)... >> I now want to include a search for conserved domains, but >> when I try to use the CDD_SEARCH-parameter >> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >> sub:CDD_SEARCH) >> like the other put-parameters the way chris once told >> me(works fine with the other params): >> >> my %put = ( >> WORD_SIZE => 3, >> HITLIST_SIZE => 100, >> THRESHOLD => 11, >> FILTER => 'R', >> GENETIC_CODE => 1, >> CDD_SEARCH => 'on' >> ###I tried it >> with 'true' and '1', too. >> >> ); >> >> for my $putName (keys %put) { >> $factory->submit_parameter($putName,$put{$putName}); >> } >> >> >> ...an exception is thrown: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: CDD_SEARCH is not a valid PUT parameter. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:325 >> STACK: main::blast_a_sequence firsteval0.8.pm:383 >> STACK: main::blast_it firsteval0.8.pm:288 >> STACK: firsteval0.8.pm:35 >> ----------------------------------------------------------- . >> I guess somehow this could be the solution to my problem: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >> ub:RID-for-Simultaneous >> , but unfortunately I don't understand what to do. >> I'm so sorry to bother you with this but please help me once >> more...:) >> >> Best regards and thanks in advance, >> Jonas >> >> ----- Original Message ----- >> From: "Smithies, Russell" >> To: "'Jonas Schaer'" >> Cc: "'Chris Fields'" ; "'BioPerl List'" >> >> Sent: Monday, July 06, 2009 10:56 PM >> Subject: RE: [Bioperl-l] different results with remote-blast skript >> >> >> Hi Jonas, >> You can't just play with the BLAST parameters and hope for a "better" >> result. >> I'd suggest that if you aren't sure what they do, you should >> leave them >> alone as small changes can make huge differences in the >> output - it's quite >> possible to miss finding what you're looking for by using the wrong >> parameters. >> If all else fails, read the blast manual: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >> _all.html >> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >> Or Read Ian Korfs' excellent book: >> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >> >> Don't worry about the integer overflow bug as there's nothing >> you can do >> about it. If you're interested, Google and Wikipedia are your >> friends: >> http://en.wikipedia.org/wiki/Integer_overflow >> >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>> To: BioPerl List; Chris Fields >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> Hi guys, thanks for your answers so far. >>> @jason: integer overflow in blast.... sorry, but what do >> you mean by that? >>> how can I fix it...? >>> >>> Since I never really changed any parameters I thought them >> all to be >>> default. >>> whatever, I tried to get "better" results with my prog by changing >>> these: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >> STICS'} = >>> '1'; >>> with no effect...I guess these were default values anyway. >>> >>> So please maybe you can tell me all the other parameters I >> can change with >>> my >>> perl-skript AND how to do that? >>> Unfortunately both, perl and the blast-algorithm are pretty >> much new to >>> me, >>> maybe thats why I just cannot find out how to do that on my >> own... :/ >>> >>> Here is the output I get with my remote-blast skript: >>> >> ############################################################## >> ################ >>> ################################### >>> Query Name: >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>> L >>> hit name is ref|XP_001702807.1| >>> score is 442 >>> BLASTP 2.2.21+ >>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >> A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), >>> "Gapped >>> BLAST and PSI-BLAST: a new generation of protein database search >>> programs", >>> Nucleic Acids Res. 25:3389-3402. >>> >>> >>> Reference for composition-based statistics: Alejandro A. >>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >> John L. Spouge, >>> Yuri >>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >>> accuracy of PSI-BLAST protein database searches with >> composition-based >>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>> >>> >>> RID: 53STX5G2013 >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> from WGS projects >>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>> ATGPDPDDEYE >>> Length=150 >>> >>> >>> >> Score >>> E >>> Sequences producing significant alignments: >> (Bits) >>> Value >>> >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 174 >>> 2e-42 >>> >>> >>> ALIGNMENTS >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> Length=303 >>> >>> Score = 174 bits (442), Expect = 2e-42, Method: >> Composition-based >>> stats. >>> Identities = 150/150 (100%), Positives = 150/150 (100%), >> Gaps = 0/150 >>> (0%) >>> >>> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>> 60 >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> 213 >>> >>> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 120 >>> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 273 >>> >>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>> >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 5, 2009 4:41 AM >>> Number of letters in database: -1,124,994,511 >>> Number of sequences in database: 9,252,587 >>> >>> Lambda K H >>> 0.309 0.122 0.345 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252587 >>> Number of Hits to DB: 60273703 >>> Number of extensions: 1448367 >>> Number of successful extensions: 2103 >>> Number of sequences better than 10: 0 >>> Number of HSP's better than 10 without gapping: 0 >>> Number of HSP's gapped: 2113 >>> Number of HSP's successfully gapped: 0 >>> Length of query: 150 >>> Length of database: 3169972781 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124430450 >>> Effective search space: 78603926650 >>> Effective search space used: 78603926650 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 74 (33.1 bits) >>> >>> >> ############################################################## >> ################ >>> ################################### >>> and here are the hits (?) of the blast-algorithm on the >> ncbi-homepage with >>> the same query of course: >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 300 >>> 3e-80 >>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >> [Acyrtho... 36.2 >>> 1.1 >>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >> [Blautia... 35.4 >>> 1.8 >>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >> brazil... 34.3 >>> 4.2 >>> ref|XP_680841.1| hypothetical protein AN7572.2 >> [Aspergillus n... 33.5 >>> 6.0 >>> ref|YP_001768110.1| hypothetical protein M446_1150 >> [Methyloba... 33.5 >>> 7.0 >>> >> ############################################################## >> ################ >>> ###################################at >>> least the first hit is the same, but even there there is a >> different score >>> and e-value. >>> >>> thanks so much for any help :) >>> regards, jonas >>> >>> >>> ----- Original Message ----- >>> From: "Chris Fields" >>> To: "Jason Stajich" >>> Cc: "Smithies, Russell" >> ; "'BioPerl >>> List'" ; "'Jonas Schaer'" >>> >>> Sent: Monday, July 06, 2009 12:51 AM >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> >>>> That inspires confidence ;> >>>> >>>> chris >>>> >>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>> >>>>> integer overflow in blast.... >>>>> >>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>> >>>>>> I'd guess it's a difference in the parameters used. >>>>>> Interesting that both have the number of letters in the db as >>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>> >>>>>> Stats from your remote_blast: >>>>>> >>>>>> 'stats' => { >>>>>> 'S1' => '42', >>>>>> 'S1_bits' => '20.8', >>>>>> 'lambda' => '0.309', >>>>>> 'entropy' => '0.345', >>>>>> 'kappa_gapped' => '0.0410', >>>>>> 'T' => '11', >>>>>> 'kappa' => '0.122', >>>>>> 'X3_bits' => '24.7', >>>>>> 'X1' => '16', >>>>>> 'lambda_gapped' => '0.267', >>>>>> 'X2' => '38', >>>>>> 'S2' => '74', >>>>>> 'seqs_better_than_cutoff' => '0', >>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>> 'Hits_to_DB' => '60102303', >>>>>> 'dbletters' => '-1125070205', >>>>>> 'A' => '40', >>>>>> 'num_successful_extensions' => '2004', >>>>>> 'num_extensions' => '1436892', >>>>>> 'X1_bits' => '7.1', >>>>>> 'X3' => '64', >>>>>> 'entropy_gapped' => '0.140', >>>>>> 'dbentries' => '9252258', >>>>>> 'X2_bits' => '14.6', >>>>>> 'S2_bits' => '33.1' >>>>>> } >>>>>> >>>>>> >>>>>> Stats from a blast done on the NCBI webpage: >>>>>> >>>>>> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt >>>>>> +PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>> Number of letters in database: -1,125,070,205 >>>>>> Number of sequences in database: 9,252,258 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.124 0.340 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252258 >>>>>> Number of Hits to DB: 86493230 >>>>>> Number of extensions: 3101413 >>>>>> Number of successful extensions: 9001 >>>>>> Number of sequences better than 100: 65 >>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>> Number of HSP's gapped: 9000 >>>>>> Number of HSP's successfully gapped: 66 >>>>>> Length of query: 150 >>>>>> Length of database: 3169897087 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124391933 >>>>>> Effective search space: 78602501521 >>>>>> Effective search space used: 78602501521 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 65 (29.6 bits) >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>> To: BioPerl List >>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>> >>>>>>> Hi again :) >>>>>>> please, I only have this little question: >>>>>>> why do I get different results with my remote::blast >> perl skript >>>>>>> then on the >>>>>>> ncbi blast homepage? >>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>> results with any >>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>> values, scores >>>>>>> etc...), the database is 'nr'. >>>>>>> PLEASE help me, >>>>>>> thank you in advance, >>>>>>> Jonas >>>>>>> >>>>>>> ps: my skript: >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> ## >>>>>>> use Bio::Seq::SeqFactory; >>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>> use strict; >>>>>>> my @blast_report; >>>>>>> my $prog = 'blastp'; >>>>>>> my $db = 'nr'; >>>>>>> my $e_val= '1e-10'; >>>>>>> #my $e_val= '10'; >>>>>>> my @params = ( '-prog' => $prog, >>>>>>> '-data' => $db, >>>>>>> '-expect' => $e_val, >>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>> $ >>>>>>> Bio >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>> = '1'; >>>>>>> >>>>>>> my >>>>>>> $ >>>>>>> blast_seq >>>>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>> >>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >> AFRQAHQNTAMATGPD >>>>>>> PDDEYE'; >>>>>>> #$v is just to turn on and off the messages >>>>>>> my $v = 1; >>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>> 'Bio::PrimarySeq'); >>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>> "$blast_seq"); >>>>>>> my $filename='temp2.out'; >>>>>>> my $r = $factory->submit_blast($seq); >>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>> { >>>>>>> foreach my $rid ( @rids ) >>>>>>> { >>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>> if( !ref($rc) ) >>>>>>> { >>>>>>> if( $rc < 0 ) >>>>>>> { >>>>>>> $factory->remove_rid($rid); >>>>>>> } >>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>> } >>>>>>> else >>>>>>> { >>>>>>> my $result = $rc->next_result(); >>>>>>> $factory->save_output($filename); >>>>>>> $factory->remove_rid($rid); >>>>>>> print "\nQuery Name: ", >> $result->query_name(), >>>>>>> "\n"; >>>>>>> while ( my $hit = $result->next_hit ) >>>>>>> { >>>>>>> next unless ( $v > 0); >>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>> { >>>>>>> print "\t\tscore is ", >> $hsp->score, "\n"; >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> } >>>>>>> @blast_report = get_file_data ($filename); >>>>>>> return @blast_report; >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> #### >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> Attention: The information contained in this message and/or >>>>>> attachments >>>>>> from AgResearch Limited is intended only for the >> persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use >>>>>> of, or >>>>>> taking of any action in reliance upon, this information >> by persons or >>>>>> entities other than the intended recipients is prohibited by >>>>>> AgResearch >>>>>> Limited. If you have received this message in error, >> please notify >>>>>> the >>>>>> sender immediately. >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> -------------------------------------------------------------- >> ---------------- >>> -- >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >> Date: 07/05/09 >>> 05:53:00 >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >> Date: 07/05/09 >> 17:54:00 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From jay at jays.net Thu Jul 9 15:47:02 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 09 Jul 2009 14:47:02 -0500 Subject: [Bioperl-l] [patch] Bio/TreeIO.pm POD patch Message-ID: <4A564936.2070909@jays.net> Hello, $tree->size throws this error: Can't locate object method "size" via package "Bio::Tree::Tree" at conv.pl line 17, line 1. Below, a POD patch to Bio::TreeIO to fix (sidestep) that problem and make podchecker happier. Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Index: Bio/TreeIO.pm =================================================================== --- Bio/TreeIO.pm (revision 15841) +++ Bio/TreeIO.pm (working copy) @@ -18,13 +18,11 @@ =head1 SYNOPSIS - { - use Bio::TreeIO; - my $treeio = Bio::TreeIO->new('-format' => 'newick', - '-file' => 'globin.dnd'); - while( my $tree = $treeio->next_tree ) { - print "Tree is ", $tree->size, "\n"; - } + use Bio::TreeIO; + my $treeio = Bio::TreeIO->new('-format' => 'newick', + '-file' => 'globin.dnd'); + while( my $tree = $treeio->next_tree ) { + print "Tree has ", $tree->number_nodes, " nodes.\n"; } =head1 DESCRIPTION @@ -45,11 +43,11 @@ http://bioperl.org/wiki/Mailing_lists - About the mailing lists =head2 Support - + Please direct usage questions or support issues to the mailing list: - + L - + rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem From vaughn at cshl.edu Thu Jul 9 16:42:53 2009 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 9 Jul 2009 16:42:53 -0400 Subject: [Bioperl-l] Next-gen modules Message-ID: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> A lot of what is being discussed is handled very elegantly by Assaf Gordon's FASTX toolkit . I spent a lot of time trying to roll my own solutions for basic Illumina processing and I've found his utilities to work much more reliably and very fast (almost real-time) than anything I could design in Perl. They are also the basis for Illumina handling in Galaxy, which is a second vote of confidence. They've got clean CLI interfaces and should be very easy to wrap in Bio::SeqUtils or Bio::Run packages. Matt -- Matthew W. Vaughn, Ph.D. Research Assistant Professor Cold Spring Harbor Laboratory 1 Bungtown Road Williams #5 Cold Spring Harbor, NY 11724 USA tel: (516) 367-8808 cell: (516) 353-7055 google-talk: matt.vaughn at gmail.com From IRytsareva at dow.com Thu Jul 9 16:33:01 2009 From: IRytsareva at dow.com (Rytsareva, Inna (I)) Date: Thu, 9 Jul 2009 16:33:01 -0400 Subject: [Bioperl-l] Modify wwwBLAST html report Message-ID: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDE46C@USMDLMDOWX028.dow.com> Hello. Thanks so much for your help!! I need some ideas how to get follow: There is a wwwBLAST html report from blast.real. And I have references (maybe I'll place them in an array of strings). Each string is a reference to GBrowse for each HSP. Like: HSP 188396 189355
So, I'd need to modify this HTML page and "push" a reference between Sbjct and Score or between tags and
 for each HSP.
 
For now my script is:
########################################################################
############################
#!/usr/bin/perl

#
# $Id: blast.cgi,v 1.1 2002/08/06 19:03:51 dondosha Exp $
#

$|=1;
use CGI::Pretty qw (:standard);
use CGI::Carp qw (fatalsToBrowser);
use CGI;


use HTML::Strip;

use IO::String;
use List::Util qw (min max);
use Switch;
use File::Temp qw/ tempfile tempdir /;


use Data::Dumper;

use Bio::SearchIO;
use Bio::SearchIO::blast;


print "Content-type: text/html \n\n";

$ENV{DEBUG_COMMAND_LINE} = TRUE;
$ENV{BLASTDB} = "db";

open (BLAST,"cat $blast_form_data |./blast.REAL|");
@blast = ;
my $hs = new HTML::Strip;

my ($o_f,$out_file) = tempfile();
open (OUTFILE,">$out_file");

foreach $blast (@blast)
{
print $blast; # printing BLAST 
my $text=$hs->parse($blast);
print OUTFILE $text;
}
close OUTFILE;
$hs->eof;

my $q = new CGI;

my $in = Bio::SearchIO->new (  	-file 	=>$out_file,
				-format =>'blast') or die $!;

while (my $result = $in->next_result) 
{
	while (my $hit = $result->next_hit)
	{
		while (my $hsp = $hit->next_hsp) 	
		{
			$qhit = $hit->name;
			$qstart = $hsp->hit->start;
			$qend = $hsp->hit->end;

			print" HSP $qname $qstart $qend
\n"; unlink $out_file; } } } ######################################################################## ######################## It prints the BLAST report and then the links. Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 From cjfields at illinois.edu Thu Jul 9 16:47:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 15:47:07 -0500 Subject: [Bioperl-l] [patch] Bio/TreeIO.pm POD patch In-Reply-To: <4A564936.2070909@jays.net> References: <4A564936.2070909@jays.net> Message-ID: committed in r15842. thanks! chris On Jul 9, 2009, at 2:47 PM, Jay Hannah wrote: > Hello, > > $tree->size throws this error: > > Can't locate object method "size" via package "Bio::Tree::Tree" at > conv.pl line 17, line 1. > > Below, a POD patch to Bio::TreeIO to fix (sidestep) that problem and > make podchecker happier. > > Thanks, > > j > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > > > Index: Bio/TreeIO.pm > =================================================================== > --- Bio/TreeIO.pm (revision 15841) > +++ Bio/TreeIO.pm (working copy) > @@ -18,13 +18,11 @@ > > =head1 SYNOPSIS > > - { > - use Bio::TreeIO; > - my $treeio = Bio::TreeIO->new('-format' => 'newick', > - '-file' => 'globin.dnd'); > - while( my $tree = $treeio->next_tree ) { > - print "Tree is ", $tree->size, "\n"; > - } > + use Bio::TreeIO; > + my $treeio = Bio::TreeIO->new('-format' => 'newick', > + '-file' => 'globin.dnd'); > + while( my $tree = $treeio->next_tree ) { > + print "Tree has ", $tree->number_nodes, " nodes.\n"; > } > > =head1 DESCRIPTION > @@ -45,11 +43,11 @@ > http://bioperl.org/wiki/Mailing_lists - About the mailing lists > > =head2 Support > - > + > Please direct usage questions or support issues to the mailing list: > - > + > L > - > + > rather than to the module maintainer directly. Many experienced and > reponsive experts will be able look at the problem and quickly > address it. Please include a thorough description of the problem > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu Jul 9 17:03:52 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 09 Jul 2009 16:03:52 -0500 Subject: [Bioperl-l] X-Greylist: Delayed Message-ID: <4A565B38.1090408@jays.net> (Thanks for committing r15842 Chris!!) I noticed this header in my last post (the copy MailMan sent me): X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); My post was, indeed, delayed by ~30 minutes. Is that intentional? And/or is there something I can do differently? Full headers of that email: http://scsys.co.uk:8001/30919 Thanks, j From maj at fortinbras.us Thu Jul 9 17:55:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 17:55:23 -0400 Subject: [Bioperl-l] Fw: Bootstrap, root, reroot... Message-ID: <999D8B7079824883AF63627CAF819614@NewLife> up to the list, too-- ----- Original Message ----- From: "Mark A. Jensen" To: "Tristan Lefebure" Sent: Thursday, July 09, 2009 3:37 PM Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > I'll bet you're right-- I'll put this in as a comment-- thanks! > ----- Original Message ----- > From: "Tristan Lefebure" > To: "BioPerl List" > Sent: Thursday, July 09, 2009 3:18 PM > Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > > >>I just add a quick look at the reroot() function of TreeFunctionsI, and it >> looks like that what should be done for the bootstrap scores is what is >> already done for the branch lengths. See this loop starting line 954: >> >> # reverse the ancestor & children pointers >> my $former_anc = $tmp_node->ancestor; >> my @path_from_oldroot = ($self->get_lineage_nodes($tmp_node), >> $tmp_node); >> for (my $i = 0; $i < @path_from_oldroot - 1; $i++) { >> my $current = $path_from_oldroot[$i]; >> my $next = $path_from_oldroot[$i + 1]; >> $current->remove_Descendent($next); >> $current->branch_length($next->branch_length); >> $next->add_Descendent($current); >> } >> >> It makes sense to me to treat bootstrap and branch lenght in a similar way: >> the branch lengths are stored inside the node object, but as the bootstrap, >> they really are branch attributes... Nope? >> >> -Tristan >> >> On Thu, Jul 9, 2009 at 2:30 PM, Tristan Lefebure >> wrote: >> >>> Done. bug #2877. >>> -Tristan >>> >>> On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: >>> > Hi Tristan-- >>> > Would you enter this in bugzilla? I did an overhaul of >>> > the root/reroot a while back, and maybe you're running >>> > into some stuff I need to check out. Thanks a lot- >>> > Mark >>> > ----- Original Message ----- >>> > From: "Tristan Lefebure" >>> > To: "BioPerl List" >>> > Sent: Thursday, July 09, 2009 11:50 AM >>> > Subject: [Bioperl-l] Bootstrap, root, reroot... >>> > >>> > > Hello, >>> > > >>> > > I have been bumping into problems while rerooting trees >>> > > that contained bootstrap scores. Basically, after >>> > > re-rooting the tree, some scores end-up at the wrong >>> > > place (i.e. node) and some nodes lose their score. I >>> > > found this thread from Bank Beszter, back in 2007, that >>> > > exactly explains the same problems: >>> > > >>> > > http://lists.open-bio.org/pipermail/bioperl-l/2007- >>> > > May/025599.html >>> > > >>> > > I attach a script that reproduces the bug and >>> > > implements the fix that Bank described (at least this >>> > > is my understanding, and it works on this example): >>> > > >>> > > >>> > > #! /usr/bin/perl >>> > > >>> > > use strict; >>> > > use warnings; >>> > > use Bio::TreeIO; >>> > > >>> > > >>> > > my $in = Bio::TreeIO->new(-format => 'newick', >>> > > -fh => \*DATA, >>> > > -internal_node_id => 'bootstrap'); >>> > > >>> > > my $out = Bio::TreeIO->new(-format => 'newick', -file >>> > > => ">out.tree"); >>> > > >>> > > while( my $t = $in->next_tree ){ >>> > > my $old_root = $t->get_root_node(); >>> > > my ($b) = $t->find_node(-id =>"B"); >>> > > my $b_anc = $b->ancestor; >>> > > $out->write_tree($t); >>> > > >>> > > #reroot with B -> wrong, and the tree is kind of weird >>> > > $t->reroot($b); >>> > > $out->write_tree($t); >>> > > >>> > > #reroot with B ancestor -> wrong >>> > > $t->reroot($b_anc); >>> > > $out->write_tree($t); >>> > > >>> > > #a fix, following Bank Beszteri description >>> > > my $node = $old_root; >>> > > while (my $anc_node = $node->ancestor) { >>> > > $node->bootstrap($anc_node->bootstrap()); >>> > > $anc_node->bootstrap(''); >>> > > $node = $anc_node; >>> > > } >>> > > $out->write_tree($t); #->good this time >>> > > } >>> > > >>> > > >>> > > __DATA__ >>> > > (A:52,(B:46,C:50)68:11,D:70); >>> > > >>> > > >>> > > Here is the output: >>> > > >>> > > (A:52,(B:46,C:50)68:11,D:70); >>> > > ((C:50,(A:52,D:70):11)68:46)B; >>> > > (B:46,C:50,(A:52,D:70):11)68; >>> > > (B:46,C:50,(A:52,D:70)68:11); >>> > > >>> > > >>> > > Tree #2 and #3 have the score 68 moved to the wrong >>> > > node, while tree #4 is OK. (BTW tree #2 is really >>> > > weird, except if B, is the real ancestor (a fossil ?), >>> > > it really does not make much sense to me). >>> > > >>> > > My understanding here is that the problem is linked to >>> > > the well-known difficulty to differentiate node from >>> > > branch labels in newick trees. Bootstrap scores are >>> > > branch attributes not node attributes, but since >>> > > Bio::TreeI has no branch/edge/bipartition object they >>> > > are attached to a node, and in fact reflects the >>> > > bootstrap score of the ancestral branch leading to that >>> > > node. Troubles naturally come when you are dealing with >>> > > an unrooted tree or reroot a tree: a child can become >>> > > an ancestor, and, if the bootstrap scores is not moved >>> > > from the old child to the new child, it will end up >>> > > attached at the wrong place (i.e. wrong node). >>> > > >>> > > I see several fix to that: >>> > > >>> > > 1- incorporate Bank's fix into the root() method. I.e. >>> > > if there is bootstrap score, after re-rooting, the one >>> > > on the old to new ancestor path, should be moved to the >>> > > right node. >>> > > >>> > > 2- Modify the way trees are stored in bioperl to >>> > > incorporate branch/edge/bipartition object, and move >>> > > the bootstrap scores to them. That won't be easy and >>> > > will break many things... >>> > > >>> > > >>> > > What do you think? >>> > > >>> > > --Tristan >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Thu Jul 9 18:48:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 17:48:13 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> References: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> Message-ID: Looks very promising. Do you know if it's capable of reporting back indices, e.g. for building flat-file databases? chris On Jul 9, 2009, at 3:42 PM, Matthew Vaughn wrote: > A lot of what is being discussed is handled very elegantly by Assaf > Gordon's FASTX toolkit . I > spent a lot of time trying to roll my own solutions for basic > Illumina processing and I've found his utilities to work much more > reliably and very fast (almost real-time) than anything I could > design in Perl. They are also the basis for Illumina handling in > Galaxy, which is a second vote of confidence. > > They've got clean CLI interfaces and should be very easy to wrap in > Bio::SeqUtils or Bio::Run packages. > > Matt > > -- > Matthew W. Vaughn, Ph.D. > Research Assistant Professor > Cold Spring Harbor Laboratory > 1 Bungtown Road > Williams #5 > Cold Spring Harbor, NY 11724 USA > > tel: (516) 367-8808 > cell: (516) 353-7055 > google-talk: matt.vaughn at gmail.com > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Thu Jul 9 21:44:12 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Jul 2009 20:44:12 -0500 Subject: [Bioperl-l] update PLATFORMS file In-Reply-To: <4A569AC4.5080200@cornell.edu> References: <4A569AC4.5080200@cornell.edu> Message-ID: <08A95736-7BD5-48A9-9786-D3D1A3520EDE@gmail.com> Beat me to it! So, what does everyone think? chris On Jul 9, 2009, at 8:35 PM, Robert Buels wrote: > Taking this to bioperl-l: > > koenvanderdrift at gmail.com said: > > The PLATFORMS document contains a *very* outdated link on how to > install > bioperl on Macs. Please remove this link: "Steve Cannon has made > available > Bioperl OS X installation directions and notes online at the > following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html > " > > ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST > ------- > I think we could actually remove this file completely. It hasn't > been updated > in quite a while and any information it contains would probably > serve a better > purpose elsewhere. > > > So, remove the PLATFORMS file? Is all of the stuff in there on the > wiki? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > From rbuels at gmail.com Thu Jul 9 21:35:00 2009 From: rbuels at gmail.com (Robert Buels) Date: Thu, 09 Jul 2009 18:35:00 -0700 Subject: [Bioperl-l] update PLATFORMS file Message-ID: <4A569AC4.5080200@cornell.edu> Taking this to bioperl-l: koenvanderdrift at gmail.com said: The PLATFORMS document contains a *very* outdated link on how to install bioperl on Macs. Please remove this link: "Steve Cannon has made available Bioperl OS X installation directions and notes online at the following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html" ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST ------- I think we could actually remove this file completely. It hasn't been updated in quite a while and any information it contains would probably serve a better purpose elsewhere. So, remove the PLATFORMS file? Is all of the stuff in there on the wiki? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj.fortinbras at gmail.com Thu Jul 9 23:28:52 2009 From: maj.fortinbras at gmail.com (Mark Jensen) Date: Thu, 9 Jul 2009 23:28:52 -0400 Subject: [Bioperl-l] X-Greylist: Delayed Message-ID: <4239c0bb0907092028w1a321724jadd3fe6e4960b47a@mail.gmail.com> This is a test. MAJ From maj at fortinbras.us Thu Jul 9 23:38:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 23:38:58 -0400 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: <4A565B38.1090408@jays.net> References: <4A565B38.1090408@jays.net> Message-ID: Good eye, Jay. Poking around, I find that some DNS names are more equal than others. My test post from my gmail account maj.fortinbras -at- gmail -dot- com had header X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 23:29:00 -0400 (EDT) X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); Thu, 09 Jul 2009 23:28:53 -0400 (EDT) while the domain with less cachet, maj -at- fortinbras -dot- us, has X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 14:25:45 -0400 (EDT) X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) and has forever; this explains the infinite waiting time I typically also experience. Some fortunate posters even obtain the coveted X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 13:30:29 -0400 (EDT) X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); This may be even more stupendous than a commit bit. cheers, Mark ----- Original Message ----- From: "Jay Hannah" To: Sent: Thursday, July 09, 2009 5:03 PM Subject: [Bioperl-l] X-Greylist: Delayed > (Thanks for committing r15842 Chris!!) > > > I noticed this header in my last post (the copy MailMan sent me): > > X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 > (portal.open-bio.org [207.154.17.70]); > > My post was, indeed, delayed by ~30 minutes. > > > Is that intentional? And/or is there something I can do differently? > > Full headers of that email: http://scsys.co.uk:8001/30919 > > Thanks, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Jul 10 01:25:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 9 Jul 2009 22:25:27 -0700 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: References: <4A565B38.1090408@jays.net> Message-ID: <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> The IP your mail comes from is initially greylisted (hence the 30 min delay which requires the host to resend) and then after it is whitelisted so frequent posters's originating IP is will end up and be cached. So it depends on if your IP is dynamic, how often you are emailing the list, etc. All this was discussed at least once a while ago. http://portal.open-bio.org/pipermail/bioperl-l/2006-April/021340.html Mailing list problems should probably go to root-l at open-bio.org if you want specific help too. -jason On Jul 9, 2009, at 8:38 PM, Mark A. Jensen wrote: > Good eye, Jay. Poking around, I find that some DNS names are > more equal than others. My test post from my gmail account > maj.fortinbras -at- gmail -dot- com had header > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 > 23:29:00 -0400 (EDT) > X-Greylist: Sender DNS name whitelisted, not delayed by milter- > greylist-2.0.2 > (portal.open-bio.org [207.154.17.70]); > Thu, 09 Jul 2009 23:28:53 -0400 (EDT) > > while the domain with less cachet, > maj -at- fortinbras -dot- us, > has > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 > 14:25:45 -0400 (EDT) > X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 > (portal.open-bio.org > [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) > > and has forever; this explains the infinite waiting time I typically > also experience. > > Some fortunate posters even obtain the coveted > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 > 13:30:29 -0400 (EDT) > X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by > milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); > > This may be even more stupendous than a commit bit. > > cheers, > Mark > ----- Original Message ----- From: "Jay Hannah" > To: > Sent: Thursday, July 09, 2009 5:03 PM > Subject: [Bioperl-l] X-Greylist: Delayed > > >> (Thanks for committing r15842 Chris!!) >> >> >> I noticed this header in my last post (the copy MailMan sent me): >> >> X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 >> (portal.open-bio.org [207.154.17.70]); >> >> My post was, indeed, delayed by ~30 minutes. >> >> >> Is that intentional? And/or is there something I can do differently? >> >> Full headers of that email: http://scsys.co.uk:8001/30919 >> >> Thanks, >> >> j >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Fri Jul 10 08:43:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 10 Jul 2009 08:43:04 -0400 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> References: <4A565B38.1090408@jays.net> <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> Message-ID: <48C03241903F4C14A9CB6676B82362B7@NewLife> The problem doesn't seem to be the IP, which is whitelisted for Jay and me, but the DNS name, which is evidently not added to the whitelist automatically for frequent posters. It would be great if this could be automatically handled as well. ----- Original Message ----- From: "Jason Stajich" To: "Jay Hannah" Cc: "BioPerl List" ; "Mark A. Jensen" Sent: Friday, July 10, 2009 1:25 AM Subject: Re: [Bioperl-l] X-Greylist: Delayed > The IP your mail comes from is initially greylisted (hence the 30 min delay > which requires the host to resend) and then after it is whitelisted so > frequent posters's originating IP is will end up and be cached. So it depends > on if your IP is dynamic, how often you are emailing the list, etc. > > All this was discussed at least once a while ago. > http://portal.open-bio.org/pipermail/bioperl-l/2006-April/021340.html > > Mailing list problems should probably go to root-l at open-bio.org if you want > specific help too. > > -jason > On Jul 9, 2009, at 8:38 PM, Mark A. Jensen wrote: > >> Good eye, Jay. Poking around, I find that some DNS names are >> more equal than others. My test post from my gmail account >> maj.fortinbras -at- gmail -dot- com had header >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 23:29:00 -0400 (EDT) >> X-Greylist: Sender DNS name whitelisted, not delayed by milter- >> greylist-2.0.2 >> (portal.open-bio.org [207.154.17.70]); >> Thu, 09 Jul 2009 23:28:53 -0400 (EDT) >> >> while the domain with less cachet, >> maj -at- fortinbras -dot- us, >> has >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 14:25:45 -0400 (EDT) >> X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 >> (portal.open-bio.org >> [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) >> >> and has forever; this explains the infinite waiting time I typically >> also experience. >> >> Some fortunate posters even obtain the coveted >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 13:30:29 -0400 (EDT) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by >> milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); >> >> This may be even more stupendous than a commit bit. >> >> cheers, >> Mark >> ----- Original Message ----- From: "Jay Hannah" >> To: >> Sent: Thursday, July 09, 2009 5:03 PM >> Subject: [Bioperl-l] X-Greylist: Delayed >> >> >>> (Thanks for committing r15842 Chris!!) >>> >>> >>> I noticed this header in my last post (the copy MailMan sent me): >>> >>> X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 >>> (portal.open-bio.org [207.154.17.70]); >>> >>> My post was, indeed, delayed by ~30 minutes. >>> >>> >>> Is that intentional? And/or is there something I can do differently? >>> >>> Full headers of that email: http://scsys.co.uk:8001/30919 >>> >>> Thanks, >>> >>> j >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > > From Brotelzwieb at gmx.de Fri Jul 10 05:18:12 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Fri, 10 Jul 2009 11:18:12 +0200 Subject: [Bioperl-l] cdd-search with remoteblast? References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: Hi, I tried to do what Malcom proposed my ($prog = 'rpsblast'; my $db = 'CDD';) but that didn't work. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Value rpsblast for PUT parameter PROGRAM does not match expression t?blast[ pnx]. Rejecting. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:329 STACK: Bio::Tools::Run::RemoteBlast::new C:/Perl/site/lib/Bio/Tools/Run/RemoteBl ast.pm:257 STACK: blast_a_seq2.pm:14 ----------------------------------------------------------- So I should try to "change the wrapper to allow 'rpsblast'", right? Could You tell me how to do that, please? So sorry but I have no idea yet...:) If that doesn't work, is there any other way to run cdd-searches with perl? Thank you so much! Regards, Jonas ----- Original Message ----- From: "Chris Fields" To: "Cook, Malcolm" Cc: "'Jonas Schaer'" ; "'BioPerl List'" ; "'Smithies, Russell'" ; Sent: Thursday, July 09, 2009 9:19 PM Subject: Re: [Bioperl-l] cdd-search with remoteblast? > I've scheduled this tentatively for the 1.6 release series (just not > sure when yet). It may work as is, but I haven't tried it out yet > (and am hazarding to guess it only retrieves the single main RID at > the moment). > > chris > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >> Jonas, >> >> If you want to continue to use the bioperl remoteblast interface, >> probably what you should do is simply call it twice. >> >> Once, as you already know how to do, which will return without CDD >> results. >> >> Secondly, to get the CDD results, call remoteblast a second time. >> This time, using >> -database => 'CDD' >> -program => 'rpsblast' >> >> However, the wrapper may object to the 'rpsblast' program. It is >> not listed in the POD - >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) >> If so, my guess is that changing the perl wrapper to allow >> rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for >> his opinion on this. >> >> Also, you might want to perform the CDD search first, especially if >> you are streaming results to eyeball that might like something to >> look at while the second (presumably longer) search is running. >> >> Cheers, >> >> Malcolm Cook >> Stowers Institute for Medical Research - Kansas City, Missouri >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Jonas Schaer >>> Sent: Thursday, July 09, 2009 5:16 AM >>> To: BioPerl List; Smithies, Russell >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>> >>> Hi guys, >>> Thank you all so much for your help and patience :). Of >>> course you were right and I finaly found the right >>> put-parameter to get exactly the same hits as on the homepage. >>> I do have an other question though :)... >>> I now want to include a search for conserved domains, but >>> when I try to use the CDD_SEARCH-parameter >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>> sub:CDD_SEARCH) >>> like the other put-parameters the way chris once told >>> me(works fine with the other params): >>> >>> my %put = ( >>> WORD_SIZE => 3, >>> HITLIST_SIZE => 100, >>> THRESHOLD => 11, >>> FILTER => 'R', >>> GENETIC_CODE => 1, >>> CDD_SEARCH => 'on' >>> ###I tried it >>> with 'true' and '1', too. >>> >>> ); >>> >>> for my $putName (keys %put) { >>> $factory->submit_parameter($putName,$put{$putName}); >>> } >>> >>> >>> ...an exception is thrown: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: CDD_SEARCH is not a valid PUT parameter. >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>> C:/Perl/site/lib/Bio/Tools >>> /Run/RemoteBlast.pm:325 >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>> STACK: main::blast_it firsteval0.8.pm:288 >>> STACK: firsteval0.8.pm:35 >>> ----------------------------------------------------------- . >>> I guess somehow this could be the solution to my problem: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>> ub:RID-for-Simultaneous >>> , but unfortunately I don't understand what to do. >>> I'm so sorry to bother you with this but please help me once >>> more...:) >>> >>> Best regards and thanks in advance, >>> Jonas >>> >>> ----- Original Message ----- >>> From: "Smithies, Russell" >>> To: "'Jonas Schaer'" >>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>> >>> Sent: Monday, July 06, 2009 10:56 PM >>> Subject: RE: [Bioperl-l] different results with remote-blast skript >>> >>> >>> Hi Jonas, >>> You can't just play with the BLAST parameters and hope for a "better" >>> result. >>> I'd suggest that if you aren't sure what they do, you should >>> leave them >>> alone as small changes can make huge differences in the >>> output - it's quite >>> possible to miss finding what you're looking for by using the wrong >>> parameters. >>> If all else fails, read the blast manual: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>> _all.html >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>> Or Read Ian Korfs' excellent book: >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>> >>> Don't worry about the integer overflow bug as there's nothing >>> you can do >>> about it. If you're interested, Google and Wikipedia are your >>> friends: >>> http://en.wikipedia.org/wiki/Integer_overflow >>> >>> >>> Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>> To: BioPerl List; Chris Fields >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi guys, thanks for your answers so far. >>>> @jason: integer overflow in blast.... sorry, but what do >>> you mean by that? >>>> how can I fix it...? >>>> >>>> Since I never really changed any parameters I thought them >>> all to be >>>> default. >>>> whatever, I tried to get "better" results with my prog by changing >>>> these: >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>> STICS'} = >>>> '1'; >>>> with no effect...I guess these were default values anyway. >>>> >>>> So please maybe you can tell me all the other parameters I >>> can change with >>>> my >>>> perl-skript AND how to do that? >>>> Unfortunately both, perl and the blast-algorithm are pretty >>> much new to >>>> me, >>>> maybe thats why I just cannot find out how to do that on my >>> own... :/ >>>> >>>> Here is the output I get with my remote-blast skript: >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> Query Name: >>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>> L >>>> hit name is ref|XP_001702807.1| >>>> score is 442 >>>> BLASTP 2.2.21+ >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>> A. Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>> Lipman (1997), >>>> "Gapped >>>> BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", >>>> Nucleic Acids Res. 25:3389-3402. >>>> >>>> >>>> Reference for composition-based statistics: Alejandro A. >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>> John L. Spouge, >>>> Yuri >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>> "Improving the >>>> accuracy of PSI-BLAST protein database searches with >>> composition-based >>>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>>> >>>> >>>> RID: 53STX5G2013 >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> from WGS projects >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>> >>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>> ATGPDPDDEYE >>>> Length=150 >>>> >>>> >>>> >>> Score >>>> E >>>> Sequences producing significant alignments: >>> (Bits) >>>> Value >>>> >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 174 >>>> 2e-42 >>>> >>>> >>>> ALIGNMENTS >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> Length=303 >>>> >>>> Score = 174 bits (442), Expect = 2e-42, Method: >>> Composition-based >>>> stats. >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>> Gaps = 0/150 >>>> (0%) >>>> >>>> Query 1 >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>> 60 >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> Sbjct 154 >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> 213 >>>> >>>> Query 61 >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 120 >>>> >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> Sbjct 214 >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 273 >>>> >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>> >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF >>>> excluding environmental samples from WGS projects >>>> Posted date: Jul 5, 2009 4:41 AM >>>> Number of letters in database: -1,124,994,511 >>>> Number of sequences in database: 9,252,587 >>>> >>>> Lambda K H >>>> 0.309 0.122 0.345 >>>> Gapped >>>> Lambda K H >>>> 0.267 0.0410 0.140 >>>> Matrix: BLOSUM62 >>>> Gap Penalties: Existence: 11, Extension: 1 >>>> Number of Sequences: 9252587 >>>> Number of Hits to DB: 60273703 >>>> Number of extensions: 1448367 >>>> Number of successful extensions: 2103 >>>> Number of sequences better than 10: 0 >>>> Number of HSP's better than 10 without gapping: 0 >>>> Number of HSP's gapped: 2113 >>>> Number of HSP's successfully gapped: 0 >>>> Length of query: 150 >>>> Length of database: 3169972781 >>>> Length adjustment: 113 >>>> Effective length of query: 37 >>>> Effective length of database: 2124430450 >>>> Effective search space: 78603926650 >>>> Effective search space used: 78603926650 >>>> T: 11 >>>> A: 40 >>>> X1: 16 (7.1 bits) >>>> X2: 38 (14.6 bits) >>>> X3: 64 (24.7 bits) >>>> S1: 42 (20.8 bits) >>>> S2: 74 (33.1 bits) >>>> >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> and here are the hits (?) of the blast-algorithm on the >>> ncbi-homepage with >>>> the same query of course: >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 300 >>>> 3e-80 >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>> [Acyrtho... 36.2 >>>> 1.1 >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>> [Blautia... 35.4 >>>> 1.8 >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>> brazil... 34.3 >>>> 4.2 >>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>> [Aspergillus n... 33.5 >>>> 6.0 >>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>> [Methyloba... 33.5 >>>> 7.0 >>>> >>> ############################################################## >>> ################ >>>> ###################################at >>>> least the first hit is the same, but even there there is a >>> different score >>>> and e-value. >>>> >>>> thanks so much for any help :) >>>> regards, jonas >>>> >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" >>>> To: "Jason Stajich" >>>> Cc: "Smithies, Russell" >>> ; "'BioPerl >>>> List'" ; "'Jonas Schaer'" >>>> >>>> Sent: Monday, July 06, 2009 12:51 AM >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> >>>>> That inspires confidence ;> >>>>> >>>>> chris >>>>> >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>> >>>>>> integer overflow in blast.... >>>>>> >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>> >>>>>>> I'd guess it's a difference in the parameters used. >>>>>>> Interesting that both have the number of letters in the db as >>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>> >>>>>>> Stats from your remote_blast: >>>>>>> >>>>>>> 'stats' => { >>>>>>> 'S1' => '42', >>>>>>> 'S1_bits' => '20.8', >>>>>>> 'lambda' => '0.309', >>>>>>> 'entropy' => '0.345', >>>>>>> 'kappa_gapped' => '0.0410', >>>>>>> 'T' => '11', >>>>>>> 'kappa' => '0.122', >>>>>>> 'X3_bits' => '24.7', >>>>>>> 'X1' => '16', >>>>>>> 'lambda_gapped' => '0.267', >>>>>>> 'X2' => '38', >>>>>>> 'S2' => '74', >>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>> 'Hits_to_DB' => '60102303', >>>>>>> 'dbletters' => '-1125070205', >>>>>>> 'A' => '40', >>>>>>> 'num_successful_extensions' => '2004', >>>>>>> 'num_extensions' => '1436892', >>>>>>> 'X1_bits' => '7.1', >>>>>>> 'X3' => '64', >>>>>>> 'entropy_gapped' => '0.140', >>>>>>> 'dbentries' => '9252258', >>>>>>> 'X2_bits' => '14.6', >>>>>>> 'S2_bits' => '33.1' >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>> >>>>>>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt >>>>>>> +PIR+PRF >>>>>>> excluding environmental samples from WGS projects >>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>> Number of letters in database: -1,125,070,205 >>>>>>> Number of sequences in database: 9,252,258 >>>>>>> >>>>>>> Lambda K H >>>>>>> 0.309 0.124 0.340 >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 0.267 0.0410 0.140 >>>>>>> Matrix: BLOSUM62 >>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>> Number of Sequences: 9252258 >>>>>>> Number of Hits to DB: 86493230 >>>>>>> Number of extensions: 3101413 >>>>>>> Number of successful extensions: 9001 >>>>>>> Number of sequences better than 100: 65 >>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>> Number of HSP's gapped: 9000 >>>>>>> Number of HSP's successfully gapped: 66 >>>>>>> Length of query: 150 >>>>>>> Length of database: 3169897087 >>>>>>> Length adjustment: 113 >>>>>>> Effective length of query: 37 >>>>>>> Effective length of database: 2124391933 >>>>>>> Effective search space: 78602501521 >>>>>>> Effective search space used: 78602501521 >>>>>>> T: 11 >>>>>>> A: 40 >>>>>>> X1: 16 (7.1 bits) >>>>>>> X2: 38 (14.6 bits) >>>>>>> X3: 64 (24.7 bits) >>>>>>> S1: 42 (20.8 bits) >>>>>>> S2: 65 (29.6 bits) >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>> To: BioPerl List >>>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>>> >>>>>>>> Hi again :) >>>>>>>> please, I only have this little question: >>>>>>>> why do I get different results with my remote::blast >>> perl skript >>>>>>>> then on the >>>>>>>> ncbi blast homepage? >>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>> results with any >>>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>>> values, scores >>>>>>>> etc...), the database is 'nr'. >>>>>>>> PLEASE help me, >>>>>>>> thank you in advance, >>>>>>>> Jonas >>>>>>>> >>>>>>>> ps: my skript: >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> ## >>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use strict; >>>>>>>> my @blast_report; >>>>>>>> my $prog = 'blastp'; >>>>>>>> my $db = 'nr'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> #my $e_val= '10'; >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> $ >>>>>>>> Bio >>>>>>>> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>> = '1'; >>>>>>>> >>>>>>>> my >>>>>>>> $ >>>>>>>> blast_seq >>>>>>>> >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>> >>>> >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>> AFRQAHQNTAMATGPD >>>>>>>> PDDEYE'; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> my $v = 1; >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>> 'Bio::PrimarySeq'); >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>>> "$blast_seq"); >>>>>>>> my $filename='temp2.out'; >>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>> { >>>>>>>> foreach my $rid ( @rids ) >>>>>>>> { >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> my $result = $rc->next_result(); >>>>>>>> $factory->save_output($filename); >>>>>>>> $factory->remove_rid($rid); >>>>>>>> print "\nQuery Name: ", >>> $result->query_name(), >>>>>>>> "\n"; >>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>> { >>>>>>>> next unless ( $v > 0); >>>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>> { >>>>>>>> print "\t\tscore is ", >>> $hsp->score, "\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>> return @blast_report; >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> #### >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> Attention: The information contained in this message and/or >>>>>>> attachments >>>>>>> from AgResearch Limited is intended only for the >>> persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use >>>>>>> of, or >>>>>>> taking of any action in reliance upon, this information >>> by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>>>> AgResearch >>>>>>> Limited. If you have received this message in error, >>> please notify >>>>>>> the >>>>>>> sender immediately. >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason at bioperl.org >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ---------------- >>>> -- >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>> Date: 07/05/09 >>>> 05:53:00 >>> >>> >>> -------------------------------------------------------------- >>> ------------------ >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>> Date: 07/05/09 >>> 17:54:00 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release Date: 07/09/09 05:55:00 From bosborne11 at verizon.net Fri Jul 10 08:58:40 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Jul 2009 08:58:40 -0400 Subject: [Bioperl-l] update PLATFORMS file In-Reply-To: <4A569AC4.5080200@cornell.edu> References: <4A569AC4.5080200@cornell.edu> Message-ID: Robert, This file can be removed, certainly. BIO On Jul 9, 2009, at 9:35 PM, Robert Buels wrote: > Taking this to bioperl-l: > > koenvanderdrift at gmail.com said: > > The PLATFORMS document contains a *very* outdated link on how to > install > bioperl on Macs. Please remove this link: "Steve Cannon has made > available > Bioperl OS X installation directions and notes online at the > following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html > " > > ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST > ------- > I think we could actually remove this file completely. It hasn't > been updated > in quite a while and any information it contains would probably > serve a better > purpose elsewhere. > > > So, remove the PLATFORMS file? Is all of the stuff in there on the > wiki? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Fri Jul 10 11:45:13 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 10 Jul 2009 10:45:13 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: Chris, I've added a test to bioperl RemoteBlast.t that demonstrates the following. Is it appropriate to submit it? Jonas, OK, I was a little quick on the gun... but I've got it now. You don't need to change the wrapper. Here is what you need to do: # 1) set your database like this: -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html for other cdd database options # 2) add this line before submitting the job: $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; You're in - No other changes needed. Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] > Sent: Friday, July 10, 2009 4:18 AM > To: BioPerl List; Cook, Malcolm; Chris Fields > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi, > I tried to do what Malcom proposed my ($prog = 'rpsblast'; > my $db = > 'CDD';) but that didn't work. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Value rpsblast for PUT parameter PROGRAM does not match > expression t?blast[ pnx]. Rejecting. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:329 > STACK: Bio::Tools::Run::RemoteBlast::new > C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > ast.pm:257 > STACK: blast_a_seq2.pm:14 > ----------------------------------------------------------- > So I should try to "change the wrapper to allow 'rpsblast'", > right? Could You tell me how to do that, please? So sorry but > I have no idea yet...:) If that doesn't work, is there any > other way to run cdd-searches with perl? > Thank you so much! > Regards, Jonas > > ----- Original Message ----- > From: "Chris Fields" > To: "Cook, Malcolm" > Cc: "'Jonas Schaer'" ; "'BioPerl List'" > ; "'Smithies, Russell'" > ; > Sent: Thursday, July 09, 2009 9:19 PM > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > > > I've scheduled this tentatively for the 1.6 release series (just not > > sure when yet). It may work as is, but I haven't tried it out yet > > (and am hazarding to guess it only retrieves the single main RID at > > the moment). > > > > chris > > > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > > > >> Jonas, > >> > >> If you want to continue to use the bioperl remoteblast interface, > >> probably what you should do is simply call it twice. > >> > >> Once, as you already know how to do, which will return without CDD > >> results. > >> > >> Secondly, to get the CDD results, call remoteblast a second time. > >> This time, using > >> -database => 'CDD' > >> -program => 'rpsblast' > >> > >> However, the wrapper may object to the 'rpsblast' program. It is > >> not listed in the POD - > >> > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > emoteBlast.pm) > >> If so, my guess is that changing the perl wrapper to allow > >> rpsblast will "just work" (tm). I've cc:ed > cjfields at bioperl.org for > >> his opinion on this. > >> > >> Also, you might want to perform the CDD search first, especially if > >> you are streaming results to eyeball that might like something to > >> look at while the second (presumably longer) search is running. > >> > >> Cheers, > >> > >> Malcolm Cook > >> Stowers Institute for Medical Research - Kansas City, Missouri > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>> Jonas Schaer > >>> Sent: Thursday, July 09, 2009 5:16 AM > >>> To: BioPerl List; Smithies, Russell > >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>> > >>> Hi guys, > >>> Thank you all so much for your help and patience :). Of > >>> course you were right and I finaly found the right > >>> put-parameter to get exactly the same hits as on the homepage. > >>> I do have an other question though :)... > >>> I now want to include a search for conserved domains, but > >>> when I try to use the CDD_SEARCH-parameter > >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>> sub:CDD_SEARCH) > >>> like the other put-parameters the way chris once told > >>> me(works fine with the other params): > >>> > >>> my %put = ( > >>> WORD_SIZE => 3, > >>> HITLIST_SIZE => 100, > >>> THRESHOLD => 11, > >>> FILTER => 'R', > >>> GENETIC_CODE => 1, > >>> CDD_SEARCH => 'on' > >>> ###I tried it > >>> with 'true' and '1', too. > >>> > >>> ); > >>> > >>> for my $putName (keys %put) { > >>> $factory->submit_parameter($putName,$put{$putName}); > >>> } > >>> > >>> > >>> ...an exception is thrown: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>> C:/Perl/site/lib/Bio/Tools > >>> /Run/RemoteBlast.pm:325 > >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>> STACK: main::blast_it firsteval0.8.pm:288 > >>> STACK: firsteval0.8.pm:35 > >>> ----------------------------------------------------------- . > >>> I guess somehow this could be the solution to my problem: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>> ub:RID-for-Simultaneous > >>> , but unfortunately I don't understand what to do. > >>> I'm so sorry to bother you with this but please help me once > >>> more...:) > >>> > >>> Best regards and thanks in advance, > >>> Jonas > >>> > >>> ----- Original Message ----- > >>> From: "Smithies, Russell" > >>> To: "'Jonas Schaer'" > >>> Cc: "'Chris Fields'" ; "'BioPerl List'" > >>> > >>> Sent: Monday, July 06, 2009 10:56 PM > >>> Subject: RE: [Bioperl-l] different results with > remote-blast skript > >>> > >>> > >>> Hi Jonas, > >>> You can't just play with the BLAST parameters and hope > for a "better" > >>> result. > >>> I'd suggest that if you aren't sure what they do, you should > >>> leave them > >>> alone as small changes can make huge differences in the > >>> output - it's quite > >>> possible to miss finding what you're looking for by using > the wrong > >>> parameters. > >>> If all else fails, read the blast manual: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>> _all.html > >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>> Or Read Ian Korfs' excellent book: > >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>> > >>> Don't worry about the integer overflow bug as there's nothing > >>> you can do > >>> about it. If you're interested, Google and Wikipedia are your > >>> friends: > >>> http://en.wikipedia.org/wiki/Integer_overflow > >>> > >>> > >>> Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>> To: BioPerl List; Chris Fields > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> Hi guys, thanks for your answers so far. > >>>> @jason: integer overflow in blast.... sorry, but what do > >>> you mean by that? > >>>> how can I fix it...? > >>>> > >>>> Since I never really changed any parameters I thought them > >>> all to be > >>>> default. > >>>> whatever, I tried to get "better" results with my prog > by changing > >>>> these: > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> > >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>> STICS'} = > >>>> '1'; > >>>> with no effect...I guess these were default values anyway. > >>>> > >>>> So please maybe you can tell me all the other parameters I > >>> can change with > >>>> my > >>>> perl-skript AND how to do that? > >>>> Unfortunately both, perl and the blast-algorithm are pretty > >>> much new to > >>>> me, > >>>> maybe thats why I just cannot find out how to do that on my > >>> own... :/ > >>>> > >>>> Here is the output I get with my remote-blast skript: > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> Query Name: > >>>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>> L > >>>> hit name is ref|XP_001702807.1| > >>>> score is 442 > >>>> BLASTP 2.2.21+ > >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>> A. Schaffer, > >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>> Lipman (1997), > >>>> "Gapped > >>>> BLAST and PSI-BLAST: a new generation of protein database search > >>>> programs", > >>>> Nucleic Acids Res. 25:3389-3402. > >>>> > >>>> > >>>> Reference for composition-based statistics: Alejandro A. > >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>> John L. Spouge, > >>>> Yuri > >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>> "Improving the > >>>> accuracy of PSI-BLAST protein database searches with > >>> composition-based > >>>> statistics and other refinements", Nucleic Acids Res. > 29:2994-3005. > >>>> > >>>> > >>>> RID: 53STX5G2013 > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>> from WGS projects > >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= > >>>> > >>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>> > >>> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > >>>> ATGPDPDDEYE > >>>> Length=150 > >>>> > >>>> > >>>> > >>> Score > >>>> E > >>>> Sequences producing significant alignments: > >>> (Bits) > >>>> Value > >>>> > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 174 > >>>> 2e-42 > >>>> > >>>> > >>>> ALIGNMENTS > >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhardtii] > >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>> Length=303 > >>>> > >>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>> Composition-based > >>>> stats. > >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>> Gaps = 0/150 > >>>> (0%) > >>>> > >>>> Query 1 > >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>> 60 > >>>> > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> Sbjct 154 > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> 213 > >>>> > >>>> Query 61 > >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 120 > >>>> > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> Sbjct 214 > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 273 > >>>> > >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE > >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>> > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF > >>>> excluding environmental samples from WGS projects > >>>> Posted date: Jul 5, 2009 4:41 AM > >>>> Number of letters in database: -1,124,994,511 > >>>> Number of sequences in database: 9,252,587 > >>>> > >>>> Lambda K H > >>>> 0.309 0.122 0.345 > >>>> Gapped > >>>> Lambda K H > >>>> 0.267 0.0410 0.140 > >>>> Matrix: BLOSUM62 > >>>> Gap Penalties: Existence: 11, Extension: 1 > >>>> Number of Sequences: 9252587 > >>>> Number of Hits to DB: 60273703 > >>>> Number of extensions: 1448367 > >>>> Number of successful extensions: 2103 > >>>> Number of sequences better than 10: 0 > >>>> Number of HSP's better than 10 without gapping: 0 > >>>> Number of HSP's gapped: 2113 > >>>> Number of HSP's successfully gapped: 0 > >>>> Length of query: 150 > >>>> Length of database: 3169972781 > >>>> Length adjustment: 113 > >>>> Effective length of query: 37 > >>>> Effective length of database: 2124430450 > >>>> Effective search space: 78603926650 > >>>> Effective search space used: 78603926650 > >>>> T: 11 > >>>> A: 40 > >>>> X1: 16 (7.1 bits) > >>>> X2: 38 (14.6 bits) > >>>> X3: 64 (24.7 bits) > >>>> S1: 42 (20.8 bits) > >>>> S2: 74 (33.1 bits) > >>>> > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> and here are the hits (?) of the blast-algorithm on the > >>> ncbi-homepage with > >>>> the same query of course: > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 300 > >>>> 3e-80 > >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>> [Acyrtho... 36.2 > >>>> 1.1 > >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>> [Blautia... 35.4 > >>>> 1.8 > >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>> brazil... 34.3 > >>>> 4.2 > >>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>> [Aspergillus n... 33.5 > >>>> 6.0 > >>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>> [Methyloba... 33.5 > >>>> 7.0 > >>>> > >>> ############################################################## > >>> ################ > >>>> ###################################at > >>>> least the first hit is the same, but even there there is a > >>> different score > >>>> and e-value. > >>>> > >>>> thanks so much for any help :) > >>>> regards, jonas > >>>> > >>>> > >>>> ----- Original Message ----- > >>>> From: "Chris Fields" > >>>> To: "Jason Stajich" > >>>> Cc: "Smithies, Russell" > >>> ; "'BioPerl > >>>> List'" ; "'Jonas Schaer'" > >>>> > >>>> Sent: Monday, July 06, 2009 12:51 AM > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> > >>>>> That inspires confidence ;> > >>>>> > >>>>> chris > >>>>> > >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>> > >>>>>> integer overflow in blast.... > >>>>>> > >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>> > >>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>> Interesting that both have the number of letters in the db as > >>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>> > >>>>>>> Stats from your remote_blast: > >>>>>>> > >>>>>>> 'stats' => { > >>>>>>> 'S1' => '42', > >>>>>>> 'S1_bits' => '20.8', > >>>>>>> 'lambda' => '0.309', > >>>>>>> 'entropy' => '0.345', > >>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>> 'T' => '11', > >>>>>>> 'kappa' => '0.122', > >>>>>>> 'X3_bits' => '24.7', > >>>>>>> 'X1' => '16', > >>>>>>> 'lambda_gapped' => '0.267', > >>>>>>> 'X2' => '38', > >>>>>>> 'S2' => '74', > >>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>> 'dbletters' => '-1125070205', > >>>>>>> 'A' => '40', > >>>>>>> 'num_successful_extensions' => '2004', > >>>>>>> 'num_extensions' => '1436892', > >>>>>>> 'X1_bits' => '7.1', > >>>>>>> 'X3' => '64', > >>>>>>> 'entropy_gapped' => '0.140', > >>>>>>> 'dbentries' => '9252258', > >>>>>>> 'X2_bits' => '14.6', > >>>>>>> 'S2_bits' => '33.1' > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>> > >>>>>>> Database: All non-redundant GenBank CDS > >>> translations+PDB+SwissProt > >>>>>>> +PIR+PRF > >>>>>>> excluding environmental samples from WGS projects > >>>>>>> Posted date: Jul 4, 2009 4:41 AM > >>>>>>> Number of letters in database: -1,125,070,205 > >>>>>>> Number of sequences in database: 9,252,258 > >>>>>>> > >>>>>>> Lambda K H > >>>>>>> 0.309 0.124 0.340 > >>>>>>> Gapped > >>>>>>> Lambda K H > >>>>>>> 0.267 0.0410 0.140 > >>>>>>> Matrix: BLOSUM62 > >>>>>>> Gap Penalties: Existence: 11, Extension: 1 > >>>>>>> Number of Sequences: 9252258 > >>>>>>> Number of Hits to DB: 86493230 > >>>>>>> Number of extensions: 3101413 > >>>>>>> Number of successful extensions: 9001 > >>>>>>> Number of sequences better than 100: 65 > >>>>>>> Number of HSP's better than 100 without gapping: 0 > >>>>>>> Number of HSP's gapped: 9000 > >>>>>>> Number of HSP's successfully gapped: 66 > >>>>>>> Length of query: 150 > >>>>>>> Length of database: 3169897087 > >>>>>>> Length adjustment: 113 > >>>>>>> Effective length of query: 37 > >>>>>>> Effective length of database: 2124391933 > >>>>>>> Effective search space: 78602501521 > >>>>>>> Effective search space used: 78602501521 > >>>>>>> T: 11 > >>>>>>> A: 40 > >>>>>>> X1: 16 (7.1 bits) > >>>>>>> X2: 38 (14.6 bits) > >>>>>>> X3: 64 (24.7 bits) > >>>>>>> S1: 42 (20.8 bits) > >>>>>>> S2: 65 (29.6 bits) > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>> To: BioPerl List > >>>>>>>> Subject: [Bioperl-l] different results with > remote-blast skript > >>>>>>>> > >>>>>>>> Hi again :) > >>>>>>>> please, I only have this little question: > >>>>>>>> why do I get different results with my remote::blast > >>> perl skript > >>>>>>>> then on the > >>>>>>>> ncbi blast homepage? > >>>>>>>> I am using blastp, the query is an amino-sequence (different > >>>>>>>> results with any > >>>>>>>> sequence, differences not only in number of hits but > even in e- > >>>>>>>> values, scores > >>>>>>>> etc...), the database is 'nr'. > >>>>>>>> PLEASE help me, > >>>>>>>> thank you in advance, > >>>>>>>> Jonas > >>>>>>>> > >>>>>>>> ps: my skript: > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> ## > >>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>> use Bio::Tools::Run::RemoteBlast; > >>>>>>>> use strict; > >>>>>>>> my @blast_report; > >>>>>>>> my $prog = 'blastp'; > >>>>>>>> my $db = 'nr'; > >>>>>>>> my $e_val= '1e-10'; > >>>>>>>> #my $e_val= '10'; > >>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>> '-data' => $db, > >>>>>>>> '-expect' => $e_val, > >>>>>>>> '-readmethod' => 'SearchIO' ); > >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>>>> $ > >>>>>>>> Bio > >>>>>>>> > >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>> = '1'; > >>>>>>>> > >>>>>>>> my > >>>>>>>> $ > >>>>>>>> blast_seq > >>>>>>>> > >>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>> > >>>> > >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>> AFRQAHQNTAMATGPD > >>>>>>>> PDDEYE'; > >>>>>>>> #$v is just to turn on and off the messages > >>>>>>>> my $v = 1; > >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>> 'Bio::PrimarySeq'); > >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, > -display_id => > >>>>>>>> "$blast_seq"); > >>>>>>>> my $filename='temp2.out'; > >>>>>>>> my $r = $factory->submit_blast($seq); > >>>>>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>>>>> while ( my @rids = $factory->each_rid ) > >>>>>>>> { > >>>>>>>> foreach my $rid ( @rids ) > >>>>>>>> { > >>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>> if( !ref($rc) ) > >>>>>>>> { > >>>>>>>> if( $rc < 0 ) > >>>>>>>> { > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> } > >>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>> } > >>>>>>>> else > >>>>>>>> { > >>>>>>>> my $result = $rc->next_result(); > >>>>>>>> $factory->save_output($filename); > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> print "\nQuery Name: ", > >>> $result->query_name(), > >>>>>>>> "\n"; > >>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>> { > >>>>>>>> next unless ( $v > 0); > >>>>>>>> print "\thit name is ", > $hit->name, "\n"; > >>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>> { > >>>>>>>> print "\t\tscore is ", > >>> $hsp->score, "\n"; > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> } > >>>>>>>> @blast_report = get_file_data ($filename); > >>>>>>>> return @blast_report; > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> #### > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> Attention: The information contained in this message and/or > >>>>>>> attachments > >>>>>>> from AgResearch Limited is intended only for the > >>> persons or entities > >>>>>>> to which it is addressed and may contain confidential and/or > >>>>>>> privileged > >>>>>>> material. Any review, retransmission, dissemination > or other use > >>>>>>> of, or > >>>>>>> taking of any action in reliance upon, this information > >>> by persons or > >>>>>>> entities other than the intended recipients is prohibited by > >>>>>>> AgResearch > >>>>>>> Limited. If you have received this message in error, > >>> please notify > >>>>>>> the > >>>>>>> sender immediately. > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> -- > >>>>>> Jason Stajich > >>>>>> jason at bioperl.org > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>> -------------------------------------------------------------- > >>> ---------------- > >>>> -- > >>>> > >>>> > >>>> > >>>> No virus found in this incoming message. > >>>> Checked by AVG - www.avg.com > >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>> Date: 07/05/09 > >>>> 05:53:00 > >>> > >>> > >>> -------------------------------------------------------------- > >>> ------------------ > >>> > >>> > >>> > >>> No virus found in this incoming message. > >>> Checked by AVG - www.avg.com > >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>> Date: 07/05/09 > >>> 17:54:00 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > Date: 07/09/09 > 05:55:00 > > From clarsen at vecna.com Fri Jul 10 12:41:37 2009 From: clarsen at vecna.com (Chris Larsen) Date: Fri, 10 Jul 2009 12:41:37 -0400 Subject: [Bioperl-l] Mac platform instructions Message-ID: Brian, I too am on a Mac now. However the 'getting bioperl' MacOs link on: "http://www.bioperl.org/wiki/Getting_BioPerl" which loads the URL: "http://www.bioperl.org/wiki/Getting_BioPerl#MacOS_X_using_fink" does nothing but reload the same page...it took a bit to figure out how to begin install, scroll around etc. since it doesnt behave as do the other platforms links. (FIrefox 3.0.11, OS X 10.5.7). Think I have it now. The rest of the install instructions seem straightforward and should behave as well as the Fedora tarball did, thanks for that documentation. Cheers Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From cjfields1 at gmail.com Fri Jul 10 14:04:43 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Fri, 10 Jul 2009 13:04:43 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> Malcolm, Nice! Go ahead and add the test in; we can look at trying to get CDD_SEARCH working at some point but this is a nice workaround. chris On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > the following. Is it appropriate to submit it? > > Jonas, OK, I was a little quick on the gun... but I've got it now. > > You don't need to change the wrapper. Here is what you need to do: > > # 1) set your database like this: > > -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > for other cdd database options > > # 2) add this line before submitting the job: > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > > You're in - No other changes needed. > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] >> Sent: Friday, July 10, 2009 4:18 AM >> To: BioPerl List; Cook, Malcolm; Chris Fields >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi, >> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >> my $db = >> 'CDD';) but that didn't work. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Value rpsblast for PUT parameter PROGRAM does not match >> expression t?blast[ pnx]. Rejecting. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:329 >> STACK: Bio::Tools::Run::RemoteBlast::new >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >> ast.pm:257 >> STACK: blast_a_seq2.pm:14 >> ----------------------------------------------------------- >> So I should try to "change the wrapper to allow 'rpsblast'", >> right? Could You tell me how to do that, please? So sorry but >> I have no idea yet...:) If that doesn't work, is there any >> other way to run cdd-searches with perl? >> Thank you so much! >> Regards, Jonas >> >> ----- Original Message ----- >> From: "Chris Fields" >> To: "Cook, Malcolm" >> Cc: "'Jonas Schaer'" ; "'BioPerl List'" >> ; "'Smithies, Russell'" >> ; >> Sent: Thursday, July 09, 2009 9:19 PM >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> >>> I've scheduled this tentatively for the 1.6 release series (just not >>> sure when yet). It may work as is, but I haven't tried it out yet >>> (and am hazarding to guess it only retrieves the single main RID at >>> the moment). >>> >>> chris >>> >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>> >>>> Jonas, >>>> >>>> If you want to continue to use the bioperl remoteblast interface, >>>> probably what you should do is simply call it twice. >>>> >>>> Once, as you already know how to do, which will return without CDD >>>> results. >>>> >>>> Secondly, to get the CDD results, call remoteblast a second time. >>>> This time, using >>>> -database => 'CDD' >>>> -program => 'rpsblast' >>>> >>>> However, the wrapper may object to the 'rpsblast' program. It is >>>> not listed in the POD - >>>> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >> emoteBlast.pm) >>>> If so, my guess is that changing the perl wrapper to allow >>>> rpsblast will "just work" (tm). I've cc:ed >> cjfields at bioperl.org for >>>> his opinion on this. >>>> >>>> Also, you might want to perform the CDD search first, especially if >>>> you are streaming results to eyeball that might like something to >>>> look at while the second (presumably longer) search is running. >>>> >>>> Cheers, >>>> >>>> Malcolm Cook >>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> Jonas Schaer >>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>> To: BioPerl List; Smithies, Russell >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>> >>>>> Hi guys, >>>>> Thank you all so much for your help and patience :). Of >>>>> course you were right and I finaly found the right >>>>> put-parameter to get exactly the same hits as on the homepage. >>>>> I do have an other question though :)... >>>>> I now want to include a search for conserved domains, but >>>>> when I try to use the CDD_SEARCH-parameter >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>> sub:CDD_SEARCH) >>>>> like the other put-parameters the way chris once told >>>>> me(works fine with the other params): >>>>> >>>>> my %put = ( >>>>> WORD_SIZE => 3, >>>>> HITLIST_SIZE => 100, >>>>> THRESHOLD => 11, >>>>> FILTER => 'R', >>>>> GENETIC_CODE => 1, >>>>> CDD_SEARCH => 'on' >>>>> ###I tried it >>>>> with 'true' and '1', too. >>>>> >>>>> ); >>>>> >>>>> for my $putName (keys %put) { >>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>> } >>>>> >>>>> >>>>> ...an exception is thrown: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>> C:/Perl/site/lib/Bio/Tools >>>>> /Run/RemoteBlast.pm:325 >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>> STACK: firsteval0.8.pm:35 >>>>> ----------------------------------------------------------- . >>>>> I guess somehow this could be the solution to my problem: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>> ub:RID-for-Simultaneous >>>>> , but unfortunately I don't understand what to do. >>>>> I'm so sorry to bother you with this but please help me once >>>>> more...:) >>>>> >>>>> Best regards and thanks in advance, >>>>> Jonas >>>>> >>>>> ----- Original Message ----- >>>>> From: "Smithies, Russell" >>>>> To: "'Jonas Schaer'" >>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>>>> >>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>> Subject: RE: [Bioperl-l] different results with >> remote-blast skript >>>>> >>>>> >>>>> Hi Jonas, >>>>> You can't just play with the BLAST parameters and hope >> for a "better" >>>>> result. >>>>> I'd suggest that if you aren't sure what they do, you should >>>>> leave them >>>>> alone as small changes can make huge differences in the >>>>> output - it's quite >>>>> possible to miss finding what you're looking for by using >> the wrong >>>>> parameters. >>>>> If all else fails, read the blast manual: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>> _all.html >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>> Or Read Ian Korfs' excellent book: >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>> >>>>> Don't worry about the integer overflow bug as there's nothing >>>>> you can do >>>>> about it. If you're interested, Google and Wikipedia are your >>>>> friends: >>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>> >>>>> >>>>> Russell >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>> To: BioPerl List; Chris Fields >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> Hi guys, thanks for your answers so far. >>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>> you mean by that? >>>>>> how can I fix it...? >>>>>> >>>>>> Since I never really changed any parameters I thought them >>>>> all to be >>>>>> default. >>>>>> whatever, I tried to get "better" results with my prog >> by changing >>>>>> these: >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>> >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>> STICS'} = >>>>>> '1'; >>>>>> with no effect...I guess these were default values anyway. >>>>>> >>>>>> So please maybe you can tell me all the other parameters I >>>>> can change with >>>>>> my >>>>>> perl-skript AND how to do that? >>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>> much new to >>>>>> me, >>>>>> maybe thats why I just cannot find out how to do that on my >>>>> own... :/ >>>>>> >>>>>> Here is the output I get with my remote-blast skript: >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> Query Name: >>>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>> L >>>>>> hit name is ref|XP_001702807.1| >>>>>> score is 442 >>>>>> BLASTP 2.2.21+ >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>> A. Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>> Lipman (1997), >>>>>> "Gapped >>>>>> BLAST and PSI-BLAST: a new generation of protein database search >>>>>> programs", >>>>>> Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> >>>>>> Reference for composition-based statistics: Alejandro A. >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>> John L. Spouge, >>>>>> Yuri >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>> "Improving the >>>>>> accuracy of PSI-BLAST protein database searches with >>>>> composition-based >>>>>> statistics and other refinements", Nucleic Acids Res. >> 29:2994-3005. >>>>>> >>>>>> >>>>>> RID: 53STX5G2013 >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF excluding >> environmental samples >>>>>> from WGS projects >>>>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>>>> >>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>> >>>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>>>> ATGPDPDDEYE >>>>>> Length=150 >>>>>> >>>>>> >>>>>> >>>>> Score >>>>>> E >>>>>> Sequences producing significant alignments: >>>>> (Bits) >>>>>> Value >>>>>> >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 174 >>>>>> 2e-42 >>>>>> >>>>>> >>>>>> ALIGNMENTS >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhardtii] >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>> Length=303 >>>>>> >>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>> Composition-based >>>>>> stats. >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>> Gaps = 0/150 >>>>>> (0%) >>>>>> >>>>>> Query 1 >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>> 60 >>>>>> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> Sbjct 154 >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> 213 >>>>>> >>>>>> Query 61 >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 120 >>>>>> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> Sbjct 214 >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 273 >>>>>> >>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>> >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 5, 2009 4:41 AM >>>>>> Number of letters in database: -1,124,994,511 >>>>>> Number of sequences in database: 9,252,587 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.122 0.345 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252587 >>>>>> Number of Hits to DB: 60273703 >>>>>> Number of extensions: 1448367 >>>>>> Number of successful extensions: 2103 >>>>>> Number of sequences better than 10: 0 >>>>>> Number of HSP's better than 10 without gapping: 0 >>>>>> Number of HSP's gapped: 2113 >>>>>> Number of HSP's successfully gapped: 0 >>>>>> Length of query: 150 >>>>>> Length of database: 3169972781 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124430450 >>>>>> Effective search space: 78603926650 >>>>>> Effective search space used: 78603926650 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 74 (33.1 bits) >>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> and here are the hits (?) of the blast-algorithm on the >>>>> ncbi-homepage with >>>>>> the same query of course: >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 300 >>>>>> 3e-80 >>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>> [Acyrtho... 36.2 >>>>>> 1.1 >>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>> [Blautia... 35.4 >>>>>> 1.8 >>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>> brazil... 34.3 >>>>>> 4.2 >>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>> [Aspergillus n... 33.5 >>>>>> 6.0 >>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>> [Methyloba... 33.5 >>>>>> 7.0 >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ###################################at >>>>>> least the first hit is the same, but even there there is a >>>>> different score >>>>>> and e-value. >>>>>> >>>>>> thanks so much for any help :) >>>>>> regards, jonas >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Chris Fields" >>>>>> To: "Jason Stajich" >>>>>> Cc: "Smithies, Russell" >>>>> ; "'BioPerl >>>>>> List'" ; "'Jonas Schaer'" >>>>>> >>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> >>>>>>> That inspires confidence ;> >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>> >>>>>>>> integer overflow in blast.... >>>>>>>> >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>> >>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>> Interesting that both have the number of letters in the db as >>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>> >>>>>>>>> Stats from your remote_blast: >>>>>>>>> >>>>>>>>> 'stats' => { >>>>>>>>> 'S1' => '42', >>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>> 'lambda' => '0.309', >>>>>>>>> 'entropy' => '0.345', >>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>> 'T' => '11', >>>>>>>>> 'kappa' => '0.122', >>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>> 'X1' => '16', >>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>> 'X2' => '38', >>>>>>>>> 'S2' => '74', >>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>> 'A' => '40', >>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>> 'X3' => '64', >>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>> 'dbentries' => '9252258', >>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>> >>>>>>>>> Database: All non-redundant GenBank CDS >>>>> translations+PDB+SwissProt >>>>>>>>> +PIR+PRF >>>>>>>>> excluding environmental samples from WGS projects >>>>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>>>> Number of letters in database: -1,125,070,205 >>>>>>>>> Number of sequences in database: 9,252,258 >>>>>>>>> >>>>>>>>> Lambda K H >>>>>>>>> 0.309 0.124 0.340 >>>>>>>>> Gapped >>>>>>>>> Lambda K H >>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>> Matrix: BLOSUM62 >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>>>> Number of Sequences: 9252258 >>>>>>>>> Number of Hits to DB: 86493230 >>>>>>>>> Number of extensions: 3101413 >>>>>>>>> Number of successful extensions: 9001 >>>>>>>>> Number of sequences better than 100: 65 >>>>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>>>> Number of HSP's gapped: 9000 >>>>>>>>> Number of HSP's successfully gapped: 66 >>>>>>>>> Length of query: 150 >>>>>>>>> Length of database: 3169897087 >>>>>>>>> Length adjustment: 113 >>>>>>>>> Effective length of query: 37 >>>>>>>>> Effective length of database: 2124391933 >>>>>>>>> Effective search space: 78602501521 >>>>>>>>> Effective search space used: 78602501521 >>>>>>>>> T: 11 >>>>>>>>> A: 40 >>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>> To: BioPerl List >>>>>>>>>> Subject: [Bioperl-l] different results with >> remote-blast skript >>>>>>>>>> >>>>>>>>>> Hi again :) >>>>>>>>>> please, I only have this little question: >>>>>>>>>> why do I get different results with my remote::blast >>>>> perl skript >>>>>>>>>> then on the >>>>>>>>>> ncbi blast homepage? >>>>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>>>> results with any >>>>>>>>>> sequence, differences not only in number of hits but >> even in e- >>>>>>>>>> values, scores >>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>> PLEASE help me, >>>>>>>>>> thank you in advance, >>>>>>>>>> Jonas >>>>>>>>>> >>>>>>>>>> ps: my skript: >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> ## >>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>>>> use strict; >>>>>>>>>> my @blast_report; >>>>>>>>>> my $prog = 'blastp'; >>>>>>>>>> my $db = 'nr'; >>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>> #my $e_val= '10'; >>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>> '-data' => $db, >>>>>>>>>> '-expect' => $e_val, >>>>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>>>> $ >>>>>>>>>> Bio >>>>>>>>>> >>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>> = '1'; >>>>>>>>>> >>>>>>>>>> my >>>>>>>>>> $ >>>>>>>>>> blast_seq >>>>>>>>>> >>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>> >>>>>> >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>> AFRQAHQNTAMATGPD >>>>>>>>>> PDDEYE'; >>>>>>>>>> #$v is just to turn on and off the messages >>>>>>>>>> my $v = 1; >>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>> 'Bio::PrimarySeq'); >>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, >> -display_id => >>>>>>>>>> "$blast_seq"); >>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>>>> { >>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>> { >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>> if( !ref($rc) ) >>>>>>>>>> { >>>>>>>>>> if( $rc < 0 ) >>>>>>>>>> { >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> } >>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>> } >>>>>>>>>> else >>>>>>>>>> { >>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>> $factory->save_output($filename); >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> print "\nQuery Name: ", >>>>> $result->query_name(), >>>>>>>>>> "\n"; >>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>> { >>>>>>>>>> next unless ( $v > 0); >>>>>>>>>> print "\thit name is ", >> $hit->name, "\n"; >>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>> { >>>>>>>>>> print "\t\tscore is ", >>>>> $hsp->score, "\n"; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>>>> return @blast_report; >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> #### >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>> attachments >>>>>>>>> from AgResearch Limited is intended only for the >>>>> persons or entities >>>>>>>>> to which it is addressed and may contain confidential and/or >>>>>>>>> privileged >>>>>>>>> material. Any review, retransmission, dissemination >> or other use >>>>>>>>> of, or >>>>>>>>> taking of any action in reliance upon, this information >>>>> by persons or >>>>>>>>> entities other than the intended recipients is prohibited by >>>>>>>>> AgResearch >>>>>>>>> Limited. If you have received this message in error, >>>>> please notify >>>>>>>>> the >>>>>>>>> sender immediately. >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> -- >>>>>>>> Jason Stajich >>>>>>>> jason at bioperl.org >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ---------------- >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> No virus found in this incoming message. >>>>>> Checked by AVG - www.avg.com >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>> Date: 07/05/09 >>>>>> 05:53:00 >>>>> >>>>> >>>>> -------------------------------------------------------------- >>>>> ------------------ >>>>> >>>>> >>>>> >>>>> No virus found in this incoming message. >>>>> Checked by AVG - www.avg.com >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>> Date: 07/05/09 >>>>> 17:54:00 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >> Date: 07/09/09 >> 05:55:00 >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jul 10 15:12:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 10 Jul 2009 15:12:35 -0400 Subject: [Bioperl-l] Mac platform instructions In-Reply-To: References: Message-ID: should be http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink -- ----- Original Message ----- From: "Chris Larsen" To: Sent: Friday, July 10, 2009 12:41 PM Subject: Re: [Bioperl-l] Mac platform instructions > Brian, > > I too am on a Mac now. However the 'getting bioperl' MacOs link on: > "http://www.bioperl.org/wiki/Getting_BioPerl" > > which loads the URL: > "http://www.bioperl.org/wiki/Getting_BioPerl#MacOS_X_using_fink" > > does nothing but reload the same page...it took a bit to figure out > how to begin install, scroll around etc. since it doesnt behave as do > the other platforms links. (FIrefox 3.0.11, OS X 10.5.7). Think I have > it now. > > The rest of the install instructions seem straightforward and should > behave as well as the Fedora tarball did, thanks for that documentation. > > Cheers > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > 240-737-4525 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From plantboy at gmail.com Fri Jul 10 19:33:25 2009 From: plantboy at gmail.com (cody h) Date: Fri, 10 Jul 2009 16:33:25 -0700 Subject: [Bioperl-l] Trouble installing bioperl-db on MacOS X... Help? Message-ID: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> Hi, I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7. The Build.PL file executes fine, but the test suite fails dramatically, returning the error "No database selected" for many of the tests. All the error calls seem to be originating from line 852 in BasePersistenceAdaptor.pm. I took a look at the code but I could not figure out why it wasn't working. I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql server. The dependencies all seem to be working, but I haven't used them enough to completely verify this, so that could be part of the problem. I don't know which ones to check though. Does anyone have any idea why I might be getting these "No database selected" errors? Here is a sample of the error messages given by the ./Build test command (note, this same error is generated by 15/16 test files) I am new to Perl and would really appreciate any help or guidance at all! Thanks! Cody t/12ontology.t .... 1/738 ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB /BioSQL/BasePersistenceAdaptor.pm:948 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB /BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/12ontology.t:44 ----------------------------------------------------------- t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00) From hlapp at gmx.net Sat Jul 11 07:32:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Jul 2009 07:32:11 -0400 Subject: [Bioperl-l] Trouble installing bioperl-db on MacOS X... Help? In-Reply-To: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> References: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> Message-ID: <7F2442F5-2224-405C-92A0-97E34FDFC2F9@gmx.net> Hi Cody, it seems like bioperl-db fails to connect to your database, or it connects but doesn't have a database selected (in MySQL connecting without setting the database is legitimate) and so as soon as it wants to execute a statement it fails. Have you set your connection parameters in t/DBTestHarness.conf? -hilmar On Jul 10, 2009, at 7:33 PM, cody h wrote: > Hi, > > I'm trying to install bioperl-db 1.5.2 on an intel mac running os > 10.5.7. > The Build.PL file executes fine, but the test suite fails > dramatically, > returning the error "No database selected" for many of the tests. > All the > error calls seem to be originating from line 852 in > BasePersistenceAdaptor.pm. I took a look at the code but I could not > figure > out why it wasn't working. > > I have bioperl 1.5.2 installed and the biosql schema loaded into my > mysql > server. The dependencies all seem to be working, but I haven't used > them > enough to completely verify this, so that could be part of the > problem. I > don't know which ones to check though. Does anyone have any idea why > I might > be getting these "No database selected" errors? Here is a sample of > the > error messages given by the ./Build test command (note, this same > error is > generated by 15/16 test files) > > I am new to Perl and would really appreciate any help or guidance at > all! > Thanks! > > Cody > > t/12ontology.t .... 1/738 > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: > No database selected > STACK: Error::throw > STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB > /BioSQL/BasePersistenceAdaptor.pm:948 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB > /BioSQL/BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_1