From jrm at compbio.dundee.ac.uk Wed Mar 1 10:19:19 2006 From: jrm at compbio.dundee.ac.uk (Jon Manning) Date: Wed, 01 Mar 2006 15:19:19 +0000 Subject: [Bioperl-l] pSW question- hanging tails of alignments Message-ID: <4405BB77.2090402@compbio.dundee.ac.uk> Hi all, I'm using the Bio::Tools::pSW module to align two very similar sequences. But one is a little longer at the start than the other. Rather than showing this, pSW simply trims it off and it's not present in the resulting Bio::SimpleAlign object- a sort of BLAST-like behaviour I suppose, showing only aligned regions. But is there a parameter I can pass to allow these regions to be kept? Or am I doing something wrong. Incidentally, the behavior is the same in dpAlign. Thanks, Jon Manning From haralds_listen at gmx.de Wed Mar 1 10:44:18 2006 From: haralds_listen at gmx.de (Harald) Date: Wed, 01 Mar 2006 16:44:18 +0100 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" Message-ID: <4405C152.7070507@gmx.de> Hi all. I am a Bioinformatics Newbie and want to use BioPerl for doing BLASTs on a local protein-sequence file. Unfortunately something went wrong, which gives me the error message: "...Unable to open BLOSUM62...". I am using Win2000 on a normal Desktop PC with 1.4 GHz AMD and 256 MB RAM (I know, this is not much, but my sequence database is not so big). My perl-interpreter is from ActiveState and in the version 5.8.7 built for MSWin32-x86-multi-thread BioPerl is version 1.4 Blast has version 2.2.13 (1) I have downloaded the blast distribution from http://www.ncbi.nlm.nih.gov/blast/download.shtml (2) I have installed it and put the file ncbi.ini into my WINNT-directory, which is pointing to blast-directory/data ==============C:/WINNT/ncbi.ini======================== [NCBI] Data="D:/BLAST/BLAST-2.2.13/data" =================================================== (3) I downloaded a FASTA-formatted protein-sequence database file (ftp://ftp.ncbi.nih.gov/pub/COG/KOG/kyva) [48MB] (4) I used formatdb -i kyva and received the files kyva.phr, kyva.pin and kyva which I put into the directory, where my perl-script resides. (5) Now I tried to BLAST this database for a sequence, I have copied out of it. Therefore I just modified the BLAST-script from the Beginner-HowTo: ==================D:/BLAST/StandAloneBlast.pl====================== use strict; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; my @params = (program => 'blastp', database => 'kyva' ); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"test_query", -seq =>"SYFICPISQEVMREPRVAADGFTYEAESLREWLDNGHETSPMTNLKLAHNNLVPNHALRSAIQEWLQRNS"); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; ============================================================= (6) But if I trigger this script, I receive: ============================================================== [NULL_Caption] WARNING: [000.000] test_query: Unable to open BLOSUM62 [NULL_Caption] WARNING: [000.000] test_query: BlastScoreBlkMatFill returned no n-zero status [NULL_Caption] WARNING: [000.000] test_query: SetUpBlastSearch failed. 0 ============================================================== Does anyone has a clue, what is going wrong? Since my ncbi.ini - file points to the directory, where blosum62 is found, I can not understand this error-message. With kind regards, frustrated Harald From cjfields at uiuc.edu Wed Mar 1 11:34:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 1 Mar 2006 10:34:51 -0600 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant openBLOSUM62" In-Reply-To: <4405C152.7070507@gmx.de> Message-ID: <000a01c63d4e$1007fc20$15327e82@pyrimidine> You're running a pretty old version of bioperl. Bioperl 1.5.1 is the latest and should work with BLAST output from v 2.2.13 (NCBI never upgraded the formatting for netblast client and local blast, so it's text output is exactly like v. 2.2.12; only web output has changed). I suggest upgrading to the latest CVS using the following instructions: http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Beyond_the_Core Don't worry if 'nmake test' fails a number of times; it's a bit flaky with bioperl-live. This should install over the old version. If you're really wanting to stick with PPM and are especially brave, I put up a VERY preliminary online method of making a custom PPM version of bioperl: http://www.bioperl.org/wiki/Create_a_Bioperl_PPM_Package It's a bit longish. But the nice thing about this is you get very pretty, browsable autogenerated HTML docs from POD for all of the installed modules, including bioperl! New versions of PPM do this automagically. If installing from CVS doesn't fix it drop a line here again. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Harald > Sent: Wednesday, March 01, 2006 9:44 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant > openBLOSUM62" > > Hi all. > > I am a Bioinformatics Newbie and want to use BioPerl for doing BLASTs on > a local protein-sequence file. Unfortunately something went wrong, which > gives me the error message: "...Unable to open BLOSUM62...". > > I am using Win2000 on a normal Desktop PC with 1.4 GHz AMD and 256 MB > RAM (I know, this is not much, but my sequence database is not so big). > My perl-interpreter is from ActiveState and in the version 5.8.7 built > for MSWin32-x86-multi-thread > BioPerl is version 1.4 > Blast has version 2.2.13 > > > (1) I have downloaded the blast distribution from > http://www.ncbi.nlm.nih.gov/blast/download.shtml > > (2) I have installed it and put the file ncbi.ini into my > WINNT-directory, which is pointing to blast-directory/data > ==============C:/WINNT/ncbi.ini======================== > [NCBI] > Data="D:/BLAST/BLAST-2.2.13/data" > =================================================== > > (3) I downloaded a FASTA-formatted protein-sequence database file > (ftp://ftp.ncbi.nih.gov/pub/COG/KOG/kyva) [48MB] > > (4) I used formatdb -i kyva and received the files kyva.phr, kyva.pin > and kyva which I put into the directory, where my perl-script resides. > > (5) Now I tried to BLAST this database for a sequence, I have copied out > of it. Therefore I just modified the BLAST-script from the Beginner-HowTo: > ==================D:/BLAST/StandAloneBlast.pl====================== > use strict; > > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastp', database => 'kyva' ); > > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > my $seq_obj = Bio::Seq->new(-id =>"test_query", -seq > =>"SYFICPISQEVMREPRVAADGFTYEAESLREWLDNGHETSPMTNLKLAHNNLVPNHALRSAIQEWLQRNS" > ); > > > my $report_obj = $blast_obj->blastall($seq_obj); > > my $result_obj = $report_obj->next_result; > > print $result_obj->num_hits; > > ============================================================= > > (6) But if I trigger this script, I receive: > ============================================================== > [NULL_Caption] WARNING: [000.000] test_query: Unable to open BLOSUM62 > [NULL_Caption] WARNING: [000.000] test_query: BlastScoreBlkMatFill > returned no > n-zero status > [NULL_Caption] WARNING: [000.000] test_query: SetUpBlastSearch failed. > 0 > ============================================================== > > Does anyone has a clue, what is going wrong? Since my ncbi.ini - file > points to the directory, where blosum62 is found, I can not understand > this error-message. > > With kind regards, > frustrated Harald > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Mar 1 11:36:07 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 1 Mar 2006 11:36:07 -0500 Subject: [Bioperl-l] pSW question- hanging tails of alignments In-Reply-To: <4405BB77.2090402@compbio.dundee.ac.uk> References: <4405BB77.2090402@compbio.dundee.ac.uk> Message-ID: <98AE803A-AC72-42F9-A640-17C4401FF936@duke.edu> that is actuall local alignment behavior - look into a global alignment like needleman-wunsch implemented in EMBOSS in the tool 'needle'. On Mar 1, 2006, at 10:19 AM, Jon Manning wrote: > Hi all, > > I'm using the Bio::Tools::pSW module to align two very similar > sequences. But one is a little longer at the start than the other. > Rather than showing this, pSW simply trims it off and it's not present > in the resulting Bio::SimpleAlign object- a sort of BLAST-like > behaviour > I suppose, showing only aligned regions. But is there a parameter I > can > pass to allow these regions to be kept? Or am I doing something wrong. > > Incidentally, the behavior is the same in dpAlign. > > Thanks, > > Jon Manning > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cain at cshl.edu Wed Mar 1 22:26:45 2006 From: cain at cshl.edu (Scott Cain) Date: Wed, 01 Mar 2006 22:26:45 -0500 Subject: [Bioperl-l] Problem with bp_pg_bulk_load_gff In-Reply-To: References: Message-ID: <1141270005.8373.107.camel@localhost.localdomain> Hi Marco, Please always send questions about bioperl to both the bioperl list and the author so that the questions (and answers) can be archived. The Postgres bulk loader tries to use a temporary directory to write files to; it tries to use these directories in order: $TMPDIR (environment variables) $TMP /usr/tmp So if you don't have either TMPDIR or TMP environment variables set, it will try to use /usr/tmp. So, to fix this, either set one of those variables, or create /usr/tmp and make it world readable and writable. Scott On Wed, 2006-03-01 at 16:23 -0800, Marco Blanchette wrote: > Scott-- > > I am trying to use the pg_bulk_load_gff.pl script you wrote without > any success. I have PostgreSQL up and running and I can use: > $ bp_load_gff.pl --adaptor dbi::pg -d chr4 --create > dmel-4-r4.2.1.gff > to load a gff file to the chr4 PostgresSQL database. > > Here is the output I get when I run your script: > > $ bp_pg_bulk_load_gff.pl -d chr4 dmel-4-r4.2.1.gff > > This operation will delete all existing data in database chr4. > Continue? y > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_fdna" for table "fdna" > NOTICE: CREATE TABLE will create implicit sequence "fdata_fid_seq" > for serial column "fdata.fid" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_fdata" for table "fdata" > NOTICE: CREATE TABLE will create implicit sequence > "fattribute_fattribute_id_seq" for serial column > "fattribute.fattribute_id" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_fattribute" for table "fattribute" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_fmeta" for table "fmeta" > NOTICE: CREATE TABLE will create implicit sequence > "ftype_ftypeid_seq" for serial column "ftype.ftypeid" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_ftype" for table "ftype" > NOTICE: CREATE TABLE / UNIQUE will create implicit index > "ftype_ftype" for table "ftype" > NOTICE: CREATE TABLE will create implicit sequence "fgroup_gid_seq" > for serial column "fgroup.gid" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pk_fgroup" for table "fgroup" > fdata: No such file or directory at /usr/bin/bp_pg_bulk_load_gff.pl > line 213. > > Somehow it breaks at line 213 within the foreach block: > > 212 foreach (@files) { > 213 $FH{$_} = IO::File->new("$tmpdir/$_.$$",">") or die $_,": $!"; > 214 $FH{$_}->autoflush; > 215 } > > Any idea what?s the problem?? > > Many tx > > Marco > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > -- > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Wed Mar 1 23:04:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 1 Mar 2006 22:04:08 -0600 Subject: [Bioperl-l] WGS sequences through Bio::DB::GenBank In-Reply-To: Message-ID: <000301c63dae$5aca0270$15327e82@pyrimidine> Thanks, Brian. I was actually typing this up when you responded. Okay, to answer my own question somewhat (and to confirm your answer), there IS no direct way; efetch doesn't complete these files, so the best way is with a query. I'm posting this so anybody searching the mail list with the same question will maybe find this. The NCBI help desk basically told me to use a query like so: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term =AAOH00000000[accn]+AND+wgs_contig[prop] which needs to be parsed for the individual contigs. I tried the same query using Bio::DB::Query::GenBank and got it to work. As for NCBIHelper, I'll give it a look and try adding this in but it won't be until next week. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Wednesday, March 01, 2006 9:55 PM > To: Chris Fields > Subject: Re: [Bioperl-l] WGS sequences through Bio::DB::GenBank > > Chris, > > No, NCBIHelper.pm doesn't handle the WGS block, presumably this is where > it > should be coded. The approach would be very similar to that used for the > CONTIG block, piece the sequence together by retrieving the CONTIG > information specified by the WGS_SCAFLD entries. > > Brian O. > > > On 2/28/06 9:41 PM, "Chris Fields" wrote: > > > I know that a recent post showed that you could retrieve CONTIG > sequences > > from GenBank files fairly easily: > > > > http://bioperl.org/pipermail/bioperl-l/2006-February/020891.html > > > > I'm driving myself a bit buggy looking for this, and I may be blind to > it, > > but can the same be done with WGS files? I've tried Bio::DB::GenBank > and a > > few other Bio::DB* modules to see if it's been implemented but haven't > had > > any luck yet. I may try getting around it using > Bio::DB::Query::GenBank, > > but just trying to find a more direct route. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Thu Mar 2 03:53:17 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Thu, 02 Mar 2006 00:53:17 -0800 Subject: [Bioperl-l] Bio::Tools::Run::* extending Message-ID: <4406B27D.1000103@sfu.ca> Hi all, I'd like to write some more Bio::Tools::Run::* wrappers, so I'm wondering what the canonical approach is. Which libraries *should* I look at (as examples of the bioperl received wisdom and coding standards) and which ones *shouldn't* I look at? I'm thinking that paup, mrbayes and modeltest need wrappers. Thanks! Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From torsten.seemann at infotech.monash.edu.au Thu Mar 2 05:50:44 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 02 Mar 2006 21:50:44 +1100 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <4405C152.7070507@gmx.de> References: <4405C152.7070507@gmx.de> Message-ID: <4406CE04.1010807@infotech.monash.edu.au> Harald, > I am a Bioinformatics Newbie and want to use BioPerl for doing BLASTs on > a local protein-sequence file. Unfortunately something went wrong, which > gives me the error message: "...Unable to open BLOSUM62...". > ==============C:/WINNT/ncbi.ini======================== > [NCBI] > Data="D:/BLAST/BLAST-2.2.13/data" > =================================================== > [NULL_Caption] WARNING: [000.000] test_query: Unable to open BLOSUM62 > [NULL_Caption] WARNING: [000.000] test_query: BlastScoreBlkMatFill > returned no > n-zero status > [NULL_Caption] WARNING: [000.000] test_query: SetUpBlastSearch failed. > 0 > Does anyone has a clue, what is going wrong? Since my ncbi.ini - file > points to the directory, where blosum62 is found, I can not understand > this error-message. It really looks like it can't find BLOSUM62, which indicates that the ncbi.ini file is not being read, or has the wrong syntax. Those "/" forward slashes look potentially problematic on a Windows platform - perhaps try using "\" backslashes. ie. [NCBI] Data="D:\BLAST\BLAST-2.2.13\data" Just an idea, --Torsten Seemann --Victorian Bioinformatics Consortium, Australia. From cjfields at uiuc.edu Thu Mar 2 07:38:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 06:38:46 -0600 Subject: [Bioperl-l] Bio::Tools::Run::* extending In-Reply-To: <4406B27D.1000103@sfu.ca> References: <4406B27D.1000103@sfu.ca> Message-ID: <03890EA2-9B73-4626-8DF6-0208B02BB186@uiuc.edu> Look at bioperl-run: http://www.bioperl.org/wiki/Run_package You can look over these modules for ideas, hints, etc. Chris On Mar 2, 2006, at 2:53 AM, Rutger Vos wrote: > Hi all, > > I'd like to write some more Bio::Tools::Run::* wrappers, so I'm > wondering what the canonical approach is. Which libraries *should* I > look at (as examples of the bioperl received wisdom and coding > standards) and which ones *shouldn't* I look at? I'm thinking that > paup, > mrbayes and modeltest need wrappers. > > Thanks! > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 2 09:08:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 08:08:24 -0600 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <4406CE04.1010807@infotech.monash.edu.au> Message-ID: <000701c63e02$c52b8c20$15327e82@pyrimidine> Torsten's right. This is what I have for BLASTDIR: C:\Documents and Settings\Administrator>echo %BLASTDIR% C:\Research\blast Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Thursday, March 02, 2006 4:51 AM > To: bioperl-l at lists.open-bio.org > Cc: Harald > Subject: Re: [Bioperl-l] newbie tries StandAloneBlast and receives "cant > open BLOSUM62" > > Harald, > > > I am a Bioinformatics Newbie and want to use BioPerl for doing BLASTs on > > a local protein-sequence file. Unfortunately something went wrong, which > > gives me the error message: "...Unable to open BLOSUM62...". > > > ==============C:/WINNT/ncbi.ini======================== > > [NCBI] > > Data="D:/BLAST/BLAST-2.2.13/data" > > =================================================== > > > [NULL_Caption] WARNING: [000.000] test_query: Unable to open BLOSUM62 > > [NULL_Caption] WARNING: [000.000] test_query: BlastScoreBlkMatFill > > returned no > > n-zero status > > [NULL_Caption] WARNING: [000.000] test_query: SetUpBlastSearch failed. > > 0 > > > Does anyone has a clue, what is going wrong? Since my ncbi.ini - file > > points to the directory, where blosum62 is found, I can not understand > > this error-message. > > It really looks like it can't find BLOSUM62, which indicates that the > ncbi.ini file is not being read, or has the wrong syntax. > > Those "/" forward slashes look potentially problematic on a Windows > platform - perhaps try using "\" backslashes. ie. > > [NCBI] > Data="D:\BLAST\BLAST-2.2.13\data" > > Just an idea, > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Australia. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From praveecbt at yahoo.co.in Thu Mar 2 00:31:05 2006 From: praveecbt at yahoo.co.in (Praveen Raj) Date: Thu, 2 Mar 2006 05:31:05 +0000 (GMT) Subject: [Bioperl-l] (no subject) Message-ID: <20060302053105.92192.qmail@web8713.mail.in.yahoo.com> Dear Sir, I am Praveen Raj, doing a project as a part of my Master degree(Bioinformatics) at National Institute of Virology(Virus Research Centre of Govt. of India). Sir, I have one question in BioPerl. I have generated a Phylogeney tree object from a Clustalw alignment object(SimpleAlign) using the method 'make_tree()' of 'Bio::Tree::DistanceFactory'. The tree is successfull and I have generated the image also. Sir, Now the problem is , I want to generate a boot-strapped tree from the SimpleAlign object.If the tree is boot-strapped, it is easy to measure the reliability of the tree. Sir, How can I generate such a tree with bootstapping values in the node? Hope you u got the problem. Sir, I am waiting for a great advice from you. Thanking you, Praveen Raj, roject Student, National Institute of Virology, Pune, INDIA. __________________________________________________________ Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com From haralds_listen at gmx.de Thu Mar 2 09:09:09 2006 From: haralds_listen at gmx.de (Harald) Date: Thu, 02 Mar 2006 15:09:09 +0100 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <000701c63e02$c52b8c20$15327e82@pyrimidine> References: <000701c63e02$c52b8c20$15327e82@pyrimidine> Message-ID: <4406FC85.90704@gmx.de> The cause for my problem was that I had no BLASTDIR set. Now that this path is on my PATH-environment variable all is fine, although my ncbi.ini points to the data-directory in a UNIX-like format. Regards, Harald >Torsten's right. This is what I have for BLASTDIR: > >C:\Documents and Settings\Administrator>echo %BLASTDIR% >C:\Research\blast > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > From osborne1 at optonline.net Thu Mar 2 09:27:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 02 Mar 2006 09:27:02 -0500 Subject: [Bioperl-l] Bio::Tools::Run::* extending In-Reply-To: <4406B27D.1000103@sfu.ca> Message-ID: Rutger, For canonical approaches I'd start by looking at modules that Jason's written, Bio::Tools::Run::Phylo::Molphy::ProtML for example, but not limited to that example since there's plenty of variation possible. Brian O. On 3/2/06 3:53 AM, "Rutger Vos" wrote: > Hi all, > > I'd like to write some more Bio::Tools::Run::* wrappers, so I'm > wondering what the canonical approach is. Which libraries *should* I > look at (as examples of the bioperl received wisdom and coding > standards) and which ones *shouldn't* I look at? I'm thinking that paup, > mrbayes and modeltest need wrappers. > > Thanks! > > Rutger From cjfields at uiuc.edu Thu Mar 2 11:20:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 10:20:45 -0600 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <4406FC85.90704@gmx.de> Message-ID: <001601c63e15$42c722e0$15327e82@pyrimidine> Sounds good! I made changes to the wiki install file to show examples of setting up env variables in Windows to clear up any confusion: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Setting_environmen t_variables Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Harald > Sent: Thursday, March 02, 2006 8:09 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Torsten Seemann' > Subject: Re: [Bioperl-l] newbie tries StandAloneBlast and receives "cant > open BLOSUM62" > > The cause for my problem was that I had no BLASTDIR set. Now that this > path is on my PATH-environment variable all is fine, although my > ncbi.ini points to the data-directory in a UNIX-like format. > > Regards, > Harald > > > >Torsten's right. This is what I have for BLASTDIR: > > > >C:\Documents and Settings\Administrator>echo %BLASTDIR% > >C:\Research\blast > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Mar 2 12:07:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 11:07:09 -0600 Subject: [Bioperl-l] WGS sequences through Bio::DB::GenBank In-Reply-To: <000301c63dae$5aca0270$15327e82@pyrimidine> Message-ID: <001701c63e1b$bdbbabf0$15327e82@pyrimidine> Brian, I working out some of the WGS subsequence parsing and it's actually pretty simple (much more so than CONTIG). The WGS tag just gives the sequence range and the WGS_SCAFLD tag is a list of scaffolds, each which can be chromosomal (CM*) supercontigs or smaller subchromosomal chunks (I think, CH*) and is a contig of shorter sequences. Essentially, CM* files are contigs of CH* files which are contigs of the base WGS files. In many cases there are only WGS files (no scaffolds), while a few a have chromosomal scaffolds and a smaller number have multiple scaffold types. I'm starting simple (WGS files only) and working my way to the more complex types before I commit anything. An issue I can foresee is many WGS file ranges are huge (O. sativa WGS master file lists WGS ~52000 subfiles, 12 chromosomal supercontigs, ~3000 subchromosomal scaffold contigs). So, which to chose from, or set a default (I'm guessing largest, using recursion to piece everything else together)? We'll also run into an issue with the max # of ids for many of these. Also, in relation to the contig; I found this blurb in the eutils document in NCBI short courses (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.sample-app s) which I found very interesting: Application 4: Downloading Contigs I want to download a flatfile with the full sequence of an assembly (eg. a contig). Solution: Use EFetch with &rettype=gbwithparts URL:efetch.fcgi?db=nucleotide&id=27479347&rettype=gbwithparts I tried it out and it works well. Should we be using this for contig building instead of the loop built into NCBIHelper? It seems much more direct/quicker. I really haven't tried messing with it until I have WGS figured out. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, March 01, 2006 10:04 PM > To: 'Brian Osborne'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] WGS sequences through Bio::DB::GenBank > > Thanks, Brian. I was actually typing this up when you responded. > > Okay, to answer my own question somewhat (and to confirm your answer), > there > IS no direct way; efetch doesn't complete these files, so the best way is > with a query. I'm posting this so anybody searching the mail list with > the > same question will maybe find this. The NCBI help desk basically told me > to > use a query like so: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&te > rm > =AAOH00000000[accn]+AND+wgs_contig[prop] > > which needs to be parsed for the individual contigs. I tried the same > query > using Bio::DB::Query::GenBank and got it to work. > > As for NCBIHelper, I'll give it a look and try adding this in but it won't > be until next week. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Brian Osborne [mailto:osborne1 at optonline.net] > > Sent: Wednesday, March 01, 2006 9:55 PM > > To: Chris Fields > > Subject: Re: [Bioperl-l] WGS sequences through Bio::DB::GenBank > > > > Chris, > > > > No, NCBIHelper.pm doesn't handle the WGS block, presumably this is where > > it > > should be coded. The approach would be very similar to that used for the > > CONTIG block, piece the sequence together by retrieving the CONTIG > > information specified by the WGS_SCAFLD entries. > > > > Brian O. > > > > > > On 2/28/06 9:41 PM, "Chris Fields" wrote: > > > > > I know that a recent post showed that you could retrieve CONTIG > > sequences > > > from GenBank files fairly easily: > > > > > > http://bioperl.org/pipermail/bioperl-l/2006-February/020891.html > > > > > > I'm driving myself a bit buggy looking for this, and I may be blind to > > it, > > > but can the same be done with WGS files? I've tried Bio::DB::GenBank > > and a > > > few other Bio::DB* modules to see if it's been implemented but haven't > > had > > > any luck yet. I may try getting around it using > > Bio::DB::Query::GenBank, > > > but just trying to find a more direct route. > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From haralds_listen at gmx.de Thu Mar 2 15:48:12 2006 From: haralds_listen at gmx.de (Harald) Date: Thu, 02 Mar 2006 21:48:12 +0100 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <001601c63e15$42c722e0$15327e82@pyrimidine> References: <001601c63e15$42c722e0$15327e82@pyrimidine> Message-ID: <44075A0C.6050504@gmx.de> Good idea! By the way, the instructions are the same for Windows 2000 and the SET command is also present under NT4 and Windows 2000. And I would skip the sentences "Some versions of Windows may have problems differentiating forward and back slashes used for directories. In general, always use backslashes (\). If something isn't working properly try reversing the slashes to see if it helps.". This is not true. In Windows the '\' is the one and only directory separator known from the system. On Unix/Linux it is '/' I can not add it to the Wiki entry because, I have no account ( and doubt that a newbie like me could be of any further help :-/ ). But this BLASTDIR variable confuses me. I have not set this variable, I have just set the bin-directory of my blast installation into the PATH environment variable. And my blastall and blastpgp seem to work probably with BioPerl. Besides, the info given in http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#BLAST differs from my experience. The only installing instructions I found reside in the doc-directory of my blast 2.2.13 install and there is no file called README.bls. And the problem with the docs is, that it is not mentioned (as far as I can see), that you have to put the bin-directory onto the PATH variable. This is bad, since copying e.g. blastall.exe into the current working directory (thus circumventing the need of the PATH variable to start blastall) leads to the problem I mentioned initially. Is this BLASTDIR only relevant for BioPerl-1.5 ? (I am using 1.4) Regards, Harald From cjfields at uiuc.edu Thu Mar 2 17:58:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 16:58:31 -0600 Subject: [Bioperl-l] newbie tries StandAloneBlast and receives "cant open BLOSUM62" In-Reply-To: <44075A0C.6050504@gmx.de> Message-ID: <000301c63e4c$d38e6900$15327e82@pyrimidine> > -----Original Message----- > From: Harald [mailto:haralds_listen at gmx.de] > Sent: Thursday, March 02, 2006 2:48 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] newbie tries StandAloneBlast and receives "cant > open BLOSUM62" > > Good idea! > > By the way, the instructions are the same for Windows 2000 and the SET > command is also present under NT4 and Windows 2000. And I would skip the > sentences "Some versions of Windows may have problems differentiating > forward and back slashes used for directories. In general, always use > backslashes (\). If something isn't working properly try reversing the > slashes to see if it helps.". This is not true. In Windows the '\' is > the one and only directory separator known from the system. On > Unix/Linux it is '/' Really it's differences in the way certain programs look for a file. For instance, you can use either \ or / when setting PERL5LIB (I have done this w/o a problem). You also mention that the ncbi.ini you set has Unix slashes and the executable works fine so ''something'' is working. According to the blast docs it needs this file to be set to the data directory. > I can not add it to the Wiki entry because, I have no account ( and > doubt that a newbie like me could be of any further help :-/ ). Not true. A program (or suite of modules) is only as good as its documentation. If it's wrong it should be corrected! > But this BLASTDIR variable confuses me. I have not set this variable, I > have just set the bin-directory of my blast installation into the PATH > environment variable. And my blastall and blastpgp seem to work probably > with BioPerl. As for BLASTDIR, 'perldoc Bio::Tools::Run::StandAloneBlast': ... Before running StandAloneBlast it is necessary: to install BLAST on your system, to edit set the environmental variable $BLASTDIR or your $PATH variable to point to the BLAST directory, and to ensure that users have execute privileges for the BLAST program. If the databases which will be searched by BLAST are located in the data subdirectory of the blast program directory (the default installation location), StandAloneBlast will find them; however, if the database files are located in any other location, environmental variable $BLASTDATADIR will need to be set to point to that directory. ... AFAIK, the BLASTDIR variable is used mainly when the data directory lies elsewhere. If the blast executables, matrices, and database files are all in separate directories (somewhat common with shared directories) then BLASTDATADIR and BLASTMAT should be set to reflect where everything is. I believe that all of these are necessary for StandAloneBlast to work under these particular conditions, which are all UNIX, BTW. That doesn't mean the docs are up to date or don't quite reflect what happens with Windows though; this is a new area for me as well (I just update the wiki if I find something that needs to be added). In fact, looking through the mail list archives, it looks like most people use Cygwin when using StandAloneBlast, so I may have jumped the gun when I used this as an example of setting env. variables. You could try testing out whether using the environment variables works with Windows by setting them. I may try it tomorrow to see what happens. > Besides, the info given in > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#BLAST differs > from my experience. The only installing instructions I found reside in > the doc-directory of my blast 2.2.13 install and there is no file called > README.bls. And the problem with the docs is, that it is not mentioned Ah, an area in which someone could correct a wrong! ;} > (as far as I can see), that you have to put the bin-directory onto the > PATH variable. This is bad, since copying e.g. blastall.exe into the > current working directory (thus circumventing the need of the PATH > variable to start blastall) leads to the problem I mentioned initially. > > Is this BLASTDIR only relevant for BioPerl-1.5 ? (I am using 1.4) The changes that you mention regard the ncbi.ini file. This tells the blastall executable the directory for databases and matrices, somewhat like BLASTDIR. BLASTDIR, on the other hand, helps StandAloneBlast.pm find the blast executables and data directory. Two different things. It may make no difference when using Windows though. I'll let you know soon; I'm also working on other things at the moment so I have my hands a bit full. As for using bioperl 1.4, you really should upgrade; bioperl 1.4 was released Dec. 2003 and many bugfixes have been added since, particularly ones that may affect blast parsing and so on. There have been several bugfixes relevant to recent blast text output changes and added xml support. > Regards, > Harald Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From admin at unleashedinformatics.com Thu Mar 2 18:14:05 2006 From: admin at unleashedinformatics.com (Unleashed Informatics Administration) Date: Thu, 02 Mar 2006 18:14:05 -0500 Subject: [Bioperl-l] SeqHound User Support Message-ID: <44077C3D.2040308@unleashedinformatics.com> As announced on 2 March 2006, SeqHound has been replaced by "DogBox Online". From 3 April 2006, users of the SeqHound API will be required to provide their e-mail address when beginning a block of SeqHound (now DogBox Online) calls. Users who fail to provide a valid address will not have access to the API, and will not have access to user support. The following FAQ has been posted to the DogBox Online website. FAQ: Q. What is DogBox Online? DogBox Online is a powerful integrated data service for the life science community, and represents the new name for the SeqHound service offered by The Blueprint Initiative. The new service is located at: http://dogboxonline.unleashedinformatics.com Q. What happened to SeqHound? The SeqHound service is being phased out. Please change to DogBox Online. Q. How will DogBox Online differ? DogBox Online includes several new features previously available only to DogBox customers. Q. Will my use of the SeqHound API be affected? Yes. On 3 April 2006, Unleashed Informatics will require SeqHound users to submit a valid e-mail address as part of the initial SHoundInit call. If you were not using a SHoundInit call or similar in your code, ensure that you now begin your scripts with this call. Q. Can you give me an example? Using the Perl API, you need to begin your series of SeqHound queries like so: > use SeqHound; > SHoundInit('Program Name'); Replace the 'Program Name' text with your valid e-mail address: > use SeqHound; > SHoundInit('joe.bloggs at blogme.com'); To avoid disappointment, we recommend you change your scripts now to employ the new URL, http://dogboxonline.unleashedinformatics.com. Q. What happens if I don't provide a valid email address? Your use of the API will be blocked until you do so. Q. Why is this change happening? 1. To obtain feedback from users regarding API use and improvement. 2. To notify users of future developments and new features. Q. Where do I sign up? https://secure.unleashedinformatics.com/index.php?pg=support.register Thank you for your co-operation. From jason at bioperl.org Thu Mar 2 09:22:08 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 2 Mar 2006 09:22:08 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <20060302053105.92192.qmail@web8713.mail.in.yahoo.com> References: <20060302053105.92192.qmail@web8713.mail.in.yahoo.com> Message-ID: <6F533D45-DDBD-4030-9802-25A4CAECAC8F@bioperl.org> you can just send your msg to the mailing list rather than emailing people who have contributed. Try seqboot from PHYLIP and read the instructions in the PHYLIP package. BioPerl isn't really a phylogenetics package, it is more focused on manipulating the alignment and tree data that goes in and comes out from these programs -jason On Mar 2, 2006, at 12:31 AM, Praveen Raj wrote: > Dear Sir, > > > I am Praveen Raj, doing a project as a part > of my Master degree(Bioinformatics) at > National Institute of Virology(Virus Research Centre > of Govt. of India). > > Sir, I have one question in BioPerl. I have generated > a Phylogeney tree object from a Clustalw alignment > object(SimpleAlign) using the method 'make_tree()' of > 'Bio::Tree::DistanceFactory'. > > The tree is successfull and I have generated the image > also. > > Sir, Now the problem is , I want to generate a > boot-strapped tree from the SimpleAlign > object.If the tree is boot-strapped, it is easy to > measure the reliability of the tree. > > Sir, How can I generate such a tree with bootstapping > values in the node? > > Hope you u got the problem. > > Sir, I am waiting for a great advice from you. > > Thanking you, > Praveen Raj, > roject Student, > National Institute of Virology, > Pune, INDIA. > > > > __________________________________________________________ > Yahoo! India Matrimony: Find your partner now. Go to http:// > yahoo.shaadi.com -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Thu Mar 2 10:51:49 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Thu, 02 Mar 2006 07:51:49 -0800 Subject: [Bioperl-l] Problem with bp_pg_bulk_load_gff In-Reply-To: <1141270005.8373.107.camel@localhost.localdomain> Message-ID: Many tx Scott-- I created the environment variable $TMPDIR pointing to a directory in my home by: export TMPDIR=$HOME/Library/tmp I included that line in my .bash_profile Now when I run: $ bp_pg_bulk_load_gff.pl -d chr4 -c dmel-4-r4.2.1.gff Everything is perfect Many thanks Marco On 3/1/06 7:26 PM, "Scott Cain" wrote: > Hi Marco, > > Please always send questions about bioperl to both the bioperl list and > the author so that the questions (and answers) can be archived. > > The Postgres bulk loader tries to use a temporary directory to write > files to; it tries to use these directories in order: > > $TMPDIR (environment variables) > $TMP > /usr/tmp > > So if you don't have either TMPDIR or TMP environment variables set, it > will try to use /usr/tmp. So, to fix this, either set one of those > variables, or create /usr/tmp and make it world readable and writable. > > Scott > > > On Wed, 2006-03-01 at 16:23 -0800, Marco Blanchette wrote: >> Scott-- >> >> I am trying to use the pg_bulk_load_gff.pl script you wrote without >> any success. I have PostgreSQL up and running and I can use: >> $ bp_load_gff.pl --adaptor dbi::pg -d chr4 --create >> dmel-4-r4.2.1.gff >> to load a gff file to the chr4 PostgresSQL database. >> >> Here is the output I get when I run your script: >> >> $ bp_pg_bulk_load_gff.pl -d chr4 dmel-4-r4.2.1.gff >> >> This operation will delete all existing data in database chr4. >> Continue? y >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_fdna" for table "fdna" >> NOTICE: CREATE TABLE will create implicit sequence "fdata_fid_seq" >> for serial column "fdata.fid" >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_fdata" for table "fdata" >> NOTICE: CREATE TABLE will create implicit sequence >> "fattribute_fattribute_id_seq" for serial column >> "fattribute.fattribute_id" >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_fattribute" for table "fattribute" >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_fmeta" for table "fmeta" >> NOTICE: CREATE TABLE will create implicit sequence >> "ftype_ftypeid_seq" for serial column "ftype.ftypeid" >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_ftype" for table "ftype" >> NOTICE: CREATE TABLE / UNIQUE will create implicit index >> "ftype_ftype" for table "ftype" >> NOTICE: CREATE TABLE will create implicit sequence "fgroup_gid_seq" >> for serial column "fgroup.gid" >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "pk_fgroup" for table "fgroup" >> fdata: No such file or directory at /usr/bin/bp_pg_bulk_load_gff.pl >> line 213. >> >> Somehow it breaks at line 213 within the foreach block: >> >> 212 foreach (@files) { >> 213 $FH{$_} = IO::File->new("$tmpdir/$_.$$",">") or die $_,": $!"; >> 214 $FH{$_}->autoflush; >> 215 } >> >> Any idea what?s the problem?? >> >> Many tx >> >> Marco >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 >> -- >> Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From admin at unleashedinformatics.com Thu Mar 2 18:08:41 2006 From: admin at unleashedinformatics.com (Unleashed Informatics Administration) Date: Thu, 02 Mar 2006 18:08:41 -0500 Subject: [Bioperl-l] Unleashed Informatics Supports DogBox Online Community Message-ID: <44077AF9.9080503@unleashedinformatics.com> In December 2005, Unleashed Informatics acquired commercial rights to Blueprint Initiative intellectual property from Mount Sinai Hospital. Spun-off from The Blueprint Initiative public research program at Toronto?s Mount Sinai Hospital, Unleashed Informatics provides integrated hardware and software products designed to harness the power of increasingly complex scientific data. On 22 February, Unleashed Informatics released DogBox Online as an open access product to the life science community. DogBox Online is an integrated, online data retrieval service and represents the new, re-named SeqHound service previously offered by the Blueprint Initiative. The new service is located at http://dogboxonline.unleashedinformatics.com, and requires a free Unleashed Informatics account for unrestricted access. The DogBox Online registration process will help Unleashed Informatics better understand the resource user base, and ultimately help us improve our open access offerings in line with the needs of the life sciences community. Importantly, the collection of such user feedback is essential for the preparation of planned public good research grant applications aimed at funding the ongoing provision of open source and freely available bioinformatics resources. Unleashed Informatics is making a concerted effort to develop, maintain and improve open access resources for global researchers. The release this past week of the freely accessible DogBox Online reaffirms the company?s commitment to open access resources. Specific support documentation for new DogBox Online service can be found in the Help section. From cjfields at uiuc.edu Thu Mar 2 20:20:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Mar 2006 19:20:35 -0600 Subject: [Bioperl-l] Unleashed Informatics Supports DogBox Online Community In-Reply-To: <44077AF9.9080503@unleashedinformatics.com> References: <44077AF9.9080503@unleashedinformatics.com> Message-ID: Ha! Looks like Bio::DB::Seqhound will need some fixing very soon! Anybody using this module currently? On Mar 2, 2006, at 5:08 PM, Unleashed Informatics Administration wrote: > In December 2005, Unleashed Informatics acquired commercial rights to > Blueprint Initiative intellectual property from Mount Sinai Hospital. > > Spun-off from The Blueprint Initiative public research program at > Toronto?s Mount Sinai Hospital, Unleashed Informatics provides > integrated hardware and software products designed to harness the > power > of increasingly complex scientific data. > > On 22 February, Unleashed Informatics released DogBox Online as an > open > access product to the life science community. > > DogBox Online is an integrated, online data retrieval service and > represents the new, re-named SeqHound service previously offered by > the > Blueprint Initiative. > > The new service is located at > http://dogboxonline.unleashedinformatics.com, and requires a free > Unleashed Informatics account for unrestricted access. > > The DogBox Online registration process will help Unleashed Informatics > better understand the resource user base, and ultimately help us > improve > our open access offerings in line with the needs of the life sciences > community. > > Importantly, the collection of such user feedback is essential for the > preparation of planned public good research grant applications > aimed at > funding the ongoing provision of open source and freely available > bioinformatics resources. > > Unleashed Informatics is making a concerted effort to develop, > maintain > and improve open access resources for global researchers. The release > this past week of the freely accessible DogBox Online reaffirms the > company?s commitment to open access resources. > > Specific support documentation for new DogBox Online service can be > found in the Help section. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From admin at unleashedinformatics.com Thu Mar 2 18:05:35 2006 From: admin at unleashedinformatics.com (Unleashed Informatics Administration) Date: Thu, 02 Mar 2006 18:05:35 -0500 Subject: [Bioperl-l] SeqHound User Support Message-ID: <44077A3F.2030304@unleashedinformatics.com> As announced on 2 March 2006, SeqHound has been replaced by "DogBox Online". From 3 April 2006, users of the SeqHound API will be required to provide their e-mail address when beginning a block of SeqHound (now DogBox Online) calls. Users who fail to provide a valid address will not have access to the API, and will not have access to user support. The following FAQ has been posted to the DogBox Online website. FAQ: Q. What is DogBox Online? DogBox Online is a powerful integrated data service for the life science community, and represents the new name for the SeqHound service offered by The Blueprint Initiative. The new service is located at: http://dogboxonline.unleashedinformatics.com Q. What happened to SeqHound? The SeqHound service is being phased out. Please change to DogBox Online. Q. How will DogBox Online differ? DogBox Online includes several new features previously available only to DogBox customers. Q. Will my use of the SeqHound API be affected? Yes. On 3 April 2006, Unleashed Informatics will require SeqHound users to submit a valid e-mail address as part of the initial SHoundInit call. If you were not using a SHoundInit call or similar in your code, ensure that you now begin your scripts with this call. Q. Can you give me an example? Using the Perl API, you need to begin your series of SeqHound queries like so: > use SeqHound; > SHoundInit('Program Name'); Replace the 'Program Name' text with your valid e-mail address: > use SeqHound; > SHoundInit('joe.bloggs at blogme.com'); To avoid disappointment, we recommend you change your scripts now to employ the new URL, http://dogboxonline.unleashedinformatics.com. Q. What happens if I don't provide a valid email address? Your use of the API will be blocked until you do so. Q. Why is this change happening? 1. To obtain feedback from users regarding API use and improvement. 2. To notify users of future developments and new features. Q. Where do I sign up? https://secure.unleashedinformatics.com/index.php?pg=support.register Thank you for your co-operation. From rvosa at sfu.ca Fri Mar 3 02:10:36 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Thu, 02 Mar 2006 23:10:36 -0800 Subject: [Bioperl-l] Substitution models Message-ID: <4407EBEC.7060901@sfu.ca> Hi all, unless I've missed it, there doesn't seem to be an object for nucleotide substitution models. I'd like to implement one. I've attached an example implementation (which needs further methods, tests and pod, obviously). I'd like to solicit the input of the bioperl-architecture gurus on whether the general design of this sample is sane. The idea is that you would use this to serialize model descriptions between e.g. MrBayes and Paup (which describe the same concepts, but with slightly different syntax). Best wishes, Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: NucSubstModel.pm Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060302/6d0ad77e/attachment-0001.pl From jason.stajich at duke.edu Fri Mar 3 10:41:33 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 3 Mar 2006 10:41:33 -0500 Subject: [Bioperl-l] Substitution models In-Reply-To: <4407EBEC.7060901@sfu.ca> References: <4407EBEC.7060901@sfu.ca> Message-ID: Great - this can probably handle a better object representation of rates we are parsing out of PAML too, not sure the same level of detail is available there. I think that Bio::Phylo is already used by another perl project or I'd suggest we start that directory, but we'll have to start to put some more of that stuff into a good namespace. The simple summary stats distance models are implemented in Bio::Align::DNAStatistics since they are pairwise distances from alignments but we may want to unify more of this into a Phylo/MolEvol namespace at some point. I'm happy for someone else to spearhead if they are interested.... -jason On Mar 3, 2006, at 2:10 AM, Rutger Vos wrote: > Hi all, > > unless I've missed it, there doesn't seem to be an object for > nucleotide substitution models. I'd like to implement one. I've > attached an example implementation (which needs further methods, > tests and pod, obviously). I'd like to solicit the input of the > bioperl-architecture gurus on whether the general design of this > sample is sane. > > The idea is that you would use this to serialize model descriptions > between e.g. MrBayes and Paup (which describe the same concepts, > but with slightly different syntax). > > Best wishes, > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > package NucSubstModel; > use strict; > use warnings; > use List::Util qw(sum); > use Bio::Root::Root; > use Bio::Tools::IUPAC; > use vars qw(@ISA); > @ISA = qw(Bio::Root::Root); > > sub new { > my $class = shift; > my $self = { > 'matrix' => [ > # TO: A C G T > [ 0.25, 0.25, 0.25, 0.25 ], # FROM: A > [ 0.25, 0.25, 0.25, 0.25 ], # FROM: C > [ 0.25, 0.25, 0.25, 0.25 ], # FROM: G > [ 0.25, 0.25, 0.25, 0.25 ], # FROM: T > ], > 'transl' => { > 'A' => { 'id' => 0, 'freq' => 0.25 }, # ids are for > matrix lookup > 'C' => { 'id' => 1, 'freq' => 0.25 }, > 'G' => { 'id' => 2, 'freq' => 0.25 }, > 'T' => { 'id' => 3, 'freq' => 0.25 }, > }, > 'gshape' => 0, # gamma shap parameter alpha > 'pinvar' => 0, # proportion of invariant sites > 'mu' => 1, # mutation rate paramter mu > }; > bless $self, $class; > return $self; > } > > # gets/sets nucleotide frequencies, one at a time. Adjusts matrix. > # example: $model->freq( 'A' ) returns 0.25 > # example: $model->freq( 'N' ) returns 1 > # example: $model->freq( 'A' => 0.5 ) scales C, G, T downward. > # does not adjust when ambiguous symbols used, e.g. > # $model->freq( 'N' => 0.5 ) doesn't work > sub freq { > my ( $self, $nuc, $freq ) = ( $_[0], uc $_[1], $_[2] ); > if ( exists $Bio::Tools::IUPAC::IUB{$nuc} ) { > if ( exists $self->{'transl'}->{$nuc} ) { > if ( $freq and $freq > 0 and $freq < 1 ) { > my %tmp = ( $nuc => 1 ); > my $diff = $self->{'transl'}->{$nuc}->{'freq'} - > $freq; > $self->{'transl'}->{$nuc}->{'freq'} = $freq; > for ( 'A', 'C', 'G', 'T' ) { > next if $tmp{$_}; > $self->{'transl'}->{$_}->{'freq'} += ( $diff / > 3 ); > } > } > elsif ( $freq and ( $freq <= 0 or $freq >= 1 ) ) { > $self->throw( "Frequency must be between 0 and 1" ); > } > return $self->{'transl'}->{$nuc}->{'freq'}; > } > else { > my $disambiguated_freq = 0; > foreach ( @{ $Bio::Tools::IUPAC::IUB{$nuc} } ) { > $disambiguated_freq += $self->{'transl'}->{$_}-> > {'freq'}; > } > return $disambiguated_freq; > } > } > else { > $self->throw( "Nucleotide \"$nuc\" does not exist" ); > } > } > > # gets/sets substitution probabilities. > # example: $model->prob( 'A' => 'C' ) returns 0.25 > # example: $model->prob( 'A' => 'N' ) returns 1 > # example: $model->prob( 'A' => 'C', 0.5 ) scales others accordingly > # does not adjust when ambiguous symbols used, e.g. > # $model->prob( 'A' => 'N', 0.5 ) doesn't work > sub prob { > my ( $self, $from, $to, $p ) = ( $_[0], uc $_[1], uc $_[2], $_ > [3] ); > if ( exists $self->{'transl'}{$from} and exists $self-> > {'transl'}{$to} ) { > my ( $i, $j ) = ( $self->{'transl'}{$from}{'id'}, $self-> > {'transl'}{$to}{'id'} ); > if ( $p and $p > 0 and $p < 1 ) { > my $mat = $self->{'matrix'}; > my $scale = $mat->[$i][$j] - $p; > $mat->[$i][$j] = $p; > for my $f ( 0 .. 3 ) { > for my $t ( 0 .. 3 ) { > next if $f == $i and $t == $j; > $mat->[$f][$t] += ( $scale / 3 ) if $f == $i or > $t == $j; > } > } > for my $f ( 0 .. 3 ) { > next if $f == $i; > my $rowscale = 1 - sum @{ $mat->[$f] }; > for my $t ( 0 .. 3 ) { > next if $t == $j; > $mat->[$f][$t] += ( $rowscale / 3 ); > } > } > } > elsif ( $p and ( $p <= 0 or $p >= 1 ) ) { > $self->throw( "Probability must be between 0 and 1" ); > } > return $self->{'matrix'}[$i][$j]; > } > else { > my $disambiguated_p = 0; > if ( exists $Bio::Tools::IUPAC::IUB{$from} and exists > $Bio::Tools::IUPAC::IUB{$to} ) { > my $from_array = $Bio::Tools::IUPAC::IUB{$from}; > my $to_array = $Bio::Tools::IUPAC::IUB{$to}; > foreach my $from_sym ( @{ $from_array } ) { > my $i = $self->{'transl'}{$from_sym}{'id'}; > foreach my $to_sym ( @{ $to_array } ) { > my $j = $self->{'transl'}{$to_sym}{'id'}; > $disambiguated_p += $self->{'matrix'}[$i][$j] > } > } > return $disambiguated_p / scalar @{ $from_array }; > } > else { > $self->throw( "Nucleotide \"$from\" and/or \"$to\" not > in IUPAC::IUB" ); > } > } > } > > # gets/sets mutation rate. > # example: $model->mu() returns 1; > # example: $model->mu( 0.5 ) doubles diagonal, scales others > accordingly > sub mu { > my ( $self, $mu ) = @_; > if ( $mu ) { > $mu = 1 / $mu; > my $scale = $mu - $self->{'mu'}; > my $mat = $self->{'matrix'}; > for my $i ( 0 .. $#{ $mat } ) { > for my $j ( 0 .. $#{ $mat->[$i] } ) { > if ( $i == $j ) { > $mat->[$i][$j] = $mat->[$i][$j] + $mat->[$i] > [$j] * $scale; > } > else { > $mat->[$i][$j] = $mat->[$i][$j] - ( $mat->[$i] > [$j] * $scale ) / 3; > } > } > } > $self->{'mu'} = $mu; > } > return $self->{'mu'}; > } > > # gets/sets gamma shape parameter alpha > sub gshape { > my ( $self, $gshape ) = @_; > $self->{'gshape'} = $gshape if $gshape; > return $self->{'gshape'}; > } > > # gets/sets proportion of invariant sites > sub pinvar { > my ( $self, $pinvar ) = @_; > $self->{'pinvar'} = $pinvar if $pinvar; > return $self->{'pinvar'}; > } > > # other methods: raw transition matrix input, raw frequency input, > export to > # and import from MrBayes/Paup/whatever model syntax, also needs > pod and tests > > 1;_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Fri Mar 3 14:52:11 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Fri, 03 Mar 2006 11:52:11 -0800 Subject: [Bioperl-l] Feature extraction using Bio::DB::GFF Message-ID: Dear all-- I have been trying to find a way to rapidly retrieve feature data from genes to pass to Bio::Graphics using a local version of the drosophila gff annotation loaded on a PostgreSQL database using the bp_pg_bulk_load_gff.pl script and the Bio::DB::GFF module. My goal is to pass to a script a series of gene id and get a graphical representation of the exon-intron gene structure where I could add additional tracts of information (custom micro-array probes position, binding sites for protein etc...). My current problem is that I can't seems to be able to retrieve the group of features associated with a given gene. I've been experimenting with simple script like: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; # Open the sequence database my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg', -dsn => 'dbi:Pg:dbname=chr4', -aggregator => ['processed_transcript'], ); my $feats = $db->features( -types => 'processed_transcript', -merge => 1, -iterator => 1, ); while (my $feat = $feats -> next_seq() ){ print "$feat\n"; print $feat -> sub_SeqFeature(), "\n"; } Without any success. Any help to get me started would be greatly appreciated. Marco ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From saldroubi at yahoo.com Fri Mar 3 15:15:59 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 3 Mar 2006 12:15:59 -0800 (PST) Subject: [Bioperl-l] How to print a generic matrix? Message-ID: <20060303201559.20973.qmail@web34306.mail.mud.yahoo.com> Hi Everyone, I am using the using Bio::Matrix::Generic module to create my own matrices (not reading from a file) but I am having troubles printing them with Bio::Matrix::IO module. I tried using both write_matrix and print function without success. I get the following error when I use print. saldroubi at ude:~/Documents/p2020> perl matrix_print.pl Can't locate object method "write_tree" via package "Bio::Matrix::IO::scoring" at /usr/lib/perl5/site_perl/5.8.6 /Bio/Matrix/IO.pm line 262. The sample code is below. Ideally I want to print to STDOUT not to a file but I was trying to follow the example in the documentation. Any idea what I am doing wrong. I using bioperl 1.5.1. Of course, I could write my own print function but why should I reinvent the wheel. Thank you. ------------------------------- use strict; use Bio::Matrix::Generic; use Bio::Matrix::IO; my $raw = [ [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ] ]; my $matrix = new Bio::Matrix::Generic(-values => $raw, -matrix_id => "my_matrix_id", \n \n -matrix_name\u003d> "my_matrix_name", \n \n \n -rownames \u003d> [qw(A C T\nG)], \n \n \n -colnames \u003d> [qw(1 2 3 4\n5 6 7 )]); \n \n \n \nmy $fh \u003d Bio::Matrix::IO->newFh(-file\u003d>\'>my_file.txt\',-format\u003d>\'scoring\'); \n \nprint $fh $matrix; # write a matrix object \n -- Sincerely, Sam Al-Droubi, M.S. \n\n ",0] ); D(["ce"]); //--> -matrix_name=> "my_matrix_name", -rownames => [qw(A C T G)], -colnames => [qw(1 2 3 4 5 6 7 )]); my $fh = Bio::Matrix::IO->newFh(-file=>'>my_file.txt',-format=>'scoring'); print $fh $matrix; # write a matrix object Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From Marc.Logghe at DEVGEN.com Fri Mar 3 16:30:40 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 3 Mar 2006 22:30:40 +0100 Subject: [Bioperl-l] Feature extraction using Bio::DB::GFF Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B87@ANTARESIA.be.devgen.com> > > my $feats = $db->features( -types => 'processed_transcript', > -merge => 1, > -iterator => 1, > ); > > while (my $feat = $feats -> next_seq() ){ > print "$feat\n"; > print $feat -> sub_SeqFeature(), "\n"; > > } > Hi Marco, It is because you are calling $db->features in scalar context. In that way $feats contains the number of returned feature objects and not the objects themselves. You should do something like: my @feats = $db->features( -types => 'processed_transcript', -merge => 1, -iterator => 1, ); foreach my $feat (@feats){ # do something with $feat } HTH, Marc From rvosa at sfu.ca Fri Mar 3 16:47:51 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Fri, 03 Mar 2006 13:47:51 -0800 Subject: [Bioperl-l] bioperl API conventions Message-ID: <4408B987.1060107@sfu.ca> Hi all, I've noticed that many bioperl objects don't separate getters from setters, e.g.: my $branchlength = $node->branchlength; # now it's a getter $node->branchlength($branchlength); # now it's a setter Is this approach carved in stone? Or could one contribute objects to the project that do: my $branchlength = $node->get_branchlength; $node->set_branchlength($branchlength); To me (and, apparently, to Damian Conway, see "perl best practices" :) the latter approach is better, as it takes away some ambiguity, especially w.r.t setting false-but-defined values (bugs could emerge where arguments are erroneously tested for truth rather than definedness) and resetting fields to undef. Also, is it okay to have separate getters and setters in your own objects, but implement interfaces that do the combined get/setter thing using aliasing? Thanks, Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From swansonj at email.arizona.edu Fri Mar 3 16:40:09 2006 From: swansonj at email.arizona.edu (Jordan Mark Swanson) Date: Fri, 03 Mar 2006 14:40:09 -0700 Subject: [Bioperl-l] How to print a generic matrix? In-Reply-To: <20060303211821.1279.qmail@web34314.mail.mud.yahoo.com> References: <20060303211821.1279.qmail@web34314.mail.mud.yahoo.com> Message-ID: <1141422009.15496.255782840@webmail.messagingengine.com> On Fri, 3 Mar 2006 13:18:20 -0800 (PST), "Sam Al-Droubi" said: > Jordan, > > Thank you for your responss. I relatively new to informatics and I > don't know what the different matrix formats are. Where is this > documented. Would be a good idea for me to complete these functions > and submit them to be inculded in MatrixIO? I haven't used the modules we are discussing, but to find information about them I changed to the directory Bio/Matrix/IO and noted which files were in there. It appears that there is a scoring matrix and a phylip matrix. The phylip matrix seems to have an output method written, but I don't know if it would be appropriate for your needs. Perhaps someone else could help you more. To find different module formats for Bio::Matrix::IO : [jswanson at localhost bioperl-live]$ cd Bio/Matrix/IO [jswanson at localhost IO]$ ls CVS/ phylip.pm scoring.pm This is analagous to what you would do to find SeqIO modules : ... [jswanson at localhost bioperl-live]$ cd Bio/SeqIO [jswanson at localhost SeqIO]$ ls abi.pm bsml.pm ctf.pm fasta.pm gcg.pm locuslink.pm pln.pm tab.pm ztr.pm ace.pm bsml_sax.pm CVS/ fastq.pm genbank.pm metafasta.pm qual.pm tigr.pm agave.pm chadoxml.pm embl.pm FTHelper.pm interpro.pm MultiFile.pm raw.pm tigrxml.pm alf.pm chaos.pm entrezgene.pm game/ kegg.pm phd.pm scf.pm tinyseq/ asciitree.pm chaosxml.pm exp.pm game.pm largefasta.pm pir.pm swiss.pm tinyseq.pm -- Jordan Swanson swansonj at email.arizona.edu Genetics Graduate Interdisciplinary Program University of Arizona From cjfields at uiuc.edu Fri Mar 3 17:19:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Mar 2006 16:19:00 -0600 Subject: [Bioperl-l] bioperl API conventions In-Reply-To: <4408B987.1060107@sfu.ca> Message-ID: <000001c63f10$78c57170$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Friday, March 03, 2006 3:48 PM > To: bioperl list > Subject: [Bioperl-l] bioperl API conventions > > Hi all, > > I've noticed that many bioperl objects don't separate getters from > setters, e.g.: > > my $branchlength = $node->branchlength; # now it's a getter > $node->branchlength($branchlength); # now it's a setter > > Is this approach carved in stone? Or could one contribute objects to the > project that do: > > my $branchlength = $node->get_branchlength; > $node->set_branchlength($branchlength); > > To me (and, apparently, to Damian Conway, see "perl best practices" :) > the latter approach is better, as it takes away some ambiguity, > especially w.r.t setting false-but-defined values (bugs could emerge > where arguments are erroneously tested for truth rather than > definedness) and resetting fields to undef. Careful, don't open up that can of worms. There was talk of this in the mail list archives. These threads cover much of the discussion; to get the gist, trace back to the beginning and read through. The second thread is more fun: http://portal.open-bio.org/pipermail/bioperl-l/2003-January/010863.html http://portal.open-bio.org/pipermail/bioperl-l/2003-December/014374.html The last one is from around the time bioperl 1.4 was released. I believe it is enforced somewhat but not religiously. I know AUTOLOAD is not supposed to be in code but I have seen it in core (Bio::Tools::Run::StandAloneBlast has it, I believe). I believe it has more to do with having similar methods in all the modules, and AUTOLOAD is avoided b/c explicit get/setters make understanding code a lot easier. I personally am looking forward to seeing how everybody here deals with Pugs/Perl6 (yet another can of worms); OOP in Perl6 looks quite different from Perl5! > Also, is it okay to have separate getters and setters in your own > objects, but implement interfaces that do the combined get/setter thing > using aliasing? > > Thanks, > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Mar 3 18:13:11 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 3 Mar 2006 15:13:11 -0800 Subject: [Bioperl-l] bioperl API conventions In-Reply-To: <4408B987.1060107@sfu.ca> References: <4408B987.1060107@sfu.ca> Message-ID: Using combined getter/setters to me is primarily a style issue, although the argument evaluation problem is a real one and in fact we've been there. Having said that, my point of view is basically, 1) we've learned the lesson so might as well continue now that we deal with it, 2) perl is not java (which is good and bad) and one might as well exploit the difference as opposed to ignoring it, and 3) consistent API design is better than inconsistent even if some of the 'new' flavors better concur with Damian's or anybody else's style guides. So, I like your last suggestion: stick with the consistent combined getter/setter API for the interface and so long as you implement the interface your implementation can delegate to whatever style model you want. -hilmar On 3/3/06, Rutger Vos wrote: > Hi all, > > I've noticed that many bioperl objects don't separate getters from > setters, e.g.: > > my $branchlength = $node->branchlength; # now it's a getter > $node->branchlength($branchlength); # now it's a setter > > Is this approach carved in stone? Or could one contribute objects to the > project that do: > > my $branchlength = $node->get_branchlength; > $node->set_branchlength($branchlength); > > To me (and, apparently, to Damian Conway, see "perl best practices" :) > the latter approach is better, as it takes away some ambiguity, > especially w.r.t setting false-but-defined values (bugs could emerge > where arguments are erroneously tested for truth rather than > definedness) and resetting fields to undef. > > Also, is it okay to have separate getters and setters in your own > objects, but implement interfaces that do the combined get/setter thing > using aliasing? > > Thanks, > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From saldroubi at gmail.com Fri Mar 3 11:11:04 2006 From: saldroubi at gmail.com (Sam Al-Droubi) Date: Fri, 3 Mar 2006 11:11:04 -0500 Subject: [Bioperl-l] How to print a generic matrix? Message-ID: Hi Everyone, I am using the using Bio::Matrix::Generic module to create my own matrices (not reading from a file) but I am having troubles printing them with Bio::Matrix::IO module. I tried using both write_matrix and print function without success. I get the following error when I use print. saldroubi at ude:~/Documents/p2020> perl matrix_print.pl Can't locate object method "write_tree" via package "Bio::Matrix::IO::scoring" at /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix/IO.pm line 262. The sample code is below. Ideally I want to print to STDOUT not to a file but I was trying to follow the example in the documentation. Any idea what I am doing wrong. I using bioperl 1.5.1. Of course, I could write my own print function but should I reinvent the wheel. Thank you. ------------------------------- use strict; use Bio::Matrix::Generic; use Bio::Matrix::IO; my $raw = [ [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 0 ] ]; my $matrix = new Bio::Matrix::Generic(-values => $raw, -matrix_id => "my_matrix_id", -matrix_name=> "my_matrix_name", -rownames => [qw(A C T G)], -colnames => [qw(1 2 3 4 5 6 7 )]); my $fh = Bio::Matrix::IO->newFh(-file=>'>my_file.txt',-format=>'scoring'); print $fh $matrix; # write a matrix object -- Sincerely, Sam Al-Droubi, M.S. From torsten.seemann at infotech.monash.edu.au Sat Mar 4 06:22:18 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 04 Mar 2006 22:22:18 +1100 Subject: [Bioperl-l] seq_word and pattern counts In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08523@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D08523@NIHCESMLBX6.nih.gov> Message-ID: <4409786A.1070306@infotech.monash.edu.au> Staffa, Nick (NIH/NIEHS) [C] wrote: > You asked: > Did this work? Yes I did; I was hoping the thread in bioperl-l would have a happy ending that others could benefit from :-) And I've never used the module myself. > gave this sort of result. > Maybe the problem is all the NNNNNN that's in the mouse chromosome, > or maybe I don't have Bio::Seq object but a Bio::DB object and wouldn't that be a bummer if they weren't compatible? I see: http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Tools/RestrictionEnzyme.html#CODE8 Argument : Reference to a Bio::PrimarySeq.pm-derived object. I guess the Bio::DB objects are Bio::SeqI derived, not Bio::PrimarySeqI derived, so incompatible. Perhaps convert your $subseq to a PrimarySeq before passing it to cut_seq, something like $subseq2 = Bio::PrimarySeq->new(-id=>'????', seq=>$subseq->seq); ? And yes, your pattern "TT^CGAA" will naturally match MANY times in sections containing lots of Ns. I guess you'll need to mask them out somehow. > gir.niehs.nih.gov> Torsten.pl TT^CGAA > mm_ref_chr10.fa 83274084 > filename=mm_ref_chr10.fa sequenceID=83274084 > start=0 window=10003 string=TT^CGAA overlap=10000 length=130066766 > out=chr10_TT^CGAA.count > > ------------- EXCEPTION ------------- > MSG: Can't cut sequence. Missing or invalid objectseqObj: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > ... > ... > ... > NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > STACK Bio::Tools::RestrictionEnzyme::cut_seq /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/RestrictionEnzyme.pm:1001 > STACK toplevel Torsten.pl:42 -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From jay at jays.net Sat Mar 4 18:52:16 2006 From: jay at jays.net (Jay Hannah) Date: Sat, 04 Mar 2006 17:52:16 -0600 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2? Message-ID: <440A2830.8080805@jays.net> Greetings -- I'm trying to load my bioperl-db database (mySQL on Linux). I seem to be able to load 1 sequence at a time, but whenever I try to load the 2nd sequence, if fails with the output below... Here's my script: --------------------------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::DB::BioDB; my $db = Bio::DB::BioDB->new( -database => "biosql", -host => 'localhost', -port => 3306, -dbname => 'VIRUS', -driver => 'mysql', -user => 'dbastola', -pass => '-----------', ); my $file = "/home/dbastola/genbankSequences/GBVRL/gbvrl_2006_Jan/GB_Sequences/gbvrl1.seq"; my $infile = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $seq = $infile->next_seq(); my $species = $seq->species; print join " | ", $species->classification; print "\n"; my $pseq = $db->create_persistent($seq); $pseq->create() or die "create failed"; $pseq->commit; exit; ------------------------------------ Output: $ perl j2.pl Human adenovirus type 15 | Mastadenovirus | Adenoviridae | dsDNA viruses, no RNA stage | Viruses -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::BioNamespaceAdaptor (driver) failed, values were ("","") FKs () Duplicate entry '' for key 2 --------------------------------------------------- ------------- EXCEPTION ------------- MSG: create: object (Bio::DB::Persistent::BioNamespace) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.3/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.8.3/Bio/DB/Persistent/PersistentObject.pm:245 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.3/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:171 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.8.3/Bio/DB/Persistent/PersistentObject.pm:245 STACK toplevel j2.pl:26 -------------------------------------- At this point I can see my first sequence loaded into mySQL. biodatabase and biosequence have stuff in them. e.g.: mysql> select * from biodatabase; +----------------+------+-----------+-------------+ | biodatabase_id | name | authority | description | +----------------+------+-----------+-------------+ | 23 | | NULL | NULL | +----------------+------+-----------+-------------+ 1 row in set (0.00 sec) And I noticed that if I delete that sequence from the database, I can once again load it, and the number biodatabase_id (23) increments. 20, 21, 22, 23, etc. I'm trying to end up with a loop something like this: while (my $seq = $infile->next_seq()) { print " " . $seq->display_id . "\n"; my $adp = $db->get_object_adaptor($seq); my $lseq = $adp->find_by_unique_key($seq); if ($lseq) { print "Already loaded! Deleting.\n"; print " Primary key: " . $lseq->primary_key . "\n"; $lseq->remove; $lseq->commit; } print "Loading...\n"; my $pseq = $db->create_persistent($seq); $pseq->create or die "create failed"; $pseq->commit or die "commit failed"; $adp->commit; # ??? } But I can't seem to get more than 1 to load. Ever. Is the bioperl-db code not reading/incrementing the biodatabase_id correctly? Am I skipping a step that makes that increment occur? Am I messing up the PK/FK somehow? phpMyAdmin says that the Next AutoIndex for the bioentry table is 24... So that's good? Thanks, j Omaha Perl Mongers http://omaha.pm.org From cjfields at uiuc.edu Sat Mar 4 23:10:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 4 Mar 2006 22:10:44 -0600 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2? In-Reply-To: <440A2830.8080805@jays.net> References: <440A2830.8080805@jays.net> Message-ID: <0A7AA07F-6FC0-4F98-B729-DBE66AFE247B@uiuc.edu> Sorry if I'm a bit off (pub you know) but have you tried the bioperl- db script load_seqdatabase.pl (scripts dir)? Have you loaded taxonomy? Chris On Mar 4, 2006, at 5:52 PM, Jay Hannah wrote: > Greetings -- > > I'm trying to load my bioperl-db database (mySQL on Linux). I seem > to be able to load 1 sequence at a time, but whenever I try to load > the 2nd sequence, if fails with the output below... > > > Here's my script: > --------------------------------------- > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::DB::BioDB; > > my $db = Bio::DB::BioDB->new( > -database => "biosql", > -host => 'localhost', > -port => 3306, > -dbname => 'VIRUS', > -driver => 'mysql', > -user => 'dbastola', > -pass => '-----------', > ); > > my $file = "/home/dbastola/genbankSequences/GBVRL/gbvrl_2006_Jan/ > GB_Sequences/gbvrl1.seq"; > my $infile = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); > > my $seq = $infile->next_seq(); > my $species = $seq->species; > print join " | ", $species->classification; > print "\n"; > > my $pseq = $db->create_persistent($seq); > $pseq->create() or die "create failed"; > $pseq->commit; > exit; > ------------------------------------ > > > Output: > > $ perl j2.pl > Human adenovirus type 15 | Mastadenovirus | Adenoviridae | dsDNA > viruses, no RNA stage | Viruses > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::BioNamespaceAdaptor (driver) > failed, values were ("","") FKs () > Duplicate entry '' for key 2 > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::DB::Persistent::BioNamespace) failed to > insert or to be found by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > perl5/site_perl/5.8.3/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208 > STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/ > site_perl/5.8.3/Bio/DB/Persistent/PersistentObject.pm:245 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > perl5/site_perl/5.8.3/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:171 > STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/ > site_perl/5.8.3/Bio/DB/Persistent/PersistentObject.pm:245 > STACK toplevel j2.pl:26 > > -------------------------------------- > > > At this point I can see my first sequence loaded into mySQL. > biodatabase and biosequence have stuff in them. e.g.: > > > mysql> select * from biodatabase; > +----------------+------+-----------+-------------+ > | biodatabase_id | name | authority | description | > +----------------+------+-----------+-------------+ > | 23 | | NULL | NULL | > +----------------+------+-----------+-------------+ > 1 row in set (0.00 sec) > > > > And I noticed that if I delete that sequence from the database, I > can once again load it, and the number biodatabase_id (23) > increments. 20, 21, 22, 23, etc. > > I'm trying to end up with a loop something like this: > > while (my $seq = $infile->next_seq()) { > print " " . $seq->display_id . "\n"; > my $adp = $db->get_object_adaptor($seq); > my $lseq = $adp->find_by_unique_key($seq); > if ($lseq) { > print "Already loaded! Deleting.\n"; > print " Primary key: " . $lseq->primary_key . "\n"; > $lseq->remove; > $lseq->commit; > } > print "Loading...\n"; > my $pseq = $db->create_persistent($seq); > $pseq->create or die "create failed"; > $pseq->commit or die "commit failed"; > $adp->commit; # ??? > } > > But I can't seem to get more than 1 to load. Ever. > > Is the bioperl-db code not reading/incrementing the biodatabase_id > correctly? Am I skipping a step that makes that increment occur? Am > I messing up the PK/FK somehow? > > phpMyAdmin says that the Next AutoIndex for the bioentry table is > 24... So that's good? > > Thanks, > > j > Omaha Perl Mongers > http://omaha.pm.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Marc.Logghe at DEVGEN.com Sun Mar 5 09:26:26 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 5 Mar 2006 15:26:26 +0100 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry ''for key 2? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B88@ANTARESIA.be.devgen.com> Hi Jay, Yes, I agree with Chris. I also think you'd be better off with load_seqdatabase.pl. > > > > At this point I can see my first sequence loaded into mySQL. > > biodatabase and biosequence have stuff in them. e.g.: > > > > > > mysql> select * from biodatabase; > > +----------------+------+-----------+-------------+ > > | biodatabase_id | name | authority | description | > > +----------------+------+-----------+-------------+ > > | 23 | | NULL | NULL | > > +----------------+------+-----------+-------------+ > > 1 row in set (0.00 sec) > > > > > > > > And I noticed that if I delete that sequence from the database, I > > can once again load it, and the number biodatabase_id (23) > > increments. 20, 21, 22, 23, etc. BTW, here you actually did not delete your sequence but the namespace. If you want to check 'sequences' you should look into the bioentry table. Using load_seqdatabase.pl the namespace is set automatically to the default ('bioperl') but you can set it as well with the --namespace option. HTH, Marc From jay at jays.net Sun Mar 5 10:36:40 2006 From: jay at jays.net (Jay Hannah) Date: Sun, 05 Mar 2006 09:36:40 -0600 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2? In-Reply-To: <0A7AA07F-6FC0-4F98-B729-DBE66AFE247B@uiuc.edu> References: <440A2830.8080805@jays.net> <0A7AA07F-6FC0-4F98-B729-DBE66AFE247B@uiuc.edu> Message-ID: <440B0588.4060607@jays.net> Chris Fields wrote: > Sorry if I'm a bit off (pub you know) but have you tried the bioperl- db > script load_seqdatabase.pl (scripts dir)? I poked around in the scripts directory, but am trying to learn the guts well enough to roll my own since I have some point-and-click CGI interfacing in mind. (I'll be posting about the project to this list once we get our thoughts together). > Have you loaded taxonomy? No, I'm not familiar with that. I'll read up on it. Marc Logghe wrote: > Yes, I agree with Chris. I also think you'd be better off with > load_seqdatabase.pl. I'm sure I would be for general loading. I'm sure the scripts are far more robust than my little piecemeal stab at it, but I'm not sure I'll learn the guts if I just use scripts. Reading the code there are many nuances I don't understand so I'm trying to learn from the ground up, and I'm not sure what I'm doing wrong in my first baby steps. :) >>>mysql> select * from biodatabase; >>>+----------------+------+-----------+-------------+ >>>| biodatabase_id | name | authority | description | >>>+----------------+------+-----------+-------------+ >>>| 23 | | NULL | NULL | >>>+----------------+------+-----------+-------------+ > > BTW, here you actually did not delete your sequence but the namespace. > If you want to check 'sequences' you should look into the bioentry > table. The data also disappeared out of the biosequence table. That indicates I deleted the sequence, right? (I didn't check bioentry at the time.) I have a question out to the BioSQL-l mailing list about the purpose of the biodatabase table. (I assume this mailing list isn't the right forum for that question.) I've been poking around in the BioSQL ERD, trying to understand the purpose of each of the tables. > Using load_seqdatabase.pl the namespace is set automatically to the > default ('bioperl') but you can set it as well with the --namespace > option. Am I foolhardy to think that I can roll my own simplistic load via the code I posted? If I do get it working should I write up a HOWTO? I can put a big "For robust file loading, please see load_seqdatabase.pl" warning at the top. But in our case, we're using Bio::SeqIO to walk through tens of thousands of flat file sequences to find the hundred or so we're interested in, and are trying to store only the ones we want into mySQL. (And we're trying to automate this process for rapid subsequent runs: Load my database w/ only those sequences that X.) Thanks for the quick help! j Omaha Perl Mongers http://omaha.pm.org From cjfields at uiuc.edu Sun Mar 5 15:07:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 5 Mar 2006 14:07:38 -0600 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2? In-Reply-To: <440B0588.4060607@jays.net> References: <440A2830.8080805@jays.net> <0A7AA07F-6FC0-4F98-B729-DBE66AFE247B@uiuc.edu> <440B0588.4060607@jays.net> Message-ID: <5D2CAA28-836D-4DF3-8974-C332F1518F4D@uiuc.edu> Start looking through load_seqdatabase.pl, the other scripts, and the test suite to get an idea of the internals. It looks like you've loaded the biosql schema, so you must have read the INSTALL instructions to get that far. This is from the INSTALL file: With bioperl and bioperl-db installed you are ready to load some data. It is advisable to pre-load the NCBI taxonomy database (use scripts/load_taxonomy.pl in the biosql-schema package, the details are in its documentation). Otherwise you'll see errors from misparsed organisms. The actual script is load_ncbi_taxonomy.pl and is located with biosql- schema (the INSTALL needs to be updated), but everything else is the same. On Mar 5, 2006, at 9:36 AM, Jay Hannah wrote: > Chris Fields wrote: >> Sorry if I'm a bit off (pub you know) but have you tried the >> bioperl- db >> script load_seqdatabase.pl (scripts dir)? > > I poked around in the scripts directory, but am trying to learn the > guts well enough to roll my own since I have some point-and-click > CGI interfacing in mind. (I'll be posting about the project to this > list once we get our thoughts together). > >> Have you loaded taxonomy? > > No, I'm not familiar with that. I'll read up on it. > > Marc Logghe wrote: >> Yes, I agree with Chris. I also think you'd be better off with >> load_seqdatabase.pl. > > I'm sure I would be for general loading. I'm sure the scripts are > far more robust than my little piecemeal stab at it, but I'm not > sure I'll learn the guts if I just use scripts. Reading the code > there are many nuances I don't understand so I'm trying to learn > from the ground up, and I'm not sure what I'm doing wrong in my > first baby steps. :) > >>>> mysql> select * from biodatabase; >>>> +----------------+------+-----------+-------------+ >>>> | biodatabase_id | name | authority | description | >>>> +----------------+------+-----------+-------------+ >>>> | 23 | | NULL | NULL | >>>> +----------------+------+-----------+-------------+ >> >> BTW, here you actually did not delete your sequence but the >> namespace. >> If you want to check 'sequences' you should look into the bioentry >> table. > > The data also disappeared out of the biosequence table. That > indicates I deleted the sequence, right? (I didn't check bioentry > at the time.) I have a question out to the BioSQL-l mailing list > about the purpose of the biodatabase table. (I assume this mailing > list isn't the right forum for that question.) I've been poking > around in the BioSQL ERD, trying to understand the purpose of each > of the tables. > >> Using load_seqdatabase.pl the namespace is set automatically to the >> default ('bioperl') but you can set it as well with the --namespace >> option. > > Am I foolhardy to think that I can roll my own simplistic load via > the code I posted? > > If I do get it working should I write up a HOWTO? I can put a big > "For robust file loading, please see load_seqdatabase.pl" warning > at the top. But in our case, we're using Bio::SeqIO to walk through > tens of thousands of flat file sequences to find the hundred or so > we're interested in, and are trying to store only the ones we want > into mySQL. (And we're trying to automate this process for rapid > subsequent runs: Load my database w/ only those sequences that X.) > > Thanks for the quick help! > > j > Omaha Perl Mongers > http://omaha.pm.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Marc.Logghe at DEVGEN.com Sun Mar 5 15:39:41 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 5 Mar 2006 21:39:41 +0100 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry ''for key 2? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B8A@ANTARESIA.be.devgen.com> > >>>> mysql> select * from biodatabase; > >>>> +----------------+------+-----------+-------------+ > >>>> | biodatabase_id | name | authority | description | > >>>> +----------------+------+-----------+-------------+ > >>>> | 23 | | NULL | NULL | > >>>> +----------------+------+-----------+-------------+ > >> This is odd. The name of the namespace is missing, it should read 'bioperl'. Have you tried explicitely setting the namespace ? my $db = Bio::DB::BioDB->new( -database => "biosql", -host => 'localhost', -port => 3306, -dbname => 'BIOSQL', # not sure, but changed it anyhow -driver => 'mysql', -user => 'dbastola', -pass => '-----------', ); my $file = "/home/dbastola/genbankSequences/GBVRL/gbvrl_2006_Jan/GB_Sequences/gbvrl 1.seq"; my $infile = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $seq = $infile->next_seq(); #set namespace explicitely $seq->namespace('VIRUS'); my $species = $seq->species; print join " | ", $species->classification; print "\n"; my $pseq = $db->create_persistent($seq); $pseq->create() or die "create failed"; $pseq->commit; exit; If you execute your select statement again, you should see VIRUS appearing in the name field. Don't forget to set the namespace as well in your lookup script. HTH, Marc From yezhiqiang at gmail.com Sun Mar 5 15:57:01 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Mon, 6 Mar 2006 04:57:01 +0800 Subject: [Bioperl-l] Suggestion about Bio::Structure::SecStr::DSSP::Res Message-ID: <34198fe40603051257l102cf1fo@mail.gmail.com> Dear developers, I have a suggestion about Bio::Structure::SecStr::DSSP::Res. The dsspcmbi program has a argument -ssa, if we use this argument, the dsspcmbi program will change the Cys's name from 'C' to 'a', 'b', 'c'... if this Cys has formed disulfide bridge. So, I would suggest that this module's construction method could have anothger argument to indicate with -ssa or not. If with -ssa, this module can provide another method about disulfide bridge information. Best regards! Zhiqiang Ye From pterry2 at unlnotes.unl.edu Sun Mar 5 20:15:32 2006 From: pterry2 at unlnotes.unl.edu (Philip M Terry) Date: Sun, 5 Mar 2006 19:15:32 -0600 Subject: [Bioperl-l] bptutorial.pl won't run Message-ID: Hello, Would anyone be able to help with the following. version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz platform: OS X 10.4.5 What trying to do: Install core bioperl modules, then run bptutorial.pl. Code that gives the error: Following installation steps: sudo perl -MCPAN -e shell; cpan> install Bundle::CPAN cpan> q sudo perl -MCPAN -e shell; cpan> install B/BI/BIRNEY/bioperl-1.4.tar.gz cpan> q sudo perl -MCPAN -e shell; cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz cpan> q Tried to run bptutorial.pl from its directory, /opt/local/lib/perl5/site_perl/5.8.7 directory Note: first changed two lines in bptutorial.pl #!/usr/bin/perl to #!/opt/local/bin/perl philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ perl -w bptutorial.pl 1 Example 1 uses files in t/data Directory t/data not found philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ Note: permissions are 444 for bptutorial.pl Reset to 555, got same output, still won't run. Question: What to do to get bptutorial.pl to run on this system? Thanks, Philip M. Terry, Ph.D. University of Nebraska-Lincoln From osborne1 at optonline.net Sun Mar 5 21:26:26 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 05 Mar 2006 21:26:26 -0500 Subject: [Bioperl-l] Suggestion about Bio::Structure::SecStr::DSSP::Res In-Reply-To: <34198fe40603051257l102cf1fo@mail.gmail.com> Message-ID: Zhiqiang Ye, This sounds like a request for an enhancement. The best way to make sure that this is fulfilled is to submit it to bugzilla.bioperl.org. Brian O. On 3/5/06 3:57 PM, "Zhiqiang Ye" wrote: > Dear developers, > > I have a suggestion about Bio::Structure::SecStr::DSSP::Res. > The dsspcmbi program has a argument -ssa, if we use this argument, > the dsspcmbi program will change the Cys's name from 'C' to 'a', 'b', > 'c'... if this Cys has formed disulfide bridge. So, I would suggest > that this module's construction method could have anothger argument to > indicate with -ssa or not. If with -ssa, this module can provide > another method about disulfide bridge information. > > Best regards! > Zhiqiang Ye > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Sun Mar 5 21:22:33 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 05 Mar 2006 21:22:33 -0500 Subject: [Bioperl-l] bptutorial.pl won't run In-Reply-To: Message-ID: Philip, You need to execute bptutorial in the directory that contains the t/ directory. If you've installed BioPerl using CPAN this may be something like ~/.cpan/build/bioperl-1.4. Brian O. On 3/5/06 8:15 PM, "Philip M Terry" wrote: > > Hello, > > Would anyone be able to help with the following. > > version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz > platform: OS X 10.4.5 > What trying to do: Install core bioperl modules, then run bptutorial.pl. > Code that gives the error: Following installation steps: > > sudo perl -MCPAN -e shell; > cpan> install Bundle::CPAN > cpan> q > sudo perl -MCPAN -e shell; > cpan> install B/BI/BIRNEY/bioperl-1.4.tar.gz > cpan> q > sudo perl -MCPAN -e shell; > cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz > cpan> q > > Tried to run bptutorial.pl from its directory, > > /opt/local/lib/perl5/site_perl/5.8.7 directory > Note: first changed two lines in bptutorial.pl > #!/usr/bin/perl > to > #!/opt/local/bin/perl > > philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ > perl -w bptutorial.pl 1 > Example 1 uses files in t/data > Directory t/data not found > philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ > > Note: permissions are 444 for bptutorial.pl > Reset to 555, got same output, still won't run. > > > > Question: > What to do to get bptutorial.pl to run on this system? > > Thanks, > Philip M. Terry, Ph.D. > University of Nebraska-Lincoln > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From saldroubi at yahoo.com Sun Mar 5 21:32:22 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Sun, 5 Mar 2006 18:32:22 -0800 (PST) Subject: [Bioperl-l] bptutorial.pl won't run In-Reply-To: Message-ID: <20060306023222.69702.qmail@web34313.mail.mud.yahoo.com> Philip, I am relatively new to Bioperl but I remember having problems installing it using CPAN so I installed it using make as described http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL under section INSTALLING BIOPERL THE EASY WAY USING 'make' I installed version 1.5.1 and I completed an entire project with it without any problems. I also remember installing the Bundle::Bioperl. I think you should install the bundle first. I also think it is ok to force the install since a few tests will fail with Bioperl but that's ok in most cases. The book Mastering Perl for Bioinformatics talks about installing Bioperl if you have it. You can get verion 1.5.1 from here http://bioperl.org/DIST/ The filename is bioperl-1.5.1.tar.gz Hope this helps. Philip M Terry wrote: Hello, Would anyone be able to help with the following. version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz platform: OS X 10.4.5 What trying to do: Install core bioperl modules, then run bptutorial.pl. Code that gives the error: Following installation steps: sudo perl -MCPAN -e shell; cpan> install Bundle::CPAN cpan> q sudo perl -MCPAN -e shell; cpan> install B/BI/BIRNEY/bioperl-1.4.tar.gz cpan> q sudo perl -MCPAN -e shell; cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz cpan> q Tried to run bptutorial.pl from its directory, /opt/local/lib/perl5/site_perl/5.8.7 directory Note: first changed two lines in bptutorial.pl #!/usr/bin/perl to #!/opt/local/bin/perl philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ perl -w bptutorial.pl 1 Example 1 uses files in t/data Directory t/data not found philip-terrys-power-mac-g5:/opt/local/lib/perl5/site_perl/5.8.7 mterry$ Note: permissions are 444 for bptutorial.pl Reset to 555, got same output, still won't run. Question: What to do to get bptutorial.pl to run on this system? Thanks, Philip M. Terry, Ph.D. University of Nebraska-Lincoln _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From torsten.seemann at infotech.monash.edu.au Sun Mar 5 22:06:08 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 06 Mar 2006 14:06:08 +1100 Subject: [Bioperl-l] Suggestion about Bio::Structure::SecStr::DSSP::Res In-Reply-To: <34198fe40603051257l102cf1fo@mail.gmail.com> References: <34198fe40603051257l102cf1fo@mail.gmail.com> Message-ID: <1141614368.1636.20.camel@chauvel.csse.monash.edu.au> Zhiqiang Ye, > I have a suggestion about Bio::Structure::SecStr::DSSP::Res. > The dsspcmbi program has a argument -ssa, if we use this argument, > the dsspcmbi program will change the Cys's name from 'C' to 'a', 'b', > 'c'... if this Cys has formed disulfide bridge. So, I would suggest > that this module's construction method could have anothger argument to > indicate with -ssa or not. If with -ssa, this module can provide > another method about disulfide bridge information. The Bio::Structure::SecStr::DSSP::Res module is for reading and parsing a DSSP output file. It does not run dsspcmbi for you, therefore any "-ssa" option for the constructor of that module is not meaningful. -- Torsten Seemann Victorian Bioinformatics Consortium From mblanche at berkeley.edu Mon Mar 6 00:02:14 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Sun, 05 Mar 2006 21:02:14 -0800 Subject: [Bioperl-l] Message-ID: Dear all-- I am trying to forge my first bioperl weapons with the Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with their underlying mRNAs and later on add addition useful info (ie binding site for our preferred proteins). I loaded the GadFly gff3 annotation in a mysql database using bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the Bio::Graphics::add_feature method. My understanding is that: my $tcs = $tg->features(-types =>'processed_transcript', -attributes => {Parent => $gene}, -iterator => 1); Produces a Bio::SeqIO object that can be iterate through the next_seq method to get a Bio::Seq object that could be used to extract a Bio::SeqFeatureI by using the get_SeqFeatures method. Somehow, my script does not produce the expected results. Could somebody put me on back on the right track. #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; use Bio::Graphics; my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => "chr4", ); my @genes = ('CG2041'); ##a gene on the fourth chromosome foreach my $gene (@genes){ my $geneseg = $dmdb->segment(-name => $gene, -merge); if ($geneseg){ my @tgs = $geneseg->features(-types => 'gene'); for my $tg (@tgs){ my $length = $tg->length(); my $panel = Bio::Graphics::Panel->new(-length => $length, -width => 800); my $track = $panel->add_track( -glyph => 'generic', -label => 1); my $tcs = $tg->features(-types =>'processed_transcript', -attributes => {Parent => $gene}, -iterator => 1); while ( my $tc = $tcs->next_seq ){ $track->add_feature($tc->get_SeqFeatures); } print $panel->png; } } } Many thanks Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From yezhiqiang at gmail.com Mon Mar 6 03:13:14 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Mon, 6 Mar 2006 16:13:14 +0800 Subject: [Bioperl-l] Suggestion about Bio::Structure::SecStr::DSSP::Res In-Reply-To: <1141614368.1636.20.camel@chauvel.csse.monash.edu.au> References: <34198fe40603051257l102cf1fo@mail.gmail.com> <1141614368.1636.20.camel@chauvel.csse.monash.edu.au> Message-ID: <34198fe40603060013h66a12889s@mail.gmail.com> hi, Torsten Seemann, 2006/3/6, Torsten Seemann : > The Bio::Structure::SecStr::DSSP::Res module is for reading and parsing > a DSSP output file. > It does not run dsspcmbi for you, therefore any "-ssa" option for the > constructor of that module is not meaningful. I know this. Sorry about my poor expression. For example: 1crn.pdb, which contains the lines like this: SSBOND 1 CYS 3 CYS 40 1CRN 60 SSBOND 2 CYS 4 CYS 32 1CRN 61 SSBOND 3 CYS 16 CYS 26 1CRN 62 run this: dsspcmbi 1crn.pdb 1crn.dssp then use this script: my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>"/tmp/1crn.dssp"); print $dssp_obj->resAA(3); print "\n"; The result is 'a', not 'C' (You can see the pdb file, the 3rd residue is CYS) btw: $dssp_obj->getSeq() runs wrong, so does $dssp_obj->getSeq(' ') If you remove the 3 lines of SSBOND from the pdb file, then run dsspcmbi 1crn.pdb 1crn.dssp $dssp_obj->resAA(3) is 'C'. but if you run like this: dsspcmbi -ssa 1crn.pdb 1crn.dssp. $dssp_obj->resAA(3) is 'a' again. That is to say, 1. if the pdb file has SSBOND definded explicitly, dsspcmbi will change CYS's residue character from 'C' to 'a' 'b' or whatever lowercase 2. if the pdb file don't have SSBOND definded explictly, dsspcmbi with '-ssa' will calculate the ssbond it self and then chang CYS's residue character from 'C' to 'a' 'b' or whatever lower case 3. if the pdb file don't have SSBOND definded explictly, and dsspcmbi without '-ssa' will treat all CYS as 'C'. No modification. That's why I think this module would need some modification. :) Best Regards, -- Zhiqiang Ye From lstein at cshl.edu Mon Mar 6 10:26:33 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Mar 2006 10:26:33 -0500 Subject: [Bioperl-l] In-Reply-To: References: Message-ID: <200603061026.35465.lstein@cshl.edu> You are trying too hard. You just fetch each gene by name. No need to iterate over Parent, as that is not formally an attribute. for my $gene (@genes) { my $g = $tg->segment(-name=>$gene); my @transcripts = $g->features(-type=>'processed_transcript'); } Note that you must be sure that GFF3 name munging is turned *off* in order to work with the flybase gff3 files. Lincoln On Monday 06 March 2006 00:02, Marco Blanchette wrote: > Dear all-- > > I am trying to forge my first bioperl weapons with the > Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with > their underlying mRNAs and later on add addition useful info (ie binding > site for our preferred proteins). > > I loaded the GadFly gff3 annotation in a mysql database using > bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the > Bio::Graphics::add_feature method. > > My understanding is that: > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > Produces a Bio::SeqIO object that can be iterate through the next_seq > method to get a Bio::Seq object that could be used to extract a > Bio::SeqFeatureI by using the get_SeqFeatures method. > > Somehow, my script does not produce the expected results. Could somebody > put me on back on the right track. > > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > use Bio::Graphics; > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => "chr4", > ); > > > my @genes = ('CG2041'); ##a gene on the fourth chromosome > > foreach my $gene (@genes){ > > my $geneseg = $dmdb->segment(-name => $gene, -merge); > > if ($geneseg){ > > my @tgs = $geneseg->features(-types => 'gene'); > > for my $tg (@tgs){ > > my $length = $tg->length(); > > my $panel = Bio::Graphics::Panel->new(-length => $length, -width > => 800); > > my $track = $panel->add_track( -glyph => 'generic', > -label => 1); > > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > while ( my $tc = $tcs->next_seq ){ > $track->add_feature($tc->get_SeqFeatures); > } > > print $panel->png; > } > } > } > > Many thanks > > > Marco Blanchette, Ph.D. > > mblanche at berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Mon Mar 6 10:47:21 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Mar 2006 10:47:21 -0500 Subject: [Bioperl-l] In-Reply-To: References: Message-ID: <200603061047.22119.lstein@cshl.edu> By the way, you'll want to CVS update to the latest bioperl live code in order to parse the flybase gff3 files correctly. Lincoln On Monday 06 March 2006 00:02, Marco Blanchette wrote: > Dear all-- > > I am trying to forge my first bioperl weapons with the > Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with > their underlying mRNAs and later on add addition useful info (ie binding > site for our preferred proteins). > > I loaded the GadFly gff3 annotation in a mysql database using > bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the > Bio::Graphics::add_feature method. > > My understanding is that: > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > Produces a Bio::SeqIO object that can be iterate through the next_seq > method to get a Bio::Seq object that could be used to extract a > Bio::SeqFeatureI by using the get_SeqFeatures method. > > Somehow, my script does not produce the expected results. Could somebody > put me on back on the right track. > > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > use Bio::Graphics; > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => "chr4", > ); > > > my @genes = ('CG2041'); ##a gene on the fourth chromosome > > foreach my $gene (@genes){ > > my $geneseg = $dmdb->segment(-name => $gene, -merge); > > if ($geneseg){ > > my @tgs = $geneseg->features(-types => 'gene'); > > for my $tg (@tgs){ > > my $length = $tg->length(); > > my $panel = Bio::Graphics::Panel->new(-length => $length, -width > => 800); > > my $track = $panel->add_track( -glyph => 'generic', > -label => 1); > > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > while ( my $tc = $tcs->next_seq ){ > $track->add_feature($tc->get_SeqFeatures); > } > > print $panel->png; > } > } > } > > Many thanks > > > Marco Blanchette, Ph.D. > > mblanche at berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From golharam at umdnj.edu Mon Mar 6 09:58:54 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 06 Mar 2006 09:58:54 -0500 Subject: [Bioperl-l] NCBI's seq_gene.md file Message-ID: <009d01c6412e$7d5fa8c0$2f01a8c0@GOLHARMOBILE1> There is a file NCBI has for every organism called seq_gene.md. It contains a list of the all the Genes names, chromosome locations, exons, introns, protein, strand, contig, etc. I can parse this easily, but was wondering if there is a bioperl module for this? Ryan From lstein at cshl.edu Mon Mar 6 11:31:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Mar 2006 11:31:47 -0500 Subject: [Bioperl-l] In-Reply-To: References: Message-ID: <200603061131.48014.lstein@cshl.edu> Hi, Since I wrote the last message I have done some more testing and have determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF due to limitations in the Bio::DB::GFF data model. The issue is that Bio::DB::GFF can only store one level of parentage, and not the two levels needed by flybase genes. Here is a quick fix to preprocess the gff3 files so that they can be used by Bio::DB::GFF: while (<>) { my @fields = split "\t"; next unless $fields[2] eq 'mRNA'; s/Parent=([^;]+)/Gene=$1/; } continue { print; } This turns the "Parent" field of mRNA lines into a "Gene" attribute. You can then find all transcripts corresponding to a particular gene in much the way you tried earlier: my $tcs = $tg->features(-types =>'processed_transcript', -attributes => {Gene=> $gene}, -iterator => 1); I am going back to work on Bio::DB::GFF3, which will fix this problem. Lincoln On Monday 06 March 2006 00:02, Marco Blanchette wrote: > Dear all-- > > I am trying to forge my first bioperl weapons with the > Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with > their underlying mRNAs and later on add addition useful info (ie binding > site for our preferred proteins). > > I loaded the GadFly gff3 annotation in a mysql database using > bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the > Bio::Graphics::add_feature method. > > My understanding is that: > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > Produces a Bio::SeqIO object that can be iterate through the next_seq > method to get a Bio::Seq object that could be used to extract a > Bio::SeqFeatureI by using the get_SeqFeatures method. > > Somehow, my script does not produce the expected results. Could somebody > put me on back on the right track. > > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > use Bio::Graphics; > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => "chr4", > ); > > > my @genes = ('CG2041'); ##a gene on the fourth chromosome > > foreach my $gene (@genes){ > > my $geneseg = $dmdb->segment(-name => $gene, -merge); > > if ($geneseg){ > > my @tgs = $geneseg->features(-types => 'gene'); > > for my $tg (@tgs){ > > my $length = $tg->length(); > > my $panel = Bio::Graphics::Panel->new(-length => $length, -width > => 800); > > my $track = $panel->add_track( -glyph => 'generic', > -label => 1); > > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Parent => $gene}, > -iterator => 1); > > while ( my $tc = $tcs->next_seq ){ > $track->add_feature($tc->get_SeqFeatures); > } > > print $panel->png; > } > } > } > > Many thanks > > > Marco Blanchette, Ph.D. > > mblanche at berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From osborne1 at optonline.net Mon Mar 6 11:39:53 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 06 Mar 2006 11:39:53 -0500 Subject: [Bioperl-l] NCBI's seq_gene.md file In-Reply-To: <009d01c6412e$7d5fa8c0$2f01a8c0@GOLHARMOBILE1> Message-ID: Ryan, No, I don't think so. Brian O. On 3/6/06 9:58 AM, "Ryan Golhar" wrote: > There is a file NCBI has for every organism called seq_gene.md. It > contains a list of the all the Genes names, chromosome locations, exons, > introns, protein, strand, contig, etc. > > I can parse this easily, but was wondering if there is a bioperl module > for this? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Mon Mar 6 12:15:52 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 06 Mar 2006 12:15:52 -0500 Subject: [Bioperl-l] NCBI's seq_gene.md file In-Reply-To: Message-ID: Ryan, I don't remember the exact details of the .md format, but it might be straightforward to parse it into a gff format? There are tools for dealing with gff files from bioperl as well as many other tools, as the gff format has become pretty standard. Sean On 3/6/06 11:39 AM, "Brian Osborne" wrote: > Ryan, > > No, I don't think so. > > Brian O. > > > On 3/6/06 9:58 AM, "Ryan Golhar" wrote: > >> There is a file NCBI has for every organism called seq_gene.md. It >> contains a list of the all the Genes names, chromosome locations, exons, >> introns, protein, strand, contig, etc. >> >> I can parse this easily, but was wondering if there is a bioperl module >> for this? >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Mar 6 12:44:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Mar 2006 09:44:37 -0800 Subject: [Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2? In-Reply-To: <440A2830.8080805@jays.net> References: <440A2830.8080805@jays.net> Message-ID: <3b430daad14330266d15b30c9a70e4b2@gmx.net> On Mar 4, 2006, at 3:52 PM, Jay Hannah wrote: > $ perl j2.pl > Human adenovirus type 15 | Mastadenovirus | Adenoviridae | dsDNA > viruses, no RNA stage | Viruses > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::BioNamespaceAdaptor (driver) failed, > values were ("","") FKs () > Duplicate entry '' for key 2 > --------------------------------------------------- > This means the namespace wasn't set. Within Bioperl namespace isn't usually ever necessary to deal with, but in BioSQL it is (and generally is when you need to uniquely identify a sequence, see e.g. LSID). Bio::PrimarySeqI has a $seq->namespace() method, just set it to whatever you'd like. load_seqdatabase.pl does that automatically for you (and you have a command line option to provide it). Most (I believe in fact all) Bio::SeqIO parsers do not set the namespace because there is no universal standard that would dictate or suggest the "right" value. > > > mysql> select * from biodatabase; > +----------------+------+-----------+-------------+ > | biodatabase_id | name | authority | description | > +----------------+------+-----------+-------------+ > | 23 | | NULL | NULL | > +----------------+------+-----------+-------------+ > 1 row in set (0.00 sec) > Well that's one of the more notorious MySQL artifacts - biodatabase.name is NOT NULLable, so MySQL silently converts a NULL value (undef attribute value) to an empty string instead of throwing an error - which probably also would have told you much more directly what is going wrong. Hth, and thanks Marc & Chris for chiming in, I saw you were on the right path. If you're dealing with viral sequences, then you should definitely consider pre-loading the taxonomy as Marc & Chris suggested because virus canonical names often (if not always) don't follow the standard binomial convention, so Bioperl may frequently fail at parsing them correctly. If the NCBI taxon ID is in the feature table Bioperl-db will first look-up by taxon ID instead of by a possibly mis-parsed name. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Mon Mar 6 13:03:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 6 Mar 2006 12:03:28 -0600 Subject: [Bioperl-l] contigs in NCBIHelper (RE: WGS sequences through Bio::DB::GenBank) Message-ID: <001e01c64148$45ad8bd0$15327e82@pyrimidine> I noticed this morning, while looking into ways of retrieving WGS sequences from WGS master files from Bio::DB::GenBank, that NCBIHelper post-processes all files to check for the CONTIG lines (I believe Brian pointed this out to me last week). I found a blurb from the eutils course file that this can be done directly from NCBI, using rettype = gbwithparts, which I mentioned previously: Application 4: Downloading Contigs I want to download a flatfile with the full sequence of an assembly (eg. a contig). Solution: Use EFetch with &rettype=gbwithparts URL:efetch.fcgi?db=nucleotide&id=27479347&rettype=gbwithparts I changed %FORMATMAP in the NCBIHelper BEGIN block to include this return type and it seems to catch these files w/o problems (i.e. passes through postprocessing w/o a hitch). This, of course, doesn't work with WGS files, my original intent. oh well ;{ This seems to speed up the process tremendously as well, considering all the work is done on NCBI's end; a quicky test using the same file (CH398084) and the following: my $gb = Bio::DB::GenBank->new(-verbose => $v, -format => 'gbwithparts'); took ~10-15 secs, most of this retrieval time, while this: my $gb = Bio::DB::GenBank->new(-verbose => $v, -format => 'gb'); took ~45-55 seconds with my 2GHz computer, ~1 Gb RAM running WinXP. There are substantial differences in the files which seem based on 'n' padding between joined segnments. NCBI's version had various # of n's padding where contigs were joined based on the presence of 'gap(x)' in the CONTIG join lines from the master file. I didn't see any padding with bioperl's version. I haven't committed any of these changes just yet as I'm still working on the WGS issue (I'm thinking about a module based on Bio::DB::GenBank aliasing some of the get methods at the moment). So, now should we change the _post_process sub to revert to this when catching CONTIG files on the backend, such as when someone requests a CONTIG file using the rettype of 'gb' instead of 'gbwithparts'? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From luciap at sas.upenn.edu Mon Mar 6 15:43:30 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Mon, 06 Mar 2006 15:43:30 -0500 Subject: [Bioperl-l] Bio::TreeIO functions Message-ID: <1141677810.440c9ef205faf@128.91.55.38> Hi I am trying to colapse nodes bellow a certain bootstrap cutoff value, so I just work only with the most confident nodes on my trees. After getting the nodes bellow my cutoff (70, and with newick format the bootstrap value is actually the _creation_id), I though that if I just call $tree->remove_Node($node) I will just get rid of the nodes and update the ancestor relationships within the tree, however when I call it it actually deletes those nodes and all its children. Anyone has any idea how can I just delete certain nodes so that the children are preserved and the relationships are collpased to polytomies on the confident nodes? thanks Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason.stajich at duke.edu Mon Mar 6 17:50:25 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 6 Mar 2006 17:50:25 -0500 Subject: [Bioperl-l] Bio::TreeIO functions In-Reply-To: <1141677810.440c9ef205faf@128.91.55.38> References: <1141677810.440c9ef205faf@128.91.55.38> Message-ID: On Mar 6, 2006, at 3:43 PM, Lucia Peixoto wrote: > Hi > I am trying to colapse nodes bellow a certain bootstrap cutoff > value, so I just > work only with the most confident nodes on my trees. > After getting the nodes bellow my cutoff (70, and with newick > format the > bootstrap value is actually the _creation_id), I though that if I > just call The bootstrap value is not the _creation_id, but it will be the $node- >id - you shouldn't be using methods that start with _ You can move it to the bootstrap with $node->bootstrap($id) if you want but of course the newick format doesn't distinguish - you can set the flavor of your newick format bootstrap values for reading and writing when you init it with Bio::TreeIO. If you use a format like NHX it will distinguish bootstrap from node Id although only ATV/ Forester will reliably read this format. delete definitely removes a node completely to collapse nodes (and those remodel the parent/child relationship) you want to use the remove_Descendent and add_Descendent methods. Please feel free to submit a 'collapse' function if you end up writing something that works. -jason > $tree->remove_Node($node) I will just get rid of the nodes and > update the > ancestor relationships within the tree, however when I call it it > actually > deletes those nodes and all its children. > Anyone has any idea how can I just delete certain nodes so that the > children are > preserved and the relationships are collpased to polytomies on the > confident > nodes? > > thanks > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Mon Mar 6 19:02:27 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Mon, 06 Mar 2006 18:02:27 -0600 Subject: [Bioperl-l] Bio::DB::GFF and GadFly GFF3 issues In-Reply-To: <200603061131.48014.lstein@cshl.edu> Message-ID: Lincoln-- I did what you suggested and still can't get a drawing of the exon/intron mRNA structure using the following script: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; use Bio::Graphics; use Bio::SeqFeature::Generic; my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => "chr4_mod", ); my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome foreach my $gene (@genes){ my $tg = $dmdb->segment(-name => $gene); if ($tg){ my $panel = Bio::Graphics::Panel->new( -length => $tg->length, -width => 800, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, ); my $tcs = $tg->features(-types =>'processed_transcript', -attributes => {Gene=> $gene}, -iterator => 1); while (my $tc = $tcs->next_seq){ my $track = $panel->add_track( generic => $tc, -bgcolor => 'blue', -label => 1, -bump => 0, #-connector => 'solid', ); } open FH, ">$gene.png" || die "Can't create file $gene.png\n"; print FH $panel->png; $panel->finished; close FH; } } I get 2 files with the mRNA tracts labeled with the RNA id but without any exon/intron structure. However, If I extract the exons first and draw them as in: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; use Bio::Graphics; use Bio::SeqFeature::Generic; my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => "chr4_mod", ); my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome foreach my $gene (@genes){ my $tg = $dmdb->segment(-name => $gene); if ($tg){ my $panel = Bio::Graphics::Panel->new( -length => $tg->length, -width => 800, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, ); my @transcripts = $tg->features(-types =>'processed_transcript', -attributes => {Gene => $gene}, ); for my $tc (@transcripts){ my @exons = $tc->features( -types => 'exon', -attributes => {Parent => $tc->group}); my @introns = $tc->features(-types => 'intron', -attributes => {Parent => $tc->group}); my $track = $panel->add_track( generic => \@exons, -bgcolor => 'blue', -label => 1, -bump => 0, ); } open FH, ">$gene.png" || die "Can't create file $gene.png\n"; print FH $panel->png; $panel->finished; close FH; } } I get the exon drawn correctly but I can add line connecting them. Moreover, since each exon are individual feature, I can't get the mRNA id displayed on each tract... I tried to pass the @intron array to the glyph in different ways to draw hat line between exon without any success. Am I going the right direction or their is some better workaround? Many thanks Marco On 3/6/06 10:31 AM, "Lincoln Stein" wrote: > Hi, > > Since I wrote the last message I have done some more testing and have > determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF due > to limitations in the Bio::DB::GFF data model. The issue is that Bio::DB::GFF > can only store one level of parentage, and not the two levels needed by > flybase genes. > > Here is a quick fix to preprocess the gff3 files so that they can be used by > Bio::DB::GFF: > > while (<>) { > my @fields = split "\t"; > next unless $fields[2] eq 'mRNA'; > s/Parent=([^;]+)/Gene=$1/; > } continue { > print; > } > > This turns the "Parent" field of mRNA lines into a "Gene" attribute. You can > then find all transcripts corresponding to a particular gene in much the way > you tried earlier: > > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Gene=> $gene}, > -iterator => 1); > > I am going back to work on Bio::DB::GFF3, which will fix this problem. > > Lincoln > > On Monday 06 March 2006 00:02, Marco Blanchette wrote: >> Dear all-- >> >> I am trying to forge my first bioperl weapons with the >> Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with >> their underlying mRNAs and later on add addition useful info (ie binding >> site for our preferred proteins). >> >> I loaded the GadFly gff3 annotation in a mysql database using >> bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the >> Bio::Graphics::add_feature method. >> >> My understanding is that: >> my $tcs = $tg->features(-types =>'processed_transcript', >> -attributes => {Parent => $gene}, >> -iterator => 1); >> >> Produces a Bio::SeqIO object that can be iterate through the next_seq >> method to get a Bio::Seq object that could be used to extract a >> Bio::SeqFeatureI by using the get_SeqFeatures method. >> >> Somehow, my script does not produce the expected results. Could somebody >> put me on back on the right track. >> >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::DB::GFF; >> use Bio::Graphics; >> >> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => "chr4", >> ); >> >> >> my @genes = ('CG2041'); ##a gene on the fourth chromosome >> >> foreach my $gene (@genes){ >> >> my $geneseg = $dmdb->segment(-name => $gene, -merge); >> >> if ($geneseg){ >> >> my @tgs = $geneseg->features(-types => 'gene'); >> >> for my $tg (@tgs){ >> >> my $length = $tg->length(); >> >> my $panel = Bio::Graphics::Panel->new(-length => $length, -width >> => 800); >> >> my $track = $panel->add_track( -glyph => 'generic', >> -label => 1); >> >> my $tcs = $tg->features(-types =>'processed_transcript', >> -attributes => {Parent => $gene}, >> -iterator => 1); >> >> while ( my $tc = $tcs->next_seq ){ >> $track->add_feature($tc->get_SeqFeatures); >> } >> >> print $panel->png; >> } >> } >> } >> >> Many thanks >> >> >> Marco Blanchette, Ph.D. >> >> mblanche at berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From cuiw at mail.nih.gov Mon Mar 6 23:24:01 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Mon, 6 Mar 2006 23:24:01 -0500 Subject: [Bioperl-l] NCBI's seq_gene.md file References: <009d01c6412e$7d5fa8c0$2f01a8c0@GOLHARMOBILE1> Message-ID: Hello, Ryan, I wrote a script to load the .md into MySQL a couple of years ago. Hopefully it still works. Wenwu Cui, PhD NCI/NIH #!/usr/bin/perl use strict; use warnings; # Make connection with MySQL database use DBI; my $database = 'hsgenome'; my $server = 'localhost'; #your server IP my $user = 'root'; #your username my $passwd = 'mysql'; #your password my $hsgenome = DBI->connect("dbi:mysql:$database:$server", $user, $passwd) or exit (1); # prepare an SQL statement #create genedb table =>task1 my $task1 =qq/ create table genedb ( taxid int(5), chromosome char(3), chrStart int(10), chrEnd int(10), orientation char(2), contig char(15), cnt_start int, cnt_stop int, cnt_orient char(2), featureName char(10), featureId char(15), featureType char(10), groupLabel char(10), transcript char(10), weight char(2) ) /; $hsgenome->do ($task1); #load gene_seq data to genedb # datafile directory should be changed according to your path to your .md file $hsgenome->do ("LOAD DATA local infile '/root/seq_gene.md' into table genedb ignore 1 lines") or die "could not load"; # Break connection with MySQL database $hsgenome->disconnect; exit; ________________________________ From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Mon 3/6/2006 9:58 AM To: 'bioperl-l' Subject: [Bioperl-l] NCBI's seq_gene.md file There is a file NCBI has for every organism called seq_gene.md. It contains a list of the all the Genes names, chromosome locations, exons, introns, protein, strand, contig, etc. I can parse this easily, but was wondering if there is a bioperl module for this? Ryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Tue Mar 7 00:35:00 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Mon, 06 Mar 2006 21:35:00 -0800 Subject: [Bioperl-l] Which interfaces to use for phylogenetics? Message-ID: <440D1B84.6020401@sfu.ca> Hi all, I am working on the next release of Bio::Phylo. Not part of BioPerl - the reasons why make a longish story - maybe the BioPerl gurus would like to discuss a timeline for merging and extending bioperl's phylogenetics capabilities? For the time being, though, I'd like to be as interoperable as possible with BioPerl. Hence, the following works: ####################################### use Bio::Phylo::Forest::Tree; use Bio::TreeIO; my $bp_tree = Bio::TreeIO->new( '-format' => 'newick', '-file' => 'intree.dnd' )->next_tree; my $newtree = Bio::Phylo::Forest::Tree->new_from_bioperl( $bp_tree ); ####################################### In the opposite direction, Bio::Phylo::Forest::Tree objects and Bio::Phylo::Forest::Node will implement the Bio::Tree::TreeI and Bio::Tree::NodeI interfaces respectively (and push them in their @ISA), if BioPerl is installed on the system. I'd like to do the same for data matrices. Hence, I'll implement a new_from_bioperl constructor for the Bio::Phylo::Matrices::Matrix and Bio::Phylo::Matrices::Datum objects. However, from which BioPerl interfaces should I inherit? Which objects are most handy as argument to the constructor from BioPerl's perspective? Note that the Matrix object behaves either as a sequence alignment or a categorical or continuous character state matrix. Bio::Align::AlignI doesn't seem entirely appropriate, at least in that it is entirely molecular-sequence-oriented. Is there a more generic matrix interface? Maybe Bio::Matrix::MatrixI? Obviously, I'm looking for an interface that will be readily accepted by the Bio::Tools::Run::* phylogenetics modules. On a related note - I'm writing a wrapper around MrBayes which I'd like to contribute as Bio::Tools::Run::MrBayes. I posted the code as an attachment to the list but I think it got blocked (okay, I can see why) so whom would I send this to for consideration? (Or, how does one get a cvs commit bit?) Thanks! Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From rvosa at sfu.ca Tue Mar 7 01:21:10 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Mon, 06 Mar 2006 22:21:10 -0800 Subject: [Bioperl-l] Bio::TreeIO parse from a string? Message-ID: <440D2656.4000500@sfu.ca> Hi, is there a way to make Bio::TreeIO parse a string? Or from a handle? E.g. $treeio = Bio::TreeIO->new( -format => 'newick', -string => $newick ); or open( my $handle, '<', \$newick ) or die $!; $treeio = Bio::TreeIO->new( -format => 'newick', -handle => $handle ); Thanks! Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From hlapp at gmx.net Tue Mar 7 02:08:30 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 6 Mar 2006 23:08:30 -0800 Subject: [Bioperl-l] Bio::TreeIO parse from a string? In-Reply-To: <440D2656.4000500@sfu.ca> References: <440D2656.4000500@sfu.ca> Message-ID: <0472556364f6d0faedc2c3f574317fe5@gmx.net> Like other IO interfaces in Bioperl you can supply -fh and as value a GLOB. In order to read from a string you'd use IO::String and then pass the handle via -fh. -hilmar On Mar 6, 2006, at 10:21 PM, Rutger Vos wrote: > Hi, > > is there a way to make Bio::TreeIO parse a string? Or from a handle? > > E.g. > > $treeio = Bio::TreeIO->new( -format => 'newick', -string => $newick ); > > or > > open( my $handle, '<', \$newick ) or die $!; > $treeio = Bio::TreeIO->new( -format => 'newick', -handle => $handle ); > > Thanks! > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From osborne1 at optonline.net Tue Mar 7 08:45:25 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 07 Mar 2006 08:45:25 -0500 Subject: [Bioperl-l] Which interfaces to use for phylogenetics? In-Reply-To: <440D1B84.6020401@sfu.ca> Message-ID: Rutger, As an immediate solution just attach it to a message submitted to http://bugzilla.bioperl.org. Brian O. On 3/7/06 12:35 AM, "Rutger Vos" wrote: > to contribute as Bio::Tools::Run::MrBayes. I posted the code as an > attachment to the list but I think it got blocked (okay, I can see why) > so whom would I send this to for consideration? (Or, how does one get a From luciap at sas.upenn.edu Tue Mar 7 09:01:46 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 07 Mar 2006 09:01:46 -0500 Subject: [Bioperl-l] Bio::TreeIO functions In-Reply-To: References: <1141677810.440c9ef205faf@128.91.55.38> Message-ID: <1141740106.440d924a21547@128.91.55.38> Hi so bad that there's no collapse function yet, If I come up with something usefull I'll add it On the other hand, the remove_nodes function, even if it removes the selected nodes and its children, alters the newick format and the end results can't be opened by any tree viewer, anyone has has a similar experience? I guess I should post this in bugzilla Lucia Quoting Jason Stajich : > > On Mar 6, 2006, at 3:43 PM, Lucia Peixoto wrote: > > > Hi > > I am trying to colapse nodes bellow a certain bootstrap cutoff > > value, so I just > > work only with the most confident nodes on my trees. > > After getting the nodes bellow my cutoff (70, and with newick > > format the > > bootstrap value is actually the _creation_id), I though that if I > > just call > > The bootstrap value is not the _creation_id, but it will be the $node- > >id - you shouldn't be using methods that start with _ > > You can move it to the bootstrap with $node->bootstrap($id) if you > want but of course the newick format doesn't distinguish - you can > set the flavor of your newick format bootstrap values for reading and > writing when you init it with Bio::TreeIO. If you use a format like > NHX it will distinguish bootstrap from node Id although only ATV/ > Forester will reliably read this format. > > delete definitely removes a node completely to collapse nodes (and > those remodel the parent/child relationship) you want to use the > remove_Descendent and add_Descendent methods. > > Please feel free to submit a 'collapse' function if you end up > writing something that works. > > -jason > > > $tree->remove_Node($node) I will just get rid of the nodes and > > update the > > ancestor relationships within the tree, however when I call it it > > actually > > deletes those nodes and all its children. > > Anyone has any idea how can I just delete certain nodes so that the > > children are > > preserved and the relationships are collpased to polytomies on the > > confident > > nodes? > > > > thanks > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From lstein at cshl.edu Mon Mar 6 21:33:37 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Mar 2006 21:33:37 -0500 Subject: [Bioperl-l] Bio::DB::GFF and GadFly GFF3 issues In-Reply-To: References: Message-ID: <200603062133.37099.lstein@cshl.edu> Hi Marco, I'm awfully sorry about this, but it looks like Bio::DB::GFF was broken at some point in the recent past and nobody picked up on it. The other thing that happened is that the FlyBase folks have started representing their transcripts with a single CDS rather than with multiple CDS's -- this doesn't affect you now, but it will if you try to use the processed_transcript glyph. So, first of you, you'll have to CVS update bioperl again. Second, look at the enclosed draw_gene.pl script that will do what you want. The main trick here is that I added type "intron" to the processed_transcript feature aggregator, enabling you to get the introns. Ordinarily introns are not included in the processed_transcript (because they're processed out!) Lincoln On Monday 06 March 2006 19:02, Marco Blanchette wrote: > Lincoln-- > > I did what you suggested and still can't get a drawing of the exon/intron > mRNA structure using the following script: > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => "chr4_mod", > ); > my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome > foreach my $gene (@genes){ > > my $tg = $dmdb->segment(-name => $gene); > > if ($tg){ > > my $panel = Bio::Graphics::Panel->new( > -length => $tg->length, > -width => 800, > -pad_left => 10, > -pad_right => 10, > ); > > my $full_length = > Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length); > > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > ); > > my $tcs = $tg->features(-types =>'processed_transcript', > -attributes => {Gene=> $gene}, > -iterator => 1); > > while (my $tc = $tcs->next_seq){ > > my $track = $panel->add_track( generic => $tc, > -bgcolor => 'blue', > -label => 1, > -bump => 0, > #-connector => 'solid', > ); > } > open FH, ">$gene.png" || die "Can't create file $gene.png\n"; > print FH $panel->png; > $panel->finished; > close FH; > } > } > > I get 2 files with the mRNA tracts labeled with the RNA id but without any > exon/intron structure. > > However, If I extract the exons first and draw them as in: > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => "chr4_mod", > ); > my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome > foreach my $gene (@genes){ > > my $tg = $dmdb->segment(-name => $gene); > > if ($tg){ > my $panel = Bio::Graphics::Panel->new( > -length => $tg->length, > -width => 800, > -pad_left => 10, > -pad_right => 10, > ); > > my $full_length = > Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length); > > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > ); > > my @transcripts = $tg->features(-types =>'processed_transcript', > -attributes => {Gene => $gene}, > ); > > for my $tc (@transcripts){ > my @exons = $tc->features( -types => 'exon', > -attributes => {Parent => > $tc->group}); > > my @introns = $tc->features(-types => 'intron', > -attributes => {Parent => > $tc->group}); > > my $track = $panel->add_track( generic => \@exons, > -bgcolor => 'blue', > -label => 1, > -bump => 0, > ); > } > open FH, ">$gene.png" || die "Can't create file $gene.png\n"; > print FH $panel->png; > $panel->finished; > close FH; > } > } > > I get the exon drawn correctly but I can add line connecting them. > Moreover, since each exon are individual feature, I can't get the mRNA id > displayed on each tract... > > I tried to pass the @intron array to the glyph in different ways to draw > hat line between exon without any success. > > Am I going the right direction or their is some better workaround? > > Many thanks > > Marco > > On 3/6/06 10:31 AM, "Lincoln Stein" wrote: > > Hi, > > > > Since I wrote the last message I have done some more testing and have > > determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF > > due to limitations in the Bio::DB::GFF data model. The issue is that > > Bio::DB::GFF can only store one level of parentage, and not the two > > levels needed by flybase genes. > > > > Here is a quick fix to preprocess the gff3 files so that they can be used > > by Bio::DB::GFF: > > > > while (<>) { > > my @fields = split "\t"; > > next unless $fields[2] eq 'mRNA'; > > s/Parent=([^;]+)/Gene=$1/; > > } continue { > > print; > > } > > > > This turns the "Parent" field of mRNA lines into a "Gene" attribute. You > > can then find all transcripts corresponding to a particular gene in much > > the way you tried earlier: > > > > my $tcs = $tg->features(-types =>'processed_transcript', > > -attributes => {Gene=> $gene}, > > -iterator => 1); > > > > I am going back to work on Bio::DB::GFF3, which will fix this problem. > > > > Lincoln > > > > On Monday 06 March 2006 00:02, Marco Blanchette wrote: > >> Dear all-- > >> > >> I am trying to forge my first bioperl weapons with the > >> Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with > >> their underlying mRNAs and later on add addition useful info (ie binding > >> site for our preferred proteins). > >> > >> I loaded the GadFly gff3 annotation in a mysql database using > >> bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the > >> Bio::Graphics::add_feature method. > >> > >> My understanding is that: > >> my $tcs = $tg->features(-types =>'processed_transcript', > >> -attributes => {Parent => > >> $gene}, -iterator => 1); > >> > >> Produces a Bio::SeqIO object that can be iterate through the next_seq > >> method to get a Bio::Seq object that could be used to extract a > >> Bio::SeqFeatureI by using the get_SeqFeatures method. > >> > >> Somehow, my script does not produce the expected results. Could somebody > >> put me on back on the right track. > >> > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> use Bio::DB::GFF; > >> use Bio::Graphics; > >> > >> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >> -dsn => "chr4", > >> ); > >> > >> > >> my @genes = ('CG2041'); ##a gene on the fourth chromosome > >> > >> foreach my $gene (@genes){ > >> > >> my $geneseg = $dmdb->segment(-name => $gene, -merge); > >> > >> if ($geneseg){ > >> > >> my @tgs = $geneseg->features(-types => 'gene'); > >> > >> for my $tg (@tgs){ > >> > >> my $length = $tg->length(); > >> > >> my $panel = Bio::Graphics::Panel->new(-length => $length, -width > >> => 800); > >> > >> my $track = $panel->add_track( -glyph => 'generic', > >> -label => 1); > >> > >> my $tcs = $tg->features(-types =>'processed_transcript', > >> -attributes => {Parent => > >> $gene}, -iterator => 1); > >> > >> while ( my $tc = $tcs->next_seq ){ > >> $track->add_feature($tc->get_SeqFeatures); > >> } > >> > >> print $panel->png; > >> } > >> } > >> } > >> > >> Many thanks > >> > >> > >> Marco Blanchette, Ph.D. > >> > >> mblanche at berkeley.edu > >> > >> Donald C. Rio's lab > >> Department of Molecular and Cell Biology > >> 16 Barker Hall > >> University of California > >> Berkeley, CA 94720-3204 > >> > >> Tel: (510) 642-1084 > >> Cell: (510) 847-0996 > >> Fax: (510) 642-6062 > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Marco Blanchette, Ph.D. > > mblanche at berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: CG7486.png Type: image/png Size: 3556 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060306/84ac0614/attachment-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: draw_gene.pl Type: application/x-perl Size: 1469 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060306/84ac0614/attachment-0001.bin From jason.stajich at duke.edu Tue Mar 7 10:01:46 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 7 Mar 2006 10:01:46 -0500 Subject: [Bioperl-l] Bio::TreeIO functions In-Reply-To: <1141740106.440d924a21547@128.91.55.38> References: <1141677810.440c9ef205faf@128.91.55.38> <1141740106.440d924a21547@128.91.55.38> Message-ID: <5434A62F-EF27-4FFE-828F-810740F8D093@duke.edu> On Mar 7, 2006, at 9:01 AM, Lucia Peixoto wrote: > Hi > so bad that there's no collapse function yet, If I come up with > something > usefull I'll add it Shouldn't be hard to do - give it a shot - for the node you want to collapse N, get N's ancestor A, add the children of N as children of A instead, remove descendents of N, and then delete N. I'll add a note to the wiki Project priority list so that it is tracked. > > On the other hand, the remove_nodes function, even if it removes > the selected > nodes and its children, alters the newick format and the end > results can't be > opened by any tree viewer, anyone has has a similar experience? I've not had a similar experience, myself and others have used it to prune trees successfully. Are you using bioperl 1.5.1 or the latest code from CVS? > > I guess I should post this in bugzilla please do and include a simple code sample that demonstrates the problem. > > Lucia > > Quoting Jason Stajich : > >> >> On Mar 6, 2006, at 3:43 PM, Lucia Peixoto wrote: >> >>> Hi >>> I am trying to colapse nodes bellow a certain bootstrap cutoff >>> value, so I just >>> work only with the most confident nodes on my trees. >>> After getting the nodes bellow my cutoff (70, and with newick >>> format the >>> bootstrap value is actually the _creation_id), I though that if I >>> just call >> >> The bootstrap value is not the _creation_id, but it will be the >> $node- >>> id - you shouldn't be using methods that start with _ >> >> You can move it to the bootstrap with $node->bootstrap($id) if you >> want but of course the newick format doesn't distinguish - you can >> set the flavor of your newick format bootstrap values for reading and >> writing when you init it with Bio::TreeIO. If you use a format like >> NHX it will distinguish bootstrap from node Id although only ATV/ >> Forester will reliably read this format. >> >> delete definitely removes a node completely to collapse nodes (and >> those remodel the parent/child relationship) you want to use the >> remove_Descendent and add_Descendent methods. >> >> Please feel free to submit a 'collapse' function if you end up >> writing something that works. >> >> -jason >> >>> $tree->remove_Node($node) I will just get rid of the nodes and >>> update the >>> ancestor relationships within the tree, however when I call it it >>> actually >>> deletes those nodes and all its children. >>> Anyone has any idea how can I just delete certain nodes so that the >>> children are >>> preserved and the relationships are collpased to polytomies on the >>> confident >>> nodes? >>> >>> thanks >>> >>> >>> Lucia Peixoto >>> Department of Biology,SAS >>> University of Pennsylvania >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Mar 7 10:21:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 7 Mar 2006 10:21:17 -0500 Subject: [Bioperl-l] Bio::TreeIO functions In-Reply-To: <1389035724.20060307101058@princeton.edu> References: <1141677810.440c9ef205faf@128.91.55.38> <1141740106.440d924a21547@128.91.55.38> <1389035724.20060307101058@princeton.edu> Message-ID: If you'd like some of those options off in the NEXUS trees just say so they are easily customizable to make write_tree - since there are so many flavors. I only used the protml bootstrap flavor output by default because that was what I needed in nexus, but the newick writer is customized so that you can request where you want the bootstrap values ( removing the need for your s/\[\d+?\]//g; pattern). This is pretty easy bit to add to the nexus writer as I would imagine it is preferrable to having to hack the output again when you get it out of the scripts. I just use newick for treeview and I've never had a problem so I am confused what is causing the problems - I guess we'll just wait for bugs to be submitted to bugzilla. -jason On Mar 7, 2006, at 10:10 AM, Georgii A Bazykin wrote: > I had similar experience... The attached script worked for me to > covert the bioperl-generated nexus tree into treeview-readable - check > if it helps you as a quick solution. > > Yegor Bazykin > > > ------------------------------ > Tuesday, March 7, 2006, 9:01:46 AM, you wrote: > >> Hi >> so bad that there's no collapse function yet, If I come up with >> something >> usefull I'll add it > >> On the other hand, the remove_nodes function, even if it removes >> the selected >> nodes and its children, alters the newick format and the end >> results can't be >> opened by any tree viewer, anyone has has a similar experience? > >> I guess I should post this in bugzilla > >> Lucia > >> Quoting Jason Stajich : > >>> >>> On Mar 6, 2006, at 3:43 PM, Lucia Peixoto wrote: >>> >>>> Hi >>>> I am trying to colapse nodes bellow a certain bootstrap cutoff >>>> value, so I just >>>> work only with the most confident nodes on my trees. >>>> After getting the nodes bellow my cutoff (70, and with newick >>>> format the >>>> bootstrap value is actually the _creation_id), I though that if I >>>> just call >>> >>> The bootstrap value is not the _creation_id, but it will be the >>> $node- >>>> id - you shouldn't be using methods that start with _ >>> >>> You can move it to the bootstrap with $node->bootstrap($id) if you >>> want but of course the newick format doesn't distinguish - you can >>> set the flavor of your newick format bootstrap values for reading >>> and >>> writing when you init it with Bio::TreeIO. If you use a format like >>> NHX it will distinguish bootstrap from node Id although only ATV/ >>> Forester will reliably read this format. >>> >>> delete definitely removes a node completely to collapse nodes (and >>> those remodel the parent/child relationship) you want to use the >>> remove_Descendent and add_Descendent methods. >>> >>> Please feel free to submit a 'collapse' function if you end up >>> writing something that works. >>> >>> -jason >>> >>>> $tree->remove_Node($node) I will just get rid of the nodes and >>>> update the >>>> ancestor relationships within the tree, however when I call it it >>>> actually >>>> deletes those nodes and all its children. >>>> Anyone has any idea how can I just delete certain nodes so that the >>>> children are >>>> preserved and the relationships are collpased to polytomies on the >>>> confident >>>> nodes? >>>> >>>> thanks >>>> >>>> >>>> Lucia Peixoto >>>> Department of Biology,SAS >>>> University of Pennsylvania >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > >> Lucia Peixoto >> Department of Biology,SAS >> University of Pennsylvania >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From gbazykin at Princeton.EDU Tue Mar 7 10:10:58 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Tue, 7 Mar 2006 10:10:58 -0500 Subject: [Bioperl-l] Bio::TreeIO functions In-Reply-To: <1141740106.440d924a21547@128.91.55.38> References: <1141677810.440c9ef205faf@128.91.55.38> <1141740106.440d924a21547@128.91.55.38> Message-ID: <1389035724.20060307101058@princeton.edu> I had similar experience... The attached script worked for me to covert the bioperl-generated nexus tree into treeview-readable - check if it helps you as a quick solution. Yegor Bazykin ------------------------------ Tuesday, March 7, 2006, 9:01:46 AM, you wrote: > Hi > so bad that there's no collapse function yet, If I come up with something > usefull I'll add it > On the other hand, the remove_nodes function, even if it removes the selected > nodes and its children, alters the newick format and the end results can't be > opened by any tree viewer, anyone has has a similar experience? > I guess I should post this in bugzilla > Lucia > Quoting Jason Stajich : >> >> On Mar 6, 2006, at 3:43 PM, Lucia Peixoto wrote: >> >> > Hi >> > I am trying to colapse nodes bellow a certain bootstrap cutoff >> > value, so I just >> > work only with the most confident nodes on my trees. >> > After getting the nodes bellow my cutoff (70, and with newick >> > format the >> > bootstrap value is actually the _creation_id), I though that if I >> > just call >> >> The bootstrap value is not the _creation_id, but it will be the $node- >> >id - you shouldn't be using methods that start with _ >> >> You can move it to the bootstrap with $node->bootstrap($id) if you >> want but of course the newick format doesn't distinguish - you can >> set the flavor of your newick format bootstrap values for reading and >> writing when you init it with Bio::TreeIO. If you use a format like >> NHX it will distinguish bootstrap from node Id although only ATV/ >> Forester will reliably read this format. >> >> delete definitely removes a node completely to collapse nodes (and >> those remodel the parent/child relationship) you want to use the >> remove_Descendent and add_Descendent methods. >> >> Please feel free to submit a 'collapse' function if you end up >> writing something that works. >> >> -jason >> >> > $tree->remove_Node($node) I will just get rid of the nodes and >> > update the >> > ancestor relationships within the tree, however when I call it it >> > actually >> > deletes those nodes and all its children. >> > Anyone has any idea how can I just delete certain nodes so that the >> > children are >> > preserved and the relationships are collpased to polytomies on the >> > confident >> > nodes? >> > >> > thanks >> > >> > >> > Lucia Peixoto >> > Department of Biology,SAS >> > University of Pennsylvania >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: nexus_bioperl2nexus_treeview.pl Type: application/octet-stream Size: 690 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060307/3685e310/attachment.obj From cjfields at uiuc.edu Tue Mar 7 10:49:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Mar 2006 09:49:49 -0600 Subject: [Bioperl-l] WGS file problems with Bio::SeqIO::genbank Message-ID: <000601c641fe$c3edd480$15327e82@pyrimidine> I'm currently working on a WGS file module for Bio::DB and noticed that, although most WGS files (with the line WGS) are parsed normally, those that have additional scaffolds (WGS_SCAFLD) are passed over. These are with regular WGS files and not the strange beast that is the WGS RefSeq file. Example file AAAA000000 (O. sativa): >From GenBank (direct from web site), FEATURES section: ------------------------------------------------------------------ FEATURES Location/Qualifiers source 1..50231 /organism="Oryza sativa (indica cultivar-group)" /mol_type="genomic DNA" /cultivar="93-11" /db_xref="taxon:39946" WGS AAAA02000001-AAAA02050231 WGS_SCAFLD CM000126-CM000137 WGS_SCAFLD CH398081-CH401163 // ------------------------------------------------------------------ Passed through SeqIO: ------------------------------------------------------------------ FEATURES Location/Qualifiers source 1..50231 /db_xref="taxon:39946" /mol_type="genomic DNA" /cultivar="93-11" /organism="Oryza sativa (indica cultivar-group)" WGS AAAA02000001-AAAA02050231 // ------------------------------------------------------------------ I'll post as a bug in Bugzilla for now. I'm looking at Bio::SeqIO::genbank and may post a fix when/if I can work out how SeqIO parses. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From clarsen at vecna.com Tue Mar 7 11:22:28 2006 From: clarsen at vecna.com (Chris Larsen) Date: Tue, 7 Mar 2006 11:22:28 -0500 (EST) Subject: [Bioperl-l] New Microbial Bioinformatics Resource Center open Message-ID: <57721.192.168.3.51.1141748548.squirrel@mail.vecna.com> Hi all, I'd like to announce here the opening of a new public website for organismal bioinformatics. It's the BioHealthbase site: http://www.biohealthbase.org/GSearch/ This resource was created using bioperl support, and a GBrowse set of pages. (Can it be added to the list of GBrowse-using DBs?) Our Northop/Vecna/UTSW BRC was recently established under the NIAID BRC program, which targets specific organisms that might be used in BioTerror projects (thankfully, not open source). Our team is competent and diverse, coming from Cognia, Celera, TIGR, and elsewhere. A nearby BRC operates at TIGR in Rockville. We focus on Francisella (tularensis), Influenza (flu), Mycobacterium (TB), Giardia, Microsporidians, and others. Check it out. We've just done release 1, and look for feedback (send direct to me). Many of the features would not have been possible without perl. Thanks for all your dev, -chris ---------------------------- Chris Larsen, Ph.D. NIAID Bioinformatics Resource Center Senior Scientist Vecna Technologies, Inc. 5004 Lehigh Rd College Park, MD 20740-3821 e: clarsen at vecna.com ph: (240) 737-1625 f: (301) 699-3180 From cjfields at uiuc.edu Tue Mar 7 12:06:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Mar 2006 11:06:23 -0600 Subject: [Bioperl-l] New Microbial Bioinformatics Resource Center open In-Reply-To: <57721.192.168.3.51.1141748548.squirrel@mail.vecna.com> Message-ID: <000a01c64209$76c579f0$15327e82@pyrimidine> Added to the wiki: http://www.bioperl.org/wiki/Gbrowse#Sites_using_Gbrowse Very nice site! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Larsen > Sent: Tuesday, March 07, 2006 10:22 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] New Microbial Bioinformatics Resource Center open > > Hi all, > > I'd like to announce here the opening of a new public website for > organismal bioinformatics. It's the BioHealthbase site: > > http://www.biohealthbase.org/GSearch/ > > This resource was created using bioperl support, and a GBrowse set of > pages. (Can it be added to the list of GBrowse-using DBs?) > > Our Northop/Vecna/UTSW BRC was recently established under the NIAID BRC > program, which targets specific organisms that might be used in BioTerror > projects (thankfully, not open source). Our team is competent and diverse, > coming from Cognia, Celera, TIGR, and elsewhere. A nearby BRC operates at > TIGR in Rockville. We focus on Francisella (tularensis), Influenza (flu), > Mycobacterium (TB), Giardia, Microsporidians, and others. > > Check it out. We've just done release 1, and look for feedback (send > direct to me). Many of the features would not have been possible without > perl. > > Thanks for all your dev, > > -chris > > > ---------------------------- > Chris Larsen, Ph.D. > NIAID Bioinformatics Resource Center > Senior Scientist > Vecna Technologies, Inc. > 5004 Lehigh Rd > College Park, MD 20740-3821 > e: clarsen at vecna.com > ph: (240) 737-1625 > f: (301) 699-3180 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Mar 8 17:48:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Mar 2006 16:48:27 -0600 Subject: [Bioperl-l] ncDNA In-Reply-To: <235f7dbe0603080136x65e82c50j2488cdb23c10b8bb@mail.gmail.com> Message-ID: <001101c64302$6a0a18c0$15327e82@pyrimidine> It's a good idea to mail to the list as well; others may have better or alternative ways of doing this. Jason mentions masking all regions you don't want (CDS, rRNA, tRNA) by changing them to N's, then splitting on multiple N's: http://portal.open-bio.org/pipermail/bioperl-l/2005-January/018092.html Using substr (as Jason suggests) should also speed things up. Here's my solution using only bioperl objects; I use this for some work I do here. It works but isn't terribly fast with large sequences due to object creation. There is very likely a better way with larger sequences. You could probably use the first loop to extract seqfeatures and change the second part to simple perl to speed things up a bit or try straight regex matching combined with substr for maximum velocity. Be careful though; bioperl subseq() searches starting with 1 while I think perl substr starts at 0! ------------------------------------------------------------ use Bio::SeqIO; use Bio::Seq; use strict; my $file = shift @ARGV; my $seqio = Bio::SeqIO->new(-file => "$file", -format => 'genbank'); my $seqout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta'); my @cr = {}; my $seq = $seqio->next_seq; # grab features, filter for those we want to keep for my $feature ($seq->get_SeqFeatures) { next unless $feature->primary_tag =~ /CDS|rRNA|tRNA/; for my $value ($feature->get_tag_values('gene')) { push @cr, {'gene' => $value, 'start' => $feature->start, 'end' => $feature->end}; } } # extract and format subseqs my $count = 0; while($cr[$count]) { if (defined $cr[$count]->{'gene'}) { # get end of previous gene; if no gene (undef), set to 1 (startpoint) my $start = $cr[$count-1]->{'end'} ? $cr[$count-1]->{'end'} + 1 : 1 ; # get start of second gene (endpoint) my $end = $cr[$count]->{'start'} - 1; my $igrname = $cr[$count-1]->{'gene'} ? $cr[$count-1]->{'gene'}."-".$cr[$count]->{'gene'}.' | '. $start."..".$end : $cr[$count]->{'gene'}.'|'. $start."..".$end; my $igr = $end - $start; if($igr > 0 ) { # filter out negative or zero values my $sub = $seq->subseq($start, $end); my $subseq = Bio::Seq->new(-seq => $sub, -desc => $igrname); $seqout->write_seq($subseq); } } $count++; } ------------------------------------------------------------ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _____ From: perlmails at gmail.com [mailto:perlmails at gmail.com] Sent: Wednesday, March 08, 2006 3:37 AM To: cjfields at uiuc.edu Subject: Re: ncDNA Hi Chris, Apologies for the personal email and the delay in replying. I think, there is a misunderstanding, since I want to extract non-coding sequences (intergenic) from the EMBL chromosome file. With the links you mentioned annotated features could be extracted from EMBL FT entries. I am deleteing the annotated parts of the EMBL file to get the inter-feature sequences. In my script(below) I replace the annotated feature sequences from a fasta formatted file based on the (start-end) coordinates saved in a different file using 'substr'. Do you know, how to go about it? or suggest a better way? -PO You're not using bioperl. See: http://www.bioperl.org/wiki/HOWTO:Beginners then go to: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Chris On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote: > Dear Bioperl group, > > I have been working on extracting non-coding DNA (ncDNA) sequences > from an organimsm. > > I tried extracting the intergenic sequences from the sense-strand > after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from > the EMBL feature table entries using the Bioperl and the additional > script (mentioned below). > > Now, I realised that there is a problem to extract the ncDNA sequences > from the negative-strand, Any ideas? > > To extract the ncDNAs from negative-strand, I thought of converting > the negative-strand co-ordinates to sense-strand co-ordinates and > adding these to the sense-strand cords. Then filter all the features > (select the ncDNAs after discarding the features from EMBL FT) to get > all the ncDNAs. > > Is there anything I am missing for using from the bioperl kit? > > ##<<> > use strict; > > my $EMBL_cord_file = "Organism.feature.cords "; # feature > co-ordinates: start \t end > my $RAW_file = "Organism.raw"; > my $ncDNA_file = "Organism.ncDNA"; > > open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file"; > open(RAW, $RAW_file) or die "Canot open RAW_file"; > open(OUT, ">$ncDNA_file") or die; > > my @dna=; > my $dna = join('', at dna); > > while($dna){ > $dna=~s/\s//g; > while(){ > my @cords = split /\t/; > my $start = $cords[0]; > my $end = $cords[1]; > my $replaceString = "\n>$start..$end"; > substr($dna, $start-1, $end-$start+1, $replaceString); > } > print OUT $dna,"\n"; > exit; > } > ##<<> > > Another thing is, since I am reading the whole file in a scalar the > script does not complete the extraction of all ncDNAs from the > sense-strand. Obviously, the features are parsed first before the > flattening of the 266,000 nt sequence into a single string. > > Any help would be appreciated. > > -PO From cjfields at uiuc.edu Wed Mar 8 18:56:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Mar 2006 17:56:30 -0600 Subject: [Bioperl-l] ncDNA In-Reply-To: <001101c64302$6a0a18c0$15327e82@pyrimidine> Message-ID: <001601c6430b$ec81f940$15327e82@pyrimidine> You know, it works on smaller sequences, but I found a problem with some genome sequences which I didn't notice before (particularly M. tuberculosis). So use this one instead. The problem lies with the fact that any CDS that don't have a 'gene' tag it will be overlooked. I think using locus_tag is better to catch all genes, and if gene is present switch it out. Some genome seqfeatures in GenBank files have gene = locus_tag and some don't. Try this version, which pulls out everything and if 'gene' is present will use it instead of the less-useful 'locus_tag': ------------------------------------------ use Bio::SeqIO; use Bio::Seq; use strict; my $starttime = time; my $file = shift @ARGV; my $seqio = Bio::SeqIO->new(-file => "$file", -format => 'genbank'); my $seqout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta'); my @feat = {}; my $seq = $seqio->next_seq; # grab features, filter for those we want to keep for my $feature ($seq->get_SeqFeatures) { next unless $feature->primary_tag =~ /CDS|rRNA|tRNA/; for my $value ($feature->get_tag_values('locus_tag')) { push @feat, {'gene' => $feature->get_tag_values('gene') ? $feature->get_tag_values('gene') : $value, 'start' => $feature->start, 'end' => $feature->end}; } } # extract and format subseqs my $count = 0; while($feat[$count]) { if (defined $feat[$count]->{'gene'}) { # get end of previous gene; if no gene (undef), set to 1 (startpoint) my $start = $feat[$count-1]->{'end'} ? $feat[$count-1]->{'end'} + 1 : 1 ; # get start of second gene (endpoint) my $end = $feat[$count]->{'start'} - 1; my $igrname = $feat[$count-1]->{'gene'} ? $feat[$count-1]->{'gene'}."-".$feat[$count]->{'gene'}.' | '. $start."..".$end : $feat[$count]->{'gene'}.'|'. $start."..".$end; my $igr = $end - $start; if($igr > 0 ) { # filter out negative or zero values my $sub = $seq->subseq($start, $end); my $subseq = Bio::Seq->new(-seq => $sub, -desc => $igrname); $seqout->write_seq($subseq); } } $count++; } ------------------------------------------ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, March 08, 2006 4:48 PM > To: perlmails at gmail.com; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] ncDNA > > It's a good idea to mail to the list as well; others may have better or > alternative ways of doing this. > > > > Jason mentions masking all regions you don't want (CDS, rRNA, tRNA) by > changing them to N's, then splitting on multiple N's: > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-January/018092.html > > > > Using substr (as Jason suggests) should also speed things up. > > > > Here's my solution using only bioperl objects; I use this for some work I > do > here. It works but isn't terribly fast with large sequences due to object > creation. There is very likely a better way with larger sequences. You > could probably use the first loop to extract seqfeatures and change the > second part to simple perl to speed things up a bit or try straight regex > matching combined with substr for maximum velocity. Be careful though; > bioperl subseq() searches starting with 1 while I think perl substr starts > at 0! From abafana at rediffmail.com Thu Mar 9 05:26:11 2006 From: abafana at rediffmail.com (amit p bafana) Date: 9 Mar 2006 10:26:11 -0000 Subject: [Bioperl-l] codontree Message-ID: <20060309102611.18995.qmail@webmail25.rediffmail.com> On reading the documentation for codontree, I understand that all applications of the program have not been made available at the Bioweb website. Is is possible to download the program and use it? with regards Amit Bafana Research Scholar NEERI India ? Amit From j.abbott at imperial.ac.uk Thu Mar 9 09:16:03 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Thu, 09 Mar 2006 14:16:03 +0000 Subject: [Bioperl-l] EMBL/genbank organism parsing Message-ID: <441038A3.4030802@imperial.ac.uk> Hi Folks, The current parsing of OS lines by Bio::SeqIO::embl.pm fails with many of the organisms currently found in the database, since the OS lines differ considerably from the specification in the EMBL User Manual, which appears to have been used as the basis for the current parser. In an attempt to improve matters, I have collected a set of examples which hopefully cover the majority of the different ways of writing an organism name, and managed to get embl.pm to 'correctly' parse these (correctly being open to debate with some of the more esoteric examples). I'm sure there are plenty of entries which still don't parse correctly, but it's a start. I'll post the patches to bugzilla once I get a few loose ends tidied up. In the interests of consistency, I have also obtained the same set of sequences from Genbank, and am trying to make both parsers behave the same way, however they currently behave in different ways with respect to parsing the common name. According to the EMBL spec, the common name is the English name for the organism given in brackets after the latin name, consequently calling the common_name method on an embl.pm parsed Bio::Species object returns 'human' for a Homo sapiens (human). The genbank parser, however, currently takes the entire SOURCE line, including the latin name, consequently calling the common_name method on a genbank.pm parsed species object returns 'Homo sapiens (human)'. This would appear to be the intended behavior, since this is considered the correct response by the tests. Is it considered better to maintain consistency between the EMBL and Genbank parsers and risk breaking any code which relies upon the current behavior of genbank->species->common_name(), or to have the two parsers behaving differently, but consistently with their existing behavior? Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From niels_klitgord at dfci.harvard.edu Thu Mar 9 10:23:44 2006 From: niels_klitgord at dfci.harvard.edu (Niels Klitgord) Date: Thu, 09 Mar 2006 10:23:44 -0500 Subject: [Bioperl-l] not getting all exons back when using Bio::DB::GFF Message-ID: <44104880.3060205@dfci.harvard.edu> Hello, Perhaps this is because I'm not using Bio::DB::GFF correctly. Perhaps I missed a previous post on this, if so I apologize. But on occasion when I try to get all the exons of an orf from a segment I wind up missing 1. I am using wormbase WS150 release GFF, and am using GFF.pm,v 1.102 perl modual (maybe I need to upgrade?). This is the code I was using: #!/usr/local/bin/perl -w use strict; use Bio::DB::GFF; my $GFFdb = new Bio::DB::GFF(-adaptor=>'dbi::mysqlopt', -dsn=>'dbi:mysql:gff150;host=dome', -user=>'niels')or die("can't open gffDB"); my $gene = 'C08C3.3'; my @seg = $GFFdb->segment(-name=>$gene, -class=>'CDS'); print "$gene CDS: ", $seg[0]->abs_start, "\t", $seg[0]->abs_stop, "\n"; my @all_exons = $seg[0]->features( 'exon:curated' ); foreach my $k (sort { $a->start <=> $b->start} @all_exons) { print "feature: ", $k->class, "\t", $k->type, "\t", $k->name, "\t", $k->abs_start, "\t", $k->abs_stop, "\n" } and get: C08C3.3 CDS: 7783311 7777192 feature: CDS exon:curated C08C3.3 7782898 7782816 feature: CDS exon:curated C08C3.3 7782130 7782027 feature: CDS exon:curated C08C3.3 7778462 7778314 feature: CDS exon:curated C08C3.3 7777314 7777192 However in the raw gff file we see (and also in the mysql ): CHROMOSOME_III curated exon 7777192 7777314 . - . CDS "C08C3.3" CHROMOSOME_III curated exon 7778314 7778462 . - . CDS "C08C3.3" CHROMOSOME_III curated exon 7782027 7782130 . - . CDS "C08C3.3" CHROMOSOME_III curated exon 7782816 7782898 . - . CDS "C08C3.3" CHROMOSOME_III curated exon 7783168 7783311 . - . CDS "C08C3.3" Am I just using this wrong, or should the last entry be returned as well? Much thanks in advance, Niels From staffa at niehs.nih.gov Thu Mar 9 11:17:32 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 9 Mar 2006 11:17:32 -0500 Subject: [Bioperl-l] seq_word and pattern counts Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08545@NIHCESMLBX6.nih.gov> I did make a perl based script that works. And I thank all who tried to help -- it did set me on a path to success. Whereas I have a first-grade education in Perl and, to me, object oriented programming concepts, the jargon et c. is like Greek. (I could go on about the complete nerdiness of it all) It worked out better to create my own subroutines without using the shortcuts and tricks that make most perl scripts unintelligable, IMHO. However, looking at the code in Bio::Tools::SeqWords gave me a hint on how to use pattern matching with the g qualifyer in a loop. (tho I thought it weird that to count a certain pattern, they make a hash of every possible n-mer and count them all) The module on restriction enzymes gave me code to change ambiguity symbols into regular expression- something I could figure out, no doubt. And here it is. (I really should have kept the use of . for ANY character) #!/usr/bin/perl use strict; use Bio::DB::Fasta; use Bio::Tools::SeqWords; use Bio::Seq; my $pattern = @ARGV[0]; open (LIST,") { print $filenameline; chomp $filenameline; ($filename, $sequenceID) = split(/\s+/, $filenameline); print "filename=$filename sequenceID=$sequenceID\n"; my $db = Bio::DB::Fasta->new("/home/staffa/clients/colaneria/Mouse_chromosomes/$filename", -makeid => \&make_my_id); my $obj = $db->get_Seq_by_id("$sequenceID"); my $start = 0; my $windowsize = 10003; my $enzstr = $pattern; my $CG = "CG"; my $len = $obj->length; my $overlap = 10000; print "start=$start window=$windowsize string=$enzstr overlap=$overlap length=$len\n"; my @filenameparts = split(/_/,$filename); my ($chrNo,$fa) = split(/\./,$filenameparts[2]); my $outputfilename = "${chrNo}_$pattern.count"; print "out=$outputfilename\n"; open (OUTPUTFILE,">${outputfilename}.out"); while (1) { my $end = $start + $windowsize; last if($end > $len); my $subseq = $obj->subseq($start,$end); my $enzcount = &get_count($enzstr,$subseq); my $CGcount = &get_count($CG,$subseq); my $enzCGratio = 0; if ($CGcount > 0){$enzCGratio = $enzcount/$CGcount;} printf OUTPUTFILE "%d %d %d %d %g\n", $start,$end,$CGcount,$enzcount,$enzCGratio; $start += $overlap; } close OUTPUTFILE; } sub get_count { my ($str,$subseq) = @_; # print "look for $str \n"; my $pat = &expanded_string($str); # print "search pattern= $pat\n"; my $count=0; while($subseq =~ /($pat)/gim){ $count++; } # print "count=$count\n"; return $count; } sub make_my_id { my $line = shift; $line =~ /gi\|(\d+)/; $1; } sub expanded_string { my $str = @_[0]; $str =~ s/N/\[ACGT\]/g; $str =~ s/R/\[AG\]/g; $str =~ s/Y/\[CT\]/g; $str =~ s/S/\[GC\]/g; $str =~ s/W/\[AT\]/g; $str =~ s/M/\[AC\]/g; $str =~ s/K/\[TG\]/g; $str =~ s/B/\[CGT\]/g; $str =~ s/D/\[AGT\]/g; $str =~ s/H/\[ACT\]/g; $str =~ s/V/\[ACG\]/g; return $str; } Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov )) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: Tuesday, February 28, 2006 5:47 PM To: Staffa, Nick (NIH/NIEHS) [C] Cc: bioperl-l Subject: Re: [Bioperl-l] seq_word and pattern counts Staffa, Nick (NIH/NIEHS) [C] wrote: > The real problem is this: > We want to count sites in a long sequence where a restriction enzyme would cut. > This restriction enzyme, in the example I gave will recognize GGnnCC, > that is two G separated by two of any bases followed by two C. > The GCG program findpatterns will do this, but bioperl makes certain statistics easy. > I'm sure there is some module somewhere for this purpose. (Nick - please respond to me AND the bioperl-l at bioperl.org mailing list ie. "Reply All", so others can benefit from the Q&A - I've re-sent your past responses already). Perhaps this module? http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html With this code? my $enz = "GGNNCC"; my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz", -MAKE =>'custom'); @fragments = $re->cut_seq($seqobj); print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " times.\n"; -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From cjfields at uiuc.edu Thu Mar 9 13:08:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Mar 2006 12:08:16 -0600 Subject: [Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files Message-ID: <000a01c643a4$706802c0$15327e82@pyrimidine> Added WGS and WGS_SCAFLD support to Bio::SeqIO::genbank as well as tests and WGS sample file; the previous fix missed the WGS_SCAFLD line. I will also soon add support to Bio::DB::GenBank for downloading WGS and WGS_SCAFLD subfiles. Brian, I found a pretty decent speed improvement for contig building in Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI using return type of 'gbwithparts' so the work is done on their end and just switches the CONTIG line with the sequence; it took about 10 seconds vs. ~50 seconds using an unmodified NCBIHelper on my PC. I haven't committed it yet bc I noticed the resulting contig files differ; the bioperl contig build lacks any N's from the 'gaps()' in the CONTIG line while NCBI's version has the N filler. I didn't know if the difference was a bug or not. Should I go ahead and commit? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Mar 9 14:26:40 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 09 Mar 2006 14:26:40 -0500 Subject: [Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files In-Reply-To: <000a01c643a4$706802c0$15327e82@pyrimidine> Message-ID: Chris, > Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI using > return type of 'gbwithparts' so the work is done on their end and just I think it's reasonable to use eutils in this way, yes. It's no longer "pure Bioperl" but all of this stuff is depending on eutils anyway. The downside is that their API may change but it looked like you wrote some tests for this, yes? Just my opinion. I believe the lack of filling Ns is a bug on Bioperl's part due to the inability of the Bio::Location code to understand NCBI's gaps(). If there are Ns in the sequence we shouldn't just be deleting them, that's not good. Brian O On 3/9/06 1:08 PM, "Chris Fields" wrote: > Added WGS and WGS_SCAFLD support to Bio::SeqIO::genbank as well as tests and > WGS sample file; the previous fix missed the WGS_SCAFLD line. I will also > soon add support to Bio::DB::GenBank for downloading WGS and WGS_SCAFLD > subfiles. > > Brian, I found a pretty decent speed improvement for contig building in > Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI using > return type of 'gbwithparts' so the work is done on their end and just > switches the CONTIG line with the sequence; it took about 10 seconds vs. ~50 > seconds using an unmodified NCBIHelper on my PC. I haven't committed it yet > bc I noticed the resulting contig files differ; the bioperl contig build > lacks any N's from the 'gaps()' in the CONTIG line while NCBI's version has > the N filler. I didn't know if the difference was a bug or not. Should I > go ahead and commit? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > From cjfields at uiuc.edu Thu Mar 9 16:04:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Mar 2006 15:04:47 -0600 Subject: [Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files In-Reply-To: Message-ID: <000301c643bd$193f0c00$15327e82@pyrimidine> > I think it's reasonable to use eutils in this way, yes. It's no longer > "pure > Bioperl" but all of this stuff is depending on eutils anyway. The downside > is that their API may change but it looked like you wrote some tests for > this, yes? Just my opinion. I'll get some tests up and running for check and for using the 'gbwithparts' format. I actually found that using format=>'fasta' also gives the NCBI-built contig and, since it required less memory overhead for object creation, used that in &postprocess_data. > I believe the lack of filling Ns is a bug on Bioperl's part due to the > inability of the Bio::Location code to understand NCBI's gaps(). If there > are Ns in the sequence we shouldn't just be deleting them, that's not > good. > > Brian O There are a number of serious problems with bioperl's joining as well, something I've just noticed when directly comparing output from NCBI. It cuts off one base from the end of each joined sequence, and some of the joins aren't correct (normal when they should be revcomp). Basically any fix is now redundant in the light of using NCBI's contig building but I would still like to know what the problem is with bioperl's version. Did something change recently with these records to break this? I'll check things over and try to get this fix committed ASAP. Here's a few chunks of fasta data, first one from NCBI eutils contig build and second from Bioperl's postprocess_data (prior to my changes); after that is the start of the CONTIG line from the master file. I snipped out the 5' end and started close to where the gaps (N's) are and added a couple and arrows where the gaps should be in the second bioperl formatted sequence. In the bioperl-formatted contig the end of each joined sequence is missing one base ('T' in the first, 'C' in the second). The third sequence should be the complement of the sequence in the contig, but isn't. Could you try this out and see if you get the same thing? I added the bit of code that I used to fetch the contig from postprocess_data. >CH398085 Oryza sativa (indica cultivar-group) chromosome 1 scaffold000005 genomic scaffold, whole genome shotgun sequence (from NCBI) .... TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA GGATTAAGCTCAGGCCTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNCTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCCCCTTCAGTA AGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAGTTGAGCGCC TGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATTGTCAGGCCT TAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCTTTTATATCA TGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAGCGTTCGGGA AAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAATGACATATC CTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGGGGTGGAAAA ACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACCCGAGATGCA TAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGTGCCCATGGA GATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTGCAAATTGTG GCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGTACAGAGCCA GAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACCCCGGTCCCT GAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGACTCCAAGTCT AAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTAAGGAACGTG CCAAACTCAGAGATGATGACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNTACTTGTTGCAATAATCTTGCTCCGGAGTAAGTGGTTATAGGATGCA AGTACAATAACTAGTTGTAGACAAAGTCAATGACGATACGGAGAAGAATAAGCGCAATGT >CH398085 Oryza sativa (indica cultivar-group) chromosome 1 scaffold000005 genomic scaffold, whole genome shotgun sequence (bioperl's version) .... TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA GGATTAAGCTCAGGCCTC <----no gap, missing base CTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCC CCTTCAGTAAGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAG TTGAGCGCCTGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATT GTCAGGCCTTAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCT TTTATATCATGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAG CGTTCGGGAAAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAA TGACATATCCTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGG GGTGGAAAAACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACC CGAGATGCATAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGT GCCCATGGAGATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTG CAAATTGTGGCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGT ACAGAGCCAGAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACC CCGGTCCCTGAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGAC TCCAAGTCTAAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTA AGGAACGTGCCAAACTCAGAGATGATGACCC <---- no gap, missing base GATGGTGGGTTAGCCTGCCTAGCTAGTTC <---- should be revcomp GAAGCGGCACTCCTTTTAATTATTTGATATTAGATCATTTTTTAATATTTGTGTTTTTAC AAGTACCGCGAGGTACAACCTCATGGACAGGAACAACGCTTTTTTGCAACATATATTTTA TACGAAATCTATGCTTTCTGTAAAGTTAAAGCACACTAAATCTAAAGCTTAATATACAAC CATGCCACATCATCACCCACTAGCAATAATTATATATTTAATCTCATACAAGCATACAAA CONTIG join(AAAA02001496.1:1..1819,gap(50),AAAA02001497.1:1..854,gap(50), complement(AAAA02001498.1:1..870),gap(50),AAAA02001499.1:1..945, gap(50),AAAA02001500.1:1..11304,gap(100), This is what I changed in postprocess_data:. # transform links to appropriate descriptions if ($data =~ /\nCONTIG\s+/) { $self->warn("CONTIG found. Retrieving contig sequence.". "\nUse format type 'gbwithparts' or 'fasta' with contigs."); # get accession from LOCUS $data =~ /^LOCUS\s+(\S+)/; my $acc = $1; my $stream = Bio::DB::GenBank->new(-format => 'fasta'); my $seq = $stream->get_Seq_by_acc($acc); my $contig = $seq->seq; # remove everything after and including CONTIG $data =~ s/(CONTIG[\s\S]+)$//i; # Bio::SeqIO::genbank will fix this line, # fills in the actual numbers $data .= "BASE COUNT 0 a 0 c 0 g 0 t \n"; $data .= "ORIGIN \n"; # Bio::SeqIO::genbank also formats this data correctly $data .= "$contig\n//"; } > On 3/9/06 1:08 PM, "Chris Fields" wrote: > > > Added WGS and WGS_SCAFLD support to Bio::SeqIO::genbank as well as tests > and > > WGS sample file; the previous fix missed the WGS_SCAFLD line. I will > also > > soon add support to Bio::DB::GenBank for downloading WGS and WGS_SCAFLD > > subfiles. > > > > Brian, I found a pretty decent speed improvement for contig building in > > Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI > using > > return type of 'gbwithparts' so the work is done on their end and just > > switches the CONTIG line with the sequence; it took about 10 seconds vs. > ~50 > > seconds using an unmodified NCBIHelper on my PC. I haven't committed it > yet > > bc I noticed the resulting contig files differ; the bioperl contig build > > lacks any N's from the 'gaps()' in the CONTIG line while NCBI's version > has > > the N filler. I didn't know if the difference was a bug or not. Should > I > > go ahead and commit? > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Thu Mar 9 20:35:34 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 9 Mar 2006 20:35:34 -0500 Subject: [Bioperl-l] Initial benchmarking of Bio::DB::GFF3 Message-ID: <200603092035.34785.lstein@cshl.edu> Hi All, I've completed some early benchmarking on the latest iteration of the Bio::DB::GFF3 module. What distinguishes this module from the original Bio::DB::GFF, in addition to its ability to correctly handle the multiple levels of containment in GFF3, is that while there are relational tables for the feature location, name, attributes and type that are used for querying, but the feature itself and all its subparts are instantiated as one Bio::SeqFeatureI object at load time, then serialized (using Storable or Data::Dumper) and stored into a relational table as a BLOB. Another change is that the "binning" scheme now uses integers rather than floats; this will avoid the precision problems that have plagued users of different MySQL versions. This means that it will take longer to load the database, but less time to retrieve objects, because all the Bio::SeqFeature object creation was done up front. It also means that there are fewer objects in the database because a gene, its transcripts, and all its exons are all stored as a single object rather than as multiple objects that need to be aggregated together at fetch time. Here are the benchmarking results: DATA SET: 2,849 genes (along with associated data) from C. elegans chromosome I LOAD TESTS: Bio::DB::GFF (bp_bulk_load_gff.pl): 54.58s, 13M database Bio::DB::GFF3 (perl DBI loading): 245.06s, 11M database RETRIEVE TESTS: (fetch 1000 random genes) Bio::DB::GFF: 16.81s Bio::DB::GFF3: 1.99s So there's about an 8x speedup in retrieval, but a 4x slowdown in loading, which is pretty much what I expected. Unexpectedly, the storage size for the data is actually smaller for the Bio::DB::GFF3 database than for Bio::DB::GFF. This looks pretty good to me. My plan now is to experiment with a variation of the scheme in which each subfeature is stored as a separate BioPerl object and then loaded in a lazy fashion as needed. This will mean that there will be as many as three database fetches to get a full gene, but it also allows one to ignore genes and just do queries for exons, UTRs, etc. Things that have split locations -- such as alignments -- will continue to be stored as a single object, however. Right now I'm still adjusting the names of the various modules so they are in my private CVS. I'll move everything to bioperl-live as soon as the names stabilize. Lincoln -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From osborne1 at optonline.net Thu Mar 9 21:38:46 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 09 Mar 2006 21:38:46 -0500 Subject: [Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files In-Reply-To: <000301c643bd$193f0c00$15327e82@pyrimidine> Message-ID: Chris, Below... On 3/9/06 4:04 PM, "Chris Fields" wrote: >> I think it's reasonable to use eutils in this way, yes. It's no longer >> "pure >> Bioperl" but all of this stuff is depending on eutils anyway. The downside >> is that their API may change but it looked like you wrote some tests for >> this, yes? Just my opinion. > > I'll get some tests up and running for check and for using the 'gbwithparts' > format. I actually found that using format=>'fasta' also gives the > NCBI-built contig and, since it required less memory overhead for object > creation, used that in &postprocess_data. > >> I believe the lack of filling Ns is a bug on Bioperl's part due to the >> inability of the Bio::Location code to understand NCBI's gaps(). If there >> are Ns in the sequence we shouldn't just be deleting them, that's not >> good. >> >> Brian O > > There are a number of serious problems with bioperl's joining as well, > something I've just noticed when directly comparing output from NCBI. It > cuts off one base from the end of each joined sequence, and some of the > joins aren't correct (normal when they should be revcomp). Basically any > fix is now redundant in the light of using NCBI's contig building but I > would still like to know what the problem is with bioperl's version. Did > something change recently with these records to break this? I'll check > things over and try to get this fix committed ASAP. > > Here's a few chunks of fasta data, first one from NCBI eutils contig build > and second from Bioperl's postprocess_data (prior to my changes); after that > is the start of the CONTIG line from the master file. I snipped out the 5' > end and started close to where the gaps (N's) are and added a couple > and arrows where the gaps should be in the second bioperl formatted > sequence. In the bioperl-formatted contig the end of each joined sequence > is missing one base ('T' in the first, 'C' in the second). The third > sequence should be the complement of the sequence in the contig, but isn't. > Could you try this out and see if you get the same thing? I added the bit > of code that I used to fetch the contig from postprocess_data. > >> CH398085 Oryza sativa (indica cultivar-group) chromosome 1 scaffold000005 > genomic scaffold, whole genome shotgun sequence (from NCBI) > .... > TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT > TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC > TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA > GGATTAAGCTCAGGCCTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > NNNNNNNNNCTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCCCCTTCAGTA > AGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAGTTGAGCGCC > TGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATTGTCAGGCCT > TAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCTTTTATATCA > TGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAGCGTTCGGGA > AAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAATGACATATC > CTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGGGGTGGAAAA > ACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACCCGAGATGCA > TAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGTGCCCATGGA > GATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTGCAAATTGTG > GCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGTACAGAGCCA > GAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACCCCGGTCCCT > GAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGACTCCAAGTCT > AAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTAAGGAACGTG > CCAAACTCAGAGATGATGACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > NNNNNNNNNNNNNTACTTGTTGCAATAATCTTGCTCCGGAGTAAGTGGTTATAGGATGCA > AGTACAATAACTAGTTGTAGACAAAGTCAATGACGATACGGAGAAGAATAAGCGCAATGT > > > >> CH398085 Oryza sativa (indica cultivar-group) chromosome 1 scaffold000005 > genomic scaffold, whole genome shotgun sequence (bioperl's version) > .... > TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT > TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC > TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA > GGATTAAGCTCAGGCCTC <----no gap, missing base > > CTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCC > CCTTCAGTAAGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAG > TTGAGCGCCTGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATT > GTCAGGCCTTAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCT > TTTATATCATGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAG > CGTTCGGGAAAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAA > TGACATATCCTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGG > GGTGGAAAAACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACC > CGAGATGCATAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGT > GCCCATGGAGATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTG > CAAATTGTGGCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGT > ACAGAGCCAGAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACC > CCGGTCCCTGAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGAC > TCCAAGTCTAAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTA > AGGAACGTGCCAAACTCAGAGATGATGACCC <---- no gap, missing base > > GATGGTGGGTTAGCCTGCCTAGCTAGTTC <---- should be revcomp > GAAGCGGCACTCCTTTTAATTATTTGATATTAGATCATTTTTTAATATTTGTGTTTTTAC > AAGTACCGCGAGGTACAACCTCATGGACAGGAACAACGCTTTTTTGCAACATATATTTTA > TACGAAATCTATGCTTTCTGTAAAGTTAAAGCACACTAAATCTAAAGCTTAATATACAAC > CATGCCACATCATCACCCACTAGCAATAATTATATATTTAATCTCATACAAGCATACAAA Here's the sequence from NCBI: 1621 ttaggtggtt ttataacttt agactttggg aattttcata tcacctggac actatggaat 1681 tgttggatga tggtggaatt ggacatacac ctctcttcct ctttcaaaac ccctaaaacc 1741 tgttttcggt ggggtttggg tgcatgccag ttgtgggaag tagcaccccg ggcactataa 1801 ggattaagct caggcctct [gap 50 bp] Expand Ns 1870 c tgagtactgt ggttgtactc attcttgctc aatcttttcc cccttcagta 1921 agagaagatt tggagaagaa gtcttaggtg gagtcctggc ttatacccca gttgagcgcc 1981 tgtgaagatg gagccgtagg cccgctagtc cgctgctgtt tatttttgat tgtcaggcct 2041 taagtgcctt tgtaataatg taaatattat cgatataata aagatgtgtc ttttatatca 2101 tgtttgtgtg gtgtaccccg gcttttcctg ggacggggat taatacacta gcgttcggga 2161 aaaggcaatt ttcccggtcg cgacagaact tgtaattctc tagcactaga atgacatatc 2221 ctttggattg tgcaccaatg ccacgcgaaa acccatggtg ccaaaactag gggtggaaaa 2281 acctccgaga cctcctccga agaggcaggt gacaggtaag gcggaggaac ccgagatgca 2341 taaggaaaat ccagtgccgg aagtgccacc ggagattgca gtgccggagg tgcccatgga 2401 gattgtagtg ccgttgtccc aatggagatt acagtggcag aaccagaggt gcaaattgtg 2461 gcatcagtcg ggacatatat agaagaagta gtacgattgg aatgggacgg tacagagcca 2521 gaaatatttg aagacccttc tcctgcgaaa gaccccgagg tgcaagaaac cccggtccct 2581 gagaaggcca ctgacaattc taaggtgcct aaagtgctta tgagccacga ctccaagtct 2641 aaagatgaga acaatgagaa gttcatgggc taaccatctt cagagggggt aaggaacgtg 2701 ccaaactcag agatgatgac ccc [gap 50 bp] Expand Ns 2774 tacttgt tgcaataatc ttgctccgga gtaagtggtt ataggatgca 2821 agtacaataa ctagttgtag acaaagtcaa tgacgatacg gagaagaata agcgcaatgt 2881 cagaccagct tgttataatc cagtaacagt aagtaaactc cgtaccgttc gtttttttca 2941 ttcattttaa ttattgtccg ttgcaggctt gcagcagtca catgagtgcg tataaatgca 3001 ccgatttcaa gcccggtgct attaatcaat agattcttct tcactgtggt tcgacaaaca 3061 atgaaactag tataactata gtataactag gtgattcctc acgctttccc gtgctttgtt 3121 gtaaaattta ctaagaaatt ctcaatatgt tttttttaca atcaaactag gattacgaag It agrees with the 1st sequence, not the second sequence. Brian O. > > CONTIG > join(AAAA02001496.1:1..1819,gap(50),AAAA02001497.1:1..854,gap(50), > complement(AAAA02001498.1:1..870),gap(50),AAAA02001499.1:1..945, > gap(50),AAAA02001500.1:1..11304,gap(100), > > > This is what I changed in postprocess_data:. > > # transform links to appropriate descriptions > if ($data =~ /\nCONTIG\s+/) { > $self->warn("CONTIG found. Retrieving contig sequence.". > "\nUse format type 'gbwithparts' or 'fasta' with > contigs."); > # get accession from LOCUS > $data =~ /^LOCUS\s+(\S+)/; > my $acc = $1; > my $stream = Bio::DB::GenBank->new(-format => 'fasta'); > my $seq = $stream->get_Seq_by_acc($acc); > my $contig = $seq->seq; > # remove everything after and including CONTIG > $data =~ s/(CONTIG[\s\S]+)$//i; > # Bio::SeqIO::genbank will fix this line, > # fills in the actual numbers > $data .= "BASE COUNT 0 a 0 c 0 g 0 t \n"; > $data .= "ORIGIN \n"; > # Bio::SeqIO::genbank also formats this data correctly > $data .= "$contig\n//"; > } > > >> On 3/9/06 1:08 PM, "Chris Fields" wrote: >> >>> Added WGS and WGS_SCAFLD support to Bio::SeqIO::genbank as well as tests >> and >>> WGS sample file; the previous fix missed the WGS_SCAFLD line. I will >> also >>> soon add support to Bio::DB::GenBank for downloading WGS and WGS_SCAFLD >>> subfiles. >>> >>> Brian, I found a pretty decent speed improvement for contig building in >>> Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI >> using >>> return type of 'gbwithparts' so the work is done on their end and just >>> switches the CONTIG line with the sequence; it took about 10 seconds vs. >> ~50 >>> seconds using an unmodified NCBIHelper on my PC. I haven't committed it >> yet >>> bc I noticed the resulting contig files differ; the bioperl contig build >>> lacks any N's from the 'gaps()' in the CONTIG line while NCBI's version >> has >>> the N filler. I didn't know if the difference was a bug or not. Should >> I >>> go ahead and commit? >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > From hlapp at gmx.net Thu Mar 9 13:54:36 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 9 Mar 2006 10:54:36 -0800 Subject: [Bioperl-l] EMBL/genbank organism parsing In-Reply-To: <441038A3.4030802@imperial.ac.uk> References: <441038A3.4030802@imperial.ac.uk> Message-ID: <57ed4482770fa0603bfab3627e123c12@gmx.net> Yeah the species parsing has bothered us for a long time. My thoughts on this - I don't think tweaking individual parsers until they behave as desired on a then-current set examples is going to put an end to this. Either species parsing will have to be moved into its own set of 'drivers' with a fronting factory, like Bio::SpeciesIO or Bio::TaxonIO, or alternatively like Bio::Factory::TaxonFactoryI and Bio::Factory::EMBLTaxonFactory etc (similar in concept to Bio::Factory::LocationFactoryI and Bio::Factory::FTLocationFactory). Or, quite radical in approach, we require the NCBI taxonomy database (or any other implementation of Bio::DB::Taxonomy, e.g. could be through BioSQL or what not) and otherwise disclaim responsibility for correctly parsing the species. Even though a TaxonIO or TaxonFactory approach looks like the 'right' way to do it in terms of SW design principles, I can't help but wonder why we really should spend much time on writing species line parsers when NCBI has done the job for us already to put all species into a compact (file-)database. If people really want to be 100% sure the parser gets the species right, why not download the NCBI taxonomy database, index it locally, and simply look-up by taxonID (which is in the Organism line in EMBL and the feature table in GenBank). Although - there could be a speed issue due to the recursive lookup - one would probably want to cache each successful species resolution. Sorry for not giving precise direction - ideally someone (you?) can take charge and spearhead overhauling this. -hilmar On Mar 9, 2006, at 6:16 AM, James Abbott wrote: > Hi Folks, > > The current parsing of OS lines by Bio::SeqIO::embl.pm fails with many > of the organisms currently found in the database, since the OS lines > differ considerably from the specification in the EMBL User Manual, > which appears to have been used as the basis for the current parser. In > an attempt to improve matters, I have collected a set of examples which > hopefully cover the majority of the different ways of writing an > organism name, and managed to get embl.pm to 'correctly' parse these > (correctly being open to debate with some of the more esoteric > examples). I'm sure there are plenty of entries which still don't parse > correctly, but it's a start. I'll post the patches to bugzilla once I > get a few loose ends tidied up. > > In the interests of consistency, I have also obtained the same set of > sequences from Genbank, and am trying to make both parsers behave the > same way, however they currently behave in different ways with respect > to parsing the common name. According to the EMBL spec, the common name > is the English name for the organism given in brackets after the latin > name, consequently calling the common_name method on an embl.pm parsed > Bio::Species object returns 'human' for a Homo sapiens (human). The > genbank parser, however, currently takes the entire SOURCE line, > including the latin name, consequently calling the common_name method > on > a genbank.pm parsed species object returns 'Homo sapiens (human)'. This > would appear to be the intended behavior, since this is considered the > correct response by the tests. > > Is it considered better to maintain consistency between the EMBL and > Genbank parsers and risk breaking any code which relies upon the > current > behavior of genbank->species->common_name(), or to have the two parsers > behaving differently, but consistently with their existing behavior? > > Cheers, > James > > -- > Dr. James Abbott > Bioinformatics Software Developer, Bioinformatics Support Service > Imperial College, London > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Fri Mar 10 08:48:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Mar 2006 07:48:40 -0600 Subject: [Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files In-Reply-To: References: Message-ID: <317E7325-7CDC-48AD-A3E2-10144A0B7ACA@uiuc.edu> The second was built using bioperl, so postprocess_data isn't working as expected. I committed a change to NCBIHelper in CVS yesterday to fix this by retrieving the sequence directly from NCBI using format 'fasta.' Chris On Mar 9, 2006, at 8:38 PM, Brian Osborne wrote: > Chris, > > Below... > .... >> >>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1 >>> scaffold000005 >> genomic scaffold, whole genome shotgun sequence (from NCBI) >> .... >> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT >> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC >> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA >> GGATTAAGCTCAGGCCTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN >> NNNNNNNNNCTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCCCCTTCAGTA >> AGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAGTTGAGCGCC >> TGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATTGTCAGGCCT >> TAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCTTTTATATCA >> TGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAGCGTTCGGGA >> AAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAATGACATATC >> CTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGGGGTGGAAAA >> ACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACCCGAGATGCA >> TAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGTGCCCATGGA >> GATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTGCAAATTGTG >> GCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGTACAGAGCCA >> GAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACCCCGGTCCCT >> GAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGACTCCAAGTCT >> AAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTAAGGAACGTG >> CCAAACTCAGAGATGATGACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN >> NNNNNNNNNNNNNTACTTGTTGCAATAATCTTGCTCCGGAGTAAGTGGTTATAGGATGCA >> AGTACAATAACTAGTTGTAGACAAAGTCAATGACGATACGGAGAAGAATAAGCGCAATGT >> >> >> >>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1 >>> scaffold000005 >> genomic scaffold, whole genome shotgun sequence (bioperl's version) >> .... >> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT >> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC >> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA >> GGATTAAGCTCAGGCCTC <----no gap, missing base >> >> CTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCC >> CCTTCAGTAAGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAG >> TTGAGCGCCTGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATT >> GTCAGGCCTTAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCT >> TTTATATCATGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAG >> CGTTCGGGAAAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAA >> TGACATATCCTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGG >> GGTGGAAAAACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACC >> CGAGATGCATAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGT >> GCCCATGGAGATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTG >> CAAATTGTGGCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGT >> ACAGAGCCAGAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACC >> CCGGTCCCTGAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGAC >> TCCAAGTCTAAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTA >> AGGAACGTGCCAAACTCAGAGATGATGACCC <---- no gap, missing base >> >> GATGGTGGGTTAGCCTGCCTAGCTAGTTC <---- should be revcomp >> GAAGCGGCACTCCTTTTAATTATTTGATATTAGATCATTTTTTAATATTTGTGTTTTTAC >> AAGTACCGCGAGGTACAACCTCATGGACAGGAACAACGCTTTTTTGCAACATATATTTTA >> TACGAAATCTATGCTTTCTGTAAAGTTAAAGCACACTAAATCTAAAGCTTAATATACAAC >> CATGCCACATCATCACCCACTAGCAATAATTATATATTTAATCTCATACAAGCATACAAA > > Here's the sequence from NCBI: > > 1621 ttaggtggtt ttataacttt agactttggg aattttcata tcacctggac > actatggaat > 1681 tgttggatga tggtggaatt ggacatacac ctctcttcct ctttcaaaac > ccctaaaacc > 1741 tgttttcggt ggggtttggg tgcatgccag ttgtgggaag tagcaccccg > ggcactataa > 1801 ggattaagct caggcctct > [gap 50 bp] Expand Ns > 1870 c tgagtactgt ggttgtactc attcttgctc aatcttttcc > cccttcagta > 1921 agagaagatt tggagaagaa gtcttaggtg gagtcctggc ttatacccca > gttgagcgcc > 1981 tgtgaagatg gagccgtagg cccgctagtc cgctgctgtt tatttttgat > tgtcaggcct > 2041 taagtgcctt tgtaataatg taaatattat cgatataata aagatgtgtc > ttttatatca > 2101 tgtttgtgtg gtgtaccccg gcttttcctg ggacggggat taatacacta > gcgttcggga > 2161 aaaggcaatt ttcccggtcg cgacagaact tgtaattctc tagcactaga > atgacatatc > 2221 ctttggattg tgcaccaatg ccacgcgaaa acccatggtg ccaaaactag > gggtggaaaa > 2281 acctccgaga cctcctccga agaggcaggt gacaggtaag gcggaggaac > ccgagatgca > 2341 taaggaaaat ccagtgccgg aagtgccacc ggagattgca gtgccggagg > tgcccatgga > 2401 gattgtagtg ccgttgtccc aatggagatt acagtggcag aaccagaggt > gcaaattgtg > 2461 gcatcagtcg ggacatatat agaagaagta gtacgattgg aatgggacgg > tacagagcca > 2521 gaaatatttg aagacccttc tcctgcgaaa gaccccgagg tgcaagaaac > cccggtccct > 2581 gagaaggcca ctgacaattc taaggtgcct aaagtgctta tgagccacga > ctccaagtct > 2641 aaagatgaga acaatgagaa gttcatgggc taaccatctt cagagggggt > aaggaacgtg > 2701 ccaaactcag agatgatgac ccc > [gap 50 bp] Expand Ns > 2774 tacttgt tgcaataatc ttgctccgga gtaagtggtt > ataggatgca > 2821 agtacaataa ctagttgtag acaaagtcaa tgacgatacg gagaagaata > agcgcaatgt > 2881 cagaccagct tgttataatc cagtaacagt aagtaaactc cgtaccgttc > gtttttttca > 2941 ttcattttaa ttattgtccg ttgcaggctt gcagcagtca catgagtgcg > tataaatgca > 3001 ccgatttcaa gcccggtgct attaatcaat agattcttct tcactgtggt > tcgacaaaca > 3061 atgaaactag tataactata gtataactag gtgattcctc acgctttccc > gtgctttgtt > 3121 gtaaaattta ctaagaaatt ctcaatatgt tttttttaca atcaaactag > gattacgaag > > It agrees with the 1st sequence, not the second sequence. > > Brian O. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sanjib at bic.boseinst.ernet.in Fri Mar 10 03:15:41 2006 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Fri, 10 Mar 2006 13:45:41 +0530 Subject: [Bioperl-l] help on blastcl3 Message-ID: <20060310081541.M82964@bic.boseinst.ernet.in> Hi I am very new using blastcl3. When I use ./blastcl3 -p blastn -d nr -i nuc -o out.blast I getting the following result in out.blast BLASTN 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|145773|gb|K01298.1|ECODNAK Escherichia coli heat shock protein 70 precursor (dnaK) gene, complete cds (1917 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 1,047,083 sequences; -311,112,946 total letters Searching... please wait.. done Score E Sequences producing significant alignments: (bits) Value THEN THE RESULTS. Why is the value negitive in total letters (1,047,083 sequences; -311,112,946 total letters)? It is very hard to parse the blastoutput using bioperl for the negetive value. Morever when i submitted the query directly on your webpage I get 3,778,900 sequences; 16,763,624,885 total letters. Why is this difference do we missout sequence when we run blastcl3? What has to be done so that negetive doesnot come on the blastoutput. Thanking you -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2334 6626, 2816, 2358 4766 Fax : +91-33-2334 3886 From cjfields at uiuc.edu Fri Mar 10 11:17:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Mar 2006 10:17:14 -0600 Subject: [Bioperl-l] help on blastcl3 In-Reply-To: <20060310081541.M82964@bic.boseinst.ernet.in> Message-ID: <000601c6445e$18084e90$15327e82@pyrimidine> This isn't relevant to bioperl. Try NCBI's BLAST help email: blast-help at ncbi.nlm.nih.gov Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sanjib Kumar Gupta > Sent: Friday, March 10, 2006 2:16 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] help on blastcl3 > > > Hi > > I am very new using blastcl3. > When I use > ./blastcl3 -p blastn -d nr -i nuc -o out.blast > > I getting the following result in out.blast > > BLASTN 2.2.13 [Nov-27-2005] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= gi|145773|gb|K01298.1|ECODNAK Escherichia coli heat shock > protein 70 precursor (dnaK) gene, complete cds > (1917 letters) > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) > 1,047,083 sequences; -311,112,946 total letters > > Searching... please wait.. done > > Score E > Sequences producing significant alignments: (bits) > Value > THEN THE RESULTS. > > Why is the value negitive in total letters (1,047,083 sequences; - > 311,112,946 > total letters)? It is very hard to parse the blastoutput using bioperl for > the > negetive value. Morever when i submitted the query directly on your > webpage I > get 3,778,900 sequences; 16,763,624,885 total letters. > Why is this difference do we missout sequence when we run blastcl3? What > has > to be done so that negetive doesnot come on the blastoutput. > > Thanking you > > -- > Sanjib Kumar Gupta > Bioinformatics Centre > Bose Institute > Kolkata 700054, INDIA > Phone : +91-33-2334 6626, 2816, 2358 4766 > Fax : +91-33-2334 3886 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Fri Mar 10 13:17:14 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 10 Mar 2006 12:17:14 -0600 Subject: [Bioperl-l] decoding strand of hit in e-PCR output using Bio::Tools::EPCR Message-ID: I just commited changes to bioperl-live to support decoding the strand of the hit (as generated when invoked using -direct option). Jason, for coverage by 2 new tests in ./t/ePCR.t, I hand editted the test data file ./t/data/genomic-seq.epcr to mock up the output as if it had been called using -direct. I'm not sure how this was generated so I didn't regenerate it myself. OK? Malcolm Cook - mec at stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA From MEC at stowers-institute.org Fri Mar 10 14:06:57 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 10 Mar 2006 13:06:57 -0600 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene Message-ID: H'lo I just committed SeqIO modules for (reading) these two sequence formats. Bio::SeqIO::strider uses Convert::Binary::C (to decode the minary header). Where should I document this new dependency? Thanks, Malcolm Cook - mec at stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA From osborne1 at optonline.net Fri Mar 10 14:19:24 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 10 Mar 2006 14:19:24 -0500 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene In-Reply-To: Message-ID: Malcolm, In Makefile.PL and INSTALL. Also, please add these 2 new formats to http://www.bioperl.org/wiki/HOWTO:SeqIO. Thank you for the additions. Brian O. On 3/10/06 2:06 PM, "Cook, Malcolm" wrote: > H'lo > > I just committed SeqIO modules for (reading) these two sequence formats. > > Bio::SeqIO::strider uses Convert::Binary::C (to decode the minary > header). Where should I document this new dependency? > > Thanks, > > Malcolm Cook - mec at stowers-institute.org - 816-926-4449 > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, MO USA > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Fri Mar 10 14:38:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 10 Mar 2006 14:38:47 -0500 Subject: [Bioperl-l] not getting all exons back when using Bio::DB::GFF In-Reply-To: <44104880.3060205@dfci.harvard.edu> References: <44104880.3060205@dfci.harvard.edu> Message-ID: <200603101438.49137.lstein@cshl.edu> Hi Niels, I just tried WS150 myself and I'm getting five exons when I run your example. Are you using bioperl 1.51 or an earlier version? Lincoln On Thursday 09 March 2006 10:23, Niels Klitgord wrote: > Hello, > Perhaps this is because I'm not using Bio::DB::GFF correctly. Perhaps > I missed a previous post on this, if so I apologize. But on occasion > when I try to get all the exons of an orf from a segment I wind up > missing 1. I am using wormbase WS150 release GFF, and am using > GFF.pm,v 1.102 perl modual (maybe I need to upgrade?). > This is the code I was using: > > #!/usr/local/bin/perl -w > use strict; > use Bio::DB::GFF; > > my $GFFdb = new Bio::DB::GFF(-adaptor=>'dbi::mysqlopt', > -dsn=>'dbi:mysql:gff150;host=dome', > -user=>'niels')or die("can't open gffDB"); > > my $gene = 'C08C3.3'; > > my @seg = $GFFdb->segment(-name=>$gene, -class=>'CDS'); > > print "$gene CDS: ", $seg[0]->abs_start, "\t", $seg[0]->abs_stop, "\n"; > > my @all_exons = $seg[0]->features( 'exon:curated' ); > foreach my $k (sort { $a->start <=> $b->start} @all_exons) { > print "feature: ", $k->class, "\t", $k->type, "\t", $k->name, "\t", > $k->abs_start, "\t", $k->abs_stop, "\n" > } > > and get: > C08C3.3 CDS: 7783311 7777192 > feature: CDS exon:curated C08C3.3 7782898 7782816 > feature: CDS exon:curated C08C3.3 7782130 7782027 > feature: CDS exon:curated C08C3.3 7778462 7778314 > feature: CDS exon:curated C08C3.3 7777314 7777192 > > However in the raw gff file we see (and also in the mysql ): > CHROMOSOME_III curated exon 7777192 7777314 . - . CDS > "C08C3.3" > CHROMOSOME_III curated exon 7778314 7778462 . - . CDS > "C08C3.3" > CHROMOSOME_III curated exon 7782027 7782130 . - . CDS > "C08C3.3" > CHROMOSOME_III curated exon 7782816 7782898 . - . CDS > "C08C3.3" > > CHROMOSOME_III curated exon 7783168 7783311 . - . CDS > "C08C3.3" > > Am I just using this wrong, or should the last entry be returned as well? > Much thanks in advance, > Niels > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008) From cjfields at uiuc.edu Fri Mar 10 14:44:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Mar 2006 13:44:49 -0600 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene In-Reply-To: Message-ID: <000001c6447b$17b28740$15327e82@pyrimidine> Probably should add any dependencies to the wiki as well: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Friday, March 10, 2006 1:19 PM > To: Cook, Malcolm; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] where to document dependency? AND new SeqIO > formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene > > Malcolm, > > In Makefile.PL and INSTALL. Also, please add these 2 new formats to > http://www.bioperl.org/wiki/HOWTO:SeqIO. > > Thank you for the additions. > > Brian O. > > > On 3/10/06 2:06 PM, "Cook, Malcolm" wrote: > > > H'lo > > > > I just committed SeqIO modules for (reading) these two sequence formats. > > > > Bio::SeqIO::strider uses Convert::Binary::C (to decode the minary > > header). Where should I document this new dependency? > > > > Thanks, > > > > Malcolm Cook - mec at stowers-institute.org - 816-926-4449 > > Database Applications Manager - Bioinformatics > > Stowers Institute for Medical Research - Kansas City, MO USA > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Mar 10 14:49:05 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Mar 2006 14:49:05 -0500 Subject: [Bioperl-l] EMBL/genbank organism parsing In-Reply-To: <441038A3.4030802@imperial.ac.uk> References: <441038A3.4030802@imperial.ac.uk> Message-ID: <8D390CCB-A1C0-42F4-8A75-B31815F19DE9@duke.edu> James - Wonderful, thanks for stepping in. One thing is this may be a good time to note that species data can be better presented in the taxonomy objects so to ditch Bio::Species and move to Bio::Taxonomy::Node (a sexy name I know). There is a little about this on the wiki in the project priority list http:// bioperl.org/wiki/Project_priority_list - I *think* the fields in the Taxonomy::Node object should be suffient to separate out the field you are talking about. As to whether or not to break common_name behavior, I don't have any opinion right now, but perhaps those who use this data from a file can speak better to it. I encourage you to add some text on the wiki pages about whatever you plan so that we can document what has happened - feel free to just create a new page for this project and it can be linked in appropriately. -jason On Mar 9, 2006, at 9:16 AM, James Abbott wrote: > Hi Folks, > > The current parsing of OS lines by Bio::SeqIO::embl.pm fails with many > of the organisms currently found in the database, since the OS lines > differ considerably from the specification in the EMBL User Manual, > which appears to have been used as the basis for the current > parser. In > an attempt to improve matters, I have collected a set of examples > which > hopefully cover the majority of the different ways of writing an > organism name, and managed to get embl.pm to 'correctly' parse these > (correctly being open to debate with some of the more esoteric > examples). I'm sure there are plenty of entries which still don't > parse > correctly, but it's a start. I'll post the patches to bugzilla once I > get a few loose ends tidied up. > > In the interests of consistency, I have also obtained the same set of > sequences from Genbank, and am trying to make both parsers behave the > same way, however they currently behave in different ways with respect > to parsing the common name. According to the EMBL spec, the common > name > is the English name for the organism given in brackets after the latin > name, consequently calling the common_name method on an embl.pm parsed > Bio::Species object returns 'human' for a Homo sapiens (human). The > genbank parser, however, currently takes the entire SOURCE line, > including the latin name, consequently calling the common_name > method on > a genbank.pm parsed species object returns 'Homo sapiens (human)'. > This > would appear to be the intended behavior, since this is considered the > correct response by the tests. > > Is it considered better to maintain consistency between the EMBL and > Genbank parsers and risk breaking any code which relies upon the > current > behavior of genbank->species->common_name(), or to have the two > parsers > behaving differently, but consistently with their existing behavior? > > Cheers, > James > > -- > Dr. James Abbott > Bioinformatics Software Developer, Bioinformatics Support Service > Imperial College, London > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hubert.prielinger at gmx.at Fri Mar 10 14:40:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 10 Mar 2006 13:40:11 -0600 Subject: [Bioperl-l] parsing blast results : how to get the length of the proteinseq Message-ID: <4411D61B.2090706@gmx.at> hi, in a blast result file there is written the the length of the entire protein, how is it possible to parse it, because $hsp->length is the length of the matched part regards Hubert From jason.stajich at duke.edu Fri Mar 10 15:49:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Mar 2006 15:49:29 -0500 Subject: [Bioperl-l] parsing blast results : how to get the length of the proteinseq In-Reply-To: <4411D61B.2090706@gmx.at> References: <4411D61B.2090706@gmx.at> Message-ID: hit->length See the SearchIO HOWTO it tells you all of that. -jason On Mar 10, 2006, at 2:40 PM, Hubert Prielinger wrote: > hi, > in a blast result file there is written the the length of the entire > protein, how is it possible to parse it, because $hsp->length is the > length of the matched part > > regards > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at stowers-institute.org Fri Mar 10 17:38:28 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 10 Mar 2006 16:38:28 -0600 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene Message-ID: Getting closer... So, I added it to the ./Makefile.PL and ./INSTALL with the cvs comment 'added dependency on Convert::Binary::C needed by Bio::SeqIO::strider' But, re the wiki, it looks to me like the contents of the wiki page are (nearly) identical to the ./INSTALL. Is one autogenerated from the other, or do the both get editted? Also, the only place I can think to add the dependency in the wiki content is to the list of modules installed by Bundle::CPAN. Am I missing something, or should I be considering adding Convert::Binary::C to Bundle::CPAN as well? Thanks, Malcolm >-----Original Message----- >From: Chris Fields [mailto:cjfields at uiuc.edu] >Sent: Friday, March 10, 2006 1:45 PM >To: 'Brian Osborne'; Cook, Malcolm; bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] where to document dependency? AND new >SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene > >Probably should add any dependencies to the wiki as well: > >http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Friday, March 10, 2006 1:19 PM >> To: Cook, Malcolm; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] where to document dependency? AND new SeqIO >> formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene >> >> Malcolm, >> >> In Makefile.PL and INSTALL. Also, please add these 2 new formats to >> http://www.bioperl.org/wiki/HOWTO:SeqIO. >> >> Thank you for the additions. >> >> Brian O. >> >> >> On 3/10/06 2:06 PM, "Cook, Malcolm" > wrote: >> >> > H'lo >> > >> > I just committed SeqIO modules for (reading) these two >sequence formats. >> > >> > Bio::SeqIO::strider uses Convert::Binary::C (to decode the minary >> > header). Where should I document this new dependency? >> > >> > Thanks, >> > >> > Malcolm Cook - mec at stowers-institute.org - 816-926-4449 >> > Database Applications Manager - Bioinformatics >> > Stowers Institute for Medical Research - Kansas City, MO USA >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Fri Mar 10 17:59:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Mar 2006 16:59:23 -0600 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene In-Reply-To: Message-ID: <000001c64496$45e81a60$15327e82@pyrimidine> > -----Original Message----- > From: Cook, Malcolm [mailto:MEC at stowers-institute.org] > Sent: Friday, March 10, 2006 4:38 PM > To: Chris Fields; Brian Osborne; bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] where to document dependency? AND new SeqIO > formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene > > Getting closer... > > So, I added it to the ./Makefile.PL and ./INSTALL with the cvs comment > 'added dependency on Convert::Binary::C needed by Bio::SeqIO::strider' > > But, re the wiki, it looks to me like the contents of the wiki page are > (nearly) identical to the ./INSTALL. Is one autogenerated from the > other, or do the both get editted? No, at least not at the moment. I suppose we could get it into POD and use pod2wiki. > Also, the only place I can think to add the dependency in the wiki > content is to the list of modules installed by Bundle::CPAN. Am I > missing something, or should I be considering adding Convert::Binary::C > to Bundle::CPAN as well? That's the place, though I think you mean Bundle::Bioperl. I'm not sure what you should do about including it with Bundle::Bioperl. Looks like Chris Dagdigian is the maintainer for that; his email listed on CPAN is dag at sonsorol.org, though I wouldn't be surprised if it's out of date. > > Thanks, > > Malcolm > Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mblanche at berkeley.edu Fri Mar 10 20:05:22 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Fri, 10 Mar 2006 17:05:22 -0800 Subject: [Bioperl-l] Multiple gene segment problem Message-ID: Dear all-- One more unusual behavior from the latest GadFly gff3 database loaded into mySQL... I had been following Lincoln Stein advice as to how populate a Bio::DB::GFF mySQL database with GadFly. Everything seemed to work nicely but somehow, while calling for gene segment based on CG ids, I hit a CG with more than one assignment as in #!/usr/bin/perl use strict; use Bio::DB::GFF; my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', -sub_parts => ['exon','five_prime_UTR','three_prime_UTR'], ); my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_421;host=riolab.net', -user => 'guest', -aggregators=> [$agg1], ); my @hits = ('CG2086', 'CG17894', 'CG32912'); for my $gene (@hits){ my $tg = $dmdb->segment(-name => $gene); print "$gene is ", $tg->length, " nt long\n"; } When I tried to get the segment for gene CG32912, I get the following: ------------- EXCEPTION ------------- MSG: multiple segment exception STACK Bio::DB::GFF::_multiple_return_args /Library/Perl/5.8.6/Bio/DB/GFF.pm:953 STACK Bio::DB::GFF::segment /Library/Perl/5.8.6/Bio/DB/GFF.pm:938 STACK toplevel test.pl:19 -------------------------------------- Any clue??? Any fix??? ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From neetisomaiya at gmail.com Sat Mar 11 01:47:44 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Sat, 11 Mar 2006 12:17:44 +0530 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <000301c63886$fa95eb20$15327e82@pyrimidine> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> Message-ID: <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> Hi, I am running standalone blast and I wanna use a particular e value, gap open and extension cost and matrix. I have tried using all possible ways: 1) @params = ('program' => 'blastn','database' => 'human.rna.fna', _READMETHOD => "Blast",'e' => '0.0001', 'Matrix' => 'BLOSUM80' ); my $factory = new Bio::Tools::Run::StandAloneBlast(@params); 2) @params = ('program' => 'blastn','database' => 'human.rna.fna', _READMETHOD => "Blast",'e' => '0.0001', 'M' => 'BLOSUM80' ); my $factory = new Bio::Tools::Run::StandAloneBlast(@params); 3) my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', 'database' => ' human.rna.fna', _READMETHOD => "Blast" ); $factory->e(0.0001); $factory->G(-11); $factory->E(-1); $factory->M('BLOSUM80'); 4) my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', 'database' => ' human.rna.fna', _READMETHOD => "Blast" ); $factory->e(0.0001); $factory->MATRIX('BLOSUM80'); $factory->GAP(-11); $factory->EXTENSION(-1); But, still, the blast results show just the e value change, but no changes from default in the matrix used or gap opening and extension penalities. What should I do? Please help. On 2/23/06, Chris Fields wrote: > > Have you tried this to see if it works? The blast report itself should > tell > you if everything is set correctly. Use 'perldoc > Bio::Tools::Run::StandAlone::Blast', which explains everything. I don't > know if the example script works but the test script StandAloneBlast.t (in > /t) should; that will give you plenty of examples for setting parameters. > > And please, don't spam the bioperl-l list with repeated emails (four at > last > count over 2 1/2 hours). > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of neeti somaiya > > Sent: Thursday, February 23, 2006 4:13 AM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] urgent help required - syntax for using > > paramatersdifferent from default in standalone blast > > > > Hi, > > > > I am running standalone blast and I wanna use a particular e value, gap > > open > > and extension cost and matrix. Is the following the correct syntax for > the > > same : > > > > my $Seq_in = Bio::SeqIO->new (-file => > > $file, -format => 'fasta'); > > my $query = $Seq_in->next_seq(); > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', > > 'database' => ' > > human.rna.fna', > > _READMETHOD => "Blast" > > ); > > $factory->e(0.0001); > > $factory->G(-11); > > $factory->E(-1); > > $factory->M('BLOSUM80'); > > > > my $blast_report = > > $factory->blastall($query); > > my $result = $blast_report->next_result; > > > > -- > > -Neeti > > Even my blood says, B positive > > > > -- > > -Neeti > > Even my blood says, B positive > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- -Neeti Even my blood says, B positive From osborne1 at optonline.net Sat Mar 11 11:02:24 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 11 Mar 2006 11:02:24 -0500 Subject: [Bioperl-l] _READMETHOD default Message-ID: bioperl-l, I don?t pay much attention to StandAloneBlast.pm but I was surprised to see this in the latest version: $DEFAULTREADMETHOD = 'BLAST'; Shouldn?t that be ?SearchIO?? Brian O. From cjfields at uiuc.edu Sat Mar 11 14:11:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 11 Mar 2006 13:11:31 -0600 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> Message-ID: <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> What version of bioperl are you using? What OS? Always a good idea to give these details. Although I believe there is a major reworking of the Bio::Tools::Run BLAST modules planned I'm pretty sure that everything still works, at least in the latest developer version. Lots of people are running it so I'm guessing something is wrong with the logic here otherwise we would have heard about this a while ago. I'm not an expert about this module either but I believe Brian's right about _READMETHOD. When using SearchIO directly this is set using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and 'blasttable' for tabular. According to the POD for StandAloneBlast SearchIO::blast is default; have you tried not using that flag (removing it)? Does the matrix you're using exist in the /data directory? Chris On Mar 11, 2006, at 12:47 AM, neeti somaiya wrote: > Hi, > I am running standalone blast and I wanna use a particular e value, > gap open > and extension cost and matrix. > I have tried using all possible ways: > > 1) > @params = ('program' => 'blastn','database' => 'human.rna.fna', > _READMETHOD > => "Blast",'e' => '0.0001', 'Matrix' => 'BLOSUM80' ); > my $factory = new Bio::Tools::Run::StandAloneBlast(@params); > > 2) > @params = ('program' => 'blastn','database' => 'human.rna.fna', > _READMETHOD > => "Blast",'e' => '0.0001', 'M' => 'BLOSUM80' ); > my $factory = new Bio::Tools::Run::StandAloneBlast(@params); > > 3) > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastn', > 'database' => ' > human.rna.fna', > _READMETHOD => > "Blast" > ); > $factory->e(0.0001); > $factory->G(-11); > $factory->E(-1); > $factory->M('BLOSUM80'); > > 4) > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastn', > 'database' => ' > human.rna.fna', > _READMETHOD => > "Blast" > ); > $factory->e(0.0001); > $factory->MATRIX('BLOSUM80'); > $factory->GAP(-11); > $factory->EXTENSION(-1); > > But, still, the blast results show just the e value change, but no > changes > from default in the matrix used or gap opening and extension > penalities. > > What should I do? > > Please help. > > > On 2/23/06, Chris Fields wrote: >> >> Have you tried this to see if it works? The blast report itself >> should >> tell >> you if everything is set correctly. Use 'perldoc >> Bio::Tools::Run::StandAlone::Blast', which explains everything. I >> don't >> know if the example script works but the test script >> StandAloneBlast.t (in >> /t) should; that will give you plenty of examples for setting >> parameters. >> >> And please, don't spam the bioperl-l list with repeated emails >> (four at >> last >> count over 2 1/2 hours). >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of neeti somaiya >>> Sent: Thursday, February 23, 2006 4:13 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] urgent help required - syntax for using >>> paramatersdifferent from default in standalone blast >>> >>> Hi, >>> >>> I am running standalone blast and I wanna use a particular e >>> value, gap >>> open >>> and extension cost and matrix. Is the following the correct >>> syntax for >> the >>> same : >>> >>> my $Seq_in = Bio::SeqIO->new (- >>> file => >>> $file, -format => 'fasta'); >>> my $query = $Seq_in->next_seq(); >>> my $factory = >>> Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', >>> 'database' => ' >>> human.rna.fna', >>> _READMETHOD => >>> "Blast" >>> ); >>> $factory->e(0.0001); >>> $factory->G(-11); >>> $factory->E(-1); >>> $factory->M('BLOSUM80'); >>> >>> my $blast_report = >>> $factory->blastall($query); >>> my $result = $blast_report- >>> >next_result; >>> >>> -- >>> -Neeti >>> Even my blood says, B positive >>> >>> -- >>> -Neeti >>> Even my blood says, B positive >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > -Neeti > Even my blood says, B positive > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rvosa at sfu.ca Sat Mar 11 16:12:12 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sat, 11 Mar 2006 13:12:12 -0800 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> Message-ID: <44133D2B.1080803@sfu.ca> As a general rule I recommend not modifying/invoking things that start with an underscore. I.e. don't touch _READMETHOD directly, but use -readmethod => 'something' in the constructor, or the $ob->readmethod('something') mutator. And I agree about the spamming. There's a lot of people on this list who will help you if they can - but they're all volunteers who I'm sure have their own 'urgent' matters to attend to. Chris Fields wrote: >What version of bioperl are you using? What OS? Always a good idea >to give these details. Although I believe there is a major reworking >of the Bio::Tools::Run BLAST modules planned I'm pretty sure that >everything still works, at least in the latest developer version. >Lots of people are running it so I'm guessing something is wrong with >the logic here otherwise we would have heard about this a while ago. > >I'm not an expert about this module either but I believe Brian's >right about _READMETHOD. When using SearchIO directly this is set >using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and >'blasttable' for tabular. According to the POD for StandAloneBlast >SearchIO::blast is default; have you tried not using that flag >(removing it)? Does the matrix you're using exist in the /data >directory? > >Chris > >On Mar 11, 2006, at 12:47 AM, neeti somaiya wrote: > > > >>Hi, >>I am running standalone blast and I wanna use a particular e value, >>gap open >>and extension cost and matrix. >>I have tried using all possible ways: >> >>1) >>@params = ('program' => 'blastn','database' => 'human.rna.fna', >>_READMETHOD >>=> "Blast",'e' => '0.0001', 'Matrix' => 'BLOSUM80' ); >>my $factory = new Bio::Tools::Run::StandAloneBlast(@params); >> >>2) >>@params = ('program' => 'blastn','database' => 'human.rna.fna', >>_READMETHOD >>=> "Blast",'e' => '0.0001', 'M' => 'BLOSUM80' ); >>my $factory = new Bio::Tools::Run::StandAloneBlast(@params); >> >>3) >>my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>'blastn', >> 'database' => ' >>human.rna.fna', >> _READMETHOD => >>"Blast" >> ); >>$factory->e(0.0001); >>$factory->G(-11); >>$factory->E(-1); >>$factory->M('BLOSUM80'); >> >>4) >>my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>'blastn', >> 'database' => ' >>human.rna.fna', >> _READMETHOD => >>"Blast" >> ); >>$factory->e(0.0001); >>$factory->MATRIX('BLOSUM80'); >>$factory->GAP(-11); >>$factory->EXTENSION(-1); >> >>But, still, the blast results show just the e value change, but no >>changes >>from default in the matrix used or gap opening and extension >>penalities. >> >>What should I do? >> >>Please help. >> >> >>On 2/23/06, Chris Fields wrote: >> >> >>>Have you tried this to see if it works? The blast report itself >>>should >>>tell >>>you if everything is set correctly. Use 'perldoc >>>Bio::Tools::Run::StandAlone::Blast', which explains everything. I >>>don't >>>know if the example script works but the test script >>>StandAloneBlast.t (in >>>/t) should; that will give you plenty of examples for setting >>>parameters. >>> >>>And please, don't spam the bioperl-l list with repeated emails >>>(four at >>>last >>>count over 2 1/2 hours). >>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>bounces at lists.open-bio.org] On Behalf Of neeti somaiya >>>>Sent: Thursday, February 23, 2006 4:13 AM >>>>To: bioperl-l at lists.open-bio.org >>>>Subject: [Bioperl-l] urgent help required - syntax for using >>>>paramatersdifferent from default in standalone blast >>>> >>>>Hi, >>>> >>>>I am running standalone blast and I wanna use a particular e >>>>value, gap >>>>open >>>>and extension cost and matrix. Is the following the correct >>>>syntax for >>>> >>>> >>>the >>> >>> >>>>same : >>>> >>>> my $Seq_in = Bio::SeqIO->new (- >>>>file => >>>>$file, -format => 'fasta'); >>>> my $query = $Seq_in->next_seq(); >>>> my $factory = >>>>Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', >>>> 'database' => ' >>>>human.rna.fna', >>>> _READMETHOD => >>>>"Blast" >>>> ); >>>> $factory->e(0.0001); >>>> $factory->G(-11); >>>> $factory->E(-1); >>>> $factory->M('BLOSUM80'); >>>> >>>> my $blast_report = >>>>$factory->blastall($query); >>>> my $result = $blast_report- >>>> >>>> >>>>>next_result; >>>>> >>>>> >>>>-- >>>>-Neeti >>>>Even my blood says, B positive >>>> >>>>-- >>>>-Neeti >>>>Even my blood says, B positive >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>-- >>-Neeti >>Even my blood says, B positive >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >Christopher Fields >Postdoctoral Researcher >Lab of Dr. Robert Switzer >Dept of Biochemistry >University of Illinois Urbana-Champaign > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From jason.stajich at duke.edu Sat Mar 11 17:07:00 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 11 Mar 2006 17:07:00 -0500 Subject: [Bioperl-l] _READMETHOD default In-Reply-To: References: Message-ID: its a synonym. if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i ) { $blast_obj = Bio::SearchIO->new(-file=> $outfile, .... ); } On Mar 11, 2006, at 11:02 AM, Brian Osborne wrote: > bioperl-l, > > I don?t pay much attention to StandAloneBlast.pm but I was > surprised to see > this in the latest version: > > $DEFAULTREADMETHOD = 'BLAST'; > Shouldn?t that be ?SearchIO?? > > Brian O. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Sat Mar 11 17:02:52 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 12 Mar 2006 09:02:52 +1100 Subject: [Bioperl-l] _READMETHOD default In-Reply-To: References: Message-ID: <4413490C.8040506@infotech.monash.edu.au> > I don?t pay much attention to StandAloneBlast.pm but I was surprised to see > this in the latest version: > $DEFAULTREADMETHOD = 'BLAST'; > Shouldn?t that be ?SearchIO?? My understanding of StandAloneBlast.pm is that _READMETHOD =~ /BLAST/i # uses SearchIO _READMETHOD =~ /BPLite/i # use BPlite / BPpsilite _READMETHOD =~ ???????? # prints warning, then returns undef ie. "blastxml" is NOT explicity handled and therefore a warning will be printed and an undef blast report returned Considering the discussion recently about switching RemoteBlast to 'blastxml' this should probably be changed. I am very busy this week (still recovering from primary server death) but will get to it soon. (Brian, i'm happy for you to patch it in, if you are willing, it's line 829) -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From torsten.seemann at infotech.monash.edu.au Sat Mar 11 17:17:51 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 12 Mar 2006 09:17:51 +1100 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> Message-ID: <44134C8F.3040303@infotech.monash.edu.au> Chris Fields wrote: > What version of bioperl are you using? What OS? Always a good idea > to give these details. Neeti, please let us also know the version of "blastall" you are running. eg. NCBI 2.2.13 ? > Although I believe there is a major reworking > of the Bio::Tools::Run BLAST modules planned Yes there is, but Roger Hall fell ill during our initial discussions so we are a little behind on that one. I'm pretty sure that > everything still works, at least in the latest developer version. > Lots of people are running it so I'm guessing something is wrong with > the logic here otherwise we would have heard about this a while ago. Agreed - at least in the bioperl-live version. > I'm not an expert about this module either but I believe Brian's > right about _READMETHOD. When using SearchIO directly this is set > using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and > 'blasttable' for tabular. Actually, 'blastxml' is NOT supported in bioperl-live version, and it doesn't appear to have ever been supported! It won't be hard to patch in. Also, the current StandAloneBlast does not handle these newer "blastall" parameters (need to add to @BLASTALL_PARAMS) -R PSI-TBLASTN checkpoint file [File In] Optional -n MegaBlast search [T/F] -L Location on query sequence [String] Optional -A Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer] -w Frame shift penalty (OOF algorithm for blastx) [Integer] -t Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments. (0 invokes default behavior; a negative value disables linking.) [Integer] -B Number of concatenated queries, for blastn and tblastn [Integer] Optional -V Force use of old engine [T/F] Optional -C Use composition-based statistics for tblastn: -s Compute locally optimal Smith-Waterman alignments (This option is only According to the POD for StandAloneBlast > SearchIO::blast is default; have you tried not using that flag > (removing it)? Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is the default, and does not need to be set in Neeti's examples. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From cjfields at uiuc.edu Sat Mar 11 20:26:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 11 Mar 2006 19:26:14 -0600 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <44134C8F.3040303@infotech.monash.edu.au> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> <44134C8F.3040303@infotech.monash.edu.au> Message-ID: <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> Ah. I thought that RemoteBlast and StandAloneBlast had similar settings. My bad. Would be nice to maybe move some of the common features to another module that both can inherit from (like parsing/ saving output, map parameters for each like SearchIO, etc). Chris On Mar 11, 2006, at 4:17 PM, Torsten Seemann wrote: >> Although I believe there is a major reworking >> of the Bio::Tools::Run BLAST modules planned > > Yes there is, but Roger Hall fell ill during our initial > discussions so we are > a little behind on that one. > > I'm pretty sure that >> everything still works, at least in the latest developer version. >> Lots of people are running it so I'm guessing something is wrong with >> the logic here otherwise we would have heard about this a while ago. > > Agreed - at least in the bioperl-live version. > >> I'm not an expert about this module either but I believe Brian's >> right about _READMETHOD. When using SearchIO directly this is set >> using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and >> 'blasttable' for tabular. > > Actually, 'blastxml' is NOT supported in bioperl-live version, and > it doesn't > appear to have ever been supported! It won't be hard to patch in. > > Also, the current StandAloneBlast does not handle these newer > "blastall" > parameters (need to add to @BLASTALL_PARAMS) > > -R PSI-TBLASTN checkpoint file [File In] Optional > -n MegaBlast search [T/F] > -L Location on query sequence [String] Optional > -A Multiple Hits window size, default if zero (blastn/megablast > 0, all > others 40 [Integer] > -w Frame shift penalty (OOF algorithm for blastx) [Integer] > -t Length of the largest intron allowed in a translated > nucleotide sequence > when linking multiple distinct alignments. (0 invokes default > behavior; a > negative value disables linking.) [Integer] > -B Number of concatenated queries, for blastn and tblastn > [Integer] Optional > -V Force use of old engine [T/F] Optional > -C Use composition-based statistics for tblastn: > -s Compute locally optimal Smith-Waterman alignments (This > option is only > > According to the POD for StandAloneBlast >> SearchIO::blast is default; have you tried not using that flag >> (removing it)? > > Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is > the default, > and does not need to be set in Neeti's examples. > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > Phone: +61 3 9905 9010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Sun Mar 12 11:27:31 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 12 Mar 2006 16:27:31 +0000 Subject: [Bioperl-l] Multiple gene segment problem In-Reply-To: References: Message-ID: <200603121627.31934.lstein@cshl.edu> There may be several genes with the same name. Call segment() in a list context to get them all. Lincoln On Saturday 11 March 2006 01:05, Marco Blanchette wrote: > Dear all-- > > One more unusual behavior from the latest GadFly gff3 database loaded into > mySQL... I had been following Lincoln Stein advice as to how populate a > Bio::DB::GFF mySQL database with GadFly. Everything seemed to work nicely > but somehow, while calling for gene segment based on CG ids, I hit a CG > with more than one assignment as in > > #!/usr/bin/perl > > use strict; > use Bio::DB::GFF; > > my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', > -sub_parts => > ['exon','five_prime_UTR','three_prime_UTR'], > ); > > my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_421;host=riolab.net', -user => 'guest', > -aggregators=> [$agg1], > ); > > my @hits = ('CG2086', 'CG17894', 'CG32912'); > > for my $gene (@hits){ > my $tg = $dmdb->segment(-name => $gene); > print "$gene is ", $tg->length, " nt long\n"; > } > > When I tried to get the segment for gene CG32912, I get the following: > > ------------- EXCEPTION ------------- > MSG: multiple segment exception > STACK Bio::DB::GFF::_multiple_return_args > /Library/Perl/5.8.6/Bio/DB/GFF.pm:953 > STACK Bio::DB::GFF::segment /Library/Perl/5.8.6/Bio/DB/GFF.pm:938 > STACK toplevel test.pl:19 > > -------------------------------------- > > Any clue??? Any fix??? > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008) From neetisomaiya at gmail.com Mon Mar 13 02:39:07 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 13 Mar 2006 13:09:07 +0530 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> <44134C8F.3040303@infotech.monash.edu.au> <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> Message-ID: <764978cf0603122339x194abfacvf7e6d4d1ece13393@mail.gmail.com> Hi, I am using blast 2.2.13 on a linux machine. I tried this: @params = ('program' => 'blastn','database' => 'human.rna.fna','e' => '0.0001', 'Matrix' => 'BLOSUM80', '_READMETHOD' => 'SearchIO' ); But, this din't help. I checked the data folder, it has BLOSUM80. Even a simple blastall command like blastall -i tryn.fasta -p blastn -d human.rna.fna -o blastoutput -M BLOSUM80 doesnt show BLOSUM80 in the results. What should I do? Any other approach to change the parameters? Why is it that e value is taken, but not any other parameter? On 3/12/06, Chris Fields wrote: > > Ah. I thought that RemoteBlast and StandAloneBlast had similar > settings. My bad. Would be nice to maybe move some of the common > features to another module that both can inherit from (like parsing/ > saving output, map parameters for each like SearchIO, etc). > > Chris > > On Mar 11, 2006, at 4:17 PM, Torsten Seemann wrote: > > >> Although I believe there is a major reworking > >> of the Bio::Tools::Run BLAST modules planned > > > > Yes there is, but Roger Hall fell ill during our initial > > discussions so we are > > a little behind on that one. > > > > I'm pretty sure that > >> everything still works, at least in the latest developer version. > >> Lots of people are running it so I'm guessing something is wrong with > >> the logic here otherwise we would have heard about this a while ago. > > > > Agreed - at least in the bioperl-live version. > > > >> I'm not an expert about this module either but I believe Brian's > >> right about _READMETHOD. When using SearchIO directly this is set > >> using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and > >> 'blasttable' for tabular. > > > > Actually, 'blastxml' is NOT supported in bioperl-live version, and > > it doesn't > > appear to have ever been supported! It won't be hard to patch in. > > > > Also, the current StandAloneBlast does not handle these newer > > "blastall" > > parameters (need to add to @BLASTALL_PARAMS) > > > > -R PSI-TBLASTN checkpoint file [File In] Optional > > -n MegaBlast search [T/F] > > -L Location on query sequence [String] Optional > > -A Multiple Hits window size, default if zero (blastn/megablast > > 0, all > > others 40 [Integer] > > -w Frame shift penalty (OOF algorithm for blastx) [Integer] > > -t Length of the largest intron allowed in a translated > > nucleotide sequence > > when linking multiple distinct alignments. (0 invokes default > > behavior; a > > negative value disables linking.) [Integer] > > -B Number of concatenated queries, for blastn and tblastn > > [Integer] Optional > > -V Force use of old engine [T/F] Optional > > -C Use composition-based statistics for tblastn: > > -s Compute locally optimal Smith-Waterman alignments (This > > option is only > > > > According to the POD for StandAloneBlast > >> SearchIO::blast is default; have you tried not using that flag > >> (removing it)? > > > > Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is > > the default, > > and does not need to be set in Neeti's examples. > > > > -- > > Torsten Seemann > > Victorian Bioinformatics Consortium, Monash University, Australia > > http://www.vicbioinformatics.com/ > > Phone: +61 3 9905 9010 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- -Neeti Even my blood says, B positive From d.gatherer at vir.gla.ac.uk Mon Mar 13 04:36:13 2006 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Mon, 13 Mar 2006 09:36:13 +0000 Subject: [Bioperl-l] translating a GenBank file Message-ID: <6.2.3.4.1.20060313091842.02af08a8@lenzie.gla.ac.uk> Dear BioPerlers I have a general strategy question for the following situation. I want to take GenBank files of viral genomes (~100-200kb only), and produce a translation around the sequence in a format like: TAAACCTGTCTTTCAGACCTTGTTGGACATCCCGTACAATCAAGATGTTCCTGTATGTTG S R C S C M L TTTGCAGTCTGGCGGTTTGCTTTCGAGGACTATTAAGCCTTTCTCTGCAATCGTCTCCAA F A V W R F A F E D Y M A F L C N R L Q ATCTCTGCCCTGGAGTGATTTCAACGCCTTACACGTTGACCTGTCCGTCTAATACATCCT I S A L E M where the translation is above the DNA for forward strand and below for complementary strand ORFs. I initially attempted this using EMBOSS, where there are a couple of utilities called "showseq" and "prettyseq" that will take a range of start and stop points and produce a translation of the type above. However, it turns out that they are not quite up to the job for translating whole genomes because showseq throws an exception when the ORFs are overlapping (a deliberate feature), and both showseq and prettyseq seem to have trouble with a combination of forward and reverse translations on the same sequence (not officially confirmed as a bug yet, but certainly not a feature). So, before I start trying to hack EMBOSS, is there a better way to do it in BioPerl? It occurs to me that the above format is not a "standard", although it is seen quite commonly in publications etc, which may be the major difficulty. All suggestions gratefully appreciated Derek _________________________ Derek Gatherer Ph.D. Cert.Ed. Computer Officer Institute of Virology Church Street Glasgow G11 5JR Tel: 0141-330-6268 Fax: 0141-337-2236 From cjfields at uiuc.edu Mon Mar 13 08:51:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Mar 2006 07:51:30 -0600 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <764978cf0603122339x194abfacvf7e6d4d1ece13393@mail.gmail.com> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> <44134C8F.3040303@infotech.monash.edu.au> <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> <764978cf0603122339x194abfacvf7e6d4d1ece13393@mail.gmail.com> Message-ID: <960536D8-CBEF-4178-A743-601C5FE1C34F@uiuc.edu> On Mar 13, 2006, at 1:39 AM, neeti somaiya wrote: > Hi, > > I am using blast 2.2.13 on a linux machine. > > I tried this: > > @params = ('program' => 'blastn','database' => 'human.rna.fna','e' => > '0.0001', 'Matrix' => 'BLOSUM80', '_READMETHOD' => 'SearchIO' ); > > But, this din't help. > > I checked the data folder, it has BLOSUM80. > > Even a simple blastall command like > blastall -i tryn.fasta -p blastn -d human.rna.fna -o blastoutput -M > BLOSUM80 > doesnt show BLOSUM80 in the results. This should tell you something here and is likely the source of the problem. Do you have the .ncbirc file set properly? From BLAST install: 2) Create a .ncbirc file. In order for Standalone BLAST to operate, you have will need to have a .ncbirc file that contains the following lines: [NCBI] Data="path/data/" Where "path/data/" is the path to the location of the Standalone BLAST "data" subdirectory. For Example: Data=/root/blast/data The data subdirectory should automatically appear in the directory where the downloaded file was extracted. Please note that in many cases it may be necessary to delimit the entire path including the machine name and or the net work you are located on. Your systems administrator can help you if you do not know the entire path to the data subdirectory. Make sure that your .ncbirc file is either in the directory that you call the Standalone BLAST program from or in your root directory. I believe this can be placed in your home directory though I can't test that out. Are your databases, matrices, etc all in the data directory? > What should I do? > Don't panic! > Any other approach to change the parameters? > Why is it that e value is taken, but not any other parameter? These are the ones I found in StandAloneBlast, just by looking (hint hint): @BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q D a O J M W z K L Y S T l U y Z); Chris > > > > On 3/12/06, Chris Fields wrote: >> >> Ah. I thought that RemoteBlast and StandAloneBlast had similar >> settings. My bad. Would be nice to maybe move some of the common >> features to another module that both can inherit from (like parsing/ >> saving output, map parameters for each like SearchIO, etc). >> >> Chris >> >> On Mar 11, 2006, at 4:17 PM, Torsten Seemann wrote: >> >>>> Although I believe there is a major reworking >>>> of the Bio::Tools::Run BLAST modules planned >>> >>> Yes there is, but Roger Hall fell ill during our initial >>> discussions so we are >>> a little behind on that one. >>> >>> I'm pretty sure that >>>> everything still works, at least in the latest developer version. >>>> Lots of people are running it so I'm guessing something is wrong >>>> with >>>> the logic here otherwise we would have heard about this a while >>>> ago. >>> >>> Agreed - at least in the bioperl-live version. >>> >>>> I'm not an expert about this module either but I believe Brian's >>>> right about _READMETHOD. When using SearchIO directly this is set >>>> using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and >>>> 'blasttable' for tabular. >>> >>> Actually, 'blastxml' is NOT supported in bioperl-live version, and >>> it doesn't >>> appear to have ever been supported! It won't be hard to patch in. >>> >>> Also, the current StandAloneBlast does not handle these newer >>> "blastall" >>> parameters (need to add to @BLASTALL_PARAMS) >>> >>> -R PSI-TBLASTN checkpoint file [File In] Optional >>> -n MegaBlast search [T/F] >>> -L Location on query sequence [String] Optional >>> -A Multiple Hits window size, default if zero (blastn/megablast >>> 0, all >>> others 40 [Integer] >>> -w Frame shift penalty (OOF algorithm for blastx) [Integer] >>> -t Length of the largest intron allowed in a translated >>> nucleotide sequence >>> when linking multiple distinct alignments. (0 invokes default >>> behavior; a >>> negative value disables linking.) [Integer] >>> -B Number of concatenated queries, for blastn and tblastn >>> [Integer] Optional >>> -V Force use of old engine [T/F] Optional >>> -C Use composition-based statistics for tblastn: >>> -s Compute locally optimal Smith-Waterman alignments (This >>> option is only >>> >>> According to the POD for StandAloneBlast >>>> SearchIO::blast is default; have you tried not using that flag >>>> (removing it)? >>> >>> Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is >>> the default, >>> and does not need to be set in Neeti's examples. >>> >>> -- >>> Torsten Seemann >>> Victorian Bioinformatics Consortium, Monash University, Australia >>> http://www.vicbioinformatics.com/ >>> Phone: +61 3 9905 9010 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> > > > -- > -Neeti > Even my blood says, B positive > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From green at eva.mpg.de Mon Mar 13 16:22:26 2006 From: green at eva.mpg.de (Ed Green) Date: Mon, 13 Mar 2006 22:22:26 +0100 Subject: [Bioperl-l] genbank2gff3.pl on new human RefSeq Message-ID: <4415E292.2000101@eva.mpg.de> I am trying to get gff3 format annotation of the new RefSeq human genome build/annotation (v36.1). There are a few genbank contigs that seem to make Bio::SeqFeature::Tools::Unflattener get confused. For example, NT_029998.7 is a problem. I don't know if this warrants a bugzilla report, but it'd be nice to have this work. Ed Green From neetisomaiya at gmail.com Tue Mar 14 02:58:55 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 14 Mar 2006 13:28:55 +0530 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <960536D8-CBEF-4178-A743-601C5FE1C34F@uiuc.edu> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> <44134C8F.3040303@infotech.monash.edu.au> <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> <764978cf0603122339x194abfacvf7e6d4d1ece13393@mail.gmail.com> <960536D8-CBEF-4178-A743-601C5FE1C34F@uiuc.edu> Message-ID: <764978cf0603132358t7d12dbf9qc98bc55c1891121f@mail.gmail.com> On 3/13/06, Chris Fields wrote: > > > On Mar 13, 2006, at 1:39 AM, neeti somaiya wrote: > > Hi, > > I am using blast 2.2.13 on a linux machine. > > I tried this: > > @params = ('program' => 'blastn','database' => 'human.rna.fna','e' => > '0.0001', 'Matrix' => 'BLOSUM80', '_READMETHOD' => 'SearchIO' ); > > But, this din't help. > > I checked the data folder, it has BLOSUM80. > > Even a simple blastall command like > blastall -i tryn.fasta -p blastn -d human.rna.fna -o blastoutput -M > BLOSUM80 > doesnt show BLOSUM80 in the results. > > > This should tell you something here and is likely the source of the > problem. Do you have the .ncbirc file set properly? > > From BLAST install: > > 2) Create a .ncbirc file. In order for Standalone BLAST to operate, you > have will need to have a .ncbirc file that contains the following lines: > > [NCBI] > Data="path/data/" > > Where "path/data/" is the path to the location of the Standalone BLAST > "data" subdirectory. For Example: > > Data=/root/blast/data > > The data subdirectory should automatically appear in the directory where > the downloaded file was extracted. Please note that in many cases it may > be necessary to delimit the entire path including the machine name and > or the net work you are located on. Your systems administrator can help > you if you do not know the entire path to the data subdirectory. > > Make sure that your .ncbirc file is either in the directory that you > call the Standalone BLAST program from or in your root directory. > > I believe this can be placed in your home directory though I can't test > that out. Are your databases, matrices, etc all in the data directory? > all databases and matrices are there in the data directory. What should I do? > > > Don't panic! > > > Any other approach to change the parameters? > Why is it that e value is taken, but not any other parameter? > > > These are the ones I found in StandAloneBlast, just by looking (hint > hint): > > @BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q > D a O J M W z K L Y S T l U y Z); > > Chris > > > > > > On 3/12/06, Chris Fields wrote: > > > Ah. I thought that RemoteBlast and StandAloneBlast had similar > settings. My bad. Would be nice to maybe move some of the common > features to another module that both can inherit from (like parsing/ > saving output, map parameters for each like SearchIO, etc). > > Chris > > On Mar 11, 2006, at 4:17 PM, Torsten Seemann wrote: > > Although I believe there is a major reworking > of the Bio::Tools::Run BLAST modules planned > > > Yes there is, but Roger Hall fell ill during our initial > discussions so we are > a little behind on that one. > > I'm pretty sure that > > everything still works, at least in the latest developer version. > Lots of people are running it so I'm guessing something is wrong with > the logic here otherwise we would have heard about this a while ago. > > > Agreed - at least in the bioperl-live version. > > I'm not an expert about this module either but I believe Brian's > right about _READMETHOD. When using SearchIO directly this is set > using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and > 'blasttable' for tabular. > > > Actually, 'blastxml' is NOT supported in bioperl-live version, and > it doesn't > appear to have ever been supported! It won't be hard to patch in. > > Also, the current StandAloneBlast does not handle these newer > "blastall" > parameters (need to add to @BLASTALL_PARAMS) > > -R PSI-TBLASTN checkpoint file [File In] Optional > -n MegaBlast search [T/F] > -L Location on query sequence [String] Optional > -A Multiple Hits window size, default if zero (blastn/megablast > 0, all > others 40 [Integer] > -w Frame shift penalty (OOF algorithm for blastx) [Integer] > -t Length of the largest intron allowed in a translated > nucleotide sequence > when linking multiple distinct alignments. (0 invokes default > behavior; a > negative value disables linking.) [Integer] > -B Number of concatenated queries, for blastn and tblastn > [Integer] Optional > -V Force use of old engine [T/F] Optional > -C Use composition-based statistics for tblastn: > -s Compute locally optimal Smith-Waterman alignments (This > option is only > > According to the POD for StandAloneBlast > > SearchIO::blast is default; have you tried not using that flag > (removing it)? > > > Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is > the default, > and does not need to be set in Neeti's examples. > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > Phone: +61 3 9905 9010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > > > -- > -Neeti > Even my blood says, B positive > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Tue Mar 14 02:58:14 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 14 Mar 2006 13:28:14 +0530 Subject: [Bioperl-l] urgent help required - syntax for using paramatersdifferent from default in standalone blast In-Reply-To: <960536D8-CBEF-4178-A743-601C5FE1C34F@uiuc.edu> References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com> <000301c63886$fa95eb20$15327e82@pyrimidine> <764978cf0603102247n46f081d0ydcc0d565664ea31b@mail.gmail.com> <6FCF7845-8A4A-43B1-A550-B11F7B8B9F76@uiuc.edu> <44134C8F.3040303@infotech.monash.edu.au> <0213A9E1-0488-4192-ABE3-8B56AAD31D95@uiuc.edu> <764978cf0603122339x194abfacvf7e6d4d1ece13393@mail.gmail.com> <960536D8-CBEF-4178-A743-601C5FE1C34F@uiuc.edu> Message-ID: <764978cf0603132358l70edd382y5383af90f47e1380@mail.gmail.com> On 3/13/06, Chris Fields wrote: > > > On Mar 13, 2006, at 1:39 AM, neeti somaiya wrote: > > Hi, > > I am using blast 2.2.13 on a linux machine. > > I tried this: > > @params = ('program' => 'blastn','database' => 'human.rna.fna','e' => > '0.0001', 'Matrix' => 'BLOSUM80', '_READMETHOD' => 'SearchIO' ); > > But, this din't help. > > I checked the data folder, it has BLOSUM80. > > Even a simple blastall command like > blastall -i tryn.fasta -p blastn -d human.rna.fna -o blastoutput -M > BLOSUM80 > doesnt show BLOSUM80 in the results. > > > This should tell you something here and is likely the source of the > problem. Do you have the .ncbirc file set properly? > > From BLAST install: > > 2) Create a .ncbirc file. In order for Standalone BLAST to operate, you > have will need to have a .ncbirc file that contains the following lines: > > [NCBI] > Data="path/data/" > > Where "path/data/" is the path to the location of the Standalone BLAST > "data" subdirectory. For Example: > > Data=/root/blast/data > > The data subdirectory should automatically appear in the directory where > the downloaded file was extracted. Please note that in many cases it may > be necessary to delimit the entire path including the machine name and > or the net work you are located on. Your systems administrator can help > you if you do not know the entire path to the data subdirectory. > > Make sure that your .ncbirc file is either in the directory that you > call the Standalone BLAST program from or in your root directory. > > I believe this can be placed in your home directory though I can't test > that out. Are your databases, matrices, etc all in the data directory? > the .ncbirc file is thr and all is properly set in it. blast runs with the default and with changed e value, only the problem is with changed matrix and gap opening and extension parameters. What should I do? > > > Don't panic! > > > Any other approach to change the parameters? > Why is it that e value is taken, but not any other parameter? > > > These are the ones I found in StandAloneBlast, just by looking (hint > hint): > > @BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q > D a O J M W z K L Y S T l U y Z); > > Chris > > > > > > On 3/12/06, Chris Fields wrote: > > > Ah. I thought that RemoteBlast and StandAloneBlast had similar > settings. My bad. Would be nice to maybe move some of the common > features to another module that both can inherit from (like parsing/ > saving output, map parameters for each like SearchIO, etc). > > Chris > > On Mar 11, 2006, at 4:17 PM, Torsten Seemann wrote: > > Although I believe there is a major reworking > of the Bio::Tools::Run BLAST modules planned > > > Yes there is, but Roger Hall fell ill during our initial > discussions so we are > a little behind on that one. > > I'm pretty sure that > > everything still works, at least in the latest developer version. > Lots of people are running it so I'm guessing something is wrong with > the logic here otherwise we would have heard about this a while ago. > > > Agreed - at least in the bioperl-live version. > > I'm not an expert about this module either but I believe Brian's > right about _READMETHOD. When using SearchIO directly this is set > using "-readmethod => 'SearchIO'" for text, 'blastxml' for xml, and > 'blasttable' for tabular. > > > Actually, 'blastxml' is NOT supported in bioperl-live version, and > it doesn't > appear to have ever been supported! It won't be hard to patch in. > > Also, the current StandAloneBlast does not handle these newer > "blastall" > parameters (need to add to @BLASTALL_PARAMS) > > -R PSI-TBLASTN checkpoint file [File In] Optional > -n MegaBlast search [T/F] > -L Location on query sequence [String] Optional > -A Multiple Hits window size, default if zero (blastn/megablast > 0, all > others 40 [Integer] > -w Frame shift penalty (OOF algorithm for blastx) [Integer] > -t Length of the largest intron allowed in a translated > nucleotide sequence > when linking multiple distinct alignments. (0 invokes default > behavior; a > negative value disables linking.) [Integer] > -B Number of concatenated queries, for blastn and tblastn > [Integer] Optional > -V Force use of old engine [T/F] Optional > -C Use composition-based statistics for tblastn: > -s Compute locally optimal Smith-Waterman alignments (This > option is only > > According to the POD for StandAloneBlast > > SearchIO::blast is default; have you tried not using that flag > (removing it)? > > > Yes, _READMETHOD='BLAST' (which uses Bio::SearchIO for parsing) is > the default, > and does not need to be set in Neeti's examples. > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > Phone: +61 3 9905 9010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > > > -- > -Neeti > Even my blood says, B positive > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- -Neeti Even my blood says, B positive From j.abbott at imperial.ac.uk Tue Mar 14 06:28:34 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Tue, 14 Mar 2006 11:28:34 +0000 Subject: [Bioperl-l] EMBL/genbank organism parsing Message-ID: <4416A8E2.8090909@imperial.ac.uk> Hi Hilmar/Jason, Thanks for the comments. Please excuse the breach of netiquette by replying to you both in one message, but given the overlaps it's the easiest way.... Jason Stajich wrote: > I *think* the fields in the Taxonomy::Node object should be suffient > to separate out the field you are talking about. I've had a look at Taxonomy::Node, and it looks like it will indeed hold the necessary fields. There are some distinctions below species level such as serovars and pathovars which I though may need special handling, but NCBI taxonomy seems happy to treat these as separate species. Well....they provide distinct nodes with the rank of 'species' for each one, which probably means they consider them separate species... Hilmar Lapp wrote: > I don't think tweaking individual parsers until they behave as desired > on a then-current set examples is going to put an end to this I agree with this completely. I haven't looked so closely at Genbank, but the EMBL User Manual dictates a 'standard' which does not appear to be enforced, to the extent that certain OS lines are little more than free text. This situation looks even worse in Uniprot, where there can be multiple bracketed names following the latin name, which may represent synonyms, strains or common-names, but with little contextual information to allow you to determine what the data is. I think, certainly for Uniprot, and probably for EMBL/Genbank, there is little chance in reliably parsing organism names. Hilmar Lapp wrote: > Or, quite radical in approach, we require the NCBI taxonomy database > (or any other implementation of Bio::DB::Taxonomy, e.g. could be > through BioSQL or what not) and otherwise disclaim responsibility for > correctly parsing the species. This seems perhaps the most pragmatic option, although I'd be worried about not providing any means of getting at species information in situations where access to a taxonomy database is not available for whatever reason (laziness included!), and the probable loss of speed associated with carrying out these queries. I guess there are numerous approaches to get round this. Two which immediately spring to mind: 1) a hybrid system which retains the parsing of OS lines as best of possible (accessed via Bio::Seq->species), but with the addition of a set of Bio::Seq->taxonomy methods to query taxonomy if more reliable data is required. Pro's - doesn't break existing API. Con's - I can see considerable user confusion by providing essentially the same data through different routes. 2) Carryout minimal parsing of OS lines populating only genus/species/binomial fields (i.e the bits we can probably reliably parse), and throw a warning if accessors to unpopulated fields are called. Add a new method to Bio::Seq to repopulate the Taxonomy::Node object on demand via a taxonomy query if more detailed info is required. Pros - Adds only one extra method Cons - breaks existing API if calls made to common_name etc. prior to fully populating the Taxonomy::Node object. I'm sure there are plenty of other ways...including just biting the bullet and enforcing the use of a taxonomy database, but that seems a little draconian when many entries will be easily parseable. > ideally someone (you?) can take charge and spearhead overhauling this. Doh...walked into that one... :-) I'll give it a go and see where we end up...I'm not hugely familiar with bioperl's internals, but I'm sure there are plenty of folk to holler if I do something stupid. I'll do some more digging, and as Jason suggested, create a page on the wiki with my thoughts and see what people think of it. Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From cjfields at uiuc.edu Tue Mar 14 09:52:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Mar 2006 08:52:21 -0600 Subject: [Bioperl-l] urgent help required - syntax for usingparamatersdifferent from default in standalone blast In-Reply-To: <764978cf0603132358l70edd382y5383af90f47e1380@mail.gmail.com> Message-ID: <001601c64776$e8d9d1e0$15327e82@pyrimidine> > > I tried this: > > > > @params = ('program' => 'blastn','database' => 'human.rna.fna','e' => > > '0.0001', 'Matrix' => 'BLOSUM80', '_READMETHOD' => 'SearchIO' ); Okay. Something I completely missed here since I assumed you knew what you were doing. There is a serious error in your logic here. Why do you need an AMINO ACID substitution matrix like BLOSUM80 for a NUCLEOTIDE database search? You're using BLASTN! And, again, you are using _READMETHOD where you don't need it. The parser is set to SearchIO by default. Remove it. .... > > But, this din't help. > > I checked the data folder, it has BLOSUM80. > > > > > > Even a simple blastall command like > > blastall -i tryn.fasta -p blastn -d human.rna.fna -o blastoutput -M > > BLOSUM80 > > doesnt show BLOSUM80 in the results. That's right, it won't. I get this: Matrix: blastn matrix:1 -3 Again, this should tell you something. The problem here is NOT StandAloneBlast but your logic. > the .ncbirc file is thr and all is properly set in it. > blast runs with the default and with changed e value, only the problem is > with changed matrix and gap opening and extension parameters. You aren't setting the gap opening penalty OR extension parameters in the parameter list you gave above. If you're referring to your earlier correspondence, the following is from the latest release (2.2.13), which could explain a lot: Megablast, blastall and bl2seq have until now allowed users to select arbitrary gap existence and extension penalties for a blastn type search. This has been convenient for users but has led to the unfortunate situation that searches with some parameter sets were significantly overestimating the statistical significance of matches. To address this problem the proper statistical parameters for a number of reward/penalty/gap existence/gap extension values have been calculated. The parameters that might cause an issue here are -r (match reward), -q (mismatch penalty), -G (gap existence cost), and -E (gap extension cost). If you do not change these, then nothing will change for you. > What should I do? > > > > > > Don't panic! This was not a joke. Think about what you are trying to do logically. If you have problems getting a result using command line args (not using Bioperl), why would you expect it to work with Bioperl? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From y.itan at ucl.ac.uk Tue Mar 14 11:01:11 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Tue, 14 Mar 2006 16:01:11 +0000 Subject: [Bioperl-l] Finding all human paralogues Message-ID: <200603141601.11779.y.itan@ucl.ac.uk> Hello, I need to find the number of duplications for each human gene using Bioperl. I would be grateful for your advice about my following problem: I have retrieved one sequence for a running test: $human_genes->[$i]->get_longest_peptide_Member()->sequence and would like to Blastp it against the whole human database. I have installed the Blast rpm on my computer, but I get this error message: linux:/home/Yuval # perl test7 -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at test7 line 50. Can you suggest how to solve that? Also, I would like to have the Blastp function inside my own programme. Is there any available source code for that? Thank you very much for any help. In case it might help, here is my whole short programme: #!/usr/local/bin/perl -w use lib "/home/Yuval/ensembl/modules"; use lib "/home/Yuval/bioperl-live"; use lib "/home/Yuval/ensembl-compara/modules"; use Bio::EnsEMBL::Compara::DBSQL::DBAdaptor; use Bio::EnsEMBL::DBSQL::DBAdaptor; use Bio::EnsEMBL::Utils::Slice qw(split_Slices); use Bio::EnsEMBL::Registry; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; my $host = 'ensembldb.ensembl.org'; my $user = 'anonymous'; my $dbname = 'ensembl_compara_37'; my $comparadb= new Bio::EnsEMBL::Compara::DBSQL::DBAdaptor( -host => $host, -user => $user, -dbname => $dbname); use strict; use Bio::EnsEMBL::Registry; use Bio::EnsEMBL::Utils::Exception qw(throw); use Bio::SimpleAlign; use Bio::AlignIO; use Bio::LocatableSeq; use Getopt::Long; my $human_genes = $comparadb->get_MemberAdaptor->fetch_by_source_taxon( 'ENSEMBLGENE', 9606); # getting all human genes my $i = 0; #print $human_genes->[$i]->get_longest_peptide_Member()->sequence, "\n\n"; #print $human_genes->[$i+1]->get_longest_peptide_Member()->sequence, "\n\n"; my @params = (program => 'blastp', database => $human_genes, _READMETHOD => 'SearchIO' ); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>$human_genes->[$i]->get_longest_peptide_Member()->sequence); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; From torsten.seemann at infotech.monash.edu.au Tue Mar 14 20:52:53 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 15 Mar 2006 12:52:53 +1100 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <200603141601.11779.y.itan@ucl.ac.uk> References: <200603141601.11779.y.itan@ucl.ac.uk> Message-ID: <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> Yuval, > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > Can you suggest how to solve that? Have you installed the blast binaries? Are they in your $PATH? Have you created a $HOME/.ncbirc file? Please read http://www.ncbi.nlm.nih.gov/blast/docs/ and http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/StandAloneBlast.html > Also, I would like to have the Blastp function inside my own programme. Is > there any available source code for that? BLAST is part of the NCBI toolkit: http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/ > my $human_genes = $comparadb->get_MemberAdaptor->fetch_by_source_taxon( > 'ENSEMBLGENE', 9606); # getting all human genes > my @params = (program => 'blastp', database => $human_genes, _READMETHOD => > 'SearchIO' ); 'database' has to be the name of the database of sequences to blast against. This has to exist as a set of blast index files on your disk. The 'formatdb' program can be used to create this. You appear to be passing some Perl object instead? Please read http://www.ncbi.nlm.nih.gov/blast/docs/ > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"test query", -seq > =>$human_genes->[$i]->get_longest_peptide_Member()->sequence); > my $report_obj = $blast_obj->blastall($seq_obj); die "unable to blastall" if not defined $report_obj; > my $result_obj = $report_obj->next_result; die "no result found" if not defined $result_obj; -- Torsten Seemann Victorian Bioinformatics Consortium From y.itan at ucl.ac.uk Wed Mar 15 13:10:13 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Wed, 15 Mar 2006 18:10:13 +0000 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> References: <200603141601.11779.y.itan@ucl.ac.uk> <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> Message-ID: <200603151810.14102.y.itan@ucl.ac.uk> Many thanks Torsten, that greatly helps! Yuval On Wednesday 15 March 2006 01:52, you wrote: > Yuval, > > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > Can you suggest how to solve that? > > Have you installed the blast binaries? > Are they in your $PATH? > Have you created a $HOME/.ncbirc file? > > Please read http://www.ncbi.nlm.nih.gov/blast/docs/ > and http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/StandAloneBlast.html > > > Also, I would like to have the Blastp function inside my own programme. > > Is there any available source code for that? > > BLAST is part of the NCBI toolkit: > http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/ > > > my $human_genes = $comparadb->get_MemberAdaptor->fetch_by_source_taxon( > > 'ENSEMBLGENE', 9606); # getting all human genes > > my @params = (program => 'blastp', database => $human_genes, _READMETHOD > > => 'SearchIO' ); > > 'database' has to be the name of the database of sequences to blast > against. This has to exist as a set of blast index files on your disk. > The 'formatdb' program can be used to create this. You appear to be > passing some Perl object instead? > > Please read http://www.ncbi.nlm.nih.gov/blast/docs/ > > > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > my $seq_obj = Bio::Seq->new(-id =>"test query", -seq > > =>$human_genes->[$i]->get_longest_peptide_Member()->sequence); > > my $report_obj = $blast_obj->blastall($seq_obj); > > die "unable to blastall" if not defined $report_obj; > > > my $result_obj = $report_obj->next_result; > > die "no result found" if not defined $result_obj; From haralds_listen at gmx.de Thu Mar 16 08:25:52 2006 From: haralds_listen at gmx.de (Harald) Date: Thu, 16 Mar 2006 14:25:52 +0100 Subject: [Bioperl-l] small mistake found in documentation of bioperl-1.5.0-RC1 In-Reply-To: <200603061131.48014.lstein@cshl.edu> References: <200603061131.48014.lstein@cshl.edu> Message-ID: <44196760.4020600@gmx.de> Hi, I think that I have found a small mistake in the documentation of bioperl-1.5.0-RC1. In the documentation of the method *column_from_residue_number *of Bio::SimpleAlign, it gives the example: Seq1/91-97 AC..DEF.GH Seq2/24-30 ACGG.RTY.. Seq3/43-51 AC.DDEFGHI column_from_residue_number( "Seq1", 94 ) returns 5. column_from_residue_number( "Seq2", 25 ) returns 2. column_from_residue_number( "Seq3", 50 ) returns 9. If I do not make a serious mistake, the first example is wrong and should say: "...returns 6" Regards, Harald From osborne1 at optonline.net Thu Mar 16 10:20:19 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Mar 2006 10:20:19 -0500 Subject: [Bioperl-l] small mistake found in documentation of bioperl-1.5.0-RC1 In-Reply-To: <44196760.4020600@gmx.de> Message-ID: Harald, Yes, you're right, thanks for the correction. Fixed. Brian O. On 3/16/06 8:25 AM, "Harald" wrote: > Hi, > > I think that I have found a small mistake in the documentation of > bioperl-1.5.0-RC1. > In the documentation of the method *column_from_residue_number *of > Bio::SimpleAlign, it gives the example: > > > > Seq1/91-97 AC..DEF.GH > Seq2/24-30 ACGG.RTY.. > Seq3/43-51 AC.DDEFGHI > > column_from_residue_number( "Seq1", 94 ) returns 5. > column_from_residue_number( "Seq2", 25 ) returns 2. > column_from_residue_number( "Seq3", 50 ) returns 9. > > If I do not make a serious mistake, the first example is wrong and should say: > "...returns 6" > > Regards, > Harald > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From koski at cenix-bioscience.com Thu Mar 16 10:14:24 2006 From: koski at cenix-bioscience.com (Liisa Koski) Date: Thu, 16 Mar 2006 16:14:24 +0100 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO Message-ID: <200603161614.24820.koski@cenix-bioscience.com> Hi, I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz). I'm using bioperl-1.5.1. I want to extract the KEGG annotations. See code below. use Bio::SeqIO; use Bio::ASN1::EntrezGene; my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => 'Homo_sapiens'); while (my $gene = $seqio->next_seq){ print "\n",$gene->id, "\t", $gene->accession_number, "\n"; my $ann = $gene->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { print $key, "\t", "=", "\t", $value->as_text,"\n"; } } } Unfortunately the only KEGG annotation I see in the results looks like: dblink = Direct database link to in database KEGG (Notice the space between 'to in') Anyone have any ideas how to get the KEGG annotation results? Note: I also tried parsing the file ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz but I got the below error: ./entrez_gene_seqio.pl Homo_sapiens.ags Data Error: none conforming data found on line 1 in Homo_sapiens.ags! first 20 (or till end of input) characters including the non-conforming data: 00 at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm line 138 Thanks, Liisa From mingyi.liu at gpc-biotech.com Thu Mar 16 10:59:32 2006 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu, 16 Mar 2006 10:59:32 -0500 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO In-Reply-To: <200603161614.24820.koski@cenix-bioscience.com> References: <200603161614.24820.koski@cenix-bioscience.com> Message-ID: <44198B64.4060704@gpc-biotech.com> Liisa Koski wrote: > Unfortunately the only KEGG annotation I see in the results looks like: > dblink = Direct database link to in database KEGG > (Notice the space between 'to in') > > Anyone have any ideas how to get the KEGG annotation results? Stefan's the person maintaining the SeqIO:entrezgene module, so he'd be able to answer this part of your question. > > Note: I also tried parsing the file > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz > but I got the below error: > > ./entrez_gene_seqio.pl Homo_sapiens.ags > Data Error: none conforming data found on line 1 in Homo_sapiens.ags! > first 20 (or till end of input) characters including the non-conforming data: > 00 > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm > line 138 > The error was thrown by my Bio::ASN1::EntrezGene module because it expects a text file, while you fed it with a binary file. To use gzipped ASN binary file from NCBI, download the NCBI gene2xml (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml), then use this syntax to run my parser on the binary files: my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped binary file directly downloaded from NCBI Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene). Best, Mingyi From skirov at utk.edu Thu Mar 16 11:29:10 2006 From: skirov at utk.edu (Stefan Kirov) Date: Thu, 16 Mar 2006 11:29:10 -0500 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO In-Reply-To: <200603161614.24820.koski@cenix-bioscience.com> References: <200603161614.24820.koski@cenix-bioscience.com> Message-ID: <44199256.5050709@utk.edu> Do this: my @dblinks=$ann->get_Annotations('dblink'); foreach my $link (@dblinks) { next unless ($dblink->database eq 'KEGG"); print $dblink->primary_id,"\t",$dblink->url,"\n"; } This works for me, hopefully it will for you too. Let me know if something is not right. Stefan Liisa Koski wrote: >Hi, >I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz). > >I'm using bioperl-1.5.1. > >I want to extract the KEGG annotations. >See code below. > >use Bio::SeqIO; >use Bio::ASN1::EntrezGene; > >my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > -file => 'Homo_sapiens'); >while (my $gene = $seqio->next_seq){ > print "\n",$gene->id, "\t", $gene->accession_number, "\n"; > my $ann = $gene->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print $key, "\t", "=", "\t", $value->as_text,"\n"; > } > } >} > >Unfortunately the only KEGG annotation I see in the results looks like: >dblink = Direct database link to in database KEGG >(Notice the space between 'to in') > >Anyone have any ideas how to get the KEGG annotation results? > >Note: I also tried parsing the file >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz >but I got the below error: > >./entrez_gene_seqio.pl Homo_sapiens.ags >Data Error: none conforming data found on line 1 in Homo_sapiens.ags! >first 20 (or till end of input) characters including the non-conforming data: >00 > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm >line 138 > > >Thanks, >Liisa > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From y.itan at ucl.ac.uk Thu Mar 16 15:19:36 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Fri, 17 Mar 2006 07:19:36 +1100 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> References: <200603141601.11779.y.itan@ucl.ac.uk> <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> Message-ID: <200603161738.19580.y.itan@ucl.ac.uk> Dear Torsten, I have followed your suggestions, and would be truly grateful for one more hint. I have updated the $PATH to include the directory of my Blast binaries. I have also made a .ncbirc file which includes the path to the fasta file I have downloaded (all human peptides): [NCBI] Data="/home/Yuval/Applications/blast/data/" But when I am trying to run one sequence against the human peptides fasta file, I get the following error message: linux:/home/Yuval # perl test5 [blastall] WARNING: test: Unable to open Homo_sapiens.NCBI354.feb.pep.fa.pin ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 /home/Yuval/Applications/blast/bin/blastall -p blastp -d /Homo_sapiens.NCBI354.feb.pep.fa -i /tmp/nvkaUCJjDq -o /tmp/klP53IHZKj STACK Bio::Tools::Run::StandAloneBlast::_runblast /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:633 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:602 STACK Bio::Tools::Run::StandAloneBlast::blastall /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:489 STACK toplevel test5:72 -------------------------------------- I would be grateful for any hint. These are the relevant lines of the code: my $i = 0; #for the first sequence in human database my $seq1 = $human_genes->[$i]->get_longest_peptide_Member()->sequence; my @params = (program => 'blastp', database => 'Homo_sapiens.NCBI354.feb.pep.fa', _READMETHOD => 'SearchIO' ); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>$seq1); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; Many thanks for everything. Yuval On Wednesday 15 March 2006 01:52, you wrote: > Yuval, > > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > Can you suggest how to solve that? > > Have you installed the blast binaries? > Are they in your $PATH? > Have you created a $HOME/.ncbirc file? > > Please read http://www.ncbi.nlm.nih.gov/blast/docs/ > and http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/StandAloneBlast.html > > > Also, I would like to have the Blastp function inside my own programme. > > Is there any available source code for that? > > BLAST is part of the NCBI toolkit: > http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/ > > > my $human_genes = $comparadb->get_MemberAdaptor->fetch_by_source_taxon( > > 'ENSEMBLGENE', 9606); # getting all human genes > > my @params = (program => 'blastp', database => $human_genes, _READMETHOD > > => 'SearchIO' ); > > 'database' has to be the name of the database of sequences to blast > against. This has to exist as a set of blast index files on your disk. > The 'formatdb' program can be used to create this. You appear to be > passing some Perl object instead? > > Please read http://www.ncbi.nlm.nih.gov/blast/docs/ > > > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > my $seq_obj = Bio::Seq->new(-id =>"test query", -seq > > =>$human_genes->[$i]->get_longest_peptide_Member()->sequence); > > my $report_obj = $blast_obj->blastall($seq_obj); > > die "unable to blastall" if not defined $report_obj; > > > my $result_obj = $report_obj->next_result; > > die "no result found" if not defined $result_obj; From torsten.seemann at infotech.monash.edu.au Thu Mar 16 15:37:35 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 17 Mar 2006 07:37:35 +1100 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <200603161738.19580.y.itan@ucl.ac.uk> References: <200603141601.11779.y.itan@ucl.ac.uk> <1142387573.29365.11.camel@chauvel.csse.monash.edu.au> <200603161738.19580.y.itan@ucl.ac.uk> Message-ID: <1142541455.6306.12.camel@chauvel.csse.monash.edu.au> Yuval, > I have followed your suggestions, and would be truly grateful for one more > hint. > I have updated the $PATH to include the directory of my Blast binaries. I have > also made a .ncbirc file which includes the path to the fasta file I have > downloaded (all human peptides): > [NCBI] > Data="/home/Yuval/Applications/blast/data/" This directory is "is the path to the location of the Standalone BLAST 'data' subdirectory" ie. the one that comes with the Blast binaries and has the BLOSUMxx and PAMxx files. It is NOT (normallly) the directory to put your fasta files in. > But when I am trying to run one sequence against the human peptides fasta > file, I get the following error message: > > linux:/home/Yuval # perl test5 > [blastall] WARNING: test: Unable to open Homo_sapiens.NCBI354.feb.pep.fa.pin It can't find the blast indices for Homo_sapiens.NCBI354.feb.pep.fa The .pin file is one of the index files created by the 'formatdb' program. > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 /home/Yuval/Applications/blast/bin/blastall -p > blastp -d /Homo_sapiens.NCBI354.feb.pep.fa -i /tmp/nvkaUCJjDq > -o /tmp/klP53IHZKj The "-d /Homo_sapiens.NCBI354.feb.pep.fa" means it is looking in your computer's root directory "/" for the .pin (and other index) files. > I would be grateful for any hint. These are the relevant lines of the code: > > my $i = 0; #for the first sequence in human database > my $seq1 = $human_genes->[$i]->get_longest_peptide_Member()->sequence; > my @params = (program => 'blastp', database => > 'Homo_sapiens.NCBI354.feb.pep.fa', _READMETHOD => 'SearchIO' ); For some reason it is looking in "/" for your index files. Do you have environmental variable $BLASTDB or $BLASTDATADIR set to "/" ? Three solutions: 1. Set "database => /full/path/to/Homo.pep.fa" 2. Set BLASTDB to /full/path/to 3. Set BLASTDATADIR to /full/path/to Options 2. and 3. can be done in your shell/environment or set in a Perl BEGIN block. ie. BEGIN { $ENV{BLASTDB} = '/......'; } Also, please remove the _READMETHOD, you don't need it. SearchIO is the defauly and used by nearly everybody. > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>$seq1); > my $report_obj = $blast_obj->blastall($seq_obj); > my $result_obj = $report_obj->next_result; > print $result_obj->num_hits; Rest looks ok. -- Torsten Seemann Victorian Bioinformatics Consortium From eweaver at gmail.com Wed Mar 15 11:21:00 2006 From: eweaver at gmail.com (Evan) Date: Wed, 15 Mar 2006 11:21:00 -0500 Subject: [Bioperl-l] update to bp_remote_blast.pl Message-ID: #!/usr/local/bin/perl -w # BioPerl module for remote_blast.pl # # Revived by Evan Weaver for bioperl-1.5.1 # 3/14/2006 # # Copyright Jason Stajich, Evan Weaver # # You may distribute this module under the same terms as perl itself # POD documentation - main docs after the code use strict; use vars qw($USAGE); use Bio::Tools::Run::RemoteBlast; use Bio::SeqIO; use Getopt::Long; $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-mod Blast] [-f seqformat] -z=\"entrez query\" -v 1 -t output_format -i seqfile\n"; my ($prog, $db, $expect,$method) = ( 'blastp', 'nr', '10', 'Blast'); my ($sequencefile,$sequenceformat,$help, $entrez, $outputformat, $verbose) = (undef, 'fasta',undef, undef, undef, 1); &GetOptions('prog|p=s' => \$prog, 'db|d=s' => \$db, 'expect|e=s' => \$expect, 'blsmod|module|method=s' => \$method, 'input|i=s' => \$sequencefile, 'format|f=s' => \$sequenceformat, 'help|h' => \$help, 'entrez|z=s' => \$entrez, 'output_format|t=s' => \$outputformat, 'verbose|v=s' => \$verbose ); if( $help ) { exec('perldoc', $0); die; } if( !defined $prog ) { die($USAGE . "\n\tMust specify a valid program name ([t]blast[pxn])\n"); } if( !defined $db ) { die($USAGE . "\n\tMust specify a db to search\n"); } if( !defined $sequencefile ) { die($USAGE . "\n\tMust specify an input file\n"); } my $blastfactory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, '-data' => $db, '-expect' => $expect, 'readmethod' => $method, ); if ($entrez) { if ($verbose) { print "Entrez query (submission side): $entrez\n"; } #$Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{ FORMAT_ENTREZ_QUERY } = $entrez; $Bio::Tools::Run::RemoteBlast::HEADER{ ENTREZ_QUERY } = $entrez; } if ($outputformat) { print "Don't use output format type; it doesn't work.\n"; $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{ FORMAT_TYPE } = $outputformat; } # submit_blast can only currenly handle fasta format files so I'll # preprocess outside of the module but I'd rather be sure here my $input; if( $sequenceformat !~ /fasta/ ) { my @seqs; my $seqio = new Bio::SeqIO('-format' => $sequenceformat, '-file' => $sequencefile ); while( my $seq = $seqio->next_seq() ) { push @seqs, $seq; } $input = \@seqs; } else { $input = $sequencefile; } my $r = $blastfactory->submit_blast($input); #my $r = $factory->submit_blast(?amino.fa?); print STDERR "waiting...\n" if( $verbose > 0 ); while ( my @rids = $blastfactory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $blastfactory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $blastfactory->remove_rid($rid); } print STDERR " checking $rid\n" if ( $verbose > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $blastfactory->save_output($filename); $blastfactory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $verbose > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } print STDERR scalar(@rids) . " left\n"; } __END__ # # BioPerl module for remote_blast.pl # # Cared for by Jason Stajich # # Copyright Jason Stajich # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME remote_blast.pl - script for submitting jobs to a remote blast server (ncbi blast queue at this time) =head1 SYNOPSIS % remote_blast.pl -p blastp -d ecoli -e 1e-5 -i myseqs.fa =head1 DESCRIPTION This module will run a remote blast on a set of sequences by submitting them to the NCBI blast queue and printing the output of the request. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l at bioperl.org - General discussion http://bioperl.org/MailList.shtml - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: http://bugzilla.open-bio.org/ =head1 AUTHOR - Jason Stajich Email jason-at-bioperl-dot-org =cut From anorman at stanford.edu Thu Mar 16 18:53:42 2006 From: anorman at stanford.edu (Andrew Norman) Date: Thu, 16 Mar 2006 15:53:42 -0800 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) Message-ID: <002301c64954$db22cf90$9a4241ab@anorman> Hi Everyone I'm brand new to bioperl and object oriented programming in general. I've been trying some of the example scripts in the HOWTO and I ran into the following error (I'm using bioperl v. 1.2.3, windows XP) while trying to get some blast results to print: "can't locate object method "next_result" via package "Bio::Seq" at blast_tetra.pl" Here's the script: use Bio::Seq; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; open OUTPUT, ">>blast_output.txt"; my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", -format => "fasta" ); my @params = (program => 'blastx', database => 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => 'SearchIO' ); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); while (my $seq_obj = $file_obj->next_seq){ my $report_obj = $blast_obj->btblastall($seq_obj); my $result_obj = $report_obj->next_result; print OUTPUT $result_obj->num_hits,"\t",$result_obj->hits, "\n"; } close OUTPUT; I'd appreciate any help anyone has to offer. Thanks! andy From koski at cenix-bioscience.com Fri Mar 17 04:10:57 2006 From: koski at cenix-bioscience.com (Liisa Koski) Date: Fri, 17 Mar 2006 10:10:57 +0100 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO In-Reply-To: <44199256.5050709@utk.edu> References: <200603161614.24820.koski@cenix-bioscience.com> <44199256.5050709@utk.edu> Message-ID: <200603171010.57673.koski@cenix-bioscience.com> Thanks Stefan, Unfortunately I only parse out the URL and not the primary_id or any comments. print "\n\nDBlinks for geneid: ",$gene->id, "\t", "acc: ", $gene->accession_number,"\n"; my @dblinks= $ann->get_Annotations('dblink'); foreach my $dblink (@dblinks) { next unless ($dblink->database eq "KEGG"); print "primary_id:", "\t",$dblink->primary_id,"\n"; print "url:", "\t", $dblink->url, "\n"; print "as_text:", "\t", $dblink->as_text, "\n"; print "optional_id:","\t",$dblink->optional_id,"\n" ; print "comment:", "\t", $dblink->comment, "\n" ; print "object_id:", "\t", $dblink->object_id, "\n"; print "namespase:", "\t", $dblink->namespace, "\n" ; print "authority:", "\t", $dblink->authority, "\n" ; print "\nhash_tree\n"; my $hash_ref = $dblink->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } Output: ------------------------------------ DBlinks for geneid: ABAT acc: 18 Use of uninitialized value in print at ./entrez_gene_seqio.pl line 42. primary_id: url: http://www.genome.jp/dbget-bin/www_bget?hsa:18 Use of uninitialized value in concatenation (.) or string at /netshare/home/koski/perl_modules/Bio/Annotation/DBLink.pm line 146. as_text: Direct database link to in database KEGG Use of uninitialized value in print at ./entrez_gene_seqio.pl line 48. optional_id: Use of uninitialized value in print at ./entrez_gene_seqio.pl line 50. comment: Use of uninitialized value in print at ./entrez_gene_seqio.pl line 52. object_id: namespase: KEGG Use of uninitialized value in print at ./entrez_gene_seqio.pl line 56. authority: printing hash_tree database: KEGG Use of uninitialized value in print at ./entrez_gene_seqio.pl line 62. primary_id: ----------------------------------------- I see that on the gene page for ABAT (acc: 18) there are KEGG pathways: KEGG pathway: Alanine and aspartate metabolism 00252 KEGG pathway: Butanoate metabolism 00650 KEGG pathway: Glutamate metabolism 00251 KEGG pathway: Propanoate metabolism 00640 KEGG pathway: Valine, leucine and isoleucine degradation 00280 KEGG pathway: beta-Alanine metabolism 00410 Is it possible to pull out these pathway names? Thanks, Liisa On Thursday 16 March 2006 17:29, Stefan Kirov wrote: > Do this: > my @dblinks=$ann->get_Annotations('dblink'); > foreach my $link (@dblinks) { > next unless ($dblink->database eq 'KEGG"); > print $dblink->primary_id,"\t",$dblink->url,"\n"; > } > This works for me, hopefully it will for you too. Let me know if > something is not right. > Stefan > > Liisa Koski wrote: > >Hi, > >I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from > >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz). > > > >I'm using bioperl-1.5.1. > > > >I want to extract the KEGG annotations. > >See code below. > > > >use Bio::SeqIO; > >use Bio::ASN1::EntrezGene; > > > >my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > > -file => 'Homo_sapiens'); > >while (my $gene = $seqio->next_seq){ > > print "\n",$gene->id, "\t", $gene->accession_number, "\n"; > > my $ann = $gene->annotation(); > > foreach my $key ( $ann->get_all_annotation_keys() ) { > > my @values = $ann->get_Annotations($key); > > foreach my $value ( @values ) { > > print $key, "\t", "=", "\t", $value->as_text,"\n"; > > } > > } > >} > > > >Unfortunately the only KEGG annotation I see in the results looks like: > >dblink = Direct database link to in database KEGG > >(Notice the space between 'to in') > > > >Anyone have any ideas how to get the KEGG annotation results? > > > >Note: I also tried parsing the file > >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags. > >gz but I got the below error: > > > >./entrez_gene_seqio.pl Homo_sapiens.ags > >Data Error: none conforming data found on line 1 in Homo_sapiens.ags! > >first 20 (or till end of input) characters including the non-conforming > > data: 00 > > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm > >line 138 > > > > > >Thanks, > >Liisa > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Liisa Koski Bioinformatics Software Engineer Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Phone: +49(351)4173-149 From haralds_listen at gmx.de Fri Mar 17 05:43:13 2006 From: haralds_listen at gmx.de (Harald) Date: Fri, 17 Mar 2006 11:43:13 +0100 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) In-Reply-To: <002301c64954$db22cf90$9a4241ab@anorman> References: <002301c64954$db22cf90$9a4241ab@anorman> Message-ID: <441A92C1.3090204@gmx.de> Hi Andrew. At first: I am also a newbie, so beware of my suggestions ;-) Then I would suggest to install a newer version of bioperl. Bioperl is a work in progress and you might miss some useful features or get some bugs if you stick with bioperl 1.2.3. There is a simple way via CPAN (just follow the instructions on www.bioperl.org) to update to 1.4. (But I failed to update to 1.5 under windows, so I installed linux) I think your problem is because there is no wrapper for btbblast. (at least under 1.5) Try blastall. Does it work? Regards Harald From torsten.seemann at infotech.monash.edu.au Fri Mar 17 06:37:39 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 17 Mar 2006 22:37:39 +1100 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) In-Reply-To: <002301c64954$db22cf90$9a4241ab@anorman> References: <002301c64954$db22cf90$9a4241ab@anorman> Message-ID: <441A9F83.9070008@infotech.monash.edu.au> Andrew, > I'm brand new to bioperl and object oriented programming in general. I've been trying some of the example scripts in the HOWTO and I ran into the following error (I'm using bioperl v. 1.2.3, windows XP) while trying to get some blast results to print: > "can't locate object method "next_result" via package "Bio::Seq" at blast_tetra.pl" StandAloneBlast has never supported "btblastall". > my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", -format => "fasta" ); > my @params = (program => 'blastx', database => 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => 'SearchIO' ); > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > while (my $seq_obj = $file_obj->next_seq){ I'm not even sure why you even got past this line without a Perl error? (perhaps try "use strict;" at the top of your script) > my $report_obj = $blast_obj->btblastall($seq_obj); If it has the same command line parameters as standard "blastall" you could hack it to work by renaming your btblastall.exe to blastall.exe ... It may also work if you can hack into the module internals, but this is not recommended. Either way, could you please email me a URL out "btblastall /?" output so I/we can consider it for inclusion in the future rewrite of the StandAloneBlast modules? Thanks. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From osborne1 at optonline.net Fri Mar 17 08:58:55 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Mar 2006 08:58:55 -0500 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) In-Reply-To: <002301c64954$db22cf90$9a4241ab@anorman> Message-ID: Andy, Change "btblastall" to "blastall", let's see what happens. A question: does your search work from the commandline? >blastall -p blastx -d Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa -i blast_input.fasta Something like that... Brian O. On 3/16/06 6:53 PM, "Andrew Norman" wrote: > Hi Everyone > > I'm brand new to bioperl and object oriented programming in general. I've > been trying some of the example scripts in the HOWTO and I ran into the > following error (I'm using bioperl v. 1.2.3, windows XP) while trying to get > some blast results to print: > > "can't locate object method "next_result" via package "Bio::Seq" at > blast_tetra.pl" > > Here's the script: > > use Bio::Seq; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > > > open OUTPUT, ">>blast_output.txt"; > > my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", -format => > "fasta" ); > my @params = (program => 'blastx', database => > 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => 'SearchIO' ); > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > > while (my $seq_obj = $file_obj->next_seq){ > > my $report_obj = $blast_obj->btblastall($seq_obj); > > my $result_obj = $report_obj->next_result; > print OUTPUT $result_obj->num_hits,"\t",$result_obj->hits, "\n"; > > > > } > > close OUTPUT; > > > I'd appreciate any help anyone has to offer. Thanks! > > andy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Mar 17 09:51:40 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Mar 2006 09:51:40 -0500 Subject: [Bioperl-l] update to bp_remote_blast.pl In-Reply-To: Message-ID: Evan, Added update. Brian O. On 3/15/06 11:21 AM, "Evan" wrote: > #!/usr/local/bin/perl -w > # BioPerl module for remote_blast.pl > # > # Revived by Evan Weaver for bioperl-1.5.1 > # 3/14/2006 > # > # Copyright Jason Stajich, Evan Weaver > # > # You may distribute this module under the same terms as perl itself > > # POD documentation - main docs after the code > > use strict; > use vars qw($USAGE); > > use Bio::Tools::Run::RemoteBlast; > use Bio::SeqIO; > use Getopt::Long; > > $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-mod > Blast] [-f seqformat] -z=\"entrez query\" -v 1 -t output_format -i > seqfile\n"; > > my ($prog, $db, $expect,$method) = ( 'blastp', 'nr', '10', 'Blast'); > > my ($sequencefile,$sequenceformat,$help, $entrez, $outputformat, > $verbose) = (undef, 'fasta',undef, undef, undef, 1); > > &GetOptions('prog|p=s' => \$prog, > 'db|d=s' => \$db, > 'expect|e=s' => \$expect, > 'blsmod|module|method=s' => \$method, > 'input|i=s' => \$sequencefile, > 'format|f=s' => \$sequenceformat, > 'help|h' => \$help, > 'entrez|z=s' => \$entrez, > 'output_format|t=s' => \$outputformat, > 'verbose|v=s' => \$verbose > ); > > if( $help ) { > exec('perldoc', $0); > die; > } > > if( !defined $prog ) { > die($USAGE . "\n\tMust specify a valid program name ([t]blast[pxn])\n"); > } > if( !defined $db ) { > die($USAGE . "\n\tMust specify a db to search\n"); > } > if( !defined $sequencefile ) { > die($USAGE . "\n\tMust specify an input file\n"); > } > > my $blastfactory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, > '-data' => $db, > '-expect' => $expect, > 'readmethod' => $method, > ); > > if ($entrez) { > if ($verbose) { > print "Entrez query (submission side): $entrez\n"; > } > #$Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{ FORMAT_ENTREZ_QUERY > } = $entrez; > $Bio::Tools::Run::RemoteBlast::HEADER{ ENTREZ_QUERY } = $entrez; > > } > if ($outputformat) { > print "Don't use output format type; it doesn't work.\n"; > $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{ FORMAT_TYPE } = > $outputformat; > } > > # submit_blast can only currenly handle fasta format files so I'll > # preprocess outside of the module but I'd rather be sure here > > my $input; > if( $sequenceformat !~ /fasta/ ) { > my @seqs; > my $seqio = new Bio::SeqIO('-format' => $sequenceformat, > '-file' => $sequencefile ); > while( my $seq = $seqio->next_seq() ) { > push @seqs, $seq; > } > $input = \@seqs; > } else { > $input = $sequencefile; > } > > my $r = $blastfactory->submit_blast($input); > #my $r = $factory->submit_blast(?amino.fa?); > > print STDERR "waiting...\n" if( $verbose > 0 ); > while ( my @rids = $blastfactory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $blastfactory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $blastfactory->remove_rid($rid); > } > print STDERR " checking $rid\n" if ( $verbose > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $blastfactory->save_output($filename); > $blastfactory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $verbose > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > print STDERR scalar(@rids) . " left\n"; > } > > > __END__ > > # > # BioPerl module for remote_blast.pl > # > # Cared for by Jason Stajich > # > # Copyright Jason Stajich > # > # You may distribute this module under the same terms as perl itself > > # POD documentation - main docs before the code > > =head1 NAME > > remote_blast.pl - script for submitting jobs to a remote blast server > (ncbi blast queue at this time) > > =head1 SYNOPSIS > > % remote_blast.pl -p blastp -d ecoli -e 1e-5 -i myseqs.fa > > =head1 DESCRIPTION > > This module will run a remote blast on a set of sequences by > submitting them to the NCBI blast queue and printing the output of the > request. > > =head1 FEEDBACK > > =head2 Mailing Lists > > User feedback is an integral part of the evolution of this and other > Bioperl modules. Send your comments and suggestions preferably to > the Bioperl mailing list. Your participation is much appreciated. > > bioperl-l at bioperl.org - General discussion > http://bioperl.org/MailList.shtml - About the mailing lists > > =head2 Reporting Bugs > > Report bugs to the Bioperl bug tracking system to help us keep track > the bugs and their resolution. Bug reports can be submitted via the > web: > > http://bugzilla.open-bio.org/ > > =head1 AUTHOR - Jason Stajich > > Email jason-at-bioperl-dot-org > > =cut From cjfields at uiuc.edu Fri Mar 17 10:44:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Mar 2006 09:44:09 -0600 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) In-Reply-To: Message-ID: <000001c649d9$a18a15b0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Friday, March 17, 2006 7:59 AM > To: Andrew Norman; bioperl-l > Subject: Re: [Bioperl-l] can't locate object method "next_result" > (probably a simple problem) > > Andy, > > Change "btblastall" to "blastall", let's see what happens. > > A question: does your search work from the commandline? > > >blastall -p blastx -d Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa -i > blast_input.fasta > > Something like that... > > Brian O. You also might want to try passing any text output from that run through SearchIO to see if it parses w/o breaking or freezing. You're using an old version of bioperl, so it may not parse. ------------------------------- use Bio::SearchIO; my $file = shift @ARGV; my $v = 0 ; my $searchin = new Bio::SearchIO(-verbose => $v, -format => 'blast', -file => $file); while( my $result = $searchin->next_result ) { print $result->query_description,"\n"; while( my $hit = $result->next_hit ) { print " ", $hit->expect,"\n"; } } ------------------------------- > On 3/16/06 6:53 PM, "Andrew Norman" wrote: > > > Hi Everyone > > > > I'm brand new to bioperl and object oriented programming in general. > I've > > been trying some of the example scripts in the HOWTO and I ran into the > > following error (I'm using bioperl v. 1.2.3, windows XP) while trying to > get > > some blast results to print: > > > > "can't locate object method "next_result" via package "Bio::Seq" at > > blast_tetra.pl" > > > > Here's the script: > > > > use Bio::Seq; > > use Bio::SeqIO; > > use Bio::Tools::Run::StandAloneBlast; > > > > > > > > open OUTPUT, ">>blast_output.txt"; > > > > my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", -format => > > "fasta" ); > > my @params = (program => 'blastx', database => > > 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => > 'SearchIO' ); > > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > > > > > while (my $seq_obj = $file_obj->next_seq){ > > > > my $report_obj = $blast_obj->btblastall($seq_obj); I agree with Torsten here. This shouldn't work but does in a weird way. The error indicates that, instead of getting a result you're getting a Bio::Seq object. That makes sense if you consider that StandAloneBlast uses AUTOLOAD. The name of the function is stored in $AUTOLOAD and sort of illustrates the problems with using AUTOLOADed subs. Here's the AUTOLOAD sub in StandAloneBlast with my comments: ------------------------------- sub AUTOLOAD { my $self = shift; # $AUTOLOAD should be Bio::Tools::Run::StandAloneBlast::btblastall my $attr = $AUTOLOAD; $attr =~ s/.*:://; # $attr is now 'btblastall' my $attr_letter = $BLASTTYPE eq 'ncbi' ? substr($attr, 0, 1) : $attr; # $attr_letter is now 'b' # actual key is first letter of $attr unless first attribute # letter is underscore (as in _READMETHOD), the $attr is a BLAST # parameter and should be truncated to its first letter only $attr = ($attr_letter eq '_') ? $attr : $attr_letter; # $attr now is 'b' $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; # passes this try at catching weird sub names (@BLASTALL_PARAMS has 'b') # $self->throw("Unallowed parameter: $attr !") unless $ok_field{$attr_letter}; $self->{$attr_letter} = shift if @_; # places seq object in $self->{$attr_letter} return $self->{$attr_letter}; # oops, returns seq object } ------------------------------- So, don't use 'btblastall'. Not sure about using AUTOLOAD here but it seems to be used in Bioperl-run tools (which StandAloneBlast is really) > > my $result_obj = $report_obj->next_result; > > print OUTPUT $result_obj->num_hits,"\t",$result_obj->hits, "\n"; > > > > > > > > } > > > > close OUTPUT; > > > > > > I'd appreciate any help anyone has to offer. Thanks! > > > > andy > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From shawnh at stanford.edu Fri Mar 17 12:02:10 2006 From: shawnh at stanford.edu (Shawn Hoon) Date: Fri, 17 Mar 2006 09:02:10 -0800 Subject: [Bioperl-l] can't locate object method "next_result" (probably a simple problem) In-Reply-To: <441A9F83.9070008@infotech.monash.edu.au> References: <002301c64954$db22cf90$9a4241ab@anorman> <441A9F83.9070008@infotech.monash.edu.au> Message-ID: On Mar 17, 2006, at 3:37 AM, Torsten Seemann wrote: > Andrew, > >> I'm brand new to bioperl and object oriented programming in >> general. I've been trying some of the example scripts in the >> HOWTO and I ran into the following error (I'm using bioperl v. >> 1.2.3, windows XP) while trying to get some blast results to print: >> "can't locate object method "next_result" via package "Bio::Seq" >> at blast_tetra.pl" > > StandAloneBlast has never supported "btblastall". > >> my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", - >> format => "fasta" ); >> my @params = (program => 'blastx', database => >> 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => >> 'SearchIO' ); >> my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); >> while (my $seq_obj = $file_obj->next_seq){ > > I'm not even sure why you even got past this line without a Perl > error? > (perhaps try "use strict;" at the top of your script) > i think StandAloneBlast was interpreting btblastall as a -b parameter to feed into to blastall. >> my $report_obj = $blast_obj->btblastall($seq_obj); > > If it has the same command line parameters as standard "blastall" > you could > hack it to work by renaming your btblastall.exe to blastall.exe ... > > It may also work if you can hack into the module internals, but > this is not > recommended. > > Either way, could you please email me a URL out "btblastall /?" > output so I/we > can consider it for inclusion in the future rewrite of the > StandAloneBlast > modules? Thanks. > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Mar 17 12:27:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Mar 2006 11:27:39 -0600 Subject: [Bioperl-l] can't locate object method "next_result" (probablya simple problem) In-Reply-To: Message-ID: <000601c649e8$17767f80$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Shawn Hoon > Sent: Friday, March 17, 2006 11:02 AM > To: Torsten Seemann > Cc: Andrew Norman; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] can't locate object method "next_result" > (probablya simple problem) > > > On Mar 17, 2006, at 3:37 AM, Torsten Seemann wrote: > > > Andrew, > > > >> I'm brand new to bioperl and object oriented programming in > >> general. I've been trying some of the example scripts in the > >> HOWTO and I ran into the following error (I'm using bioperl v. > >> 1.2.3, windows XP) while trying to get some blast results to print: > >> "can't locate object method "next_result" via package "Bio::Seq" > >> at blast_tetra.pl" > > > > StandAloneBlast has never supported "btblastall". > > > >> my $file_obj = Bio::SeqIO->new(-file => "blast_input.fasta", - > >> format => "fasta" ); > >> my @params = (program => 'blastx', database => > >> 'Tetraodon_nigroviridis.TETRAODON7.mar.pep.fa', _READMETHOD => > >> 'SearchIO' ); > >> my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > >> while (my $seq_obj = $file_obj->next_seq){ > > > > I'm not even sure why you even got past this line without a Perl > > error? > > (perhaps try "use strict;" at the top of your script) > > > > i think StandAloneBlast was interpreting btblastall as a -b parameter > to feed into to blastall. > ... Yep. AUTOLOADed it. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From saldroubi at yahoo.com Fri Mar 17 15:32:44 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 17 Mar 2006 12:32:44 -0800 (PST) Subject: [Bioperl-l] Correlation coefficient? Message-ID: <20060317203244.71901.qmail@web34302.mail.mud.yahoo.com> Hello everyone, I need to determine the correlation coefficient between two data sets. Is this implemented in bioperl or some perl module I can use? This would save me time from writing it myself. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From saldroubi at gmail.com Fri Mar 17 13:30:29 2006 From: saldroubi at gmail.com (Sam Al-Droubi) Date: Fri, 17 Mar 2006 13:30:29 -0500 Subject: [Bioperl-l] Correlation coefficient? Message-ID: Hello everyone, I need to determine the correlation coefficient between two data sets. Is this implemented in bioperl or some perl module I can use? This would save me time from writing it myself. Thank you. -- Sincerely, Sam Al-Droubi, M.S. From torsten.seemann at infotech.monash.edu.au Fri Mar 17 16:13:52 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 18 Mar 2006 08:13:52 +1100 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <200603171541.04313.y.itan@ucl.ac.uk> References: <200603141601.11779.y.itan@ucl.ac.uk> <200603161738.19580.y.itan@ucl.ac.uk> <1142541455.6306.12.camel@chauvel.csse.monash.edu.au> <200603171541.04313.y.itan@ucl.ac.uk> Message-ID: <441B2690.8030801@infotech.monash.edu.au> > Many thanks for your good help. I have followed your advice, and updated the > path to the fasta file, both in BLASTDB, BLASTDATADIR and in the code. But > even now it doesn't work. The directory of my fasta file > is: /home/Yuval/FastaDBs and the one of the Blast binaries > is: /home/Yuval/Applications/blast/bin > > And I keep getting the same error message: > > [blastall] WARNING: test: Unable to open Homo_sapiens.NCBI354.feb.pep.fa.pin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 /home/Yuval/Applications/blast/bin/blastall > -pblastp -d /home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep.fa (I assume there was a space between -p and blastp) > -i /tmp/98fwqIYOMi -o /tmp/vlRcHscPcH This doesn't look like a BioPerl problem anymore. It says it can't find the blast index files for Homo_sapiens.NCBI354.feb.pep.fa. But the "-d xxxx" line is trying to load them from the right place. I assume you have run formatdb to create the indices, and they are in /home/Yuval/FastaDBs/, and that there were no errors ? % cd /home/Yuval/FastaDBs/ % formatdb -i Homo_sapiens.NCBI354.feb.pep.fa -p T -o T % cat formatdb.log % ls -lsa Homo_sapiens.NCBI354.feb.pep.fa.* and that this last "ls" lists about seven index files called Homo_sapiens.NCBI354.feb.pep.fa.pin, .psq, .p?? etc And they are READABLE by the user who is running Blast (hopefully "Yuval")? If so, can you run blast on the command line? % blastall -i SOME_PROTEIN.FA -d /home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep -p blastp BioPerl will probably never run your blast if you can't get that working. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From y.itan at ucl.ac.uk Fri Mar 17 10:41:04 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Fri, 17 Mar 2006 15:41:04 +0000 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <1142541455.6306.12.camel@chauvel.csse.monash.edu.au> References: <200603141601.11779.y.itan@ucl.ac.uk> <200603161738.19580.y.itan@ucl.ac.uk> <1142541455.6306.12.camel@chauvel.csse.monash.edu.au> Message-ID: <200603171541.04313.y.itan@ucl.ac.uk> Dear Torsten, Many thanks for your good help. I have followed your advice, and updated the path to the fasta file, both in BLASTDB, BLASTDATADIR and in the code. But even now it doesn't work. The directory of my fasta file is: /home/Yuval/FastaDBs and the one of the Blast binaries is: /home/Yuval/Applications/blast/bin And I keep getting the same error message: [blastall] WARNING: test: Unable to open Homo_sapiens.NCBI354.feb.pep.fa.pin ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 /home/Yuval/Applications/blast/bin/blastall -pblastp -d /home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep.fa -i /tmp/98fwqIYOMi -o /tmp/vlRcHscPcH STACK Bio::Tools::Run::StandAloneBlast::_runblast /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:633 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:602 STACK Bio::Tools::Run::StandAloneBlast::blastall /home/Yuval/bioperl-live/Bio/Tools/Run/StandAloneBlast.pm:489 STACK toplevel test6:74 -------------------------------------- The programme runs well until the last line (the 4th here) which causes the crash: my @params = (program => 'blastp', database => '/home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep.fa'); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>$seq1); my $report_obj = $blast_obj->blastall($seq_obj); #my $result_obj = $report_obj->next_result; #print $result_obj->num_hits; Many thanks for any clue. I truly appreciate your time and effort. Yuval On Thursday 16 March 2006 20:37, you wrote: > Yuval, > > > I have followed your suggestions, and would be truly grateful for one > > more hint. > > I have updated the $PATH to include the directory of my Blast binaries. I > > have also made a .ncbirc file which includes the path to the fasta file I > > have downloaded (all human peptides): > > [NCBI] > > Data="/home/Yuval/Applications/blast/data/" > > This directory is "is the path to the location of the Standalone BLAST > 'data' subdirectory" ie. the one that comes with the Blast binaries and > has the BLOSUMxx and PAMxx files. It is NOT (normallly) the directory to > put your fasta files in. > > > But when I am trying to run one sequence against the human peptides fasta > > file, I get the following error message: > > > > linux:/home/Yuval # perl test5 > > [blastall] WARNING: test: Unable to open > > Homo_sapiens.NCBI354.feb.pep.fa.pin > > It can't find the blast indices for Homo_sapiens.NCBI354.feb.pep.fa > The .pin file is one of the index files created by the 'formatdb' > program. > > > ------------- EXCEPTION ------------- > > MSG: blastall call crashed: 256 > > /home/Yuval/Applications/blast/bin/blastall -p blastp -d > > /Homo_sapiens.NCBI354.feb.pep.fa -i /tmp/nvkaUCJjDq -o /tmp/klP53IHZKj > > The "-d /Homo_sapiens.NCBI354.feb.pep.fa" means it is looking in your > computer's root directory "/" for the .pin (and other index) files. > > > I would be grateful for any hint. These are the relevant lines of the > > code: > > > > my $i = 0; #for the first sequence in human database > > my $seq1 = $human_genes->[$i]->get_longest_peptide_Member()->sequence; > > my @params = (program => 'blastp', database => > > 'Homo_sapiens.NCBI354.feb.pep.fa', _READMETHOD => 'SearchIO' ); > > For some reason it is looking in "/" for your index files. Do you have > environmental variable $BLASTDB or $BLASTDATADIR set to "/" ? > > Three solutions: > 1. Set "database => /full/path/to/Homo.pep.fa" > 2. Set BLASTDB to /full/path/to > 3. Set BLASTDATADIR to /full/path/to > > Options 2. and 3. can be done in your shell/environment or set in a > Perl BEGIN block. ie. BEGIN { $ENV{BLASTDB} = '/......'; } > > Also, please remove the _READMETHOD, you don't need it. SearchIO is the > defauly and used by nearly everybody. > > > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > > my $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>$seq1); > > my $report_obj = $blast_obj->blastall($seq_obj); > > my $result_obj = $report_obj->next_result; > > print $result_obj->num_hits; > > Rest looks ok. From cuiw at mail.nih.gov Fri Mar 17 16:44:51 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Fri, 17 Mar 2006 16:44:51 -0500 Subject: [Bioperl-l] Correlation coefficient? Message-ID: Statistics::Basic::Correlation; -----Original Message----- From: Sam Al-Droubi [mailto:saldroubi at gmail.com] Sent: Friday, March 17, 2006 1:30 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Correlation coefficient? Hello everyone, I need to determine the correlation coefficient between two data sets. Is this implemented in bioperl or some perl module I can use? This would save me time from writing it myself. Thank you. -- Sincerely, Sam Al-Droubi, M.S. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Mar 17 16:59:49 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 17 Mar 2006 16:59:49 -0500 Subject: [Bioperl-l] Correlation coefficient? In-Reply-To: <20060317203244.71901.qmail@web34302.mail.mud.yahoo.com> References: <20060317203244.71901.qmail@web34302.mail.mud.yahoo.com> Message-ID: there are a lot of other perl modules besides bioperl, have a look around at cpan. (www.cpan.org or search.cpan.org) see Statistics::Basic I think. http://search.cpan.org/~jettero/Statistics-Basic-0.42/ I also use R for my stats work so you can shell out and run a quick R program too. On Mar 17, 2006, at 3:32 PM, Sam Al-Droubi wrote: > Hello everyone, > > I need to determine the correlation coefficient between two data > sets. Is this implemented in bioperl or some perl module I can > use? This would save me time from writing it myself. > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From skirov at utk.edu Fri Mar 17 17:32:26 2006 From: skirov at utk.edu (Stefan Kirov) Date: Fri, 17 Mar 2006 17:32:26 -0500 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO In-Reply-To: <200603161614.24820.koski@cenix-bioscience.com> References: <200603161614.24820.koski@cenix-bioscience.com> Message-ID: <441B38FA.6000707@utk.edu> OK, done now. Update to bioperl-live and use optional_id to get the text. Let me know how it goes. Stefan Liisa Koski wrote: >Hi, >I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz). > >I'm using bioperl-1.5.1. > >I want to extract the KEGG annotations. >See code below. > >use Bio::SeqIO; >use Bio::ASN1::EntrezGene; > >my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > -file => 'Homo_sapiens'); >while (my $gene = $seqio->next_seq){ > print "\n",$gene->id, "\t", $gene->accession_number, "\n"; > my $ann = $gene->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print $key, "\t", "=", "\t", $value->as_text,"\n"; > } > } >} > >Unfortunately the only KEGG annotation I see in the results looks like: >dblink = Direct database link to in database KEGG >(Notice the space between 'to in') > >Anyone have any ideas how to get the KEGG annotation results? > >Note: I also tried parsing the file >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz >but I got the below error: > >./entrez_gene_seqio.pl Homo_sapiens.ags >Data Error: none conforming data found on line 1 in Homo_sapiens.ags! >first 20 (or till end of input) characters including the non-conforming data: >00 > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm >line 138 > > >Thanks, >Liisa > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > - From wgallin at ualberta.ca Fri Mar 17 17:48:33 2006 From: wgallin at ualberta.ca (Warren Gallin) Date: Fri, 17 Mar 2006 15:48:33 -0700 Subject: [Bioperl-l] Sporadic Failure to retrieve Sequence Objects from GenPept Message-ID: <897D05A2-F383-427E-BEF2-74A5DFD43D80@ualberta.ca> Hi, I am getting a sporadic error running a script. The code fragment that is failing is foreach $gi (@ginumbers){ $find_gi = $gi; $find_gi =~ s/gi(\d+)/"$1"/; $seq_object = $database -> get_Seq_by_id("$find_gi"); $description = $seq_object -> desc; #line 54 $species_object = $seq_object -> species; $species_name = $species_object -> binomial; It is returning a message like this, although the gi number that it fails on varies from run to run. -------------------- WARNING --------------------- MSG: id ("73998177") does not exist --------------------------------------------------- Can't call method "desc" on an undefined value at gi_to_name.pl line 54. As far as as I can tell, every once and awhile the retrieval of $seq_object from the database (GenPept in this case) is failing. The gi numbers exist, and if I just run the script again it doesn't fail. This leads me to believe that there may be some kind of internal wait cycle that returns undefined if the $seq_object retrieval has not completed in a certain time. In any case, is there some way to tweak a script to prevent this kind of problem? Thanks, Warren Gallin From skirov at utk.edu Fri Mar 17 15:30:28 2006 From: skirov at utk.edu (Stefan Kirov) Date: Fri, 17 Mar 2006 15:30:28 -0500 Subject: [Bioperl-l] Parsing entrezgene with Bio::SeqIO In-Reply-To: <200603171010.57673.koski@cenix-bioscience.com> References: <200603161614.24820.koski@cenix-bioscience.com> <44199256.5050709@utk.edu> <200603171010.57673.koski@cenix-bioscience.com> Message-ID: <441B1C64.7020007@utk.edu> Liisa, This program: #!/usr/bin/perl use Bio::SeqIO; use Data::Dumper; my $file=shift; my $previd; my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -debug=>'on',-service_record=>'yes');#, -locuslink=>'convert'); while (1) { my ($seq,$struct,$uncapt,%transvar); eval { ($gene,$struct,$uncapt)=$eio->next_seq; }; if ($@) {print $@,"\n"; print $previd,"\n"; undef $previd; }; last unless ($gene); my $gid= $gene->accession_number; print "\n\nDBlinks for geneid: ",$gene->id, "\t", "acc: ", $gene->accession_number,"\n"; my $ann=$gene->annotation; my @dblinks= $ann->get_Annotations('dblink'); foreach my $dblink (@dblinks) { next unless ($dblink->database eq "KEGG"); print "primary_id:", "\t",$dblink->primary_id,"\n"; print "url:", "\t", $dblink->url, "\n"; print "as_text:", "\t", $dblink->as_text, "\n"; print "optional_id:","\t",$dblink->optional_id,"\n" ; print "comment:", "\t", $dblink->comment, "\n" ; print "object_id:", "\t", $dblink->object_id, "\n"; print "namespase:", "\t", $dblink->namespace, "\n" ; print "authority:", "\t", $dblink->authority, "\n" ; print "\nhash_tree\n"; my $hash_ref = $dblink->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } } prints out this: DBlinks for geneid: A1BG acc: 1 primary_id: hsa:1 url: http://www.genome.jp/dbget-bin/www_bget?hsa:1 as_text: Direct database link to hsa:1 in database KEGG optional_id: comment: object_id: hsa:1 namespase: KEGG authority: hash_tree database: KEGG primary_id: hsa:1 DBlinks for geneid: A2M acc: 2 primary_id: hsa:2 url: http://www.genome.jp/dbget-bin/www_bget?hsa:2 as_text: Direct database link to hsa:2 in database KEGG optional_id: comment: object_id: hsa:2 namespase: KEGG authority: hash_tree database: KEGG primary_id: hsa:2 By the way I encourage you to use Data::Dumper if you want to print an object- it is much nicer. Just do print Dumper($obj); The text you refer to is not captured (but I have no ider why you don't see the primary id). You can recover it from the uncaptured data, but it is kind of dirty. I will try to make the parser capture this as well. Stefan Liisa Koski wrote: >Thanks Stefan, >Unfortunately I only parse out the URL and not the primary_id or any comments. > > > print "\n\nDBlinks for geneid: ",$gene->id, "\t", > "acc: ", $gene->accession_number,"\n"; > my @dblinks= $ann->get_Annotations('dblink'); > foreach my $dblink (@dblinks) { > next unless ($dblink->database eq "KEGG"); > print "primary_id:", "\t",$dblink->primary_id,"\n"; > print "url:", "\t", $dblink->url, "\n"; > print "as_text:", "\t", $dblink->as_text, "\n"; > print "optional_id:","\t",$dblink->optional_id,"\n" ; > print "comment:", "\t", $dblink->comment, "\n" ; > print "object_id:", "\t", $dblink->object_id, "\n"; > print "namespase:", "\t", $dblink->namespace, "\n" ; > print "authority:", "\t", $dblink->authority, "\n" ; > > print "\nhash_tree\n"; > my $hash_ref = $dblink->hash_tree; > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > } > >Output: >------------------------------------ >DBlinks for geneid: ABAT acc: 18 >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 42. >primary_id: >url: http://www.genome.jp/dbget-bin/www_bget?hsa:18 >Use of uninitialized value in concatenation (.) or string >at /netshare/home/koski/perl_modules/Bio/Annotation/DBLink.pm line 146. >as_text: Direct database link to in database KEGG >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 48. >optional_id: >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 50. >comment: >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 52. >object_id: >namespase: KEGG >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 56. >authority: > >printing hash_tree >database: KEGG >Use of uninitialized value in print at ./entrez_gene_seqio.pl line 62. >primary_id: >----------------------------------------- > >I see that on the gene page for ABAT (acc: 18) there are KEGG pathways: >KEGG pathway: Alanine and aspartate metabolism 00252 >KEGG pathway: Butanoate metabolism 00650 >KEGG pathway: Glutamate metabolism 00251 >KEGG pathway: Propanoate metabolism 00640 >KEGG pathway: Valine, leucine and isoleucine degradation 00280 >KEGG pathway: beta-Alanine metabolism 00410 > >Is it possible to pull out these pathway names? > >Thanks, >Liisa > > >On Thursday 16 March 2006 17:29, Stefan Kirov wrote: > > >>Do this: >>my @dblinks=$ann->get_Annotations('dblink'); >>foreach my $link (@dblinks) { >> next unless ($dblink->database eq 'KEGG"); >> print $dblink->primary_id,"\t",$dblink->url,"\n"; >>} >>This works for me, hopefully it will for you too. Let me know if >>something is not right. >>Stefan >> >>Liisa Koski wrote: >> >> >>>Hi, >>>I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from >>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz). >>> >>>I'm using bioperl-1.5.1. >>> >>>I want to extract the KEGG annotations. >>>See code below. >>> >>>use Bio::SeqIO; >>>use Bio::ASN1::EntrezGene; >>> >>>my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>> -file => 'Homo_sapiens'); >>>while (my $gene = $seqio->next_seq){ >>> print "\n",$gene->id, "\t", $gene->accession_number, "\n"; >>> my $ann = $gene->annotation(); >>> foreach my $key ( $ann->get_all_annotation_keys() ) { >>> my @values = $ann->get_Annotations($key); >>> foreach my $value ( @values ) { >>> print $key, "\t", "=", "\t", $value->as_text,"\n"; >>> } >>> } >>>} >>> >>>Unfortunately the only KEGG annotation I see in the results looks like: >>>dblink = Direct database link to in database KEGG >>>(Notice the space between 'to in') >>> >>>Anyone have any ideas how to get the KEGG annotation results? >>> >>>Note: I also tried parsing the file >>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags. >>>gz but I got the below error: >>> >>>./entrez_gene_seqio.pl Homo_sapiens.ags >>>Data Error: none conforming data found on line 1 in Homo_sapiens.ags! >>>first 20 (or till end of input) characters including the non-conforming >>>data: 00 >>>at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm >>>line 138 >>> >>> >>>Thanks, >>>Liisa >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > > From torsten.seemann at infotech.monash.edu.au Fri Mar 17 22:29:35 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 18 Mar 2006 14:29:35 +1100 Subject: [Bioperl-l] Sporadic Failure to retrieve Sequence Objects from GenPept In-Reply-To: <897D05A2-F383-427E-BEF2-74A5DFD43D80@ualberta.ca> References: <897D05A2-F383-427E-BEF2-74A5DFD43D80@ualberta.ca> Message-ID: <441B7E9F.9010104@infotech.monash.edu.au> Warren, > I am getting a sporadic error running a script. The code fragment > that is failing is > foreach $gi (@ginumbers){ > $find_gi = $gi; This line is the problem: > $find_gi =~ s/gi(\d+)/"$1"/; You are converting gi73998177 to "73998177" ie. you are adding double quotes to it. Then you try and lookup and id called "733998177" (with quotes!) rather than the integer alone: > $seq_object = $database -> get_Seq_by_id("$find_gi"); > $description = $seq_object -> desc; #line 54 > $species_object = $seq_object -> species; > $species_name = $species_object -> binomial; > > It is returning a message like this, although the gi number that it > fails on varies from run to run. > -------------------- WARNING --------------------- > MSG: id ("73998177") does not exist The confusing part here is that these double-quotes are yours, not part of the BioPerl's error message :-) > --------------------------------------------------- > Can't call method "desc" on an undefined value at gi_to_name.pl line 54. # defensive programming, add this line die "could get get_Seq_by_id($find_gi)" if not defined $seq_object; -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From r93626012 at ntu.edu.tw Sun Mar 19 09:39:46 2006 From: r93626012 at ntu.edu.tw (r93626012 at ntu.edu.tw) Date: Sun, 19 Mar 2006 22:39:46 +0800 Subject: [Bioperl-l] Hi! everyone Message-ID: <20060319223946.kl3c0eds2scsw00g@wmail9.cc.ntu.edu.tw> Hi!I am a graduate student of National Taiwan University. My name is scott. I had some probelms when i using bioperl module. I wanted to get a sequence from "swissprot" database and blast with nr database of NCBI automatically by bioperl module. This is my program: #!/usr/bin/perl -w use strict; use Bio::Perl; my $seq_object = get_sequence('swissport',"ROA1_HUMAN"); my $blast_result = blast_sequence($seq_object); write_blast(">roa1.blast",$blast_result); exit; then i could get the sequence successfully, however, it could not produce any results of blast. The error message as follow: ------------- EXCEPTION ------------- MSG: WebDBSeqI Request Error: 501 Protocol scheme 'http' is not supported Content-Type: text/plain Client-Date: Sun, 19 Mar 2006 06:25:06 GMT Client-Warning: Internal response 501 Protocol scheme 'http' is not supported STACK Bio::DB::WebDBSeqI::_stream_request /usr/lib/perl5/5.8.3/Bio/DB/WebDBSeqI.pm:728 STACK Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/5.8.3/Bio/DB/WebDBSeqI.pm:460 STACK Bio::DB::WebDBSeqI::get_Stream_by_id /usr/lib/perl5/5.8.3/Bio/DB/WebDBSeqI.pm:287 STACK Bio::DB::WebDBSeqI::get_Seq_by_id /usr/lib/perl5/5.8.3/Bio/DB/WebDBSeqI.pm:153 STACK Bio::Perl::get_sequence /usr/lib/perl5/5.8.3/Bio/Perl.pm:511 STACK toplevel test2.txt:6 -------------------------------------- -------------------- WARNING --------------------- MSG: id (ROA1_HUMAN) does not exist --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/RemoteBlast.pm line 392. -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 Content-Length: 178 Content-Type: application/x-www-form-urlencoded DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3Eblast-sequence-temp-id+%0A&EXPECT=1e-10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastp An Error Occurred

An Error Occurred

501 Protocol scheme 'http' is not supported --------------------------------------------------- Submitted Blast for [blast-sequence-temp-id] i had copy the "http" module and "HTML" module from other sever, but it still tell me that it could not find anything in the folder. Perhaps you can tell me what thing wrong or just i am missing something. Thank you for your reading and i will very glad of get your answers. From cjfields at uiuc.edu Sun Mar 19 19:10:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 19 Mar 2006 18:10:12 -0600 Subject: [Bioperl-l] Hi! everyone In-Reply-To: <20060319223946.kl3c0eds2scsw00g@wmail9.cc.ntu.edu.tw> References: <20060319223946.kl3c0eds2scsw00g@wmail9.cc.ntu.edu.tw> Message-ID: <5132CD57-514B-4DD7-9DFD-261B66E9CE55@uiuc.edu> On Mar 19, 2006, at 8:39 AM, r93626012 at ntu.edu.tw wrote: > Hi!I am a graduate student of National Taiwan University. My name > is scott. I had some probelms when i using bioperl module. I wanted > to get a sequence from "swissprot" database and blast with nr > database of NCBI automatically by bioperl module. This is my program: > > #!/usr/bin/perl -w > use strict; > use Bio::Perl; > > my $seq_object = get_sequence('swissport',"ROA1_HUMAN"); 'swissport' should be 'swissprot', though I get this script to work when spelled either way (well, I get some BLAST results back). This error was, I believe, fixed in a recent version of Bioperl; I would upgrade from CVS. However, the major problem I see is that the formatting from Bioperl is completely messed up now; it looks like the entire BLAST report is globbed together at the 'Query=' line. I'm sure this is an issue related to the recent blast changes at NCBI. I'll look into this tomorrow. Chris > > my $blast_result = blast_sequence($seq_object); > > write_blast(">roa1.blast",$blast_result); > > exit; > > then i could get the sequence successfully, however, it could not > produce any results of blast. The error message as follow: > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 501 Protocol scheme 'http' is not supported > Content-Type: text/plain > Client-Date: Sun, 19 Mar 2006 06:25:06 GMT > Client-Warning: Internal response > > 501 Protocol scheme 'http' is not supported > > STACK Bio::DB::WebDBSeqI::_stream_request /usr/lib/perl5/5.8.3/Bio/ > DB/WebDBSeqI.pm:728 > STACK Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/5.8.3/Bio/ > DB/WebDBSeqI.pm:460 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id /usr/lib/perl5/5.8.3/Bio/ > DB/WebDBSeqI.pm:287 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /usr/lib/perl5/5.8.3/Bio/DB/ > WebDBSeqI.pm:153 > STACK Bio::Perl::get_sequence /usr/lib/perl5/5.8.3/Bio/Perl.pm:511 > STACK toplevel test2.txt:6 > > -------------------------------------- > > -------------------- WARNING --------------------- > MSG: id (ROA1_HUMAN) does not exist > --------------------------------------------------- > Use of uninitialized value in concatenation (.) or string at /usr/ > lib/perl5/site_perl/5.8.3/Bio/Tools/Run/RemoteBlast.pm line 392. > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 > Content-Length: 178 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3Eblast- > sequence-temp-id+% > 0A&EXPECT=1e-10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L > &CDD_SEARCH=off&PROGRAM=blastp > > > An Error Occurred > >

An Error Occurred

> 501 Protocol scheme 'http' is not supported > > > > --------------------------------------------------- > Submitted Blast for [blast-sequence-temp-id] i had copy the "http" > module and "HTML" module from other sever, but it still tell me > that it could not find anything in the folder. Perhaps you can tell > me what thing wrong or just i am missing something. Thank you for > your reading and i will very glad of get your answers. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Mar 20 13:36:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Mar 2006 12:36:37 -0600 Subject: [Bioperl-l] Hi! everyone In-Reply-To: <5132CD57-514B-4DD7-9DFD-261B66E9CE55@uiuc.edu> Message-ID: <000601c64c4d$39e32ca0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Sunday, March 19, 2006 6:10 PM > To: r93626012 at ntu.edu.tw > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Hi! everyone > > > On Mar 19, 2006, at 8:39 AM, r93626012 at ntu.edu.tw wrote: > > > Hi!I am a graduate student of National Taiwan University. My name > > is scott. I had some probelms when i using bioperl module. I wanted > > to get a sequence from "swissprot" database and blast with nr > > database of NCBI automatically by bioperl module. This is my program: > > > > #!/usr/bin/perl -w > > use strict; > > use Bio::Perl; > > > > my $seq_object = get_sequence('swissport',"ROA1_HUMAN"); > > 'swissport' should be 'swissprot', though I get this script to work > when spelled either way (well, I get some BLAST results back). This > error was, I believe, fixed in a recent version of Bioperl; I would > upgrade from CVS. However, the major problem I see is that the > formatting from Bioperl is completely messed up now; it looks like > the entire BLAST report is globbed together at the 'Query=' line. > I'm sure this is an issue related to the recent blast changes at > NCBI. I'll look into this tomorrow. > > Chris > To follow up on this (for the mail-list in case anybody comes across this in the future): I get this to work without problems on Windows (which is where problems usually show up). I'll play around with it a bit more with Mac OS X to see if this may be an OS issue, but I haven't updated bioperl from CVS on my wife's IBook in quite a while so it could be an old version of bioperl mucking up the works. If this script isn't working for your bioperl version, try installing bioperl from CVS. My WinXP is running off bioperl from the latest CVS and it works fine. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From haralds_listen at gmx.de Mon Mar 20 15:14:21 2006 From: haralds_listen at gmx.de (Harald) Date: Mon, 20 Mar 2006 21:14:21 +0100 Subject: [Bioperl-l] how to "tile" the HSPs of a hit-object ? Message-ID: <441F0D1D.2060203@gmx.de> Hi all. I want to use Bioperl for doing some psi-blast postprocessing (under linux with bioperl 1.5 and perl 5.8.7). For doing so I would like to get for every hit-sequence its alignment with the query. So I dont want all those overlapping HSPs, but the one and only alignment with best score. I am reading in the documentation for that for some time and think, that "tiling" is what I want to do to each hit-object. As far as I have understood the documentation, calling Bio::Search::SearchUtils::tile_hsps($hit); (or calling $hit->ambiguous_aln(), which will call the aforementioned) should be suficient so that $hit will become tiled. Right? But if I run the following program, the ranges of the hsp-objects will still overlap :-( - no matter if I use tile_hsp($hit) or $hit->ambiguous_aln(). ================================ use strict; use Bio::Tools::Run::StandAloneBlast; my $report = new Bio::SearchIO('-file'=>'out.txt', '-fomat'=>'psiblast'); my $result = $report->next_result; my $iterat = $result->next_iteration; while( my $hit = $iterat->next_hit ) { $hit->overlap(0); # Bio::Search::SearchUtils::tile_hsps($hit); $hit->ambiguous_aln(); # while( my $hsp = $hit->next_hsp ) { my @q_range = $hit->range('query'); my @h_range = $hit->range('hit'); $, = " "; print @q_range,"\n"; print @h_range,"\n\n"; } print "-" x 5, "\n"; } ================================ Can anyone tell me where my problem lies? Regards and thanks in advance, Harald From y.itan at ucl.ac.uk Mon Mar 20 15:36:24 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Tue, 21 Mar 2006 07:36:24 +1100 Subject: [Bioperl-l] Finding all human paralogues In-Reply-To: <441B2690.8030801@infotech.monash.edu.au> References: <200603141601.11779.y.itan@ucl.ac.uk> <200603171541.04313.y.itan@ucl.ac.uk> <441B2690.8030801@infotech.monash.edu.au> Message-ID: <200603201506.38554.y.itan@ucl.ac.uk> Dear Torsten, I have solved the problem, I deeply thank you for your patience and great help. All the best, Yuval On Friday 17 March 2006 21:13, you wrote: > > Many thanks for your good help. I have followed your advice, and updated > > the path to the fasta file, both in BLASTDB, BLASTDATADIR and in the > > code. But even now it doesn't work. The directory of my fasta file > > is: /home/Yuval/FastaDBs and the one of the Blast binaries > > is: /home/Yuval/Applications/blast/bin > > > > And I keep getting the same error message: > > > > [blastall] WARNING: test: Unable to open > > Homo_sapiens.NCBI354.feb.pep.fa.pin > > > > ------------- EXCEPTION ------------- > > MSG: blastall call crashed: 256 > > /home/Yuval/Applications/blast/bin/blastall -pblastp -d > > /home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep.fa > > (I assume there was a space between -p and blastp) > > > -i /tmp/98fwqIYOMi -o /tmp/vlRcHscPcH > > This doesn't look like a BioPerl problem anymore. It says it can't find the > blast index files for Homo_sapiens.NCBI354.feb.pep.fa. But the "-d xxxx" > line is trying to load them from the right place. I assume you have run > formatdb to create the indices, and they are in /home/Yuval/FastaDBs/, and > that there were no errors ? > > % cd /home/Yuval/FastaDBs/ > % formatdb -i Homo_sapiens.NCBI354.feb.pep.fa -p T -o T > % cat formatdb.log > % ls -lsa Homo_sapiens.NCBI354.feb.pep.fa.* > > and that this last "ls" lists about seven index files called > Homo_sapiens.NCBI354.feb.pep.fa.pin, .psq, .p?? etc > And they are READABLE by the user who is running Blast (hopefully "Yuval")? > > If so, can you run blast on the command line? > > % blastall -i SOME_PROTEIN.FA -d > /home/Yuval/FastaDBs/Homo_sapiens.NCBI354.feb.pep -p blastp > > BioPerl will probably never run your blast if you can't get that working. From MEC at stowers-institute.org Mon Mar 20 18:07:06 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 20 Mar 2006 17:07:06 -0600 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene Message-ID: Chris Dagdigian, Convert::Binary::C is now used by new Bio::SeqIO::strider module. I've added this to bioperl's ./MakeFile.PL and ./INSTALL. Methinks it could be added to Bundle::BioPerl. Would you like to do that? --Malcolm Cook >-----Original Message----- >From: Chris Fields [mailto:cjfields at uiuc.edu] >Sent: Friday, March 10, 2006 4:59 PM >To: Cook, Malcolm; 'Brian Osborne'; bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] where to document dependency? AND new >SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene > > >> -----Original Message----- >> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >> Sent: Friday, March 10, 2006 4:38 PM >> To: Chris Fields; Brian Osborne; bioperl-l at lists.open-bio.org >> Subject: RE: [Bioperl-l] where to document dependency? AND new SeqIO >> formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene >> >> Getting closer... >> >> So, I added it to the ./Makefile.PL and ./INSTALL with the >cvs comment >> 'added dependency on Convert::Binary::C needed by >Bio::SeqIO::strider' >> >> But, re the wiki, it looks to me like the contents of the >wiki page are >> (nearly) identical to the ./INSTALL. Is one autogenerated from the >> other, or do the both get editted? > >No, at least not at the moment. I suppose we could get it >into POD and use >pod2wiki. > >> Also, the only place I can think to add the dependency in the wiki >> content is to the list of modules installed by Bundle::CPAN. Am I >> missing something, or should I be considering adding >Convert::Binary::C >> to Bundle::CPAN as well? > >That's the place, though I think you mean Bundle::Bioperl. > >I'm not sure what you should do about including it with >Bundle::Bioperl. >Looks like Chris Dagdigian is the maintainer for that; his >email listed on >CPAN is dag at sonsorol.org, though I wouldn't be surprised if >it's out of >date. > >> >> Thanks, >> >> Malcolm >> > > > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > > From Chris at bioteam.net Mon Mar 20 18:24:32 2006 From: Chris at bioteam.net (Chris Dagdigian) Date: Mon, 20 Mar 2006 18:24:32 -0500 Subject: [Bioperl-l] where to document dependency? AND new SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene In-Reply-To: References: Message-ID: Sure! Any other dependencies that should be added or removed? -Chris On Mar 20, 2006, at 6:07 PM, Cook, Malcolm wrote: > Chris Dagdigian, > > Convert::Binary::C is now used by new Bio::SeqIO::strider module. > I've > added this to bioperl's ./MakeFile.PL and ./INSTALL. Methinks it > could > be added to Bundle::BioPerl. Would you like to do that? > > --Malcolm Cook > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> Sent: Friday, March 10, 2006 4:59 PM >> To: Cook, Malcolm; 'Brian Osborne'; bioperl-l at lists.open-bio.org >> Subject: RE: [Bioperl-l] where to document dependency? AND new >> SeqIO formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene >> >> >>> -----Original Message----- >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>> Sent: Friday, March 10, 2006 4:38 PM >>> To: Chris Fields; Brian Osborne; bioperl-l at lists.open-bio.org >>> Subject: RE: [Bioperl-l] where to document dependency? AND new SeqIO >>> formats: Bio::SeqIO::strider and Bio::SeqIO::lasergene >>> >>> Getting closer... >>> >>> So, I added it to the ./Makefile.PL and ./INSTALL with the >> cvs comment >>> 'added dependency on Convert::Binary::C needed by >> Bio::SeqIO::strider' >>> >>> But, re the wiki, it looks to me like the contents of the >> wiki page are >>> (nearly) identical to the ./INSTALL. Is one autogenerated from the >>> other, or do the both get editted? >> >> No, at least not at the moment. I suppose we could get it >> into POD and use >> pod2wiki. >> >>> Also, the only place I can think to add the dependency in the wiki >>> content is to the list of modules installed by Bundle::CPAN. Am I >>> missing something, or should I be considering adding >> Convert::Binary::C >>> to Bundle::CPAN as well? >> >> That's the place, though I think you mean Bundle::Bioperl. >> >> I'm not sure what you should do about including it with >> Bundle::Bioperl. >> Looks like Chris Dagdigian is the maintainer for that; his >> email listed on >> CPAN is dag at sonsorol.org, though I wouldn't be surprised if >> it's out of >> date. >> >>> >>> Thanks, >>> >>> Malcolm >>> >> >> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> From Steve_Chervitz at affymetrix.com Mon Mar 20 19:06:12 2006 From: Steve_Chervitz at affymetrix.com (Steve_Chervitz) Date: Mon, 20 Mar 2006 16:06:12 -0800 Subject: [Bioperl-l] how to "tile" the HSPs of a hit-object ? In-Reply-To: <441F0D1D.2060203@gmx.de> References: <441F0D1D.2060203@gmx.de> Message-ID: Harald, The SearchUtils::tile_hsps function does not do what you want it to do. It does not modify the contained HSP objects and tile them together (as perhaps its name suggests it might). It's goal is to collect summary statistics over all of the HSPs within a hit object, so that when you call a method like $hit->length (), you get the sum of all HSPs lengths, factoring out regions of overlap between HSPs. As such, it's intended for internal use by the GenericHit object. I just checked in a new version of SearchUtils.pm in which tile_hsps () now returns the tiled HSP data it constructs (i.e., the merge-able contiguous stretches along query and subject). It still does not modify the contained HSPs (I wouldn't feel comfortable doing that, as it would invalidate the scoring data), but it might be sufficient for your needs. You can now do something like this: my ($qcontigs, $scontigs) = Bio::Search::SearchUtils::tile_hsps ($hit); if (ref $qcontigs) { print STDERR "Query contigs:\n"; foreach (@{$qcontigs}) { print "contig start is $_->{'start'}\n"; print "contig stop is $_->{'stop'}\n"; print "contig identical residues count is $_->{'iden'}\n"; print "contig conserved residues count is $_->{'cons'}\n"; } } You can get my new version of SearchUtils.pm via the bioperl CVS (also attached it to this message for convenience). Cheers, Steve -------------- next part -------------- A non-text attachment was scrubbed... Name: SearchUtils.pm Type: text/x-perl-script Size: 24089 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060320/dfc1353e/attachment.bin -------------- next part -------------- On Mar 20, 2006, at 12:14 PM, Harald wrote: > Hi all. > > I want to use Bioperl for doing some psi-blast postprocessing (under > linux with bioperl 1.5 and perl 5.8.7). > > For doing so I would like to get for every hit-sequence its alignment > with the query. So I dont want all those overlapping HSPs, but the one > and only alignment with best score. > > I am reading in the documentation for that for some time and think, > that > "tiling" is what I want to do to each hit-object. > > As far as I have understood the documentation, calling > Bio::Search::SearchUtils::tile_hsps($hit); (or calling > $hit->ambiguous_aln(), which will call the aforementioned) > should be suficient so that $hit will become tiled. Right? > > But if I run the following program, the ranges of the hsp-objects will > still overlap :-( - no matter if I use tile_hsp($hit) or > $hit->ambiguous_aln(). > > ================================ > use strict; > use Bio::Tools::Run::StandAloneBlast; > > my $report = new Bio::SearchIO('-file'=>'out.txt', > '-fomat'=>'psiblast'); > my $result = $report->next_result; > my $iterat = $result->next_iteration; > > while( my $hit = $iterat->next_hit ) > { > $hit->overlap(0); > # Bio::Search::SearchUtils::tile_hsps($hit); > $hit->ambiguous_aln(); # > > while( my $hsp = $hit->next_hsp ) > { > my @q_range = $hit->range('query'); > my @h_range = $hit->range('hit'); > > $, = " "; > print @q_range,"\n"; > print @h_range,"\n\n"; > } > print "-" x 5, "\n"; > } > ================================ > > Can anyone tell me where my problem lies? > > Regards and thanks in advance, > Harald > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Steve_Chervitz at affymetrix.com Mon Mar 20 19:35:18 2006 From: Steve_Chervitz at affymetrix.com (Steve_Chervitz) Date: Mon, 20 Mar 2006 16:35:18 -0800 Subject: [Bioperl-l] Missing methods in SearchUtils and GenericHit Message-ID: <9143C7D4-045C-4E7D-B6AD-427124D89E59@affymetrix.com> My local copies of from Bio::Search::SearchUtils and Bio::Search::Hit::GenericHit have some extra methods relative to what's in the CVS head. Looking at CVS logs, it doesn't appear they were removed, so it's likely they were new things I never committed to CVS: Methods in Bio::Search::Hit::GenericHit: num_unknown_residues frac_aligned_query percent_conserved percent_identical Methods in Bio::Search::SearchUtils: mol_type Looks like I may have added them to provide additional data output via Bio::SearchIO::Writer::HitTableWriter (which is also modified on my local copy). It's been a while since I worked on these. Can anyone remember add/ removing/modifying these methods? If not, I'll go ahead and commit them. They look like they could be generally useful. Steve From haralds_listen at gmx.de Tue Mar 21 10:40:34 2006 From: haralds_listen at gmx.de (Harald) Date: Tue, 21 Mar 2006 16:40:34 +0100 Subject: [Bioperl-l] how to "tile" the HSPs of a hit-object ? In-Reply-To: References: <441F0D1D.2060203@gmx.de> Message-ID: <44201E72.6020308@gmx.de> Hi Steve. Thanks a lot for your help. But where is this contig-object documented? Besides, a small question about blast: Does the computation of the e-value (of a hit) take all HSPs into account or only a "best alignment"? Regards and thanks a lot, Harald From cjfields at uiuc.edu Tue Mar 21 11:35:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Mar 2006 10:35:03 -0600 Subject: [Bioperl-l] Hi! everyone In-Reply-To: <20060322002243.3yl4fsc7400sckog@wmail9.cc.ntu.edu.tw> Message-ID: <000f01c64d05$67cc1100$15327e82@pyrimidine> > -----Original Message----- > From: r93626012 at ntu.edu.tw [mailto:r93626012 at ntu.edu.tw] > Sent: Tuesday, March 21, 2006 10:23 AM > To: Chris Fields > Subject: RE: [Bioperl-l] Hi! everyone > > Quoting Chris Fields : > > Sorry! I forget to tell you that i ran the program on Suse Linux, and > my bioperl version is 1.4. Is it still has to update or it should work > without problems on this version? thank you very much! Please respond to the list as well so that anyone following this thread might find a resolution to this issue. In short, yes, update to the newest version of bioperl (1.5.1). Although it is a developer version many changes have been added (including critical ones to SearchIO, RemoteBlast, and SeqIO). I believe that bioperl 1.4 is about two years old now; I'm not sure when the push will begin for the next stable release. Anyway, try v 1.5.1 first since it should be more stable than a CVS bleeding edge version. If that doesn't help try installing from CVS. BTW, haven't tried this script on OS X yet with the latest CVS. We have a bit of a snowstorm slowing us down up here. Should be able to get to that tonight or tomorrow. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > >> Sent: Sunday, March 19, 2006 6:10 PM > >> To: r93626012 at ntu.edu.tw > >> Cc: bioperl-l at bioperl.org > >> Subject: Re: [Bioperl-l] Hi! everyone > >> > >> > >> On Mar 19, 2006, at 8:39 AM, r93626012 at ntu.edu.tw wrote: > >> > >> > Hi!I am a graduate student of National Taiwan University. My name > >> > is scott. I had some probelms when i using bioperl module. I wanted > >> > to get a sequence from "swissprot" database and blast with nr > >> > database of NCBI automatically by bioperl module. This is my program: > >> > > >> > #!/usr/bin/perl -w > >> > use strict; > >> > use Bio::Perl; > >> > > >> > my $seq_object = get_sequence('swissport',"ROA1_HUMAN"); > >> > >> 'swissport' should be 'swissprot', though I get this script to work > >> when spelled either way (well, I get some BLAST results back). This > >> error was, I believe, fixed in a recent version of Bioperl; I would > >> upgrade from CVS. However, the major problem I see is that the > >> formatting from Bioperl is completely messed up now; it looks like > >> the entire BLAST report is globbed together at the 'Query=' line. > >> I'm sure this is an issue related to the recent blast changes at > >> NCBI. I'll look into this tomorrow. > >> > >> Chris > >> > > > > To follow up on this (for the mail-list in case anybody comes across > this in > > the future): > > > > I get this to work without problems on Windows (which is where problems > > usually show up). I'll play around with it a bit more with Mac OS X to > see > > if this may be an OS issue, but I haven't updated bioperl from CVS on my > > wife's IBook in quite a while so it could be an old version of bioperl > > mucking up the works. > > > > If this script isn't working for your bioperl version, try installing > > bioperl from CVS. My WinXP is running off bioperl from the latest CVS > and > > it works fine. > > > > Chris > > > > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > From muratem at eng.uah.edu Tue Mar 21 12:09:05 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue, 21 Mar 2006 11:09:05 -0600 (CST) Subject: [Bioperl-l] Bio::Registry & Bio::DB::Flat Message-ID: Greetings I am trying to get index/fetch some fasta files with Bio::Registry and Bio::DB::Flat. I created a seqdatabase.ini in $HOME/.bioinformatics containing: VERSION=1.00 [fungal_genomes] protocol=flat location=/opt/mmuratet/fungal_genomes dbname=fungal_genomes I used the release script bp_bioflat_index.pl to index the files in /opt/mmuratet/fungal_genomes and there is a binary key_acc file and a config file whose first few lines are: index BerkeleyDB/1 format URN:LSID:open-bio.org:fasta/dna fileid_11 /opt/mmuratet/fungal_genomes/fusarium_verticillioides_2.fasta 42688995 I couldn't get my own script to work, so I tried the bp_biogetseq.pl script: bp_biogetseq.pl --dbname fungal_genomes --format fasta \ --namespace acc 'Aspergillus nidulans supercontig 1.5' -------------------- WARNING --------------------- MSG: Couldn't call new_from_registry on [Bio::DB::Flat] ------------- EXCEPTION ------------- MSG: you must specify an indexing scheme STACK Bio::DB::Flat::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Flat.pm:163 STACK Bio::DB::Flat::new_from_registry /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Flat.pm:255 STACK (eval) /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:183 STACK Bio::DB::Registry::_load_registry /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:182 STACK Bio::DB::Registry::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:95 STACK toplevel /usr/bin/bp_biogetseq.pl:28 -------------------------------------- --------------------------------------------------- Could not find sequence with identifier [Aspergillus nidulans supercontig 1.5] Can anyone see anything I've done wrong? It all seems in agreement with the documentation. I found a few old threads but I can't see where it ever got resolved. I have set the environment variables OBDA_INDEX=bdb and OBDA_LOCATION=/opt/mmuratet/fungal_genomes. Thanks Mike From khoueiry at ibdm.univ-mrs.fr Tue Mar 21 13:21:21 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Tue, 21 Mar 2006 19:21:21 +0100 Subject: [Bioperl-l] Bio::Registry & Bio::DB::Flat In-Reply-To: References: Message-ID: <1142965281.22801.19.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060321/40c65cf2/attachment.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: indexingFastaFile.pl Type: application/x-perl Size: 2813 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060321/40c65cf2/attachment.bin From Steve_Chervitz at affymetrix.com Tue Mar 21 14:15:50 2006 From: Steve_Chervitz at affymetrix.com (Steve_Chervitz) Date: Tue, 21 Mar 2006 11:15:50 -0800 Subject: [Bioperl-l] how to "tile" the HSPs of a hit-object ? In-Reply-To: <44201E72.6020308@gmx.de> References: <441F0D1D.2060203@gmx.de> <44201E72.6020308@gmx.de> Message-ID: <51BCEF1E-3417-4976-8DD0-FF8B543B7F5E@affymetrix.com> Harald, On Mar 21, 2006, at 7:40 AM, Harald wrote: > Hi Steve. > > Thanks a lot for your help. But where is this contig-object > documented? See the 'Returns' section of the method POD in my new version of SearchUtils.pm. A contig is just a simple hash reference with 6 or so keys. I don't think we need a full-blown object for it yet. Maybe we could provide access to it via the GenericHit object, so you don't have to call SearchUtils directly, and documentation would be easier to find. > Besides, a small question about blast: Does the computation of the > e-value (of a hit) take all HSPs into account or only a "best > alignment"? The hit's e-value is for the top HSP, whose e-value could be based on either a single alignment or a group of alignments. If you see something like 'Expect(2)' in an NCBI blast report, this implies that sum statistics were applied to two HSPs to compute the expect score. Similarly for P(n) in WU-blast. This is only relevant for ungapped blast. You can get the Expect(n) or P(n) number from the hit object or any HSP object calling the n() method. If you call n() on a hit from a gapped blast, you'll get the total number of HSPs. Steve From muratem at eng.uah.edu Tue Mar 21 16:39:48 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue, 21 Mar 2006 15:39:48 -0600 (CST) Subject: [Bioperl-l] Bio::Registry & Bio::DB::Flat In-Reply-To: <1142965281.22801.19.camel@localhost> References: <1142965281.22801.19.camel@localhost> Message-ID: On Tue, 21 Mar 2006, khoueiry wrote: > hi Mike, > > If I well understood your question, you want to index a fasta file to > fetch it. For that puropose, I already wrote a perl scirpt to do that > (view attachment). It is a very simple script, run it without any > parameters to get a description on the fine way to use it. This file > will create an index file that you can fetch it: > > e. g; once you create your index file (myfile.idx) by the attached > script, you can fetch any sequence(id) by : > > my $indexFile = Bio::Index::Fasta->new(myfile.idx); > my $seq = $indexFile->fetch(id); > > Hope this help > > Pierre > Pierre Thanks for the script, I appreciate it. I was (am) trying to get the OBDA Bio::Registry to work for the platform independance it provides. I have had good success with biosql & load_seqdatabase, too, which you might want to try. You generally have to write a little module from the SeqProcessor class to handle FASTA IDs, but it's really nice to have things in mySQL. Cheers Mike From muratem at eng.uah.edu Tue Mar 21 17:03:30 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue, 21 Mar 2006 16:03:30 -0600 (CST) Subject: [Bioperl-l] Bio::Registry & Bio::DB::Flat In-Reply-To: References: Message-ID: On Tue, 21 Mar 2006, Brian Osborne wrote: > Mike, > > Are you sure that the "location" and "dbname" are right? Usually these will > be different - from the OBDA HOWTO: > > Once the flat database is created you can configure your seqdatabase.ini > file. Let's say that you have used the bioflat_index.pl script to create > the flat database and a new directory called ppp has been created in your > /home/sally/bioinf/ directory (and the ppp/ directory contains the > config.dat file). > > This refers to this Registry entry: > > protocol=flat > location=/home/sally/bioinf > dbname=ppp > > Perhaps the location should be "/opt/muratet"? You may have to change > OBDA_LOCATION as well. > > Brian O. Brian Thanks, I think I've got it sorted out now. I changed things around to be closer to the situation in the HOWTO. I changed the root to /opt/mmuratet and the dbname to harry. I reran bp_bioflatindex.pl with the new parameters and changed seqdatabase.ini to: VERSION=1.00 [fungal_genomes] protocol=flat location=/opt/mmuratet dbname=harry It created a new directory, harry, under the root /opt/mmuratet which has the appropriate entries. I set OBDA_LOCATION=/opt/mmuratet. Then I changed the command, bp_biogetseq.pl --dbname fungal_genomes etc etc and it works. Thanks for the help. Mike > > > On 3/21/06 12:09 PM, "Mike Muratet" wrote: > >> Greetings >> >> I am trying to get index/fetch some fasta files with Bio::Registry and >> Bio::DB::Flat. I created a seqdatabase.ini in $HOME/.bioinformatics >> containing: >> >> VERSION=1.00 >> >> [fungal_genomes] >> protocol=flat >> location=/opt/mmuratet/fungal_genomes >> dbname=fungal_genomes >> >> I used the release script bp_bioflat_index.pl to index the files in >> /opt/mmuratet/fungal_genomes and there is a binary key_acc file and a >> config file whose first few lines are: >> >> index BerkeleyDB/1 >> format URN:LSID:open-bio.org:fasta/dna >> fileid_11 >> /opt/mmuratet/fungal_genomes/fusarium_verticillioides_2.fasta 42688995 >> >> I couldn't get my own script to work, so I tried the bp_biogetseq.pl >> script: >> >> bp_biogetseq.pl --dbname fungal_genomes --format fasta \ >> --namespace acc 'Aspergillus nidulans supercontig 1.5' >> >> -------------------- WARNING --------------------- >> MSG: Couldn't call new_from_registry on [Bio::DB::Flat] >> >> ------------- EXCEPTION ------------- >> MSG: you must specify an indexing scheme >> STACK Bio::DB::Flat::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Flat.pm:163 >> STACK Bio::DB::Flat::new_from_registry >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Flat.pm:255 >> STACK (eval) /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:183 >> STACK Bio::DB::Registry::_load_registry >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:182 >> STACK Bio::DB::Registry::new >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Registry.pm:95 >> STACK toplevel /usr/bin/bp_biogetseq.pl:28 >> >> -------------------------------------- >> >> --------------------------------------------------- >> Could not find sequence with identifier [Aspergillus nidulans supercontig >> 1.5] >> >> Can anyone see anything I've done wrong? It all seems in agreement with >> the documentation. I found a few old threads but I can't see where it ever >> got resolved. I have set the environment variables OBDA_INDEX=bdb and >> OBDA_LOCATION=/opt/mmuratet/fungal_genomes. >> >> Thanks >> >> Mike >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From lstein at cshl.edu Tue Mar 21 18:27:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 21 Mar 2006 18:27:30 -0500 Subject: [Bioperl-l] Request for comments: Bio::DB::GFF3 namespace Message-ID: <200603211827.30785.lstein@cshl.edu> Hi All, I'm pretty much ready to check in the replacement for the Bio::DB::GFF database. What I ended up writing has only a remote relationship to gff3 files -- it is more like a general storage engine for Bio::SeqFeatureI objects. So I don't want to call the thing Bio::DB::GFF, but want to place it somewhere else in the namespace hierarchy. Here are my current classes and what they do; the names are placeholders so don't get hung up on what they're called now: Bio::SeqFeature::Store - implements the Bio::SeqFeature::CollectionI interface. You can store any Bio::SeqFeatureI into a database (mysql, berkeleydb, in-memory) and fetch it out using a variety of queries. Bio::SeqFeature::Store::DBI - base class for DBI databases Bio::SeqFeature::Store::DBI::mysql - mysql storage implementation Bio::SeqFeature::Store::DBI::mysql::Iterator - helper class for the mysql adaptor. Implements (part of) the Bio::SeqIO interface Bio::SeqFeature::Store::memory - in-memory storage implementation Bio::SeqFeature::Store::bdb - Berkeley DB storage implementation Bio::SeqFeature::Store::Cacher - An in-memory cache for the store; uses an LRU cache to avoid going to the database for frequently used objects. Also implements scheme that lets features share the same in-memory subfeatures (like shared exons). Bio::SeqFeature::LazyFeature - A Bio::SeqFeatureI class that stores its subfeatures in an underlying Bio::SeqFeature::Store and fetches them in a lazy fashion as needed. Bio::SeqFeature::LazyTableFeature - A Bio::SeqFeatureI class that stores its subfeatures in an underlying Bio::SeqFeature::Store, AND stores the parent/child relationship data in the Store as well. Fetches subfeatures as needed in a lazy fashion. A utility script, currently called gff3_load.pl, parses a gff3 file, creates the proper objects, and stores them in the Store. Eventually some of this functionality will be moved into Bio::Tools::GFF. So where should I put these files? The Bio::DB namespace? A new Bio::Collection namespace? A Bio::SeqFeature::Collection namespace? Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008) From hlapp at gmx.net Tue Mar 21 18:56:39 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Mar 2006 15:56:39 -0800 Subject: [Bioperl-l] Request for comments: Bio::DB::GFF3 namespace In-Reply-To: <200603211827.30785.lstein@cshl.edu> References: <200603211827.30785.lstein@cshl.edu> Message-ID: I would suggest under the Bio::DB namespace (e.g., Bio::DB::SeqFeature), that keeps all the database access interfaces/implementations in one place. Alternatively, go with along with the Bio::SeqFeature::CollectionI name as the base interface, i.e., implementations would then go under the Bio::SeqFeature::Collection namespace. If this is the critical interface being implemented (it sounds like it is?), the naming would be in line with the pattern used elsewhere (Bio::LocationI and Bio::Location::*, Bio::SeqFeatureI and Bio::SeqFeature::*, Bio::SeqI and Bio::Seq::*). My few cents ... -hilmar On 3/21/06, Lincoln Stein wrote: > Hi All, > > I'm pretty much ready to check in the replacement for the Bio::DB::GFF > database. What I ended up writing has only a remote relationship to gff3 > files -- it is more like a general storage engine for Bio::SeqFeatureI > objects. So I don't want to call the thing Bio::DB::GFF, but want to place it > somewhere else in the namespace hierarchy. > > Here are my current classes and what they do; the names are placeholders so > don't get hung up on what they're called now: > > Bio::SeqFeature::Store > - implements the Bio::SeqFeature::CollectionI interface. You can > store any Bio::SeqFeatureI into a database (mysql, berkeleydb, in-memory) > and fetch it out using a variety of queries. > > Bio::SeqFeature::Store::DBI > - base class for DBI databases > > Bio::SeqFeature::Store::DBI::mysql > - mysql storage implementation > > Bio::SeqFeature::Store::DBI::mysql::Iterator > - helper class for the mysql adaptor. Implements > (part of) the Bio::SeqIO interface > > Bio::SeqFeature::Store::memory > - in-memory storage implementation > > Bio::SeqFeature::Store::bdb > - Berkeley DB storage implementation > > Bio::SeqFeature::Store::Cacher > - An in-memory cache for the store; uses an LRU cache > to avoid going to the database for frequently used > objects. Also implements scheme that lets features > share the same in-memory subfeatures (like shared > exons). > > Bio::SeqFeature::LazyFeature > - A Bio::SeqFeatureI class that stores its subfeatures in an > underlying Bio::SeqFeature::Store and fetches them > in a lazy fashion as needed. > > Bio::SeqFeature::LazyTableFeature > - A Bio::SeqFeatureI class that stores its subfeatures in > an underlying Bio::SeqFeature::Store, AND stores > the parent/child relationship data in the Store as > well. Fetches subfeatures as needed in a lazy fashion. > > A utility script, currently called gff3_load.pl, parses a gff3 file, creates > the proper objects, and stores them in the Store. Eventually some of this > functionality will be moved into Bio::Tools::GFF. > > So where should I put these files? The Bio::DB namespace? A new > Bio::Collection namespace? A Bio::SeqFeature::Collection namespace? > > Lincoln > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From sdavis2 at mail.nih.gov Wed Mar 22 08:28:29 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 22 Mar 2006 08:28:29 -0500 Subject: [Bioperl-l] Random access to subsequences using OBDA or bio::db::flat Message-ID: I have a pretty straightforward question. Is it possible to use bio::db::flat indexed file access to get subsequences quickly (like chr1:100000-100100) like bio::index::fasta, or do I need to use bio::index::fasta for that kind of access? I didn't see any specific mention of it in the docs, so I doubt it is possible, but just wanted to make sure. Thanks, Sean From ni_psis at hotmail.com Wed Mar 22 11:42:47 2006 From: ni_psis at hotmail.com (ni_psis) Date: Wed, 22 Mar 2006 08:42:47 -0800 (PST) Subject: [Bioperl-l] convert xyz coordinates to distance Message-ID: <3535971.post@talk.nabble.com> Hi, I am new to perl, but I would like write a perl script that can convert xyz cordinates from a pdb file to distances using the formula: distance = SQRT[(X1-X2)^2 + (Y1-Y2)^2 + (Z1-Z2)^2] eg: The atoms of HELIX 1 with all the atoms of HELIX 2 and 3 the atom[0] of 1 with the atom[0] of 2 the atom[0] of 1 with the atom[1] of 2 the atom[0] of 1 with the atom[2] of 2 the atom[0] of 1 with the atom[3] of 2 . . . . . . the atom[1] of 1 with the atom[0] of 2 the atom[1] of 1 with the atom[1] of 2 the atom[1] of 1 with the atom[2] of 2 the atom[1] of 1 with the atom[3] of 2 I want a good idea of how to do it Please HELP ME thanks The pdb file format look like this HEADER PHEROMONE 20-DEC-95 2ERL HELIX 1 H1 ASP 1 GLN 9 HELIX 2 H2 GLU 12 LEU 18 HELIX 3 H3 GLU 23 ASN 35 ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 CG ASP 1 -0.189 6.261 5.135 ATOM 7 OD1 ASP 1 0.337 6.017 6.212 ATOM 8 OD2 ASP 1 0.306 6.000 3.954 ATOM 9 1H ASP 1 -1.462 8.819 7.844 ATOM 10 2H ASP 1 -0.287 8.240 7.214 ATOM 11 3H ASP 1 -1.095 9.210 6.493 ATOM 12 HA ASP 1 -2.833 7.829 6.464 ATOM 13 1HB ASP 1 -2.216 6.400 4.825 ATOM 14 2HB ASP 1 -1.478 7.758 4.569 ATOM 15 N ALA 2 -2.745 5.280 7.165 ATOM 16 CA ALA 2 -2.945 4.152 7.987 ATOM 17 C ALA 2 -1.606 3.448 8.305 ATOM 18 O ALA 2 -1.440 3.010 9.454 ATOM 19 CB ALA 2 -3.966 3.256 7.436 ATOM 20 H ALA 2 -3.119 5.289 6.390 ATOM 21 HA ALA 2 -3.296 4.485 8.839 ATOM 22 1HB ALA 2 -4.765 3.756 7.251 ATOM 23 2HB ALA 2 -3.640 2.861 6.624 ATOM 24 3HB ALA 2 -4.162 2.564 8.072 ATOM 25 N CYS 3 -0.777 3.267 7.329 ATOM 26 CA CYS 3 0.570 2.624 7.511 ATOM 27 C CYS 3 1.328 3.308 8.626 ATOM 28 O CYS 3 1.802 2.679 9.562 ATOM 29 CB CYS 3 1.351 2.667 6.209 ATOM 30 SG CYS 3 2.981 1.901 6.318 ATOM 31 H CYS 3 -1.006 3.526 6.542 ATOM 32 HA CYS 3 0.434 1.686 7.759 ATOM 33 1HB CYS 3 0.838 2.216 5.521 ATOM 34 2HB CYS 3 1.457 3.593 5.938 ATOM 35 N GLU 4 1.508 4.617 8.489 ATOM 36 CA GLU 4 2.255 5.404 9.498 ATOM 37 C GLU 4 1.610 5.225 10.885 ATOM 38 O GLU 4 2.334 5.109 11.842 ATOM 39 CB GLU 4 2.244 6.852 9.036 ATOM 40 CG GLU 4 2.997 7.867 9.820 ATOM 41 CD GLU 4 2.736 9.285 9.326 ATOM 42 OE1 GLU 4 1.708 9.575 8.697 ATOM 43 OE2 GLU 4 3.543 10.133 9.731 ATOM 44 H GLU 4 1.184 5.016 7.799 ATOM 45 HA GLU 4 3.181 5.086 9.532 ATOM 46 1HB GLU 4 2.585 6.872 8.128 ATOM 47 2HB GLU 4 1.319 7.140 8.998 ATOM 48 1HG GLU 4 2.741 7.803 10.754 ATOM 49 2HG GLU 4 3.947 7.678 9.758 ATOM 50 N GLN 5 0.296 5.263 10.912 ATOM 51 CA GLN 5 -0.378 5.203 12.200 ATOM 52 C GLN 5 -0.239 3.845 12.878 ATOM 53 O GLN 5 -0.077 3.785 14.080 ATOM 54 CB GLN 5 -1.796 5.653 12.089 ATOM 55 CG GLN 5 -2.013 7.148 11.907 ATOM 56 CD GLN 5 -1.903 7.874 13.238 ATOM 57 OE1 GLN 5 -0.839 8.347 13.647 ATOM 58 NE2 GLN 5 -3.009 7.883 13.980 ATOM 59 H GLN 5 -0.156 5.321 10.182 ATOM 60 HA GLN 5 0.075 5.850 12.780 ATOM 61 1HB GLN 5 -2.203 5.194 11.338 ATOM 62 2HB GLN 5 -2.267 5.373 12.890 ATOM 63 1HG GLN 5 -1.351 7.498 11.292 ATOM 64 2HG GLN 5 -2.891 7.303 11.525 ATOM 65 1HE2 GLN 5 -2.995 8.225 14.769 ATOM 66 2HE2 GLN 5 -3.737 7.546 13.670 ATOM 67 N ALA 6 -0.230 2.753 12.099 ATOM 68 CA ALA 6 0.008 1.434 12.610 ATOM 69 C ALA 6 1.381 1.256 13.230 ATOM 70 O ALA 6 1.566 0.690 14.248 ATOM 71 CB ALA 6 -0.241 0.379 11.529 ATOM 72 H ALA 6 -0.372 2.847 11.256 ATOM 73 HA ALA 6 -0.650 1.279 13.320 ATOM 74 1HB ALA 6 -1.110 0.513 11.145 ATOM 75 2HB ALA 6 0.428 0.458 10.845 ATOM 76 3HB ALA 6 -0.196 -0.497 11.921 ATOM 77 N ALA 7 2.408 1.829 12.524 ATOM 78 CA ALA 7 3.765 1.850 13.007 ATOM 79 C ALA 7 3.841 2.536 14.354 ATOM 80 O ALA 7 4.398 2.044 15.330 ATOM 81 CB ALA 7 4.697 2.539 12.023 ATOM 82 H ALA 7 2.235 2.189 11.763 ATOM 83 HA ALA 7 4.066 0.924 13.117 ATOM 84 1HB ALA 7 4.652 2.095 11.173 ATOM 85 2HB ALA 7 4.432 3.456 11.919 ATOM 86 3HB ALA 7 5.597 2.503 12.355 ATOM 87 N ILE 8 3.226 3.717 14.440 ATOM 88 CA ILE 8 3.309 4.495 15.692 ATOM 89 C ILE 8 2.560 3.810 16.813 ATOM 90 O ILE 8 2.928 3.990 17.984 ATOM 91 CB ILE 8 2.843 5.950 15.428 ATOM 92 CG1 ILE 8 3.857 6.706 14.481 ATOM 93 CG2 ILE 8 2.577 6.751 16.654 ATOM 94 CD1 ILE 8 3.229 8.004 13.927 ATOM 95 H ILE 8 2.783 4.024 13.769 ATOM 96 HA ILE 8 4.254 4.532 15.951 ATOM 97 HB ILE 8 1.994 5.890 14.941 ATOM 98 1HG1 ILE 8 4.662 6.922 14.976 ATOM 99 2HG1 ILE 8 4.104 6.126 13.743 ATOM 100 1HG2 ILE 8 3.375 6.798 17.185 ATOM 101 2HG2 ILE 8 2.307 7.638 16.404 ATOM 102 3HG2 ILE 8 1.878 6.335 17.163 ATOM 103 1HD1 ILE 8 2.439 7.789 13.425 ATOM 104 2HD1 ILE 8 2.997 8.584 14.656 ATOM 105 3HD1 ILE 8 3.861 8.446 13.355 ATOM 106 N GLN 9 1.568 3.015 16.550 ATOM 107 CA GLN 9 0.911 2.196 17.602 ATOM 108 C GLN 9 1.655 0.958 17.980 ATOM 109 O GLN 9 1.263 0.292 18.947 ATOM 110 CB GLN 9 -0.515 1.736 17.086 ATOM 111 CG GLN 9 -1.483 2.809 16.995 ATOM 112 CD GLN 9 -1.712 3.600 18.204 ATOM 113 OE1 GLN 9 -1.704 3.103 19.341 ATOM 114 NE2 GLN 9 -1.737 4.933 18.099 ATOM 115 H GLN 9 1.282 2.960 15.740 ATOM 116 HA GLN 9 0.793 2.749 18.402 ATOM 117 1HB GLN 9 -0.414 1.332 16.210 ATOM 118 2HB GLN 9 -0.859 1.056 17.685 ATOM 119 1HG GLN 9 -1.198 3.412 16.290 ATOM 120 2HG GLN 9 -2.331 2.426 16.721 ATOM 121 1HE2 GLN 9 -1.747 5.420 18.808 ATOM 122 2HE2 GLN 9 -1.744 5.303 17.323 ATOM 149 N GLU 12 3.089 -4.518 14.660 ATOM 150 CA GLU 12 3.774 -4.990 13.456 ATOM 151 C GLU 12 2.912 -5.712 12.482 ATOM 152 O GLU 12 3.083 -5.643 11.264 ATOM 153 CB GLU 12 5.001 -5.826 13.875 ATOM 154 CG GLU 12 5.800 -6.264 12.646 ATOM 155 CD GLU 12 7.105 -7.002 13.014 ATOM 156 OE1 GLU 12 7.603 -7.032 14.115 ATOM 157 OE2 GLU 12 7.612 -7.651 12.023 ATOM 158 H GLU 12 3.340 -4.793 15.435 ATOM 159 HA GLU 12 4.117 -4.196 12.996 ATOM 160 1HB GLU 12 5.568 -5.299 14.460 ATOM 161 2HB GLU 12 4.707 -6.609 14.366 ATOM 162 1HG GLU 12 5.249 -6.848 12.102 ATOM 163 2HG GLU 12 6.018 -5.482 12.115 ATOM 164 N SER 13 1.920 -6.484 13.028 ATOM 165 CA SER 13 1.076 -7.239 12.160 ATOM 166 C SER 13 0.324 -6.363 11.167 ATOM 167 O SER 13 0.007 -6.792 10.066 ATOM 168 CB SER 13 0.128 -8.087 12.980 ATOM 169 OG SER 13 -0.739 -7.320 13.698 ATOM 170 H SER 13 1.808 -6.514 13.880 ATOM 171 HA SER 13 1.647 -7.846 11.645 ATOM 172 1HB SER 13 -0.374 -8.670 12.388 ATOM 173 2HB SER 13 0.640 -8.646 13.585 ATOM 174 HG SER 13 -1.237 -7.805 14.133 ATOM 175 N ALA 14 0.024 -5.155 11.540 ATOM 176 CA ALA 14 -0.691 -4.208 10.662 ATOM 177 C ALA 14 0.172 -3.600 9.632 ATOM 178 O ALA 14 -0.368 -3.032 8.645 ATOM 179 CB ALA 14 -1.353 -3.146 11.567 ATOM 180 H ALA 14 0.249 -4.898 12.329 ATOM 181 HA ALA 14 -1.404 -4.701 10.205 ATOM 182 1HB ALA 14 -1.901 -3.582 12.223 ATOM 183 2HB ALA 14 -0.672 -2.633 12.008 ATOM 184 3HB ALA 14 -1.897 -2.564 11.031 ATOM 185 N CYS 15 1.477 -3.696 9.725 ATOM 186 CA CYS 15 2.308 -3.191 8.605 ATOM 187 C CYS 15 1.975 -3.899 7.294 ATOM 188 O CYS 15 2.030 -3.304 6.232 ATOM 189 CB CYS 15 3.808 -3.398 8.859 ATOM 190 SG CYS 15 4.517 -2.546 10.292 ATOM 191 H CYS 15 1.843 -4.044 10.421 ATOM 192 HA CYS 15 2.138 -2.232 8.496 ATOM 193 1HB CYS 15 3.968 -4.349 8.965 ATOM 194 2HB CYS 15 4.291 -3.110 8.069 ATOM 195 N GLU 16 1.667 -5.187 7.314 ATOM 196 CA GLU 16 1.430 -5.931 6.110 ATOM 197 C GLU 16 0.156 -5.451 5.422 ATOM 198 O GLU 16 0.100 -5.353 4.192 ATOM 199 CB GLU 16 1.428 -7.442 6.289 ATOM 200 CG GLU 16 1.044 -8.157 4.985 ATOM 201 CD GLU 16 1.407 -9.599 4.948 ATOM 202 OE1 GLU 16 1.736 -10.214 5.969 ATOM 203 OE2 GLU 16 1.368 -10.134 3.797 ATOM 204 H GLU 16 1.607 -5.590 8.071 ATOM 205 HA GLU 16 2.171 -5.723 5.503 ATOM 206 1HB GLU 16 2.310 -7.734 6.568 ATOM 207 2HB GLU 16 0.798 -7.683 6.986 ATOM 208 1HG GLU 16 0.086 -8.075 4.856 ATOM 209 2HG GLU 16 1.480 -7.708 4.244 ATOM 210 N SER 17 -0.869 -5.247 6.225 ATOM 211 CA SER 17 -2.172 -4.980 5.581 ATOM 212 C SER 17 -2.344 -3.505 5.280 ATOM 213 O SER 17 -3.155 -3.183 4.446 ATOM 214 CB SER 17 -3.275 -5.410 6.507 ATOM 215 OG SER 17 -3.218 -4.787 7.747 ATOM 216 H SER 17 -0.783 -5.269 7.080 ATOM 217 HA SER 17 -2.232 -5.490 4.747 ATOM 218 1HB SER 17 -4.129 -5.211 6.094 ATOM 219 2HB SER 17 -3.224 -6.370 6.634 ATOM 220 HG SER 17 -3.829 -5.063 8.220 ATOM 221 N LEU 18 -1.632 -2.581 5.945 ATOM 222 CA LEU 18 -1.851 -1.181 5.784 ATOM 223 C LEU 18 -0.765 -0.443 5.006 ATOM 224 O LEU 18 -0.917 0.722 4.696 ATOM 225 CB LEU 18 -2.125 -0.476 7.077 ATOM 226 CG LEU 18 -3.325 -1.012 7.842 ATOM 227 CD1 LEU 18 -3.376 -0.270 9.218 ATOM 228 CD2 LEU 18 -4.628 -0.783 7.166 ATOM 229 H LEU 18 -1.020 -2.844 6.490 ATOM 230 HA LEU 18 -2.670 -1.096 5.252 ATOM 231 1HB LEU 18 -1.339 -0.547 7.642 ATOM 232 2HB LEU 18 -2.269 0.465 6.892 ATOM 233 HG LEU 18 -3.206 -1.972 7.997 ATOM 234 1HD1 LEU 18 -2.544 -0.397 9.681 ATOM 235 2HD1 LEU 18 -3.520 0.668 9.070 ATOM 236 3HD1 LEU 18 -4.094 -0.625 9.746 ATOM 237 1HD2 LEU 18 -4.623 -1.212 6.307 ATOM 238 2HD2 LEU 18 -5.335 -1.148 7.702 ATOM 239 3HD2 LEU 18 -4.767 0.160 7.051 ATOM 295 N GLU 23 9.333 -0.276 0.185 ATOM 296 CA GLU 23 9.826 0.653 1.171 ATOM 297 C GLU 23 8.840 0.959 2.297 ATOM 298 O GLU 23 9.259 1.093 3.424 ATOM 299 CB GLU 23 10.325 1.921 0.540 ATOM 300 CG GLU 23 10.916 2.847 1.683 ATOM 301 CD GLU 23 12.317 3.194 1.151 ATOM 302 OE1 GLU 23 12.312 3.576 -0.032 ATOM 303 OE2 GLU 23 13.279 2.843 1.795 ATOM 304 H GLU 23 9.278 -0.029 -0.637 ATOM 305 HA GLU 23 10.602 0.227 1.590 ATOM 306 1HB GLU 23 11.013 1.719 -0.113 ATOM 307 2HB GLU 23 9.597 2.376 0.087 ATOM 308 1HG GLU 23 10.378 3.645 1.801 ATOM 309 2HG GLU 23 10.969 2.371 2.526 ATOM 310 N ASP 24 7.545 1.080 2.021 ATOM 311 CA ASP 24 6.578 1.456 2.996 ATOM 312 C ASP 24 6.494 0.434 4.133 ATOM 313 O ASP 24 6.495 0.773 5.304 ATOM 314 CB AASP 24 5.191 1.484 2.253 ATOM 315 CB BASP 24 5.265 1.879 2.478 ATOM 316 CG AASP 24 4.141 2.327 2.967 ATOM 317 CG BASP 24 5.248 3.162 1.679 ATOM 318 OD1AASP 24 4.571 3.306 3.637 ATOM 319 OD1BASP 24 6.269 3.566 1.111 ATOM 320 OD2AASP 24 2.942 2.035 2.732 ATOM 321 OD2BASP 24 4.151 3.739 1.602 ATOM 322 H ASP 24 7.279 0.924 1.218 ATOM 323 HA ASP 24 6.788 2.344 3.354 ATOM 324 1HB AASP 24 5.321 1.835 1.358 ATOM 325 1HB BASP 24 4.915 1.168 1.920 ATOM 326 2HB AASP 24 4.861 0.576 2.169 ATOM 327 2HB BASP 24 4.660 1.981 3.230 ATOM 328 N ARG 25 6.468 -0.850 3.785 ATOM 329 CA ARG 25 6.481 -1.917 4.730 ATOM 330 C ARG 25 7.768 -1.982 5.503 ATOM 331 O ARG 25 7.746 -2.193 6.697 ATOM 332 CB ARG 25 6.233 -3.282 4.080 ATOM 333 CG ARG 25 4.939 -3.786 4.243 ATOM 334 CD ARG 25 4.593 -5.053 3.521 ATOM 335 NE ARG 25 3.283 -4.845 2.866 ATOM 336 CZ ARG 25 2.745 -5.680 2.027 ATOM 337 NH1 ARG 25 3.329 -6.802 1.738 ATOM 338 NH2 ARG 25 1.606 -5.294 1.451 ATOM 339 H ARG 25 6.442 -1.046 2.948 ATOM 340 HA ARG 25 5.757 -1.757 5.370 ATOM 341 1HB ARG 25 6.419 -3.209 3.131 ATOM 342 2HB ARG 25 6.862 -3.919 4.453 ATOM 343 1HG ARG 25 4.793 -3.933 5.191 ATOM 344 2HG ARG 25 4.314 -3.101 3.959 ATOM 345 1HD ARG 25 5.271 -5.258 2.858 ATOM 346 2HD ARG 25 4.541 -5.793 4.146 ATOM 347 HE ARG 25 2.850 -4.125 3.052 ATOM 348 1HH1 ARG 25 4.082 -7.002 2.104 ATOM 349 2HH1 ARG 25 2.964 -7.346 1.182 ATOM 350 1HH2 ARG 25 1.266 -4.528 1.643 ATOM 351 2HH2 ARG 25 1.212 -5.811 0.889 ATOM 352 N THR 26 8.905 -1.802 4.817 ATOM 353 CA THR 26 10.154 -1.810 5.523 ATOM 354 C THR 26 10.210 -0.678 6.527 ATOM 355 O THR 26 10.682 -0.883 7.638 ATOM 356 CB THR 26 11.365 -1.691 4.530 ATOM 357 OG1 THR 26 11.250 -2.794 3.577 ATOM 358 CG2 THR 26 12.671 -1.765 5.262 ATOM 359 H THR 26 8.887 -1.684 3.966 ATOM 360 HA THR 26 10.232 -2.658 6.007 ATOM 361 HB THR 26 11.309 -0.839 4.049 ATOM 362 HG1 THR 26 10.524 -2.754 3.199 ATOM 363 1HG2 THR 26 12.728 -2.600 5.732 ATOM 364 2HG2 THR 26 13.394 -1.701 4.634 ATOM 365 3HG2 THR 26 12.727 -1.040 5.889 ATOM 366 N GLY 27 9.745 0.504 6.136 ATOM 367 CA GLY 27 9.780 1.635 7.013 ATOM 368 C GLY 27 8.911 1.477 8.245 ATOM 369 O GLY 27 9.232 2.015 9.304 ATOM 370 H GLY 27 9.419 0.591 5.345 ATOM 371 1HA GLY 27 10.696 1.786 7.294 ATOM 372 2HA GLY 27 9.489 2.421 6.524 ATOM 373 N CYS 28 7.821 0.743 8.138 ATOM 374 CA CYS 28 6.903 0.489 9.277 ATOM 375 C CYS 28 7.656 -0.419 10.284 ATOM 376 O CYS 28 7.686 -0.110 11.456 ATOM 377 CB CYS 28 5.665 -0.223 8.696 ATOM 378 SG CYS 28 4.315 -0.568 9.820 ATOM 379 H CYS 28 7.634 0.396 7.374 ATOM 380 HA CYS 28 6.644 1.333 9.703 ATOM 381 1HB CYS 28 5.320 0.321 7.971 ATOM 382 2HB CYS 28 5.957 -1.064 8.311 ATOM 383 N TYR 29 8.295 -1.482 9.790 ATOM 384 CA TYR 29 9.088 -2.358 10.716 ATOM 385 C TYR 29 10.247 -1.595 11.290 ATOM 386 O TYR 29 10.548 -1.718 12.474 ATOM 387 CB TYR 29 9.614 -3.574 9.912 ATOM 388 CG TYR 29 10.520 -4.518 10.686 ATOM 389 CD1 TYR 29 10.093 -5.166 11.837 ATOM 390 CD2 TYR 29 11.796 -4.811 10.254 ATOM 391 CE1 TYR 29 10.877 -6.030 12.516 ATOM 392 CE2 TYR 29 12.587 -5.705 10.948 ATOM 393 CZ TYR 29 12.135 -6.301 12.099 ATOM 394 OH TYR 29 12.930 -7.218 12.733 ATOM 395 H TYR 29 8.253 -1.663 8.950 ATOM 396 HA TYR 29 8.510 -2.672 11.443 ATOM 397 1HB TYR 29 8.853 -4.078 9.584 ATOM 398 2HB TYR 29 10.101 -3.245 9.141 ATOM 399 HD1 TYR 29 9.234 -4.998 12.152 ATOM 400 HD2 TYR 29 12.127 -4.402 9.487 ATOM 401 HE1 TYR 29 10.548 -6.443 13.281 ATOM 402 HE2 TYR 29 13.438 -5.906 10.631 ATOM 403 HH TYR 29 12.530 -7.527 13.378 ATOM 404 N MET 30 10.949 -0.787 10.499 ATOM 405 CA MET 30 12.097 -0.042 11.035 ATOM 406 C MET 30 11.669 0.919 12.090 ATOM 407 O MET 30 12.377 1.090 13.089 ATOM 408 CB AMET 30 12.832 0.699 9.921 ATOM 409 CB BMET 30 12.811 0.650 9.888 ATOM 410 CG AMET 30 13.669 -0.195 9.045 ATOM 411 CG BMET 30 14.005 1.475 10.250 ATOM 412 SD AMET 30 14.471 0.753 7.727 ATOM 413 SD BMET 30 14.944 1.947 8.787 ATOM 414 CE AMET 30 14.905 2.248 8.635 ATOM 415 CE BMET 30 15.397 0.346 8.138 ATOM 416 H MET 30 10.730 -0.700 9.672 ATOM 417 HA MET 30 12.717 -0.684 11.439 ATOM 418 1HB AMET 30 12.181 1.158 9.368 ATOM 419 1HB BMET 30 13.093 -0.028 9.253 ATOM 420 2HB AMET 30 13.406 1.372 10.319 ATOM 421 2HB BMET 30 12.173 1.223 9.435 ATOM 422 1HG AMET 30 14.346 -0.634 9.584 ATOM 423 1HG BMET 30 13.715 2.273 10.718 ATOM 424 2HG AMET 30 13.106 -0.881 8.652 ATOM 425 2HG BMET 30 14.576 0.968 10.849 ATOM 426 1HE AMET 30 14.110 2.647 8.993 ATOM 427 1HE BMET 30 14.603 -0.156 7.942 ATOM 428 2HE AMET 30 15.503 2.024 9.352 ATOM 429 2HE BMET 30 15.910 0.460 7.334 ATOM 430 3HE AMET 30 15.336 2.869 8.042 ATOM 431 3HE BMET 30 15.923 -0.126 8.788 ATOM 432 N TYR 31 10.526 1.565 11.956 ATOM 433 CA TYR 31 10.065 2.484 13.000 ATOM 434 C TYR 31 9.814 1.710 14.299 ATOM 435 O TYR 31 10.250 2.118 15.373 ATOM 436 CB TYR 31 8.756 3.200 12.579 ATOM 437 CG TYR 31 8.267 3.992 13.773 ATOM 438 CD1 TYR 31 8.851 5.234 14.116 ATOM 439 CD2 TYR 31 7.329 3.440 14.678 ATOM 440 CE1 TYR 31 8.499 5.910 15.275 ATOM 441 CE2 TYR 31 6.981 4.150 15.769 ATOM 442 CZ TYR 31 7.538 5.382 16.044 ATOM 443 OH TYR 31 7.191 6.079 17.208 ATOM 444 H TYR 31 10.048 1.445 11.251 ATOM 445 HA TYR 31 10.759 3.158 13.160 ATOM 446 1HB TYR 31 8.923 3.792 11.829 ATOM 447 2HB TYR 31 8.088 2.548 12.313 ATOM 448 HD1 TYR 31 9.488 5.607 13.550 ATOM 449 HD2 TYR 31 6.958 2.602 14.522 ATOM 450 HE1 TYR 31 8.918 6.706 15.512 ATOM 451 HE2 TYR 31 6.349 3.799 16.353 ATOM 452 HH TYR 31 6.601 5.671 17.606 ATOM 453 N ILE 32 9.123 0.578 14.183 ATOM 454 CA ILE 32 8.789 -0.212 15.347 ATOM 455 C ILE 32 10.053 -0.625 16.093 ATOM 456 O ILE 32 10.184 -0.529 17.301 ATOM 457 CB ILE 32 7.879 -1.387 14.953 ATOM 458 CG1 ILE 32 6.521 -0.859 14.500 ATOM 459 CG2 ILE 32 7.713 -2.356 16.108 ATOM 460 CD1 ILE 32 5.561 -1.922 13.957 ATOM 461 H ILE 32 8.871 0.313 13.405 ATOM 462 HA ILE 32 8.273 0.364 15.949 ATOM 463 HB ILE 32 8.293 -1.864 14.203 ATOM 464 1HG1 ILE 32 6.097 -0.416 15.252 ATOM 465 2HG1 ILE 32 6.664 -0.191 13.811 ATOM 466 1HG2 ILE 32 8.573 -2.695 16.368 ATOM 467 2HG2 ILE 32 7.311 -1.901 16.852 ATOM 468 3HG2 ILE 32 7.151 -3.085 15.837 ATOM 469 1HD1 ILE 32 5.390 -2.577 14.638 ATOM 470 2HD1 ILE 32 4.735 -1.506 13.698 ATOM 471 3HD1 ILE 32 5.956 -2.350 13.194 ATOM 472 N TYR 33 11.046 -1.139 15.317 ATOM 473 CA TYR 33 12.292 -1.565 15.897 ATOM 474 C TYR 33 13.029 -0.407 16.556 ATOM 475 O TYR 33 13.826 -0.566 17.499 ATOM 476 CB TYR 33 13.154 -2.243 14.790 ATOM 477 CG TYR 33 14.066 -3.275 15.385 ATOM 478 CD1 TYR 33 15.158 -2.997 16.140 ATOM 479 CD2 TYR 33 13.705 -4.619 15.244 ATOM 480 CE1 TYR 33 15.930 -4.045 16.679 ATOM 481 CE2 TYR 33 14.414 -5.660 15.777 ATOM 482 CZ TYR 33 15.554 -5.350 16.468 ATOM 483 OH TYR 33 16.237 -6.383 17.086 ATOM 484 H TYR 33 10.927 -1.209 14.468 ATOM 485 HA TYR 33 12.096 -2.235 16.585 ATOM 486 1HB TYR 33 12.572 -2.662 14.138 ATOM 487 2HB TYR 33 13.682 -1.569 14.334 ATOM 488 HD1 TYR 33 15.398 -2.113 16.302 ATOM 489 HD2 TYR 33 12.937 -4.817 14.758 ATOM 490 HE1 TYR 33 16.692 -3.854 17.177 ATOM 491 HE2 TYR 33 14.134 -6.541 15.674 ATOM 492 HH TYR 33 16.848 -6.075 17.537 ATOM 493 N SER 34 12.844 0.772 16.083 ATOM 494 CA SER 34 13.557 1.963 16.575 ATOM 495 C SER 34 12.950 2.566 17.788 ATOM 496 O SER 34 13.608 3.309 18.555 ATOM 497 CB ASER 34 13.553 3.076 15.438 ATOM 498 CB BSER 34 13.638 2.928 15.354 ATOM 499 OG ASER 34 12.413 3.945 15.615 ATOM 500 OG BSER 34 14.493 2.380 14.346 ATOM 501 H SER 34 12.274 0.871 15.446 ATOM 502 HA SER 34 14.485 1.717 16.774 ATOM 503 1HB ASER 34 14.372 3.595 15.484 ATOM 504 1HB BSER 34 12.751 3.066 14.989 ATOM 505 2HB ASER 34 13.511 2.653 14.566 ATOM 506 2HB BSER 34 13.984 3.788 15.641 ATOM 507 HG ASER 34 11.724 3.501 15.592 ATOM 508 HG BSER 34 15.247 2.278 14.652 ATOM 509 N ASN 35 11.615 2.428 18.062 ATOM 510 CA ASN 35 10.925 3.271 18.975 ATOM 511 C ASN 35 9.854 2.614 19.812 ATOM 512 O ASN 35 9.237 3.328 20.640 ATOM 513 CB ASN 35 10.506 4.597 18.417 ATOM 514 CG ASN 35 10.539 5.778 19.332 ATOM 515 OD1 ASN 35 11.373 5.994 20.212 ATOM 516 ND2 ASN 35 9.625 6.723 19.078 ATOM 517 H ASN 35 11.175 1.804 17.664 ATOM 518 HA ASN 35 11.613 3.507 19.631 ATOM 519 1HB ASN 35 11.075 4.794 17.657 ATOM 520 2HB ASN 35 9.601 4.507 18.081 ATOM 521 1HD2 ASN 35 9.631 7.458 19.526 ATOM 522 2HD2 ASN 35 9.033 6.596 18.468 ########## -- View this message in context: http://www.nabble.com/convert-xyz-coordinates-to-distance-t1325102.html#a3535971 Sent from the Perl - Bioperl-L forum at Nabble.com. From boris.steipe at utoronto.ca Wed Mar 22 15:28:41 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 22 Mar 2006 15:28:41 -0500 Subject: [Bioperl-l] convert xyz coordinates to distance In-Reply-To: <3535971.post@talk.nabble.com> References: <3535971.post@talk.nabble.com> Message-ID: <0D545175-AFD3-4CF7-BFB5-FF96DB650FEA@utoronto.ca> Here's a bit of cut-and-paste code: this routine works with two strings, each containing an ATOM record. No checking is done, the logic which records to select is up to you. Use as, for example: my $distance = PDB_dist($coord_1, $coord_2); sub PDB_dist { my ($s1, $s2) = @_; my $dx = substr($s1,30,8) - substr($s2,30,8); my $dy = substr($s1,38,8) - substr($s2,38,8); my $dz = substr($s1,46,8) - substr($s2,46,8); return sqrt( $dx*$dx + $dy*$dy + $dz*$dz ); } HTH Boris On 22 Mar 2006, at 11:42, ni_psis wrote: > > Hi, > > I am new to perl, but I would like write a perl script that can > convert xyz > cordinates from a pdb file to distances using the formula: distance = > SQRT[(X1-X2)^2 + (Y1-Y2)^2 + (Z1-Z2)^2] > > eg: The atoms of HELIX 1 with all the atoms of HELIX 2 and 3 > > the atom[0] of 1 with the atom[0] of 2 > the atom[0] of 1 with the atom[1] of 2 > the atom[0] of 1 with the atom[2] of 2 > the atom[0] of 1 with the atom[3] of 2 > . > . > . > . > . > . > the atom[1] of 1 with the atom[0] of 2 > the atom[1] of 1 with the atom[1] of 2 > the atom[1] of 1 with the atom[2] of 2 > the atom[1] of 1 with the atom[3] of 2 > > > I want a good idea of how to do it > Please HELP ME > thanks > > > The pdb file format look like this > > HEADER PHEROMONE 20-DEC-95 2ERL > HELIX 1 H1 ASP 1 GLN 9 > HELIX 2 H2 GLU 12 LEU 18 > HELIX 3 H3 GLU 23 ASN 35 > ATOM 1 N ASP 1 -1.115 8.537 7.075 [snip !] > > -- > View this message in context: http://www.nabble.com/convert-xyz- > coordinates-to-distance-t1325102.html#a3535971 > Sent from the Perl - Bioperl-L forum at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From smarkel at scitegic.com Wed Mar 22 17:28:44 2006 From: smarkel at scitegic.com (Scott Markel) Date: Wed, 22 Mar 2006 14:28:44 -0800 Subject: [Bioperl-l] question about revision 1.17 of Bio::Index::AbstractSeq.pm Message-ID: <4421CF9C.2040500@scitegic.com> We're upgrading our Pipeline Pilot Sequence Analysis Collection, which runs on Windows and Linux, to use BioPerl 1.5.1. In tracking down a regression change, we've discovered the following. There's a line in Bio::Index::AbstractSeq that was commented out in revision 1.17. It's the only change. The CVS log message for this change is "Nathan's additions". The line is in the fetch() subroutine. $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug We find that we need this line to be uncommented in order for the first character of the ID to be included. Anyone have any history on why this line was commented out? We looked at DB_File and SDBM_File changes and didn't see any changes that impact the file pointer. Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From osborne1 at optonline.net Wed Mar 22 18:24:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 22 Mar 2006 18:24:27 -0500 Subject: [Bioperl-l] question about revision 1.17 of Bio::Index::AbstractSeq.pm In-Reply-To: <4421CF9C.2040500@scitegic.com> Message-ID: Scott, Yes, I believe I committed this on behalf of Nathan Haigh. Is this right, Nathan? Brian O. On 3/22/06 5:28 PM, "Scott Markel" wrote: > We're upgrading our Pipeline Pilot Sequence Analysis Collection, > which runs on Windows and Linux, to use BioPerl 1.5.1. In > tracking down a regression change, we've discovered the > following. > > There's a line in Bio::Index::AbstractSeq that was commented > out in revision 1.17. It's the only change. The CVS log > message for this change is "Nathan's additions". > > The line is in the fetch() subroutine. > > $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug > > We find that we need this line to be uncommented in order for > the first character of the ID to be included. > > Anyone have any history on why this line was commented out? > We looked at DB_File and SDBM_File changes and didn't see > any changes that impact the file pointer. > > Scott From jyotikshah at gmail.com Wed Mar 22 16:24:35 2006 From: jyotikshah at gmail.com (Jyoti Shah) Date: Wed, 22 Mar 2006 15:24:35 -0600 Subject: [Bioperl-l] cannot find path to wublast Message-ID: <769931430603221324q354b1c74o48190de3ff3bf2ef@mail.gmail.com> Hi, I had a question regadring the usage of standaloneblast to parse wublast results. I am comparatively new with bioperl and any help would be greatly appreciated :-) I have previously used standaloneblast to parse NCBI blast results and it seems to work fine but when I tried the similar approach with the wublast , I got the following error message: -------------------- WARNING --------------------- MSG: cannot find path to wublast --------------------------------------------------- Can't call method "next_result" on an undefined value at ./wu-blast.pl line 20. The script I have used for this is as follows: #!/usr/bin/perl use lib "/home/admin/bioperl-1.4/"; $db='/unigene/human/Hs.seq.uniq'; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } BEGIN { $ENV{BLASTDATADIR} = '/home/databases/'; } use Bio::Seq; use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::SearchIO; @params = ('program' => 'wublastn', 'database' => $db); $seqn = "AAGGCCGTGACCC"; $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id =>"test_query", -seqn=> $seqn); my $blast_report = $factory->wublast($input); my $result = $blast_report->next_result; while(my $hit = $result->next_hit()) { $name=$hit->name(); $desc=$hit->description(); print "Name $name\tDesc $desc\n"; } From cjfields at uiuc.edu Wed Mar 22 20:58:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 22 Mar 2006 19:58:02 -0600 Subject: [Bioperl-l] question about revision 1.17 of Bio::Index::AbstractSeq.pm In-Reply-To: <4421CF9C.2040500@scitegic.com> References: <4421CF9C.2040500@scitegic.com> Message-ID: <23C3985D-540E-4624-ADD8-8653FCA50DDD@uiuc.edu> Does this have anything to do with older versions of ActivePerl on Windows? I remember seeing something once quite a while back on a DB_File bug that was win32-related so I did some searching, but I believe that has been fixed in more recent ActivePerl distributions. Nathan's fix was probably commented out for newer versions of ActivePerl. The original bug fix looks like it was intended as a workaround for an older DB_File issue in early versions of ActivePerl 5.6 (rev. 1.15, Jason's original fix whic adds this line, reads "workaround for problems with Windows and indexing files - important that latest DB_File is installed"). That was over four years ago and dates to around the time ActivePerl was having some issues with DB_File. ActivePerl DB_File should be working fine now. Is it possible that your ActivePerl version is an older v. 5.6 and has an older, possibly buggy DB_File? Chris On Mar 22, 2006, at 4:28 PM, Scott Markel wrote: > We're upgrading our Pipeline Pilot Sequence Analysis Collection, > which runs on Windows and Linux, to use BioPerl 1.5.1. In > tracking down a regression change, we've discovered the > following. > > There's a line in Bio::Index::AbstractSeq that was commented > out in revision 1.17. It's the only change. The CVS log > message for this change is "Nathan's additions". > > The line is in the fetch() subroutine. > > $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug > > We find that we need this line to be uncommented in order for > the first character of the ID to be included. > > Anyone have any history on why this line was commented out? > We looked at DB_File and SDBM_File changes and didn't see > any changes that impact the file pointer. > > Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed Mar 22 22:19:06 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 23 Mar 2006 14:19:06 +1100 Subject: [Bioperl-l] cannot find path to wublast In-Reply-To: <769931430603221324q354b1c74o48190de3ff3bf2ef@mail.gmail.com> References: <769931430603221324q354b1c74o48190de3ff3bf2ef@mail.gmail.com> Message-ID: <442213AA.9030800@infotech.monash.edu.au> > -------------------- WARNING --------------------- > MSG: cannot find path to wublast > --------------------------------------------------- > BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } > @params = ('program' => 'wublastn', Just a quick possibility - do you actuall have an executable file called "wublastn", and is it in /home/wustl/wu-blast ? Because in my wu-blast installation, it's called just "blastn" (which itself is a symlink to the real multipurpose exe called "blasta"). > my $blast_report = $factory->wublast($input); From rahall2 at ualr.edu Thu Mar 23 12:28:14 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Thu, 23 Mar 2006 11:28:14 -0600 Subject: [Bioperl-l] Thursday Message-ID: <005101c64e9f$2b27e800$4701a8c0@LIBERAL2> Folks, I went to the docs this morning because I felt "funny" and weak. I've been feeling a little weird this week, and had a flutter or two in my chest. I'm feeling better. Don't know what the issue might be at this point. I'm working on a couple of things at home, but call or email if you need anything. Call soon, or email later please. Thanks, Roger From smarkel at scitegic.com Thu Mar 23 20:08:55 2006 From: smarkel at scitegic.com (Scott Markel) Date: Thu, 23 Mar 2006 17:08:55 -0800 Subject: [Bioperl-l] question about revision 1.17 of Bio::Index::AbstractSeq.pm In-Reply-To: <23C3985D-540E-4624-ADD8-8653FCA50DDD@uiuc.edu> References: <4421CF9C.2040500@scitegic.com> <23C3985D-540E-4624-ADD8-8653FCA50DDD@uiuc.edu> Message-ID: <442346A7.9040607@scitegic.com> Chris, Thanks for the detailed response. We're using a version we compiled, not ActivePerl. > perl.exe -v This is perl, v5.8.7 built for MSWin32-x86-multi-thread We'll go check the DB_File versions again. Hopefully we'll see something obvious. Scott Chris Fields wrote: > Does this have anything to do with older versions of ActivePerl on > Windows? I remember seeing something once quite a while back on a > DB_File bug that was win32-related so I did some searching, but I > believe that has been fixed in more recent ActivePerl distributions. > Nathan's fix was probably commented out for newer versions of ActivePerl. > > The original bug fix looks like it was intended as a workaround for an > older DB_File issue in early versions of ActivePerl 5.6 (rev. 1.15, > Jason's original fix whic adds this line, reads "workaround for > problems with Windows and indexing files - important that latest > DB_File is installed"). That was over four years ago and dates to > around the time ActivePerl was having some issues with DB_File. > ActivePerl DB_File should be working fine now. Is it possible that > your ActivePerl version is an older v. 5.6 and has an older, possibly > buggy DB_File? > > Chris > > On Mar 22, 2006, at 4:28 PM, Scott Markel wrote: > >> We're upgrading our Pipeline Pilot Sequence Analysis Collection, >> which runs on Windows and Linux, to use BioPerl 1.5.1. In >> tracking down a regression change, we've discovered the >> following. >> >> There's a line in Bio::Index::AbstractSeq that was commented >> out in revision 1.17. It's the only change. The CVS log >> message for this change is "Nathan's additions". >> >> The line is in the fetch() subroutine. >> >> $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug >> >> We find that we need this line to be uncommented in order for >> the first character of the ID to be included. >> >> Anyone have any history on why this line was commented out? >> We looked at DB_File and SDBM_File changes and didn't see >> any changes that impact the file pointer. >> >> Scott >> >> -- >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >> San Diego, CA 92123 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From sonmitra4u at yahoo.co.in Fri Mar 24 07:06:35 2006 From: sonmitra4u at yahoo.co.in (sonmitra mondal) Date: Fri, 24 Mar 2006 12:06:35 +0000 (GMT) Subject: [Bioperl-l] error Message-ID: <20060324120635.58599.qmail@web8514.mail.in.yahoo.com> while running the script in unix i am facing one problem , everytime during execution it's showing 1 error message : Invalid RID . Please help me With regards Sonmitra student of M.Sc. Bioinformatics __________________________________________________________ Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com From cjfields at uiuc.edu Fri Mar 24 10:00:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Mar 2006 09:00:23 -0600 Subject: [Bioperl-l] error In-Reply-To: <20060324120635.58599.qmail@web8514.mail.in.yahoo.com> References: <20060324120635.58599.qmail@web8514.mail.in.yahoo.com> Message-ID: <54211ACA-3724-4EA6-9480-69C1DFDDB9C1@uiuc.edu> Which script? I'm guessing one from the Beginner's HOWTO or one using Bio::Perl. An RID is usually from a NCBI BLAST run. We need more information (OS, perl version, bioperl version, script, input, etc) to actually help you; otherwise we're just shooting in the dark. Chris On Mar 24, 2006, at 6:06 AM, sonmitra mondal wrote: > while running the script in unix i am facing one > problem , everytime during execution it's showing 1 > error message : Invalid RID . > Please help me > > With regards > Sonmitra > student of M.Sc. Bioinformatics > > > > __________________________________________________________ > Yahoo! India Matrimony: Find your partner now. Go to http:// > yahoo.shaadi.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rahall2 at ualr.edu Fri Mar 24 14:50:34 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 24 Mar 2006 13:50:34 -0600 Subject: [Bioperl-l] Quantum computing Message-ID: <002301c64f7c$36de86d0$cc16a790@LIBERAL2> http://www.news-gazette.com/news/local/2006/03/02/quantum_bits_help_yiel d_some_computer_esp/ From dwaner at scitegic.com Fri Mar 24 18:52:57 2006 From: dwaner at scitegic.com (David Waner) Date: Fri, 24 Mar 2006 15:52:57 -0800 Subject: [Bioperl-l] genbank species parsing (genbank.pm,v 1.121) Message-ID: <830D8D4719112B418ABBC3A0EBA95812019428B3@webmail.scitegic.com> The genbank reader in BioPerl 1.5.1 parses the species name of plant hybrids like "Musa x paradisiaca" as species = "x", subspecies "paradisiaca". It would be more useful (and result in more accurate round tripping) if this were parsed as species = "x paradisiaca", no subspecies. Perhaps this common special case should be handled in the genbank.pm module. - David Waner Example excerpts: Original genbank file: SOURCE Musa x paradisiaca ORGANISM Musa x paradisiaca Output from round-trip through BioPerl: SOURCE Musa x paradisiaca paradisiaca ORGANISM Musa x Test case: LOCUS MSZ85965 634 bp DNA linear STS 28-FEB-2002 DEFINITION Musa x paradisiaca DNA for sequence tagged microsatellite site (STMS), sequence tagged site. ACCESSION Z85965 VERSION Z85965.1 GI:2266701 KEYWORDS STS; microsatellite; STMS. SOURCE Musa x paradisiaca ORGANISM Musa x paradisiaca Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Zingiberales; Musaceae; Musa. REFERENCE 1 AUTHORS Lagoda,P.J.L. TITLE Banana STMS markers JOURNAL Unpublished REFERENCE 2 (bases 1 to 634) AUTHORS Lagoda,P.J.L. TITLE Direct Submission JOURNAL Submitted (31-JAN-1997) Lagoda P.J.L., CIRAD BIOTROP, AGETROP laboratory, Avenue du val de Montferrand, BP5035, 34032 Montpellier Cedex, France FEATURES Location/Qualifiers source 1..634 /organism="Musa x paradisiaca" /mol_type="genomic DNA" /cultivar="Gobusik" /db_xref="taxon:89151" /clone="pMaCIR561" /cell_line="Madang" /clone_lib="Pst 1" primer_bind 79..103 /note="Upper Primer AGMI145" repeat_region 165..188 /note="(TC)16 repeat" repeat_region 247..261 /note="(TC)6 repeat" primer_bind 279..297 /note="Lower Primer AGMI146" ORIGIN 1 ctgcaggtaa ctggccgagt tgaacagtac caaccctgtt gtcacgaggc acataatgac 61 tagagtaccc tccatccaag ctattacttg tttttatctt gaagacattt cagtctatnc 121 aatcataagc atgattgaac cctctcattc gtgaaccgct accctctctc tctctctctc 181 tctctctcca gcnacccttt nttngctctg tctaactact ctgtccctct cttggctctt 241 gcacactcct ctctctctct ccccagtaat tgaacncctc ctgtcttttn tgtccttgct 301 ccctcttctt tccagtcntc atnttatctc tnnctgcana anattgcacc atttccttac 361 ttcttagtan tttcagattt ttaaatattt tccaatattg caccaaaatc ttggctgtct 421 tattggtcca actagtaatc tgaggcttag taaagtcatt gttcagtttg agcttgataa 481 ttatggttcg aatgcttaaa gactagtaaa tctacgggaa gggttacaan accccataaa 541 attctagctt atactgnaat aaaaaaacnt cttccaacnt aacanccttt ccantatctc 601 tcgggttttt caaaaggatt aaggnnggtg ttcc // From dwaner at scitegic.com Fri Mar 24 19:19:04 2006 From: dwaner at scitegic.com (David Waner) Date: Fri, 24 Mar 2006 16:19:04 -0800 Subject: [Bioperl-l] Species name validation problem Message-ID: <830D8D4719112B418ABBC3A0EBA95812019428B4@webmail.scitegic.com> I have found that Bio::Seq->new() throws exceptions on some "species" names containing special characters, or consisting of a single letter, e.g: SwissProt: POLN_ONNVG O'nyong-nyong virus SwissProt: FIBP_ADE1H Human adenovirus 15/H9 SwissProt: POLG_FMDVZ Foot-and-mouth disease virus (strain A22/550 Azerbaijan 65) SwissProt: RIR1_BHV1C Bovine herpesvirus 1.1 SwissProt: SODF_METJ Methylomonas J GenBank: AJ416726 Stylosanthes aff. calcicola It seems that the regex in validate_species_name() is too restrictive, but I can't find a way to turn off validation without editing bioperl modules. There has been some recent discussion of this issue on the mailing list (see below). Does anyone know if or when a -validate_species option to Bio::Seq->new() will be added? Or should I just propose the code change? Thanks, David Waner > Stefan Kirov skirov at utk.edu > Wed Sep 21 08:46:05 EDT 2005 > > ------------------------------------------------------------------------ -------- > > Thanks for the great answer Hilmar! > I would prefer to have some kind of a check if the user wishes so. For > example Entrezgene file contains some HTML tags in some entries species > names which is good to know. > I will put an option -validate_species in the constructor to turn the > check on and off. Maybe a species filter can be of some use as well. > though you can just select the correct file from the NCBI site.... > Thanks again! > Stefan > From cjfields at uiuc.edu Fri Mar 24 21:53:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Mar 2006 20:53:35 -0600 Subject: [Bioperl-l] Quantum computing In-Reply-To: <002301c64f7c$36de86d0$cc16a790@LIBERAL2> References: <002301c64f7c$36de86d0$cc16a790@LIBERAL2> Message-ID: That's my home-town boys! Now if they could only figure out why we didn't make the sweet 16.... On Mar 24, 2006, at 1:50 PM, Roger Hall wrote: > http://www.news-gazette.com/news/local/2006/03/02/ > quantum_bits_help_yiel > d_some_computer_esp/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sat Mar 25 00:42:18 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 24 Mar 2006 21:42:18 -0800 Subject: [Bioperl-l] Species name validation problem In-Reply-To: <830D8D4719112B418ABBC3A0EBA95812019428B4@webmail.scitegic.com> References: <830D8D4719112B418ABBC3A0EBA95812019428B4@webmail.scitegic.com> Message-ID: The option would be in Bio::Species, not Bio::Seq. You can circumvent the name validation by passing an array ref to $species->classification() and anything that evaluates to true as the second argument. This is for instance what the genbank parser does (which doesn't mean that it is always correct); supposedly the swissprot parser ought to do the same. -hilmar On 3/24/06, David Waner wrote: > I have found that Bio::Seq->new() throws exceptions on some "species" > names containing special characters, or consisting of a single letter, > e.g: > > SwissProt: POLN_ONNVG O'nyong-nyong virus > SwissProt: FIBP_ADE1H Human adenovirus 15/H9 > SwissProt: POLG_FMDVZ Foot-and-mouth disease virus (strain > A22/550 Azerbaijan 65) > SwissProt: RIR1_BHV1C Bovine herpesvirus 1.1 > SwissProt: SODF_METJ Methylomonas J > GenBank: AJ416726 Stylosanthes aff. calcicola > > It seems that the regex in validate_species_name() is too restrictive, > but I can't find a way to turn off validation without editing bioperl > modules. There has been some recent discussion of this issue on the > mailing list (see below). Does anyone know if or when a > -validate_species option to Bio::Seq->new() will be added? Or should I > just propose the code change? > > Thanks, > David Waner > > > > Stefan Kirov skirov at utk.edu > > Wed Sep 21 08:46:05 EDT 2005 > > > > > ------------------------------------------------------------------------ > -------- > > > > Thanks for the great answer Hilmar! > > I would prefer to have some kind of a check if the user wishes so. For > > > example Entrezgene file contains some HTML tags in some entries > species > > names which is good to know. > > I will put an option -validate_species in the constructor to turn the > > check on and off. Maybe a species filter can be of some use as well. > > though you can just select the correct file from the NCBI site.... > > Thanks again! > > Stefan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From dag at sonsorol.org Sat Mar 25 18:50:57 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Sat, 25 Mar 2006 18:50:57 -0500 Subject: [Bioperl-l] Important news for developers on open-bio machines Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org> Hi, apologies for the massive cross-post. I'll keep it short! This message is a last-ditch attempt to contact people with developer accounts on pub.open-bio.org who may have not received the individual mails we've been sending via the obf-developers at lists.open-bio.org mailing list. We suspect that there are a number of devs out there for whom we don't have up to date email addresses. All open-bio services have been migrated to new hardware and a new datacenter. Part of this migration process involved moving all developer accounts and all source-code repositories to a new server. The developer migration was completed a few minutes ago. An unavoidable side effect of the move is that all developers are now locked out of their accounts until they contact us for a password reset. If you are a developer and this news comes as a surprise to you, it means we don't have your contact info. Your best way to get up to speed on the history and technical details behind the migration is to point your browser here: http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ thread.html ... and read the various messages we've posted this month. Included in the first message is the information on how to request an account reset. Regards, Chris Dagdigian open-bio.org From pterry2 at unlnotes.unl.edu Sun Mar 26 14:47:14 2006 From: pterry2 at unlnotes.unl.edu (Philip M Terry) Date: Sun, 26 Mar 2006 13:47:14 -0600 Subject: [Bioperl-l] bptutorial.pl question, and appropriate bioperl version question Message-ID: Hello, Would anyone be able to help/comment on the following: version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz (installed from CPAN) platform: Power Mac G5 OS X 10.4.5 What trying to do: Practice/learn to run bptutorial.pl from ~/.cpan/build/bioperl-1.4, trying run_remoteblast Code that gives the error: philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ perl -w bptutorial.pl 22 Beginning run_remoteblast example... submitted Blast job retrieving results... #line 3322 of bptutorial.pl -------------------- WARNING --------------------- MSG: Possible error (1) while parsing BLAST report! --------------------------------------------------- Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm line 337, line 30. Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm line 338, line 30. Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm line 339, line 30. Use of uninitialized value in pattern match (m//) at Bio/Tools/BPlite.pm line 341, line 30. philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ The variable $def never gets initialized at line 326 of BPlite.pm module. I note that Bio::Tools::BPlite module will not to be supported in future versions of bioperl. So should I go from bioperl ver 1.4 to ver 1.5.1 before proceeding to practice with bptutorial.pl? If so, do I need to delete/uninstall ver 1.4 first before installing ver 1.5.1? If so, how should it be done? Thanks, Philip M. Terry, Ph.D. University of Nebraska-Lincoln From mseewald at gmail.com Sun Mar 26 07:26:29 2006 From: mseewald at gmail.com (Michael Seewald) Date: Sun, 26 Mar 2006 14:26:29 +0200 Subject: [Bioperl-l] MeSH term retrieval Message-ID: Dear Bioperl-Users, dear Heikki, I am trying to retrieve MeSH term descriptions using Bio::DB::MeSH (bioperl v1.5.0). Sometimes it works (first example below), sometimes it does not (second example). Is there anything wrong with the query? Thanks & bets wishes, Michael use Bio::DB::MeSH; my $mesh = new Bio::DB::MeSH(); # works, compare: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=&db=MeSH&cmd=search&term=Eisenmenger%20Complex my $term = 'Eisenmenger Complex'; print "Term: $term\n"; print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; # does not work, compare: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=&db=MeSH&cmd=search&term=Sinus%20Thrombosis,%20Intracranial my $term = 'Sinus Thrombosis, Intracranial'; print "Term: $term\n"; print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; -- Dr. Michael Seewald Bioinformatics Bayer HealthCare AG From hlapp at gmx.net Sun Mar 26 20:06:06 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 26 Mar 2006 17:06:06 -0800 Subject: [Bioperl-l] MeSH term retrieval In-Reply-To: References: Message-ID: <8fbf6550d8754858c36ab235b57d6929@gmx.net> There's no psychics on this list AFAIK - if you don't post the error message or further qualify 'does not work' then I'm afraid you won't get much help ... On Mar 26, 2006, at 4:26 AM, Michael Seewald wrote: > Dear Bioperl-Users, dear Heikki, > > I am trying to retrieve MeSH term descriptions using Bio::DB::MeSH > (bioperl > v1.5.0). Sometimes it works (first example below), sometimes it does > not > (second example). Is there anything wrong with the query? > > Thanks & bets wishes, > Michael > > > use Bio::DB::MeSH; > > my $mesh = new Bio::DB::MeSH(); > > # works, compare: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > holding=&db=MeSH&cmd=search&term=Eisenmenger%20Complex > my $term = 'Eisenmenger Complex'; > print "Term: $term\n"; > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > # does not work, compare: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > holding=&db=MeSH&cmd=search&term=Sinus%20Thrombosis,%20Intracranial > my $term = 'Sinus Thrombosis, Intracranial'; > print "Term: $term\n"; > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > -- > Dr. Michael Seewald > Bioinformatics > Bayer HealthCare AG > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Sun Mar 26 20:57:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 26 Mar 2006 19:57:08 -0600 Subject: [Bioperl-l] bptutorial.pl question, and appropriate bioperl version question In-Reply-To: References: Message-ID: Try updating to at least v.1.5.1. If that doesn't help, try updating from CVS. Not sure if this has been reported as a bug or not but there have been many changes since 1.4 was released, many specifically in Blast parsing. I'm not sure Bio::Perl is still using BPlite for Blast parsing but wouldn't be surprised if it has changed to SearchIO::blast. Chris On Mar 26, 2006, at 1:47 PM, Philip M Terry wrote: > > Hello, > > Would anyone be able to help/comment on the following: > > version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz > (installed from CPAN) > platform: Power Mac G5 OS X 10.4.5 > What trying to do: Practice/learn to run bptutorial.pl from > ~/.cpan/build/bioperl-1.4, trying run_remoteblast > Code that gives the error: > philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ perl -w > bptutorial.pl 22 > > Beginning run_remoteblast example... > submitted Blast job > retrieving results... #line 3322 of bptutorial.pl > > -------------------- WARNING --------------------- > MSG: Possible error (1) while parsing BLAST report! > --------------------------------------------------- > Use of uninitialized value in substitution (s///) at Bio/Tools/ > BPlite.pm > line 337, line 30. > Use of uninitialized value in substitution (s///) at Bio/Tools/ > BPlite.pm > line 338, line 30. > Use of uninitialized value in substitution (s///) at Bio/Tools/ > BPlite.pm > line 339, line 30. > Use of uninitialized value in pattern match (m//) at Bio/Tools/ > BPlite.pm > line 341, line 30. > philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ > > The variable $def never gets initialized at line 326 of BPlite.pm > module. > > I note that Bio::Tools::BPlite module will not to be supported in > future > versions of bioperl. > > So should I go from bioperl ver 1.4 to ver 1.5.1 before proceeding to > practice with bptutorial.pl? > > If so, do I need to delete/uninstall ver 1.4 first before > installing ver > 1.5.1? If so, how should it be done? > > Thanks, > Philip M. Terry, Ph.D. > University of Nebraska-Lincoln > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From saldroubi at gmail.com Sun Mar 26 19:12:49 2006 From: saldroubi at gmail.com (Sam Al-Droubi) Date: Sun, 26 Mar 2006 19:12:49 -0500 Subject: [Bioperl-l] bptutorial.pl question, and appropriate bioperl version question In-Reply-To: References: Message-ID: Philip, I am relatively new to bioperl and I started with 1.5.0 and then I upgraded to 1.5.1 because of a bug. I suggest you do the same. As far as I know, you don't need to uninstall the current version just install the new version and I think everything will work fine. On 3/26/06, Philip M Terry wrote: > > > Hello, > > Would anyone be able to help/comment on the following: > > version of bioperl: Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz > (installed from CPAN) > platform: Power Mac G5 OS X 10.4.5 > What trying to do: Practice/learn to run bptutorial.pl from > ~/.cpan/build/bioperl-1.4, trying run_remoteblast > Code that gives the error: > philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ perl -w > bptutorial.pl 22 > > Beginning run_remoteblast example... > submitted Blast job > retrieving results... #line 3322 of bptutorial.pl > > -------------------- WARNING --------------------- > MSG: Possible error (1) while parsing BLAST report! > --------------------------------------------------- > Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm > line 337, line 30. > Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm > line 338, line 30. > Use of uninitialized value in substitution (s///) at Bio/Tools/BPlite.pm > line 339, line 30. > Use of uninitialized value in pattern match (m//) at Bio/Tools/BPlite.pm > line 341, line 30. > philip-terrys-power-mac-g5:~/.cpan/build/bioperl-1.4 mterry$ > > The variable $def never gets initialized at line 326 of BPlite.pm module. > > I note that Bio::Tools::BPlite module will not to be supported in future > versions of bioperl. > > So should I go from bioperl ver 1.4 to ver 1.5.1 before proceeding to > practice with bptutorial.pl? > > If so, do I need to delete/uninstall ver 1.4 first before installing ver > 1.5.1? If so, how should it be done? > > Thanks, > Philip M. Terry, Ph.D. > University of Nebraska-Lincoln > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Sincerely, Sam Al-Droubi, M.S. From alex at bioinfo2.sastra.edu Mon Mar 27 04:44:40 2006 From: alex at bioinfo2.sastra.edu (Alex Stanley) Date: Mon, 27 Mar 2006 15:14:40 +0530 Subject: [Bioperl-l] help needed Message-ID: dear sir, i worked genscan with the code from ur publish, now presently iam doing project in my college can i know whether u have any code for UTR or SPLICE SITES to predict there regions, plz help me in this regard, Thank you sir Yours B.Alex From mseewald at gmail.com Mon Mar 27 06:58:01 2006 From: mseewald at gmail.com (Michael Seewald) Date: Mon, 27 Mar 2006 13:58:01 +0200 Subject: [Bioperl-l] MeSH term retrieval In-Reply-To: <8fbf6550d8754858c36ab235b57d6929@gmx.net> References: <8fbf6550d8754858c36ab235b57d6929@gmx.net> Message-ID: Right, but it's all I could find out. The output on my system is: $ ./test.pl Term: Eisenmenger Complex Desc: Defect of the interventricular septum with severe pulmonary hypertension, hypertrophy of the right ventricle, and latent or overt cyanosis. Term: Sinus Thrombosis, Intracranial Desc: Using Data::Dumper, I found out, that the first mesh object has a key value pair for "description", the object from the second query does not. This may be pointing at a parsing problem. Any hints welcome.. Could anyone run the script on their system and see whether it works in a different setting? Best wishes, Michael On 3/27/06, Hilmar Lapp wrote: > > There's no psychics on this list AFAIK - if you don't post the error > message or further qualify 'does not work' then I'm afraid you won't > get much help ... > > On Mar 26, 2006, at 4:26 AM, Michael Seewald wrote: > > > Dear Bioperl-Users, dear Heikki, > > > > I am trying to retrieve MeSH term descriptions using Bio::DB::MeSH > > (bioperl > > v1.5.0). Sometimes it works (first example below), sometimes it does > > not > > (second example). Is there anything wrong with the query? > > > > Thanks & bets wishes, > > Michael > > > > > > use Bio::DB::MeSH; > > > > my $mesh = new Bio::DB::MeSH(); > > > > # works, compare: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > > holding=&db=MeSH&cmd=search&term=Eisenmenger%20Complex > > my $term = 'Eisenmenger Complex'; > > print "Term: $term\n"; > > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > > # does not work, compare: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > > holding=&db=MeSH&cmd=search&term=Sinus%20Thrombosis,%20Intracranial > > my $term = 'Sinus Thrombosis, Intracranial'; > > print "Term: $term\n"; > > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > > > > -- > > Dr. Michael Seewald > > Bioinformatics > > Bayer HealthCare AG > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > -- Dr. Michael Seewald Bioinformatics Bayer HealthCare AG From katrin-mueller4 at gmx.de Mon Mar 27 11:14:03 2006 From: katrin-mueller4 at gmx.de (Katrin) Date: Mon, 27 Mar 2006 08:14:03 -0800 (PST) Subject: [Bioperl-l] clustalw.exe Message-ID: <3612399.post@talk.nabble.com> hello, I am a new Perl/Bioperl-User and first I must excuse me for my really bad english, but I hope everybody will understand me. I have the following problem: In my Perl-skript is the following system call: $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If I call this Script with the Shell (cmd.exe) everything works correctly. But if I call this script with PHP I get the following error message: Error: unknown option /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried also system and qx. And I tested the environment variables: I wrote a bat-file with the definition of all environment-variables and the system call, but this did not work, too. The same problem is in php. The PHP-Scipt is called from html and I worked under WindowsXP with xampp. I hope, somebody can help me. greetings Katrin -- View this message in context: http://www.nabble.com/clustalw.exe-t1350142.html#a3612399 Sent from the Perl - Bioperl-L forum at Nabble.com. From t-nakazato at muj.biglobe.ne.jp Mon Mar 27 11:37:18 2006 From: t-nakazato at muj.biglobe.ne.jp (t-nakazato at muj.biglobe.ne.jp) Date: Tue, 28 Mar 2006 01:37:18 +0900 (JST) Subject: [Bioperl-l] MeSH term retrieval References: Message-ID: <20060328013718.BCNQC0A82741.C7DE0C8A@wpop.biglobe.ne.jp> Hi. I tried your script with some MeSH Terms. I can retrieve correct result with "Insulin", but I can't get with "Insulin, Isophane". You can't get the result because second example contain "," and Bioperl don't treat "," correctly, I think. Takeru ----- Takeru Nakazato t-nakazato at muj.biglobe.ne.jp > Right, but it's all I could find out. The output on my system is: > > $ ./test.pl > Term: Eisenmenger Complex > Desc: Defect of the interventricular septum with severe pulmonary > hypertension, hypertrophy of the right ventricle, and latent or overt > cyanosis. > Term: Sinus Thrombosis, Intracranial > Desc: > > Using Data::Dumper, I found out, that the first mesh object has a key value > pair for "description", the object from the second query does not. This may > be pointing at a parsing problem. Any hints welcome.. Could anyone run the > script on their system and see whether it works in a different setting? > > Best wishes, > Michael > > > On 3/27/06, Hilmar Lapp wrote: > > > > There's no psychics on this list AFAIK - if you don't post the error > > message or further qualify 'does not work' then I'm afraid you won't > > get much help ... > > > > On Mar 26, 2006, at 4:26 AM, Michael Seewald wrote: > > > > > Dear Bioperl-Users, dear Heikki, > > > > > > I am trying to retrieve MeSH term descriptions using Bio::DB::MeSH > > > (bioperl > > > v1.5.0). Sometimes it works (first example below), sometimes it does > > > not > > > (second example). Is there anything wrong with the query? > > > > > > Thanks & bets wishes, > > > Michael > > > > > > > > > use Bio::DB::MeSH; > > > > > > my $mesh = new Bio::DB::MeSH(); > > > > > > # works, compare: > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > > > holding=&db=MeSH&cmd=search&term=Eisenmenger%20Complex > > > my $term = 'Eisenmenger Complex'; > > > print "Term: $term\n"; > > > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > > > > # does not work, compare: > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > > > holding=&db=MeSH&cmd=search&term=Sinus%20Thrombosis,%20Intracranial > > > my $term = 'Sinus Thrombosis, Intracranial'; > > > print "Term: $term\n"; > > > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > > > > > > > -- > > > Dr. Seewald > > > Bioinformatics > > > Bayer HealthCare AG > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > > > > -- > Dr. Michael Seewald > Bioinformatics > Bayer HealthCare AG > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dwaner at scitegic.com Mon Mar 27 13:24:12 2006 From: dwaner at scitegic.com (David Waner) Date: Mon, 27 Mar 2006 10:24:12 -0800 Subject: [Bioperl-l] Species name validation problem Message-ID: <830D8D4719112B418ABBC3A0EBA95812019428B5@webmail.scitegic.com> Yes, I meant to type Bio::Species, not Bio::Seq. Sorry for the confusion. My problem is that I am not calling $species->classification() directly; I am calling Bio::Species->new(), which in turn calls classification() which calls validate_species_name(), which then throws an exception on some species names. As far as I can see, there is no way to turn off this (over-aggressive) validation in the Species constructor. I guess that instead of this: $species = Bio::Species->new(-classification => \@classificationArray); I could do this: $species = Bio::Species->new(); $species->classification(\@classificationArray, 'no validation'); but it would make a nicer interface to have a validation option in the Species constructor. - David -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Friday, March 24, 2006 9:42 PM To: David Waner Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Species name validation problem The option would be in Bio::Species, not Bio::Seq. You can circumvent the name validation by passing an array ref to $species->classification() and anything that evaluates to true as the second argument. This is for instance what the genbank parser does (which doesn't mean that it is always correct); supposedly the swissprot parser ought to do the same. -hilmar On 3/24/06, David Waner wrote: > I have found that Bio::Seq->new() throws exceptions on some "species" > names containing special characters, or consisting of a single letter, > e.g: > > SwissProt: POLN_ONNVG O'nyong-nyong virus > SwissProt: FIBP_ADE1H Human adenovirus 15/H9 > SwissProt: POLG_FMDVZ Foot-and-mouth disease virus (strain > A22/550 Azerbaijan 65) > SwissProt: RIR1_BHV1C Bovine herpesvirus 1.1 > SwissProt: SODF_METJ Methylomonas J > GenBank: AJ416726 Stylosanthes aff. calcicola > > It seems that the regex in validate_species_name() is too restrictive, > but I can't find a way to turn off validation without editing bioperl > modules. There has been some recent discussion of this issue on the > mailing list (see below). Does anyone know if or when a > -validate_species option to Bio::Seq->new() will be added? Or should I > just propose the code change? > > Thanks, > David Waner > > > > Stefan Kirov skirov at utk.edu > > Wed Sep 21 08:46:05 EDT 2005 > > > > > ---------------------------------------------------------------------- > -- > -------- > > > > Thanks for the great answer Hilmar! > > I would prefer to have some kind of a check if the user wishes so. > > For > > > example Entrezgene file contains some HTML tags in some entries > species > > names which is good to know. > > I will put an option -validate_species in the constructor to turn > > the check on and off. Maybe a species filter can be of some use as > > well. though you can just select the correct file from the NCBI > > site.... Thanks again! Stefan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Click on the link below to report this email as spam https://www.mailcontrol.com/sr/6RxreR3!4EAT093Sa0o+kL74sPfAD2rj2Jp!eGk8r RtXfcIn+KX87A70BrDI0qIcMansH9FDdvd7u5Zc1G6CuaLdquPg4xnr+tcULmTIZgnhNIFUk MNJWsODXSRTEtZF6To1umzAv!mlBBYJW4WXOZWaK8xzZrmj3Eao8o3D4YNM7jMpLnqnc7LtK 9D9H+YhmDk7r9DMVd5h6cTMU3rPx7Z43oVxeMeC From hlapp at gmx.net Mon Mar 27 13:29:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 27 Mar 2006 10:29:40 -0800 Subject: [Bioperl-l] Species name validation problem In-Reply-To: <830D8D4719112B418ABBC3A0EBA95812019428B5@webmail.scitegic.com> References: <830D8D4719112B418ABBC3A0EBA95812019428B5@webmail.scitegic.com> Message-ID: <026223d58812d8369430f5e794cf63f2@gmx.net> I agree. can you file this on bugzilla as a feature request, basically copy&pasting your email below? On Mar 27, 2006, at 10:24 AM, David Waner wrote: > Yes, I meant to type Bio::Species, not Bio::Seq. Sorry for the > confusion. > > My problem is that I am not calling $species->classification() > directly; > I am calling Bio::Species->new(), which in turn calls classification() > which calls validate_species_name(), which then throws an exception on > some species names. As far as I can see, there is no way to turn off > this (over-aggressive) validation in the Species constructor. > > I guess that instead of this: > > $species = Bio::Species->new(-classification => > \@classificationArray); > > I could do this: > > $species = Bio::Species->new(); > $species->classification(\@classificationArray, 'no > validation'); > > but it would make a nicer interface to have a validation option in the > Species constructor. > > - David > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Friday, March 24, 2006 9:42 PM > To: David Waner > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Species name validation problem > > > The option would be in Bio::Species, not Bio::Seq. You can circumvent > the name validation by passing an array ref to > $species->classification() and anything that evaluates to true as the > second argument. This is for instance what the genbank parser does > (which doesn't mean that it is always correct); supposedly the > swissprot > parser ought to do the same. > > -hilmar > > On 3/24/06, David Waner wrote: >> I have found that Bio::Seq->new() throws exceptions on some "species" >> names containing special characters, or consisting of a single letter, >> e.g: >> >> SwissProt: POLN_ONNVG O'nyong-nyong virus >> SwissProt: FIBP_ADE1H Human adenovirus 15/H9 >> SwissProt: POLG_FMDVZ Foot-and-mouth disease virus (strain >> A22/550 Azerbaijan 65) >> SwissProt: RIR1_BHV1C Bovine herpesvirus 1.1 >> SwissProt: SODF_METJ Methylomonas J >> GenBank: AJ416726 Stylosanthes aff. calcicola >> >> It seems that the regex in validate_species_name() is too restrictive, > >> but I can't find a way to turn off validation without editing bioperl >> modules. There has been some recent discussion of this issue on the >> mailing list (see below). Does anyone know if or when a >> -validate_species option to Bio::Seq->new() will be added? Or should I > >> just propose the code change? >> >> Thanks, >> David Waner >> >> >>> Stefan Kirov skirov at utk.edu >>> Wed Sep 21 08:46:05 EDT 2005 >>> >>> >> ---------------------------------------------------------------------- >> -- >> -------- >>> >>> Thanks for the great answer Hilmar! >>> I would prefer to have some kind of a check if the user wishes so. >>> For >> >>> example Entrezgene file contains some HTML tags in some entries >> species >>> names which is good to know. >>> I will put an option -validate_species in the constructor to turn >>> the check on and off. Maybe a species filter can be of some use as >>> well. though you can just select the correct file from the NCBI >>> site.... Thanks again! Stefan >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol.com/sr/6RxreR3!4EAT093Sa0o+kL74sPfAD2rj2Jp! > eGk8r > RtXfcIn+KX87A70BrDI0qIcMansH9FDdvd7u5Zc1G6CuaLdquPg4xnr+tcULmTIZgnhNIFU > k > MNJWsODXSRTEtZF6To1umzAv! > mlBBYJW4WXOZWaK8xzZrmj3Eao8o3D4YNM7jMpLnqnc7LtK > 9D9H+YhmDk7r9DMVd5h6cTMU3rPx7Z43oVxeMeC > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jyotikshah at gmail.com Mon Mar 27 13:55:48 2006 From: jyotikshah at gmail.com (Jyoti Shah) Date: Mon, 27 Mar 2006 12:55:48 -0600 Subject: [Bioperl-l] cannot find path to wublast In-Reply-To: <442213AA.9030800@infotech.monash.edu.au> References: <769931430603221324q354b1c74o48190de3ff3bf2ef@mail.gmail.com> <442213AA.9030800@infotech.monash.edu.au> Message-ID: <769931430603271055j73474f15o257f25f575159469@mail.gmail.com> Thanks you were right!! There was no program such as wublastn in my wu-blast directory :o. To use the "Standaloneblast" module, I had to rename the "wu-blastall" program to "blastall" and it worked great.The following changes did the magic for me :-) BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } BEGIN { $ENV{BLASTDATADIR} = '/home/databases/'; } @params = ('program' => 'blastn', 'database' => $db); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id =>"test_query", -seqn=> $seqn); my $blast_report = $factory->blastall($input); Thanks Jyoti On 3/22/06, Torsten Seemann wrote: > > > -------------------- WARNING --------------------- > > MSG: cannot find path to wublast > > --------------------------------------------------- > > > BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } > > > @params = ('program' => 'wublastn', > > Just a quick possibility - do you actuall have an executable file called > "wublastn", and is it in /home/wustl/wu-blast ? > > Because in my wu-blast installation, it's called just "blastn" (which > itself is a symlink to the real multipurpose exe called "blasta"). > > > my $blast_report = $factory->wublast($input); > From cjfields at uiuc.edu Mon Mar 27 14:36:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 27 Mar 2006 13:36:52 -0600 Subject: [Bioperl-l] error In-Reply-To: <20060327093202.11345.qmail@web8503.mail.in.yahoo.com> Message-ID: <000801c651d5$cc3d4850$15327e82@pyrimidine> Please reply to the mailing list as well. Don't use attachments either, especially Word-like doc files b/c they might not get through. Copy and paste any text into the email. As for your attached script, I get everything to work fine; below is a sampling of the returned info. Your bioperl version is WAY too old (v1.0.1 is from June 2002; there have been many updates since then, esp. to BLAST and RemoteBlast), so that's likely the issue. Update to the latest from CVS (my recommendation if you use RemoteBlast). Oh, and a bit of advice: don't use 'package Bio::Perl' in a script like this: #Code for :Running Remote Blast Program package Bio::Perl; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; ... It didn't do any harm here AFAIK but it will likely cause havoc in your code down the line. You normally don't use this declaration unless you are coding a perl module (class), so by using this statement here you essentially change the default package prefix from main to Bio::Perl here (not good). Any declared variables, methods, etc. are therefore in Bio::Perl's namespace and not main which could interfere with vars, methods, etc. declared as class members/methods in Bio::Perl. That apparently didn't happen though, but I would highly recommend not doing it again in the future, JIC. Here's the output: -------------------------------------------------- result db is Non-redundant SwissProt sequences sp|P49895|IOD1_HUMAN Type I iodothyronine deiodinase (Type-I 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 290 9e-79 sp|P24389|IOD1_RAT Type I iodothyronine deiodinase (Type-I 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 240 1e-63 sp|Q61153|IOD1_MOUSE Type I iodothyronine deiodinase (Type-I 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 236 2e-62 sp|P49894|IOD1_CANFA Type I iodothyronine deiodinase (Type-I 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 232 3e-61 sp|Q95N00|IOD1_SUNMU Type I iodothyronine deiodinase (Type-I 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 214 8e-56 ...... sp|Q9Z1Y9|IOD2_MOUSE Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 134 2e-31 sp|P70551|IOD2_RAT Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 134 2e-31 sp|Q6QN12|IOD2_PIG Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 132 4e-31 sp|Q92813|IOD2_HUMAN Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 131 1e-30 sp|Q5I3B2|IOD2_BOVIN Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 130 2e-30 sp|P49898|IOD3_RANCA Type III iodothyronine deiodinase (Type-III 5'deiodinase) (DIOIII) (Type 3 DI) (5DIII) 130 3e-30 sp|Q9IAX2|IOD2_CHICK Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 128 1e-29 sp|P49896|IOD2_RANCA Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 125 7e-29 sp|P79747|IOD2_FUNHE Type II iodothyronine deiodinase (Type-II 5'deiodinase) (DIOII) (Type 2 DI) (5DII) 120 2e-27 -------------------------------------------------- Good luck! Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: sonmitra mondal [mailto:sonmitra4u at yahoo.co.in] > Sent: Monday, March 27, 2006 3:32 AM > To: Chris Fields > Subject: Re: [Bioperl-l] error > > Respected Sir, > I am sending all the details as asked by you . > > Script : attached file > I am writing & saving the script using vi editor of > Linux . > O.S. : Linux > Perl Version: v5.8.0 > BioPerl : perl-bioperl1-1.0-1.i386.rpm > I/P file : seqs.fasta > : the details of the i/p file is > given in the attached file . > > I am just hanging on this single error :Invalid RID > for more than 1 n 1/2 week . Please help me as soon as > possible for you . > > with regards, > Sonmitra > > --- Chris Fields wrote: > > > Which script? I'm guessing one from the Beginner's > > HOWTO or one > > using Bio::Perl. An RID is usually from a NCBI > > BLAST run. > > > > We need more information (OS, perl version, bioperl > > version, script, > > input, etc) to actually help you; otherwise we're > > just shooting in > > the dark. > > > > Chris > > > > On Mar 24, 2006, at 6:06 AM, sonmitra mondal wrote: > > > > > while running the script in unix i am facing one > > > problem , everytime during execution it's showing > > 1 > > > error message : Invalid RID . > > > Please help me > > > > > > With regards > > > Sonmitra > > > student of M.Sc. Bioinformatics > > > > > > > > > > > > > > > __________________________________________________________ > > > Yahoo! India Matrimony: Find your partner now. Go > > to http:// > > > yahoo.shaadi.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > __________________________________________________________ > Yahoo! India Matrimony: Find your partner now. Go to > http://yahoo.shaadi.com From jyotikshah at gmail.com Mon Mar 27 10:23:26 2006 From: jyotikshah at gmail.com (Jyoti Shah) Date: Mon, 27 Mar 2006 09:23:26 -0600 Subject: [Bioperl-l] cannot find path to wublast In-Reply-To: <442213AA.9030800@infotech.monash.edu.au> References: <769931430603221324q354b1c74o48190de3ff3bf2ef@mail.gmail.com> <442213AA.9030800@infotech.monash.edu.au> Message-ID: <769931430603270723u45fe5a29n6a933e90699fce8@mail.gmail.com> Thanks you were right!! There was no program such as wublastn in my wu-blast directory :o. To use the "Standaloneblast" module, I had to rename the "wu-blastall" program to "blastall" and it worked great.The following changes did the magic for me :-) BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } BEGIN { $ENV{BLASTDATADIR} = '/home/databases/'; } @params = ('program' => 'blastn', 'database' => $db); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id =>"test_query", -seqn=> $seqn); my $blast_report = $factory->blastall($input); Thanks Jyoti On 3/22/06, Torsten Seemann wrote: > > > -------------------- WARNING --------------------- > > MSG: cannot find path to wublast > > --------------------------------------------------- > > > BEGIN { $ENV{PATH} = '/home/wustl/wu-blast'; } > > > @params = ('program' => 'wublastn', > > Just a quick possibility - do you actuall have an executable file called > "wublastn", and is it in /home/wustl/wu-blast ? > > Because in my wu-blast installation, it's called just "blastn" (which > itself is a symlink to the real multipurpose exe called "blasta"). > > > my $blast_report = $factory->wublast($input); > From osborne1 at optonline.net Mon Mar 27 16:47:59 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 27 Mar 2006 16:47:59 -0500 Subject: [Bioperl-l] MeSH term retrieval In-Reply-To: Message-ID: Michael, Your query was correct, the problem was that that particular entry has a long, multi-line description which Bio::DB::MeSH couldn't handle. This is fixed now in bioperl-live. Either install a brand new Bioperl or just copy the latest Bio/DB/MeSH.pm. Brian O. On 3/26/06 7:26 AM, "Michael Seewald" wrote: > Dear Bioperl-Users, dear Heikki, > > I am trying to retrieve MeSH term descriptions using Bio::DB::MeSH (bioperl > v1.5.0). Sometimes it works (first example below), sometimes it does not > (second example). Is there anything wrong with the query? > > Thanks & bets wishes, > Michael > > > use Bio::DB::MeSH; > > my $mesh = new Bio::DB::MeSH(); > > # works, compare: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=&db=MeSH&cmd=search&term > =Eisenmenger%20Complex > my $term = 'Eisenmenger Complex'; > print "Term: $term\n"; > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > # does not work, compare: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=&db=MeSH&cmd=search&term > =Sinus%20Thrombosis,%20Intracranial > my $term = 'Sinus Thrombosis, Intracranial'; > print "Term: $term\n"; > print "Desc: ",$mesh->get_exact_term($term)->description,"\n"; > > > -- > Dr. Michael Seewald > Bioinformatics > Bayer HealthCare AG > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mseewald at gmail.com Tue Mar 28 00:34:35 2006 From: mseewald at gmail.com (Michael Seewald) Date: Tue, 28 Mar 2006 07:34:35 +0200 Subject: [Bioperl-l] MeSH term retrieval In-Reply-To: References: Message-ID: Hello Brian, Thanks! Best wishes, Michael On 3/27/06, Brian Osborne wrote: > > Michael, > > Your query was correct, the problem was that that particular entry has a > long, multi-line description which Bio::DB::MeSH couldn't handle. This is > fixed now in bioperl-live. Either install a brand new Bioperl or just copy > the latest Bio/DB/MeSH.pm. > > Brian O. > -- Dr. Michael Seewald Bioinformatics Bayer HealthCare AG From katrin-mueller4 at gmx.de Tue Mar 28 07:21:58 2006 From: katrin-mueller4 at gmx.de (Katrin) Date: Tue, 28 Mar 2006 04:21:58 -0800 (PST) Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3612399.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> Message-ID: <3628579.post@talk.nabble.com> hello, I solved my problem. I must set the environment variables SESSION and USERNAME. So I wrote a bat-file. In this file I defined the environment variables and then the systemcall. In my first trial the code in the bat-file was equal, but first I called the file shell.bat and this is a predefined file in WIndowsXP and so it did not work. Greetings Katrin -- View this message in context: http://www.nabble.com/clustalw.exe-t1350142.html#a3628579 Sent from the Perl - Bioperl-L forum at Nabble.com. From cjfields at uiuc.edu Tue Mar 28 09:48:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 28 Mar 2006 08:48:34 -0600 Subject: [Bioperl-l] error In-Reply-To: <20060328062742.11390.qmail@web8508.mail.in.yahoo.com> Message-ID: <000801c65276$b0368760$15327e82@pyrimidine> Sonmitra, I tried the script from your attachment last time and it worked fine (I had no problems, got results from a BLAST hit), and I'm running bioperl updated from CVS. The CVS repository has recently changed, so I'll point out the browsable version for now. Note that this is bleeding-edge code, though it's probably more stable than most CVS code. Go here: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/ At the bottom of the web page is a link to download the contents of the repository as a tarball. Click this link to download bioperl-live, decompress everything, then change into the bioperl-live folder. Type the following, waiting between each until it is finished. perl Makefile.PL make make test make install You will get a number of prompts after 'perl Makefile.PL' asking about script installation, etc (not a bad idea to install the scripts). 'make test' is optional and will probably not pass all tests (rarely does), but it's a good idea to run it to make sure most things work properly. If nothing works with 'make test' you're in trouble. Okay, so here's where I rant a bit: Most of your questions are answered in the FAQ, the installation pages (and links), and the HOWTO's (particularly the Beginner's HOWTO). Googling the mail-list also helps. All of this is available through the Bioperl wiki, which is a rich (and updated) source of information: http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/FAQ http://www.bioperl.org/wiki/Installing_BioPerl http://www.bioperl.org/wiki/HOWTOs http://www.bioperl.org/wiki/Mailing_lists Make sure to read my previous response to you very carefully. There were a few things I mentioned to you that are very important. Also, when I mentioned responding to the mailing list, I meant it (note that I added the mailing list to this response). I will not respond to you again unless it's directly through or copied to the list in some way. Remember, Bioperl is essentially a volunteer effort. We really don't mind helping out with user problems when we know the answer or can help out in some way (or fix the problem/bug if there is one), but there needs to be a record of the responses here so that people can search through the archives to find answers to questions, problems, etc. Responding to every 'bioperl-module x doesn't work, why?' query quickly gets redundant when the potential answer is in the mail-list archives, wiki, etc. Again, if someone knows the answer, we will generally respond, but don't expect us to drop everything we're doing in order to help you out. Everybody here has their own priorities, agendas, and so on; don't be surprised if you don't get an immediate answer to your questions. Also, your chances of getting a response to a question are much greater by emailing the list than through me directly (you might get an answer from somebody using bioperl on Linux, for instance). This is a community effort. I would say, by far, a large percentage of the questions posted here get some response. My 2c Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: sonmitra mondal [mailto:sonmitra4u at yahoo.co.in] > Sent: Tuesday, March 28, 2006 12:28 AM > To: Chris Fields > Subject: RE: [Bioperl-l] error > > I am sending you my script & input as asked by you . > Tell me one thing more precisely that which bioperl > version i will install . > > > #Code for :Running Remote Blast Program > package Bio::Perl; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > my (@params, $remote_blast_object, $blast_file, $r, > $rc, $database); > my $sleep_time = 2; > $database = 'swissprot'; > @params = (-prog=>'blastp', -data=>'swissprot', > -expect=>'1e-10'); > $remote_blast_object = > Bio::Tools::Run::RemoteBlast->new(@params); > $blast_file = Bio::Root::IO->catfile("seqs.fasta"); > $r = $remote_blast_object->submit_blast( $blast_file); > while ( my @rids = $remote_blast_object->each_rid ) { > foreach my $rid ( @rids ) { > my$rc = $remote_blast_object->retrieve_blast($rid); > if(!ref($rc) ) { # $rc not a reference => error or job > not yet finished > if( $rc < 0 ) { > $remote_blast_object->remove_rid($rid); > print "Error return code for BlastID code $rid ... > \n";} > sleep $sleep_time; if ($sleep_time < 120) {$sleep_time > *= 2;} > } else { > > $sleep_time = 2; > $remote_blast_object->remove_rid($rid); > my $count = 0; > while( my $res = $rc->next_result ) { > $count++; > print "result db is ", $res->database_name(), "\n"; > while( my $hit = $res->next_hit()) { > print $hit->name(),"\t",$hit->description()," \t"; > while( my $hsp = $hit->next_hsp ) { > print "\t",$hsp->bits,"\t",$hsp->evalue; > } > print "\n"; > } > } > } > } > } > > > > Input file : > Seqs.fasta > >seq1 > MGLPQPGLWLKRLWVLLEVAVHVVVGKVLLILFPDRVKRNILAM > > GEKTGMTRNPHFSHDNWIPTFFSTQYFWFVLKVRWQRLEDTTELGGLAPNCPVVRLSG > > QRCNIWEFMQGNRPLVLNFGSCTUPSFMFKFDQFKRLIEDFSSIADFLVIYIEEAHASG > > >seq2 > MGLPQPGLWLKRLWVLLEVAVHVVVGKVLLILFPDRVKRNILAM > > GEKTGNRPLVLNFGSCTUPSFMFKFDQFKRLIEDFSSIADFLVIYIEEAHASDGWAFK > > NNMDIRNHQNLQDRLQAAHLLLARSPQCPVVVDTMQNQSSQLYAALPERLYIIQEGRI > LYKGKSGPWNYNPEEVRAVLEKLHS > >seq3 > MGLPQPGLWLKRLWVLLEVAVHVVVGKVLLILFPDRVKRNILAM > > GEKTGMTRNPHFSHDNWIPTFFSTQYFWFVLKVRWQRLEDTTELGGLAPNCPVVRLSG > > QRCNIWEFMQDGWAFKNNMDIRNHQNLQDRLQAAHLLLARSPQCPVVVDTMQNQSSQL > > YAALPERLYIIQEGRILYKGKSGPWNYNPEEVRAVLEKLHS > > > > Or even if I give a file with single sequence that > time also it's not working & generating the same error > .:Invaid RID > Another file : roa1.fasta > >ROA1_HUMAN Heterogeneous nuclear ribonucleoprotein A1 > (Helix-destabilizing prot > ein) (Single-strand RNA-binding protein) (hnRNP core > protein A1). > SKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVT > YATVEEVDAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKIFVGGIKEDTEEHHL > RDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKYHTVNGHNCEVRKAL > SKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSG > DGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGG > SGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSS > SSSSYGSGRRF > > > > > > --- Chris Fields wrote: > > > Please reply to the mailing list as well. Don't use > > attachments either, > > especially Word-like doc files b/c they might not > > get through. Copy and > > paste any text into the email. > > > > As for your attached script, I get everything to > > work fine; below is a > > sampling of the returned info. Your bioperl version > > is WAY too old (v1.0.1 > > is from June 2002; there have been many updates > > since then, esp. to BLAST > > and RemoteBlast), so that's likely the issue. > > Update to the latest from CVS > > (my recommendation if you use RemoteBlast). > > > > Oh, and a bit of advice: don't use 'package > > Bio::Perl' in a script like > > this: > > > > #Code for :Running Remote Blast Program > > package Bio::Perl; > > use Bio::Perl; > > use Bio::Tools::Run::RemoteBlast; > > ... > > > > It didn't do any harm here AFAIK but it will likely > > cause havoc in your code > > down the line. You normally don't use this > > declaration unless you are > > coding a perl module (class), so by using this > > statement here you > > essentially change the default package prefix from > > main to Bio::Perl here > > (not good). Any declared variables, methods, etc. > > are therefore in > > Bio::Perl's namespace and not main which could > > interfere with vars, methods, > > etc. declared as class members/methods in Bio::Perl. > > That apparently didn't > > happen though, but I would highly recommend not > > doing it again in the > > future, JIC. > > > > Here's the output: > > -------------------------------------------------- > > result db is Non-redundant SwissProt sequences > > sp|P49895|IOD1_HUMAN Type I iodothyronine > > deiodinase (Type-I > > 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 290 > > 9e-79 > > > > sp|P24389|IOD1_RAT Type I iodothyronine > > deiodinase (Type-I > > 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 240 > > 1e-63 > > > > sp|Q61153|IOD1_MOUSE Type I iodothyronine > > deiodinase (Type-I > > 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 236 > > 2e-62 > > > > sp|P49894|IOD1_CANFA Type I iodothyronine > > deiodinase (Type-I > > 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 232 > > 3e-61 > > > > sp|Q95N00|IOD1_SUNMU Type I iodothyronine > > deiodinase (Type-I > > 5'deiodinase) (DIOI) (Type 1 DI) (5DI) 214 > > 8e-56 > > > > ...... > > > > sp|Q9Z1Y9|IOD2_MOUSE Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 134 > > 2e-31 > > sp|P70551|IOD2_RAT Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 134 > > 2e-31 > > sp|Q6QN12|IOD2_PIG Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 132 > > 4e-31 > > sp|Q92813|IOD2_HUMAN Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 131 > > 1e-30 > > sp|Q5I3B2|IOD2_BOVIN Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 130 > > 2e-30 > > sp|P49898|IOD3_RANCA Type III iodothyronine > > deiodinase (Type-III > > 5'deiodinase) (DIOIII) (Type 3 DI) (5DIII) > > 130 > > 3e-30 > > sp|Q9IAX2|IOD2_CHICK Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 128 > > 1e-29 > > sp|P49896|IOD2_RANCA Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 125 > > 7e-29 > > sp|P79747|IOD2_FUNHE Type II iodothyronine > > deiodinase (Type-II > > 5'deiodinase) (DIOII) (Type 2 DI) (5DII) > > 120 > > 2e-27 > > > > -------------------------------------------------- > > > > Good luck! > > > > Chris > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: sonmitra mondal > > [mailto:sonmitra4u at yahoo.co.in] > > > Sent: Monday, March 27, 2006 3:32 AM > > > To: Chris Fields > > > Subject: Re: [Bioperl-l] error > > > > > > Respected Sir, > > > I am sending all the details as asked by you . > > > > > > Script : attached file > > > I am writing & saving the script using vi editor > > of > > > Linux . > > > O.S. : Linux > > > Perl Version: v5.8.0 > > > BioPerl : perl-bioperl1-1.0-1.i386.rpm > > > I/P file : seqs.fasta > > > : the details of the i/p file is > > > given in the attached file . > > > > > > I am just hanging on this single error :Invalid > > RID > > > for more than 1 n 1/2 week . Please help me as > > soon as > > > possible for you . > > > > > > with regards, > > > Sonmitra > > > > > > --- Chris Fields wrote: > > > > > > > Which script? I'm guessing one from the > > Beginner's > > > > HOWTO or one > > > > using Bio::Perl. An RID is usually from a NCBI > > > > BLAST run. > > > > > > > > We need more information (OS, perl version, > > bioperl > > > > version, script, > > > > input, etc) to actually help you; otherwise > > we're > > > > just shooting in > > > > the dark. > > > > > > > > Chris > > > > > > > > On Mar 24, 2006, at 6:06 AM, sonmitra mondal > > wrote: > > > > > > > > > while running the script in unix i am facing > > one > > > > > problem , everytime during execution it's > > showing > > > > 1 > > > > > error message : Invalid RID . > > > > > Please help me > > > > > > > > > > With regards > > > > > Sonmitra > > > > > student of M.Sc. Bioinformatics > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > __________________________________________________________ > > > > > Yahoo! India Matrimony: Find your partner now. > > Go > > > === message truncated === > > > > > __________________________________________________________ > Yahoo! India Matrimony: Find your partner now. Go to > http://yahoo.shaadi.com From cjfields at uiuc.edu Tue Mar 28 09:37:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 28 Mar 2006 08:37:19 -0600 Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3628579.post@talk.nabble.com> Message-ID: <000701c65275$1e352b10$15327e82@pyrimidine> Good to know you got it fixed. Thanks for posting the response here JIC somebody runs into the same issue. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Katrin > Sent: Tuesday, March 28, 2006 6:22 AM > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] clustalw.exe > > > hello, I solved my problem. I must set the environment variables SESSION > and > USERNAME. So I wrote a bat-file. In this file I defined the environment > variables and then the systemcall. In my first trial the code in the > bat-file was equal, but first I called the file shell.bat and this is a > predefined file in WIndowsXP and so it did not work. Greetings Katrin > -- > View this message in context: http://www.nabble.com/clustalw.exe- > t1350142.html#a3628579 > Sent from the Perl - Bioperl-L forum at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From darin.london at duke.edu Tue Mar 28 09:42:45 2006 From: darin.london at duke.edu (Darin London) Date: Tue, 28 Mar 2006 09:42:45 -0500 Subject: [Bioperl-l] Announcing BOSC 2006 Message-ID: <44294B65.4050207@duke.edu> MEETING ANNOUNCEMENT & CALL FOR SPEAKERS The 7th annual Bioinformatics Open Source Conference (BOSC 2006) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place Aug 4,5th in Fortaleza, Brasil, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 14th International Conference on Intelligent Systems for Molecular Biology. Please consult The Official BOSC 2006 Website at http://www.open-bio.org/wiki/BOSC_2006 for details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From saldroubi at gmail.com Tue Mar 28 14:19:01 2006 From: saldroubi at gmail.com (Sam Al-Droubi) Date: Tue, 28 Mar 2006 14:19:01 -0500 Subject: [Bioperl-l] Correlation coefficient? In-Reply-To: References: Message-ID: Thank you. This did the trick. On 3/17/06, Cui, Wenwu (NIH/NCI) [F] wrote: > > Statistics::Basic::Correlation; > > -----Original Message----- > From: Sam Al-Droubi [mailto:saldroubi at gmail.com] > Sent: Friday, March 17, 2006 1:30 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Correlation coefficient? > > Hello everyone, > > I need to determine the correlation coefficient between two data sets. > Is this implemented in bioperl or some perl module I can use? This > would save me time from writing it myself. > > Thank you. > > -- > Sincerely, > Sam Al-Droubi, M.S. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Sincerely, Sam Al-Droubi, M.S. From mblanche at berkeley.edu Wed Mar 29 18:46:02 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Wed, 29 Mar 2006 15:46:02 -0800 Subject: [Bioperl-l] Bio::DB::GFF3 nightmare Message-ID: Dear all-- I have been trying to display exon/intron structure of mRNAs for a given gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL using bp_bulk_loadd_gff.pl. I keep getting mRNAs from other genes that fall within the segment of the queried gene. For example: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', -sub_parts => ['exon','five_prime_UTR','three_prime_UTR'], ); my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_421;host=riolab.net', -user => 'guest', -aggregators=> [$agg1], ); my @genes = qw (CG17800); for my $gene (@genes){ my $tg = $dmdb->segment(-name => $gene); my @transcripts = $tg->features(-type => 'pre_mRNA', ); for my $tc (@transcripts){ my %atts = $tc->attributes; print "$_ => $atts{$_}\n" foreach (keys %atts); } } This script generate the output: Parent => CG30501-RA Name => Dscam:23 Parent => CG17800-RE Parent => CG30500-RA Name => Dscam:23 Parent => CG17800-RE Name => Dscam:23 Parent => CG17800-RE Name => Dscam:23 Parent => CG17800-RE Name => Dscam:23 Parent => CG17800-RE Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800. If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course, all the different mRNAs even the one that don?t belong to the CG17800 gene. I just can?t figure out how to restrict the $tg->feature() call to the queried gene (ie CG17800) Many thanks ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From mcraige at genetics.emory.edu Wed Mar 29 17:41:29 2006 From: mcraige at genetics.emory.edu (Michael Craige) Date: Wed, 29 Mar 2006 17:41:29 -0500 Subject: [Bioperl-l] Question: How to manipulate files Message-ID: I am attempting to develop a script to open a DNA file contain 15 FASTA sequences and then delete the first 7 sequences and close the file leaving the remainder 8 sequences intact. Can someone help me with a Perl script or point me to some doc that can help? Here is a sample, the first sequence in the file header is show below. All the header is the same except for the number "001 to 015" >10kb_NN_Analysis.txt.nmrc_001 NTNTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNN AANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNN I trying to get the script to find the first sequences ".nmrc_001" and then delete files content to the end of file ".nmrc_007" without affect the header with ".nmrc_008" Is there something already exist to do this? Michael Craige Emory University From mblanche at berkeley.edu Wed Mar 29 20:20:22 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Wed, 29 Mar 2006 17:20:22 -0800 Subject: [Bioperl-l] Question: How to manipulate files In-Reply-To: Message-ID: Michael-- Something like: #!/usr/bin/perl use Bio::SeqIO; my $file = shift; my $seqio_o = Bio::SeqIO->new(-file => $file); while ($seq_o =$seqio_o->next_seq){ my ($id) = $seq_o->display_id =~ /_(\d*)$/; print ">", $seq_o->display_id, "\n", $seq_o->seq, "\n", if $id >= 7; } If you redirect the standard output, this script would do what you try to achieve. Just call: $perl theScript.pl myfile.fasta > myNewFile.fasta On 3/29/06 14:41, "Michael Craige" wrote: > I am attempting to develop a script to open a DNA file contain 15 FASTA > sequences and then delete the first 7 sequences and close the file leaving > the remainder 8 sequences intact. > > Can someone help me with a Perl script or point me to some doc that can > help? Here is a sample, the first sequence in the file header is show below. > All the header is the same except for the number "001 to 015" > > >> 10kb_NN_Analysis.txt.nmrc_001 > NTNTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNN > AANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > NNNNNNNNNNNNNNNNNNNNNNNN > > I trying to get the script to find the first sequences ".nmrc_001" and then > delete files content to the end of file ".nmrc_007" without affect the > header with ".nmrc_008" > > Is there something already exist to do this? > > > Michael Craige > Emory University > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Wed Mar 29 21:16:45 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 29 Mar 2006 21:16:45 -0500 Subject: [Bioperl-l] Question: How to manipulate files In-Reply-To: Message-ID: Michael, Operations like these are easy using SeqIO - see the Beginners HOWTO or the SeqIO HOWTO: http://www.bioperl.org/wiki/HOWTOs The script could look something like: use Bio::SeqIO; my $count = 0; my $in = Bio::SeqIO->new(-file => "file.fa",-format => "fasta"); my $out = Bio::SeqIO->new(-file => ">newfile.fa",-format => "fasta"); while (my $seq = $in->next_seq) { $count++; next if $count < 8; $out->write_seq($seq); } Then you can delete the old and rename the new... Brian O. On 3/29/06 5:41 PM, "Michael Craige" wrote: > I am attempting to develop a script to open a DNA file contain 15 FASTA > sequences and then delete the first 7 sequences and close the file leaving > the remainder 8 sequences intact. > > Can someone help me with a Perl script or point me to some doc that can > help? Here is a sample, the first sequence in the file header is show below. > All the header is the same except for the number "001 to 015" > > >> 10kb_NN_Analysis.txt.nmrc_001 > NTNTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNN > AANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > NNNNNNNNNNNNNNNNNNNNNNNN > > I trying to get the script to find the first sequences ".nmrc_001" and then > delete files content to the end of file ".nmrc_007" without affect the > header with ".nmrc_008" > > Is there something already exist to do this? > > > Michael Craige > Emory University > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mblanche at berkeley.edu Wed Mar 29 21:19:57 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Wed, 29 Mar 2006 18:19:57 -0800 Subject: [Bioperl-l] Bio::DB::GFF still a nightmare... Message-ID: Dear all-- There?s definitely something I don?t get with the Bio::DB::GFF3 module... When I run the following script, I get the drawing I want but contaminated with pieces of overlapping genes (see attached CG17800.png_v1). My understanding is that the aggregate pre-mRNAs contain the attribute ?Gene->CG_ID? (see the output). So when I uncomment line 25 '-attributes => {Gene => $gene},' in order to get only the transcript from the queried gene. Now, as the output, I only get an "intron line" from the beginning to the end of the gene for all transcript (see attached CG17800.png_v2)... Can someone help me understand what I am doing wrong... The script: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; use Bio::Graphics; use Bio::SeqFeature::Generic; my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', -main_method => 'mRNA', -sub_parts => ['exon','five_prime_UTR','three_prime_UTR'], ); my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_421;host=riolab.net', -user => 'guest', -aggregators=> [$agg1], ); my @genes = qw (CG17800); for my $gene (@genes){ my $tg = $dmdb->segment(-name => $gene); my @transcripts = $tg->features(-type => 'pre_mRNA', #-attributes => {Gene => $gene}, ); for my $tc (@transcripts){ my %atts = $tc->attributes; print "$_ => $atts{$_}\n" foreach (keys %atts); print "\n"; } my $panel = Bio::Graphics::Panel->new( -length => $tg->length, -width => 800, -pad_left => 10, -pad_right => 10, ); $panel->add_track(processed_transcript=>\@transcripts, -label=>1, -implied_utrs=>1, ); open FH, ">$gene.png" || die "Can't create file $gene.png\n"; print "saving $gene.png\n"; print FH $panel->png; $panel->finished; close FH; } Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -------------- next part -------------- A non-text attachment was scrubbed... Name: CG17800.png_v2 Type: application/octet-stream Size: 1403 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060329/acaad022/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: CG17800.png_v1 Type: application/octet-stream Size: 2927 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060329/acaad022/attachment-0001.obj From aws at sanger.ac.uk Thu Mar 30 07:43:47 2006 From: aws at sanger.ac.uk (Adam Spargo) Date: Thu, 30 Mar 2006 13:43:47 +0100 (BST) Subject: [Bioperl-l] TraceSearch Message-ID: Hi, We would like to announce the launch of a new free service which gives public access to the Wellcome Trust Sanger Institute Trace Archive via sequence similarity. The archive contains records of all publicly available DNA sequencing reads. The search engine, available at: http://trace.ensembl.org/cgi-bin/tracesearch allows users to identify any sequences in the archive with significant similarity to their query sequence. Users are able to search the whole archive in a few seconds, or alternatively to limit the search by species, sequencing centre or trace type. We use a version of the SSAHA algorithm to distribute an index over a cluster of machines so that we can continue to scale the service as the archive grows. Full Story: http://www.sanger.ac.uk/Info/Press/ We welcome any feedback and suggestions for improvements to this service. Please forward this email to collegues and collaborators who may be interested. Thanks, On behave of the TraceSearch development team. -- Dr Adam Spargo High Performance Assembly Group email: aws at sanger.ac.uk Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728 Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919 From lstein at cshl.edu Thu Mar 30 15:11:01 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 30 Mar 2006 15:11:01 -0500 Subject: [Bioperl-l] Bio::DB::GFF still a nightmare... In-Reply-To: References: Message-ID: <200603301511.01994.lstein@cshl.edu> Hi Marco, There is no Bio::DB::GFF3 module, and there is no attachment! Send out the data file and the script you are using and I'll comment on it. Lincoln On Wednesday 29 March 2006 21:19, Marco Blanchette wrote: > Dear all-- > > There?s definitely something I don?t get with the Bio::DB::GFF3 module... > When I run the following script, I get the drawing I want but contaminated > with pieces of overlapping genes (see attached CG17800.png_v1). My > understanding is that the aggregate pre-mRNAs contain the attribute > ?Gene->CG_ID? (see the output). So when I uncomment line 25 '-attributes => > {Gene => $gene},' in order to get only the transcript from the queried gene -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Mar 30 15:41:39 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 30 Mar 2006 15:41:39 -0500 Subject: [Bioperl-l] Bio::DB::GFF3 nightmare In-Reply-To: References: Message-ID: <200603301541.40102.lstein@cshl.edu> The Bio::DB::GFF module does not correctly support GFF3 format because it only handles one level of containment rather than two. You will have to filter out the overlapping features. Alternatively, wait another two weeks or so and the new Bio::DB::SeqFeature module will support GFF3 correctly. If you like, I can send you something that will work, but you will have to rewrite your scripts when the API is modified. Lincoln On Wednesday 29 March 2006 18:46, Marco Blanchette wrote: > Dear all-- > > I have been trying to display exon/intron structure of mRNAs for a given > gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL > using bp_bulk_loadd_gff.pl. I keep getting mRNAs from other genes that > fall within the segment of the queried gene. For example: > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', > -sub_parts => > ['exon','five_prime_UTR','three_prime_UTR'], > ); > > my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_421;host=riolab.net', > -user => 'guest', > -aggregators=> [$agg1], > ); > > my @genes = qw (CG17800); > > for my $gene (@genes){ > my $tg = $dmdb->segment(-name => $gene); > > my @transcripts = $tg->features(-type => 'pre_mRNA', > ); > > for my $tc (@transcripts){ > my %atts = $tc->attributes; > print "$_ => $atts{$_}\n" foreach (keys %atts); > } > } > > This script generate the output: > Parent => CG30501-RA > Name => Dscam:23 > Parent => CG17800-RE > Parent => CG30500-RA > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > > Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800. > If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course, > all the different mRNAs even the one that don?t belong to the CG17800 gene. > > I just can?t figure out how to restrict the $tg->feature() call to the > queried gene (ie CG17800) > > Many thanks > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jayhorton at gmail.com Thu Mar 30 14:38:57 2006 From: jayhorton at gmail.com (CJay Horton) Date: Thu, 30 Mar 2006 13:38:57 -0600 Subject: [Bioperl-l] Error: The extension 'Bio::SeqIO::staden::read' is not properly installed in path Message-ID: Hello Everyone, I am using perl v5.8.0 and bioperl-1.4 on rh linux. When i run this script: #!/bin/perl -w use Bio::SeqIO; $seqio_obj = Bio::SeqIO->new(-file => "H12_SLC3R_Shelley_Sheridan.ab1", -format => "fasta" ); $seq_obj = $seqio_obj->next_seq; while ($seq_obj = $seqio_obj->next_seq){ # print the sequence print $seq_obj->seq,"\n"; } ########################################## I recieve the following error: $ perl seqio.pl The extension 'Bio::SeqIO::staden::read' is not properly installed in path: '/usr/lib/perl5/site_perl/5.8.0' If this is a CPAN/distributed module, you may need to reinstall it on your system. To allow Inline to compile the module in a temporary cache, simply remove the Inline config option 'VERSION=' from the Bio::SeqIO::staden::read module. at seqio.pl line 0 INIT failed--call queue aborted, line 1. ##################################### The only error I recall seeing during installation was during the perl Make.PL for bioperl-ext and it was: $ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::Ext::Align Found Staden io_lib "libread" in /usr/local/lib ... Automatically using the Read.h found in /usr/local/include/io_lib ... Writing Makefile for Bio::SeqIO::staden::read Writing Makefile for Bio One or more DATA sections were not processed by Inline. ####################################### I manually moved the .h files into /usr/local/include/io_lib as instructed by the bioperl-ext readme and installed and modules that were required for each component to run(inline:MakeMaker, etc.) Any direction would be greatly appreciated! From golharam at umdnj.edu Thu Mar 30 16:44:30 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 30 Mar 2006 16:44:30 -0500 Subject: [Bioperl-l] What happens to STDOUT? Message-ID: <012d01c65443$2056cfa0$e6028a0a@GOLHARMOBILE1> I'm using a simply script to reformat a genbank file to a fasta file. Within the script, I have it print out some information. That information never appears in the console unless I print it to STDERR. What happened to stdout? Here's the script: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; foreach my $gbkfile(`ls *.gbk`) { chomp $gbkfile; $gbkfile =~ m/chr(\w+)/; my $chr = $1; my $fastafile = $gbkfile; $fastafile =~ s/gbk/fa/; print "$gbkfile..."; my $seqin = Bio::SeqIO->new(-file => "<$gbkfile", -format => 'genbank' ); my $seqout = Bio::SeqIO->new(-file => ">$fastafile", -format => 'fasta'); while (my $seq = $seqin->next_seq) { print $seqout->write_seq($seq); } print "\n"; } Ryan From osborne1 at optonline.net Thu Mar 30 17:44:52 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 30 Mar 2006 17:44:52 -0500 Subject: [Bioperl-l] Error: The extension 'Bio::SeqIO::staden::read' is not properly installed in path In-Reply-To: Message-ID: CJay, Did you try the modification that Inline suggested? This fix has worked for me before... Brian O. On 3/30/06 2:38 PM, "CJay Horton" wrote: > To allow Inline to compile the module in a temporary cache, simply remove the > Inline config option 'VERSION=' from the Bio::SeqIO::staden::read module. From torsten.seemann at infotech.monash.edu.au Thu Mar 30 17:37:30 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 31 Mar 2006 09:37:30 +1100 Subject: [Bioperl-l] Request for comments: Bio::DB::GFF3 namespace In-Reply-To: <200603211827.30785.lstein@cshl.edu> References: <200603211827.30785.lstein@cshl.edu> Message-ID: <442C5DAA.5040603@infotech.monash.edu.au> Lincoln, > I'm pretty much ready to check in the replacement for the Bio::DB::GFF > database. What I ended up writing has only a remote relationship to gff3 > files -- it is more like a general storage engine for Bio::SeqFeatureI > objects. So I don't want to call the thing Bio::DB::GFF, but want to place it > somewhere else in the namespace hierarchy. > Bio::SeqFeature::Store > - implements the Bio::SeqFeature::CollectionI interface. You can > store any Bio::SeqFeatureI into a database (mysql, berkeleydb, in-memory) > and fetch it out using a variety of queries. > A utility script, currently called gff3_load.pl, parses a gff3 file, creates > the proper objects, and stores them in the Store. Eventually some of this > functionality will be moved into Bio::Tools::GFF. Is the focus still only on bulk loading + searching? ie. mainly read-only activites like with gbrowse? Or is dynamic inserting + writing also well supported? ie. could be used as a persistent Bio::SeqFeatureI object store. How do you see this module in relationship to the BioSQL project? -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From torsten.seemann at infotech.monash.edu.au Thu Mar 30 17:45:24 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 31 Mar 2006 09:45:24 +1100 Subject: [Bioperl-l] What happens to STDOUT? In-Reply-To: <012d01c65443$2056cfa0$e6028a0a@GOLHARMOBILE1> References: <012d01c65443$2056cfa0$e6028a0a@GOLHARMOBILE1> Message-ID: <442C5F84.7070609@infotech.monash.edu.au> Ryan, > I'm using a simply script to reformat a genbank file to a fasta file. > Within the script, I have it print out some information. That > information never appears in the console unless I print it to STDERR. > What happened to stdout? Here's the script: > while (my $seq = $seqin->next_seq) { > print $seqout->write_seq($seq); This last line shouldn't have a "print" unless you want to print out the return value from write_seq(). The write_seq() function writes the sequence into your $fastafile. Perhaps this is cause of the STDOUT issue? > my $seqin = Bio::SeqIO->new(-file => "<$gbkfile", -format => 'genbank' ); It's a good idea to check if the open failed, eg: my $seqin = Bio::SeqIO->new(-file => "<$gbkfile", -format => 'genbank') or die "could not open $gbkfile"; > foreach my $gbkfile(`ls *.gbk`) { This could also be written more portably as (<*.gbk>) or (glob('*.gbk')). -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From lstein at cshl.edu Thu Mar 30 18:40:14 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 30 Mar 2006 18:40:14 -0500 Subject: [Bioperl-l] Bio::DB::GFF3 nightmare In-Reply-To: References: Message-ID: <200603301840.15477.lstein@cshl.edu> If you insist on using the Bio::DB::GFF database with GFF3 data, you must that the segment method returns a segment of the genome corresponding to the start and end positions of the indicated feature. When you call features() it returns everything that overlaps the region. The API to use to get a single feature is get_feature_by_name(): my $transcript = $dmdb->get_feature_by_name('CG30501-RA'); my @exons = $transcript->get_SeqFeatures; Or, if you want all transcripts in the gene CG17800: my @transcripts = $dmdb->get_feature_by_attribute(Gene=>'CG17800'); for my $transcript (@transcripts) { my @exons = $transcript->get_SeqFeatures; } The latter depends on your having changed the gene Parent attribute into a Gene attribute as recommended in my previous email. Lincoln On Wednesday 29 March 2006 18:46, Marco Blanchette wrote: > Dear all-- > > I have been trying to display exon/intron structure of mRNAs for a given > gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL > using bp_bulk_loadd_gff.pl. I keep getting mRNAs from other genes that > fall within the segment of the queried gene. For example: > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', > -sub_parts => > ['exon','five_prime_UTR','three_prime_UTR'], > ); > > my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_421;host=riolab.net', > -user => 'guest', > -aggregators=> [$agg1], > ); > > my @genes = qw (CG17800); > > for my $gene (@genes){ > my $tg = $dmdb->segment(-name => $gene); > > my @transcripts = $tg->features(-type => 'pre_mRNA', > ); > > for my $tc (@transcripts){ > my %atts = $tc->attributes; > print "$_ => $atts{$_}\n" foreach (keys %atts); > } > } > > This script generate the output: > Parent => CG30501-RA > Name => Dscam:23 > Parent => CG17800-RE > Parent => CG30500-RA > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > Name => Dscam:23 > Parent => CG17800-RE > > Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800. > If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course, > all the different mRNAs even the one that don?t belong to the CG17800 gene. > > I just can?t figure out how to restrict the $tg->feature() call to the > queried gene (ie CG17800) > > Many thanks > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Mar 30 18:43:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Mar 2006 17:43:54 -0600 Subject: [Bioperl-l] What happens to STDOUT? In-Reply-To: <442C5F84.7070609@infotech.monash.edu.au> Message-ID: <000001c65453$ce7e72d0$15327e82@pyrimidine> You can redirect to STDOUT by a glob (setting the filehandle to *\STDOUT). Note that this doesn't use '-file', but '-fh.' It's in the SeqIO HOWTO: # create one SeqIO object to read in, and another to write out my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => $informat); my $outseq = Bio::SeqIO->new(-fh => \*STDOUT, -format => $outformat); Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Thursday, March 30, 2006 4:45 PM > To: golharam at umdnj.edu > Cc: 'bioperl-l' > Subject: Re: [Bioperl-l] What happens to STDOUT? > > Ryan, > > > I'm using a simply script to reformat a genbank file to a fasta file. > > Within the script, I have it print out some information. That > > information never appears in the console unless I print it to STDERR. > > What happened to stdout? Here's the script: > > while (my $seq = $seqin->next_seq) { > > print $seqout->write_seq($seq); > > This last line shouldn't have a "print" unless you want to print out the > return > value from write_seq(). The write_seq() function writes the sequence into > your > $fastafile. Perhaps this is cause of the STDOUT issue? > > > my $seqin = Bio::SeqIO->new(-file => "<$gbkfile", -format => 'genbank' > ); > > It's a good idea to check if the open failed, eg: > > my $seqin = Bio::SeqIO->new(-file => "<$gbkfile", -format => 'genbank') or > die > "could not open $gbkfile"; > > > foreach my $gbkfile(`ls *.gbk`) { > > This could also be written more portably as (<*.gbk>) or (glob('*.gbk')). > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mblanche at berkeley.edu Thu Mar 30 19:59:55 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Thu, 30 Mar 2006 16:59:55 -0800 Subject: [Bioperl-l] Bio::DB::GFF still a nightmare... In-Reply-To: <200603301511.01994.lstein@cshl.edu> Message-ID: Lincoln-- Sorry for the confusion, I meant to say Bio::DB:GFF. As for the data file, I have been using a modify version of the GadFly gff3 v4.2.1 annotation where the Parent tag was modify to Gene using the following script you provided me with some weeks ago: while (<>) { my @fields = split "\t"; next unless $fields[2] eq 'mRNA'; s/Parent=([^;]+)/Gene=$1/; } continue { print; } The data file is publicly available using the guest user in the database dmel_421 at riolab.net. The script I am testing is: #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; use Bio::Graphics; use Bio::SeqFeature::Generic; my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA', -main_method => 'mRNA', -sub_parts => ['exon','five_prime_UTR','three_prime_UTR'], ); my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_421;host=riolab.net', -user => 'guest', -aggregators=> [$agg1], ); my @genes = qw (CG17800); for my $gene (@genes){ my $tg = $dmdb->segment(-name => $gene); my @transcripts = $tg->features(-type => 'pre_mRNA', #-attributes => {Gene => $gene}, ); for my $tc (@transcripts){ my %atts = $tc->attributes; print "$_ => $atts{$_}\n" foreach (keys %atts); print "\n"; } my $panel = Bio::Graphics::Panel->new( -length => $tg->length, -width => 800, -pad_left => 10, -pad_right => 10, ); $panel->add_track(processed_transcript=>\@transcripts, -label=>1, -implied_utrs=>1, ); open FH, ">$gene.png" || die "Can't create file $gene.png\n"; print "saving $gene.png\n"; print FH $panel->png; $panel->finished; close FH; } If you run this script as is, you will get part of CG30500 and CG30501 within together with all CG17800 mRNAs. I my understanding of the module was that by selecting the Gene=>CG_ID attribute in the features methods, I would trap only the transcript from CG17800 (rerun the script with un-commented line 25 #-attributes => {Gene => $gene},). Buy doing that the script trap the right transcripts (as seen in the STDOUT) but looses the 'pre-mRNA' aggregate... As you suggest, should I stop trying and wait for the release of the Bio::DB::GFF3 module? Again, real sorry for the confusion. Marco On 3/30/06 12:11 PM, "Lincoln Stein" wrote: > Hi Marco, > > There is no Bio::DB::GFF3 module, and there is no attachment! Send out the > data file and the script you are using and I'll comment on it. > > Lincoln > > On Wednesday 29 March 2006 21:19, Marco Blanchette wrote: >> Dear all-- >> >> There?s definitely something I don?t get with the Bio::DB::GFF3 module... >> When I run the following script, I get the drawing I want but contaminated >> with pieces of overlapping genes (see attached CG17800.png_v1). My >> understanding is that the aggregate pre-mRNAs contain the attribute >> ?Gene->CG_ID? (see the output). So when I uncomment line 25 '-attributes => >> {Gene => $gene},' in order to get only the transcript from the queried gene Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From smarkel at scitegic.com Thu Mar 30 20:17:50 2006 From: smarkel at scitegic.com (Scott Markel) Date: Thu, 30 Mar 2006 17:17:50 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers Message-ID: <442C833E.5020701@scitegic.com> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the following. Annotation tags used by Bio::SeqIO::FTHelper were strings and are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper subroutine of Bio::SeqIO::genbank the following code still assumes that tags are strings. foreach my $tag ( keys %{$fth->field} ) { foreach my $value ( @{$fth->field->{$tag}} ) { $value =~ s/\"/\"\"/g; If the tag value was a zero, an empty string is written. We think that $value = $value->{"value"}; should be added before the s/// call. Here's our test case. Note that the qualifier value for "foo" is changed to an empty string. Input file ==================================== LOCUS MY_LOCUS 10 aa linear UNK DEFINITION my description. ACCESSION 12345 FEATURES Location/Qualifiers misc_feature 1..10 /foo="0" ORIGIN 1 atggagaact // ==================================== Perl code ==================================== use strict; use warnings; use Bio::SeqIO; my $inputFilename = "input.gbff"; my $outputFilename = "output.gbff"; my $in = Bio::SeqIO->new(-file => $inputFilename, -format => "genbank"); my $out = Bio::SeqIO->new(-file => ">$outputFilename", -format => "genbank"); my $sequence = $in->next_seq(); $out->write_seq($sequence); ==================================== Output file ==================================== LOCUS MY_LOCUS 10 aa linear linear DEFINITION my description. ACCESSION 12345 KEYWORDS . FEATURES Location/Qualifiers misc_feature 1..10 /foo="" ORIGIN 1 atggagaact // ==================================== I'll add this to bugzilla, but first I want to make sure I'm not missing something obvious. Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From cjfields at uiuc.edu Thu Mar 30 22:50:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Mar 2006 21:50:40 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <442C833E.5020701@scitegic.com> Message-ID: <000101c65476$47c2f130$15327e82@pyrimidine> I tried this on WinXP (I'm using bioperl-live) and got a warning: -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Running using debugging shows that no feature key was found in _read_FTHelper_GenBank. So I'm getting an error, but on input not output. In fact, turning on -verbose in the SeqIO input object gives the below extra output, whereas turning -verbose on only in the output object just gives the warning above. ==================================== C:\Perl\Scripts\gb_test>test.pl no feature key! -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover STACK Bio::SeqIO::genbank::next_seq C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 sequence length is 10 ==================================== The sequence came back w/o any features in the feature table, which is what I would expect from this error: ==================================== LOCUS MY_LOCUS 10 aa linear linear DEFINITION my description. ACCESSION 12345 KEYWORDS . FEATURES Location/Qualifiers ORIGIN 1 atggagaact // ==================================== Adding the extra line before the s/// didn't help any (warning still pops up, no change in output). Anybody out there with any ideas? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Scott Markel > Sent: Thursday, March 30, 2006 7:18 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers > > In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the > following. > > Annotation tags used by Bio::SeqIO::FTHelper were strings and > are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper > subroutine of Bio::SeqIO::genbank the following code still > assumes that tags are strings. > > foreach my $tag ( keys %{$fth->field} ) { > foreach my $value ( @{$fth->field->{$tag}} ) { > $value =~ s/\"/\"\"/g; > > If the tag value was a zero, an empty string is written. > > We think that > > $value = $value->{"value"}; > > should be added before the s/// call. > > Here's our test case. Note that the qualifier value for "foo" > is changed to an empty string. > > Input file > > ==================================== > LOCUS MY_LOCUS 10 aa linear UNK > DEFINITION my description. > ACCESSION 12345 > FEATURES Location/Qualifiers > misc_feature 1..10 > /foo="0" > ORIGIN > 1 atggagaact > // > ==================================== > > Perl code > ==================================== > use strict; > use warnings; > > use Bio::SeqIO; > > my $inputFilename = "input.gbff"; > my $outputFilename = "output.gbff"; > > my $in = Bio::SeqIO->new(-file => $inputFilename, > -format => "genbank"); > my $out = Bio::SeqIO->new(-file => ">$outputFilename", > -format => "genbank"); > > my $sequence = $in->next_seq(); > $out->write_seq($sequence); > ==================================== > > Output file > ==================================== > LOCUS MY_LOCUS 10 aa linear linear > DEFINITION my description. > ACCESSION 12345 > KEYWORDS . > FEATURES Location/Qualifiers > misc_feature 1..10 > /foo="" > ORIGIN > 1 atggagaact > // > ==================================== > > I'll add this to bugzilla, but first I want to make sure > I'm not missing something obvious. > > Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kent at soe.ucsc.edu Fri Mar 31 01:09:19 2006 From: kent at soe.ucsc.edu (Jim Kent) Date: Thu, 30 Mar 2006 22:09:19 -0800 Subject: [Bioperl-l] [Genome] TraceSearch In-Reply-To: References: Message-ID: <5BD549EE-45B5-400D-B46D-6C24750EC733@soe.ucsc.edu> Very nice. It takes a very long time to search the NCBI trace archives. I've been tinkering a little on a tool that might let you do more sensitive searches in reasonable time, but it would require a LOT of RAM for the trace archives! How much RAM are you using for your SSAHA servers? On Mar 30, 2006, at 4:43 AM, Adam Spargo wrote: > Hi, > We would like to announce the launch of a new free service which gives > public access to the Wellcome Trust Sanger Institute Trace Archive via > sequence similarity. The archive contains records of all publicly > available DNA sequencing reads. The search engine, available at: > > http://trace.ensembl.org/cgi-bin/tracesearch > > allows users to identify any sequences in the archive with significant > similarity to their query sequence. Users are able to search the whole > archive in a few seconds, or alternatively to limit the search by > species, > sequencing centre or trace type. We use a version of the SSAHA > algorithm > to distribute an index over a cluster of machines so that we can > continue > to scale the service as the archive grows. > > Full Story: > > http://www.sanger.ac.uk/Info/Press/ > > We welcome any feedback and suggestions for improvements to this > service. > > Please forward this email to collegues and collaborators who may be > interested. > > Thanks, > > On behave of the TraceSearch development team. > > -- > Dr Adam Spargo > High Performance Assembly Group email: aws at sanger.ac.uk > Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728 > Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919 > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From MarcL at DEVGEN.com Fri Mar 31 02:59:20 2006 From: MarcL at DEVGEN.com (Marc Logghe) Date: Fri, 31 Mar 2006 09:59:20 +0200 Subject: [Bioperl-l] What happens to STDOUT? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746C8B@ANTARESIA.be.devgen.com> > You can redirect to STDOUT by a glob (setting the filehandle > to *\STDOUT). > Note that this doesn't use '-file', but '-fh.' It's in the > SeqIO HOWTO: > > # create one SeqIO object to read in, and another to write out > my $seqin = Bio::SeqIO->new(-fh => \*STDIN, > -format => $informat); A little off topic, but might be usefull as well. Sometimes you don't know whether the input sequences will come from stdin or from a file. Therefor, I ofen found myself writing things like: my $seqin = Bio::SeqIO->new(-format => $informat, $file ? (-file => $file) : (-fh => \*STDIN)); This can be replaced by: my $seqin = Bio::SeqIO->new(-format => $informat, $file ? -fh => \*ARGV); In that way, you have the same magic as the diamond/pulp fiction operator <> Cheers, Marc From cjfields at uiuc.edu Fri Mar 31 08:40:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 07:40:21 -0600 Subject: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) In-Reply-To: <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> References: <001901c65446$bfb0b270$15327e82@pyrimidine> <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> Message-ID: <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> I'll try it with Mac OS X this weekend to confirm; I'm running v. 10.4.5 with perl 5.8.6. I noticed that there's no tests for psl in SearchIO.t which should have caught this error. I'll double check that in case I'm mistaken. If not, I'll add a few to see what happens... maybe we'll get some responses back? I'll also forward this to the mail list to see if anybody else has had this issue. Chris On Mar 31, 2006, at 3:53 AM, Albert Vernon Smith wrote: > Running your same code, on the same file, I get: > > Output: > ------- > /usr/local/blat/db/hg17/hg17.2bit:chr5 > 100.00 > /usr/local/blat/db/hg17/hg17.2bit:chr21 > 90.00 > /usr/local/blat/db/hg17/hg17.2bit:chr5 > 85.00 > /usr/local/blat/db/hg17/hg17.2bit:chr13 > 80.00 > /usr/local/blat/db/hg17/hg17.2bit:chr7 > 80.00 > Use of uninitialized value in pattern match (m//) at /Users/albert/ > Documents/CSHL/cvswork/bioperl-live/Bio/SearchIO/psl.pm line 173, > line 10. > -------- > > This is current CVS, and I see the problem on Mac OS X, as well as > on Linux. > > As it stands the code for Bio::Search::psl *should* be fine (as I > run it in my head :-), and the error message is kinda weird. The > last line is #10, so there should be a value for the line, unless > things are trying to cycle back over it again for some reason. > > -albert > > > On 30.3.2006, at 22:10, Chris Fields wrote: > >> I'm running off bioperl-live from CVS (updated yesterday) and I get >> everything to work on this end (no errors) using your file, >> although I'm >> just printing names and HSP scores out, like this: >> >> -------------------------------------- >> >> my $parser = Bio::SearchIO->new(-verbose => $v, >> -file => 'psl.out', >> -format => 'psl'); >> >> while (my $result = $parser->next_result) { >> while (my $hit = $result->next_hit) { >> print $hit->name,"\n"; >> while (my $hsp = $hit->next_hsp) { >> print " ",$hsp->score,"\n"; >> } >> } >> } >> >> -------------------------------------- >> Output: >> -------------------------------------- >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 100.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr21 >> 90.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 85.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr13 >> 80.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr7 >> 80.00 >> -------------------------------------- >> >> Is this a recent update of Bioperl? There were several updates in >> CVS to >> Bio::SearchIO::psl for various bugfixes over the last year, >> including one >> that postdates the 1.5.1 release. I would recommend trying the >> CVS version >> (copy it over the your old version if possible or just install >> bioperl-live >> from CVS). If this doesn't work could you send your script? It >> may be a >> specific method that's acting up. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: Albert Vernon Smith [mailto:smithav at cshl.edu] >>> Sent: Thursday, March 30, 2006 1:49 PM >>> To: Chris Fields >>> Subject: Re: Bioperl bug 1977 >>> >>> [Message never went out before. Was stuck in outbox.] >>> >>> I've attached an output which causes issues. While parsing this >>> output gives me an issue, I'm actually doing something slightly >>> different. I have a webBlat server, and am getting output via >>> LWP::UserAgent, and I take the psl returned from my query and pass >>> that in memory (with IO::String) to the parser. When I do that, I >>> get a complaint which references the last line. Still, parsing this >>> as a file should be the same thing. >>> >>> -albert >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MarcL at DEVGEN.com Fri Mar 31 08:59:07 2006 From: MarcL at DEVGEN.com (Marc Logghe) Date: Fri, 31 Mar 2006 15:59:07 +0200 Subject: [Bioperl-l] What happens to STDOUT? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746C92@ANTARESIA.be.devgen.com> > my $seqin = Bio::SeqIO->new(-format => $informat, $file ? -fh > => \*ARGV); In that way, you have the same magic as the > diamond/pulp fiction operator <> > Of course, this should be (some copy/paste remnants): my $seqin = Bio::SeqIO->new(-format => $informat, -fh => \*ARGV); ML From MarcL at DEVGEN.com Fri Mar 31 09:02:41 2006 From: MarcL at DEVGEN.com (Marc Logghe) Date: Fri, 31 Mar 2006 16:02:41 +0200 Subject: [Bioperl-l] What happens to STDOUT? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746C93@ANTARESIA.be.devgen.com> > I suspect you mean: > my $seqin = Bio::SeqIO->new(-format => $informat, -fh => \*ARGV); Yeah, corrected myself a split minute ago. Sorry for that. > > But does this work? perldoc perlvar says it "may not": > Note that currently ARGV only has its magical effect within > the <> operator; elsewhere it is just a plain filehandle > corresponding to the last file opened by <>. In particular, > passing \*ARGV as a parameter to a function that expects a > filehandle may not cause your function to automatically read > the contents of all the files in @ARGV. Well, I can only say that up to now it used to work. > > > In that way, you have the same magic as the diamond/pulp fiction > > operator <> > Pulp fiction operator? Never heard it called that before. If I recall well, it must have been Damian Conway who used that. Cheers, ML From roy at colibase.bham.ac.uk Fri Mar 31 08:59:18 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Fri, 31 Mar 2006 14:59:18 +0100 Subject: [Bioperl-l] What happens to STDOUT? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746C8B@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746C8B@ANTARESIA.be.devgen.com> Message-ID: <442D35B6.8080101@colibase.bham.ac.uk> > A little off topic, but might be usefull as well. > Sometimes you don't know whether the input sequences will come from > stdin or from a file. Therefor, I ofen found myself writing things like: > my $seqin = Bio::SeqIO->new(-format => $informat, $file ? (-file => > $file) : (-fh => \*STDIN)); > This can be replaced by: > my $seqin = Bio::SeqIO->new(-format => $informat, $file ? -fh => > \*ARGV); I suspect you mean: my $seqin = Bio::SeqIO->new(-format => $informat, -fh => \*ARGV); But does this work? perldoc perlvar says it "may not": Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>. In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in @ARGV. > In that way, you have the same magic as the diamond/pulp fiction > operator <> Pulp fiction operator? Never heard it called that before. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From MarcL at DEVGEN.com Fri Mar 31 09:45:04 2006 From: MarcL at DEVGEN.com (Marc Logghe) Date: Fri, 31 Mar 2006 16:45:04 +0200 Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746C94@ANTARESIA.be.devgen.com> Hi, It seems that in the current (CVS of last night) Bio::DB::GenBank implementation it is not at all possible to set the mode to 'batch' instead of the default 'single'. Devel::StackTrace revealed that the mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method. Is that intended ? The problem is that with single mode, the request is always done with a GET. In most cases (at least in my hands) when you pass a batch of 500 id's the request fails because of the url getting too long. All goes well when the method is overridden whereby the mode option is hardcoded to 'batch' so that a POST is done. I think there are at least 2 possibilities: 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id 2) allow the possibility to pass the mode option when get_Stream_by_id is called using the Bio::DB::GenBank object Any comments/preferences before I actually commit some edits ? Regards, Marc From cjfields at uiuc.edu Fri Mar 31 11:56:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 10:56:12 -0600 Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746C94@ANTARESIA.be.devgen.com> Message-ID: <000601c654e4$071b63b0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Marc Logghe > Sent: Friday, March 31, 2006 8:45 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank > > Hi, > It seems that in the current (CVS of last night) Bio::DB::GenBank > implementation it is not at all possible to set the mode to 'batch' > instead of the default 'single'. Devel::StackTrace revealed that the > mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method. > Is that intended ? > The problem is that with single mode, the request is always done with a > GET. In most cases (at least in my hands) when you pass a batch of 500 > id's the request fails because of the url getting too long. All goes > well when the method is overridden whereby the mode option is hardcoded > to 'batch' so that a POST is done. You're right about the 500 seq limit. If it's particularly busy (during peak hours) it's less, around 200-400. I have been grabbing them 400 at a time using a loop, which works but batch would be better. I remember asking about this a few years ago and, according to Lincoln, we use the approved batch method retrieval. However, now you point it out, I just don't see it here (no epost). NCBIHelper has, for some reason, this: %CGILOCATION = ( 'batch' => ['post' => '/entrez/eutils/efetch.fcgi'], 'query' => ['get' => '/entrez/eutils/efetch.fcgi'], 'single' => ['get' => '/entrez/eutils/efetch.fcgi'], 'version'=> ['get' => '/entrez/eutils/efetch.fcgi'], 'gi' => ['get' => '/entrez/eutils/efetch.fcgi'], ); Which has batch set to efetch, not epost. > I think there are at least 2 possibilities: > 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id > 2) allow the possibility to pass the mode option when get_Stream_by_id > is called using the Bio::DB::GenBank object I would say the second is the most flexible, though I'm not exactly sure why we hardcode in 'single' for sequence streams. It may have something to do with the way single sequences are retrieved; looks like get_Seq_by_acc in WebDBSeqI calls get_Stream_by_acc with one sequence instead of an array ref; I guess get_Stream_by_id is the same. Anyway, I'm for it as long as some tests are added for batch retrieval and everything passes. > Any comments/preferences before I actually commit some edits ? > Regards, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Fri Mar 31 12:00:15 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 31 Mar 2006 12:00:15 -0500 Subject: [Bioperl-l] What happens to STDOUT? In-Reply-To: <442D35B6.8080101@colibase.bham.ac.uk> Message-ID: <016e01c654e4$951732c0$e6028a0a@GOLHARMOBILE1> Thanks all for your responses, but I think there is a bit of a misunderstanding. I meant to have my script show: while (my $seq = $seqin->next_seq) { print $seq->accession; ... } In fact, ANY print statement (printing anything, not just related to bioperl) after opening a file with bioperl gets lost somewhere. So again, what does bioperl do to STDOUT? I have to force all my output to STDERR to get anything on the console. Ryan From smarkel at scitegic.com Fri Mar 31 12:31:05 2006 From: smarkel at scitegic.com (Scott Markel) Date: Fri, 31 Mar 2006 09:31:05 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <000101c65476$47c2f130$15327e82@pyrimidine> References: <000101c65476$47c2f130$15327e82@pyrimidine> Message-ID: <442D6759.3090208@scitegic.com> Chris, Looks like I made my test case too simple. In our application, which calls BioPerl, I'm creating the feature with the zero- valued qualifier. It's not being read in from a file, so my only issue is with writing GenBank files. The real feature is one for a primer binding site. The qualifier contains the number of mismatches. The one line change of $value = $value->{"value"} definitely fixes our problem and causes no regression failures in our application. Scott Chris Fields wrote: > I tried this on WinXP (I'm using bioperl-live) and got a warning: > > -------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, attempting to > recover > --------------------------------------------------- > > Running using debugging shows that no feature key was found in > _read_FTHelper_GenBank. So I'm getting an error, but on input not output. > In fact, turning on -verbose in the SeqIO input object gives the below extra > output, whereas turning -verbose on only in the output object just gives the > warning above. > > ==================================== > C:\Perl\Scripts\gb_test>test.pl > no feature key! > > -------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, attempting to > recover > STACK Bio::SeqIO::genbank::next_seq > C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > sequence length is 10 > ==================================== > > The sequence came back w/o any features in the feature table, which is what > I would expect from this error: > ==================================== > LOCUS MY_LOCUS 10 aa linear linear > DEFINITION my description. > ACCESSION 12345 > KEYWORDS . > FEATURES Location/Qualifiers > ORIGIN > 1 atggagaact > // > ==================================== > > Adding the extra line before the s/// didn't help any (warning still pops > up, no change in output). Anybody out there with any ideas? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>bounces at lists.open-bio.org] On Behalf Of Scott Markel >>Sent: Thursday, March 30, 2006 7:18 PM >>To: bioperl-l at lists.open-bio.org >>Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers >> >>In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the >>following. >> >>Annotation tags used by Bio::SeqIO::FTHelper were strings and >>are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper >>subroutine of Bio::SeqIO::genbank the following code still >>assumes that tags are strings. >> >> foreach my $tag ( keys %{$fth->field} ) { >> foreach my $value ( @{$fth->field->{$tag}} ) { >> $value =~ s/\"/\"\"/g; >> >>If the tag value was a zero, an empty string is written. >> >>We think that >> >> $value = $value->{"value"}; >> >>should be added before the s/// call. >> >>Here's our test case. Note that the qualifier value for "foo" >>is changed to an empty string. >> >>Input file >> >>==================================== >>LOCUS MY_LOCUS 10 aa linear UNK >>DEFINITION my description. >>ACCESSION 12345 >>FEATURES Location/Qualifiers >> misc_feature 1..10 >> /foo="0" >>ORIGIN >> 1 atggagaact >>// >>==================================== >> >>Perl code >>==================================== >>use strict; >>use warnings; >> >>use Bio::SeqIO; >> >>my $inputFilename = "input.gbff"; >>my $outputFilename = "output.gbff"; >> >>my $in = Bio::SeqIO->new(-file => $inputFilename, >> -format => "genbank"); >>my $out = Bio::SeqIO->new(-file => ">$outputFilename", >> -format => "genbank"); >> >>my $sequence = $in->next_seq(); >>$out->write_seq($sequence); >>==================================== >> >>Output file >>==================================== >>LOCUS MY_LOCUS 10 aa linear linear >>DEFINITION my description. >>ACCESSION 12345 >>KEYWORDS . >>FEATURES Location/Qualifiers >> misc_feature 1..10 >> /foo="" >>ORIGIN >> 1 atggagaact >>// >>==================================== >> >>I'll add this to bugzilla, but first I want to make sure >>I'm not missing something obvious. >> >>Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From hlapp at gmx.net Fri Mar 31 12:43:15 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 31 Mar 2006 09:43:15 -0800 Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank In-Reply-To: <000601c654e4$071b63b0$15327e82@pyrimidine> References: <000601c654e4$071b63b0$15327e82@pyrimidine> Message-ID: <5b51ade5e93409471171b216f0c1d37a@gmx.net> There used to be get_Stream_by_batch() which apparently is now deprecated and forwards to get_Stream_by_id(), which therefore I assume is supposed to do the Right Thing depending on its arguments. I don't know where this is going wrong. -hilmar On Mar 31, 2006, at 8:56 AM, Chris Fields wrote: > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Marc Logghe >> Sent: Friday, March 31, 2006 8:45 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank >> >> Hi, >> It seems that in the current (CVS of last night) Bio::DB::GenBank >> implementation it is not at all possible to set the mode to 'batch' >> instead of the default 'single'. Devel::StackTrace revealed that the >> mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method. >> Is that intended ? >> The problem is that with single mode, the request is always done with >> a >> GET. In most cases (at least in my hands) when you pass a batch of 500 >> id's the request fails because of the url getting too long. All goes >> well when the method is overridden whereby the mode option is >> hardcoded >> to 'batch' so that a POST is done. > > You're right about the 500 seq limit. If it's particularly busy > (during > peak hours) it's less, around 200-400. I have been grabbing them 400 > at a > time using a loop, which works but batch would be better. > > I remember asking about this a few years ago and, according to > Lincoln, we > use the approved batch method retrieval. However, now you point it > out, I > just don't see it here (no epost). NCBIHelper has, for some reason, > this: > > %CGILOCATION = ( > 'batch' => ['post' => '/entrez/eutils/efetch.fcgi'], > 'query' => ['get' => '/entrez/eutils/efetch.fcgi'], > 'single' => ['get' => '/entrez/eutils/efetch.fcgi'], > 'version'=> ['get' => '/entrez/eutils/efetch.fcgi'], > 'gi' => ['get' => '/entrez/eutils/efetch.fcgi'], > ); > > Which has batch set to efetch, not epost. > >> I think there are at least 2 possibilities: >> 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id >> 2) allow the possibility to pass the mode option when get_Stream_by_id >> is called using the Bio::DB::GenBank object > > I would say the second is the most flexible, though I'm not exactly > sure why > we hardcode in 'single' for sequence streams. It may have something > to do > with the way single sequences are retrieved; looks like get_Seq_by_acc > in > WebDBSeqI calls get_Stream_by_acc with one sequence instead of an > array ref; > I guess get_Stream_by_id is the same. > > Anyway, I'm for it as long as some tests are added for batch retrieval > and > everything passes. > >> Any comments/preferences before I actually commit some edits ? >> Regards, >> Marc >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Fri Mar 31 13:04:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 12:04:25 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <442D6759.3090208@scitegic.com> Message-ID: <000701c654ed$8bd67d70$15327e82@pyrimidine> Okay. I'm committing this change; it passes all tests for SeqIO.t and genbank.t. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Scott Markel [mailto:smarkel at scitegic.com] > Sent: Friday, March 31, 2006 11:31 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers > > Chris, > > Looks like I made my test case too simple. In our application, > which calls BioPerl, I'm creating the feature with the zero- > valued qualifier. It's not being read in from a file, so > my only issue is with writing GenBank files. The real feature > is one for a primer binding site. The qualifier contains the > number of mismatches. The one line change of > > $value = $value->{"value"} > > definitely fixes our problem and causes no regression > failures in our application. > > Scott > > Chris Fields wrote: > > > I tried this on WinXP (I'm using bioperl-live) and got a warning: > > > > -------------------- WARNING --------------------- > > MSG: Unexpected error in feature table for Skipping feature, attempting > to > > recover > > --------------------------------------------------- > > > > Running using debugging shows that no feature key was found in > > _read_FTHelper_GenBank. So I'm getting an error, but on input not > output. > > In fact, turning on -verbose in the SeqIO input object gives the below > extra > > output, whereas turning -verbose on only in the output object just gives > the > > warning above. > > > > ==================================== > > C:\Perl\Scripts\gb_test>test.pl > > no feature key! > > > > -------------------- WARNING --------------------- > > MSG: Unexpected error in feature table for Skipping feature, attempting > to > > recover > > STACK Bio::SeqIO::genbank::next_seq > > C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > > STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > > sequence length is 10 > > ==================================== > > > > The sequence came back w/o any features in the feature table, which is > what > > I would expect from this error: > > ==================================== > > LOCUS MY_LOCUS 10 aa linear linear > > DEFINITION my description. > > ACCESSION 12345 > > KEYWORDS . > > FEATURES Location/Qualifiers > > ORIGIN > > 1 atggagaact > > // > > ==================================== > > > > Adding the extra line before the s/// didn't help any (warning still > pops > > up, no change in output). Anybody out there with any ideas? > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>bounces at lists.open-bio.org] On Behalf Of Scott Markel > >>Sent: Thursday, March 30, 2006 7:18 PM > >>To: bioperl-l at lists.open-bio.org > >>Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers > >> > >>In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the > >>following. > >> > >>Annotation tags used by Bio::SeqIO::FTHelper were strings and > >>are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper > >>subroutine of Bio::SeqIO::genbank the following code still > >>assumes that tags are strings. > >> > >> foreach my $tag ( keys %{$fth->field} ) { > >> foreach my $value ( @{$fth->field->{$tag}} ) { > >> $value =~ s/\"/\"\"/g; > >> > >>If the tag value was a zero, an empty string is written. > >> > >>We think that > >> > >> $value = $value->{"value"}; > >> > >>should be added before the s/// call. > >> > >>Here's our test case. Note that the qualifier value for "foo" > >>is changed to an empty string. > >> > >>Input file > >> > >>==================================== > >>LOCUS MY_LOCUS 10 aa linear UNK > >>DEFINITION my description. > >>ACCESSION 12345 > >>FEATURES Location/Qualifiers > >> misc_feature 1..10 > >> /foo="0" > >>ORIGIN > >> 1 atggagaact > >>// > >>==================================== > >> > >>Perl code > >>==================================== > >>use strict; > >>use warnings; > >> > >>use Bio::SeqIO; > >> > >>my $inputFilename = "input.gbff"; > >>my $outputFilename = "output.gbff"; > >> > >>my $in = Bio::SeqIO->new(-file => $inputFilename, > >> -format => "genbank"); > >>my $out = Bio::SeqIO->new(-file => ">$outputFilename", > >> -format => "genbank"); > >> > >>my $sequence = $in->next_seq(); > >>$out->write_seq($sequence); > >>==================================== > >> > >>Output file > >>==================================== > >>LOCUS MY_LOCUS 10 aa linear linear > >>DEFINITION my description. > >>ACCESSION 12345 > >>KEYWORDS . > >>FEATURES Location/Qualifiers > >> misc_feature 1..10 > >> /foo="" > >>ORIGIN > >> 1 atggagaact > >>// > >>==================================== > >> > >>I'll add this to bugzilla, but first I want to make sure > >>I'm not missing something obvious. > >> > >>Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com From hlapp at gmx.net Fri Mar 31 13:22:30 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 31 Mar 2006 10:22:30 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <442D6759.3090208@scitegic.com> References: <000101c65476$47c2f130$15327e82@pyrimidine> <442D6759.3090208@scitegic.com> Message-ID: <565c0146aaaa3890042a92efe786bb2f@gmx.net> Scott, your fix assumes that $value in reality is not a scalar but a hash ref and that it has a key "value". Apparently in your test environment this is all indeed true, but there is no guarantee that this will still be true tomorrow when you next update from CVS (or install a new version). It seems to me that making feature tag values Bio::AnnotationI objects and the stringification overload is what is interfering here. More specifically, the broken overload in Bio::Annotation::SimpleValue use overload '""' => sub { $_[0]->value || ''}; will lead exactly to the behavior you see (b/c $_[0]->value evaluates to false if the value is '0'). You say you build and populate the feature dynamically - are you using Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated to get this behavior reverted, i.e., will return to using scalars for tag values. (Or so I recall ...) To fix the problem for you now, I suggest you either fix the overload statement above to be use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' }; I suppose this should in fact be committed to the repository - does anybody see any damage from this change? Or, if you do want to mess with the GenBank format writer, protect the conversion to string and use the object access method: if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) { # convert SimpleValue object to represented (string) value $value = $value->value; } Hth, -hilmar On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > Chris, > > Looks like I made my test case too simple. In our application, > which calls BioPerl, I'm creating the feature with the zero- > valued qualifier. It's not being read in from a file, so > my only issue is with writing GenBank files. The real feature > is one for a primer binding site. The qualifier contains the > number of mismatches. The one line change of > > $value = $value->{"value"} > > definitely fixes our problem and causes no regression > failures in our application. > > Scott > > Chris Fields wrote: > >> I tried this on WinXP (I'm using bioperl-live) and got a warning: >> >> -------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to >> recover >> --------------------------------------------------- >> >> Running using debugging shows that no feature key was found in >> _read_FTHelper_GenBank. So I'm getting an error, but on input not >> output. >> In fact, turning on -verbose in the SeqIO input object gives the >> below extra >> output, whereas turning -verbose on only in the output object just >> gives the >> warning above. >> >> ==================================== >> C:\Perl\Scripts\gb_test>test.pl >> no feature key! >> >> -------------------- WARNING --------------------- >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to >> recover >> STACK Bio::SeqIO::genbank::next_seq >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 >> sequence length is 10 >> ==================================== >> >> The sequence came back w/o any features in the feature table, which >> is what >> I would expect from this error: >> ==================================== >> LOCUS MY_LOCUS 10 aa linear linear >> DEFINITION my description. >> ACCESSION 12345 >> KEYWORDS . >> FEATURES Location/Qualifiers >> ORIGIN >> 1 atggagaact >> // >> ==================================== >> >> Adding the extra line before the s/// didn't help any (warning still >> pops >> up, no change in output). Anybody out there with any ideas? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Scott Markel >>> Sent: Thursday, March 30, 2006 7:18 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers >>> >>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the >>> following. >>> >>> Annotation tags used by Bio::SeqIO::FTHelper were strings and >>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper >>> subroutine of Bio::SeqIO::genbank the following code still >>> assumes that tags are strings. >>> >>> foreach my $tag ( keys %{$fth->field} ) { >>> foreach my $value ( @{$fth->field->{$tag}} ) { >>> $value =~ s/\"/\"\"/g; >>> >>> If the tag value was a zero, an empty string is written. >>> >>> We think that >>> >>> $value = $value->{"value"}; >>> >>> should be added before the s/// call. >>> >>> Here's our test case. Note that the qualifier value for "foo" >>> is changed to an empty string. >>> >>> Input file >>> >>> ==================================== >>> LOCUS MY_LOCUS 10 aa linear UNK >>> DEFINITION my description. >>> ACCESSION 12345 >>> FEATURES Location/Qualifiers >>> misc_feature 1..10 >>> /foo="0" >>> ORIGIN >>> 1 atggagaact >>> // >>> ==================================== >>> >>> Perl code >>> ==================================== >>> use strict; >>> use warnings; >>> >>> use Bio::SeqIO; >>> >>> my $inputFilename = "input.gbff"; >>> my $outputFilename = "output.gbff"; >>> >>> my $in = Bio::SeqIO->new(-file => $inputFilename, >>> -format => "genbank"); >>> my $out = Bio::SeqIO->new(-file => ">$outputFilename", >>> -format => "genbank"); >>> >>> my $sequence = $in->next_seq(); >>> $out->write_seq($sequence); >>> ==================================== >>> >>> Output file >>> ==================================== >>> LOCUS MY_LOCUS 10 aa linear >>> linear >>> DEFINITION my description. >>> ACCESSION 12345 >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> misc_feature 1..10 >>> /foo="" >>> ORIGIN >>> 1 atggagaact >>> // >>> ==================================== >>> >>> I'll add this to bugzilla, but first I want to make sure >>> I'm not missing something obvious. >>> >>> Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From smarkel at scitegic.com Fri Mar 31 14:17:19 2006 From: smarkel at scitegic.com (Scott Markel) Date: Fri, 31 Mar 2006 11:17:19 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <565c0146aaaa3890042a92efe786bb2f@gmx.net> References: <000101c65476$47c2f130$15327e82@pyrimidine> <442D6759.3090208@scitegic.com> <565c0146aaaa3890042a92efe786bb2f@gmx.net> Message-ID: <442D803F.6020102@scitegic.com> Hilmar, Thanks for the detailed reply. The dynamic features we create are added using Bio::SeqFeature::Generic. I'm happy to use any of your suggestions. If the overload statement in Bio::Annotation::SimpleValue could be changed in the repository, that's probably the cleanest solution from my point of view. The GenBank format writer is just an instance of SimpleValue's use. Since I don't use bioperl-live in our product environment, I typically handle issues like this either in our code or try to be as faithful as I can to the direction bioperl-live is heading. In the latter case, I usually make the same change(s) in our copy of the released version. Scott Hilmar Lapp wrote: > Scott, > > your fix assumes that $value in reality is not a scalar but a hash ref > and that it has a key "value". > > Apparently in your test environment this is all indeed true, but there > is no guarantee that this will still be true tomorrow when you next > update from CVS (or install a new version). > > It seems to me that making feature tag values Bio::AnnotationI objects > and the stringification overload is what is interfering here. More > specifically, the broken overload in Bio::Annotation::SimpleValue > > use overload '""' => sub { $_[0]->value || ''}; > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates to > false if the value is '0'). > > You say you build and populate the feature dynamically - are you using > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated > to get this behavior reverted, i.e., will return to using scalars for > tag values. (Or so I recall ...) > > To fix the problem for you now, I suggest you either fix the overload > statement above to be > > use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' }; > > I suppose this should in fact be committed to the repository - does > anybody see any damage from this change? > > Or, if you do want to mess with the GenBank format writer, protect the > conversion to string and use the object access method: > > if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) { > # convert SimpleValue object to represented (string) value > $value = $value->value; > } > > Hth, > > -hilmar > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > >> Chris, >> >> Looks like I made my test case too simple. In our application, >> which calls BioPerl, I'm creating the feature with the zero- >> valued qualifier. It's not being read in from a file, so >> my only issue is with writing GenBank files. The real feature >> is one for a primer binding site. The qualifier contains the >> number of mismatches. The one line change of >> >> $value = $value->{"value"} >> >> definitely fixes our problem and causes no regression >> failures in our application. >> >> Scott >> >> Chris Fields wrote: >> >>> I tried this on WinXP (I'm using bioperl-live) and got a warning: >>> >>> -------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to >>> recover >>> --------------------------------------------------- >>> >>> Running using debugging shows that no feature key was found in >>> _read_FTHelper_GenBank. So I'm getting an error, but on input not >>> output. >>> In fact, turning on -verbose in the SeqIO input object gives the >>> below extra >>> output, whereas turning -verbose on only in the output object just >>> gives the >>> warning above. >>> >>> ==================================== >>> C:\Perl\Scripts\gb_test>test.pl >>> no feature key! >>> >>> -------------------- WARNING --------------------- >>> MSG: Unexpected error in feature table for Skipping feature, >>> attempting to >>> recover >>> STACK Bio::SeqIO::genbank::next_seq >>> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 >>> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 >>> sequence length is 10 >>> ==================================== >>> >>> The sequence came back w/o any features in the feature table, which >>> is what >>> I would expect from this error: >>> ==================================== >>> LOCUS MY_LOCUS 10 aa linear linear >>> DEFINITION my description. >>> ACCESSION 12345 >>> KEYWORDS . >>> FEATURES Location/Qualifiers >>> ORIGIN >>> 1 atggagaact >>> // >>> ==================================== >>> >>> Adding the extra line before the s/// didn't help any (warning still >>> pops >>> up, no change in output). Anybody out there with any ideas? >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Scott Markel >>>> Sent: Thursday, March 30, 2006 7:18 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers >>>> >>>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the >>>> following. >>>> >>>> Annotation tags used by Bio::SeqIO::FTHelper were strings and >>>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper >>>> subroutine of Bio::SeqIO::genbank the following code still >>>> assumes that tags are strings. >>>> >>>> foreach my $tag ( keys %{$fth->field} ) { >>>> foreach my $value ( @{$fth->field->{$tag}} ) { >>>> $value =~ s/\"/\"\"/g; >>>> >>>> If the tag value was a zero, an empty string is written. >>>> >>>> We think that >>>> >>>> $value = $value->{"value"}; >>>> >>>> should be added before the s/// call. >>>> >>>> Here's our test case. Note that the qualifier value for "foo" >>>> is changed to an empty string. >>>> >>>> Input file >>>> >>>> ==================================== >>>> LOCUS MY_LOCUS 10 aa linear UNK >>>> DEFINITION my description. >>>> ACCESSION 12345 >>>> FEATURES Location/Qualifiers >>>> misc_feature 1..10 >>>> /foo="0" >>>> ORIGIN >>>> 1 atggagaact >>>> // >>>> ==================================== >>>> >>>> Perl code >>>> ==================================== >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::SeqIO; >>>> >>>> my $inputFilename = "input.gbff"; >>>> my $outputFilename = "output.gbff"; >>>> >>>> my $in = Bio::SeqIO->new(-file => $inputFilename, >>>> -format => "genbank"); >>>> my $out = Bio::SeqIO->new(-file => ">$outputFilename", >>>> -format => "genbank"); >>>> >>>> my $sequence = $in->next_seq(); >>>> $out->write_seq($sequence); >>>> ==================================== >>>> >>>> Output file >>>> ==================================== >>>> LOCUS MY_LOCUS 10 aa linear linear >>>> DEFINITION my description. >>>> ACCESSION 12345 >>>> KEYWORDS . >>>> FEATURES Location/Qualifiers >>>> misc_feature 1..10 >>>> /foo="" >>>> ORIGIN >>>> 1 atggagaact >>>> // >>>> ==================================== >>>> >>>> I'll add this to bugzilla, but first I want to make sure >>>> I'm not missing something obvious. >>>> >>>> Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From cjfields at uiuc.edu Fri Mar 31 14:27:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 13:27:49 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <565c0146aaaa3890042a92efe786bb2f@gmx.net> Message-ID: <000801c654f9$3246f9e0$15327e82@pyrimidine> The Bio::Simple::Annotation fix actually sounds like the more reasonable fix. Problem is this is essentially reverts Bio::Annotation::Simple's last CVS commit. The bioperl-live version of this line is now: use overload '""' => sub { $_[0]->value}; with the commit message by Heikki: I'll try out your suggested fix to see what happens here, but I may need to do a bit of checking on why to see what happens here (WinXP) and on OSX and see if it passes genbank.t and SeqIO.t. Scott's fix, which I committed already, doesn't break anything as of now so shouldn't interfere, though if I get Hilmar's fix to work I'll probably roll back my last CVS commit to SeqIO::genbank regardless. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Friday, March 31, 2006 12:23 PM > To: Scott Markel > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers > > Scott, > > your fix assumes that $value in reality is not a scalar but a hash ref > and that it has a key "value". > > Apparently in your test environment this is all indeed true, but there > is no guarantee that this will still be true tomorrow when you next > update from CVS (or install a new version). > > It seems to me that making feature tag values Bio::AnnotationI objects > and the stringification overload is what is interfering here. More > specifically, the broken overload in Bio::Annotation::SimpleValue > > use overload '""' => sub { $_[0]->value || ''}; > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates > to false if the value is '0'). > > You say you build and populate the feature dynamically - are you using > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated > to get this behavior reverted, i.e., will return to using scalars for > tag values. (Or so I recall ...) > > To fix the problem for you now, I suggest you either fix the overload > statement above to be > > use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' > }; > > I suppose this should in fact be committed to the repository - does > anybody see any damage from this change? > > Or, if you do want to mess with the GenBank format writer, protect the > conversion to string and use the object access method: > > if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) { > # convert SimpleValue object to represented (string) value > $value = $value->value; > } > > Hth, > > -hilmar > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > > > Chris, > > > > Looks like I made my test case too simple. In our application, > > which calls BioPerl, I'm creating the feature with the zero- > > valued qualifier. It's not being read in from a file, so > > my only issue is with writing GenBank files. The real feature > > is one for a primer binding site. The qualifier contains the > > number of mismatches. The one line change of > > > > $value = $value->{"value"} > > > > definitely fixes our problem and causes no regression > > failures in our application. > > > > Scott > > > > Chris Fields wrote: > > > >> I tried this on WinXP (I'm using bioperl-live) and got a warning: > >> > >> -------------------- WARNING --------------------- > >> MSG: Unexpected error in feature table for Skipping feature, > >> attempting to > >> recover > >> --------------------------------------------------- > >> > >> Running using debugging shows that no feature key was found in > >> _read_FTHelper_GenBank. So I'm getting an error, but on input not > >> output. > >> In fact, turning on -verbose in the SeqIO input object gives the > >> below extra > >> output, whereas turning -verbose on only in the output object just > >> gives the > >> warning above. > >> > >> ==================================== > >> C:\Perl\Scripts\gb_test>test.pl > >> no feature key! > >> > >> -------------------- WARNING --------------------- > >> MSG: Unexpected error in feature table for Skipping feature, > >> attempting to > >> recover > >> STACK Bio::SeqIO::genbank::next_seq > >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > >> sequence length is 10 > >> ==================================== > >> > >> The sequence came back w/o any features in the feature table, which > >> is what > >> I would expect from this error: > >> ==================================== > >> LOCUS MY_LOCUS 10 aa linear linear > >> DEFINITION my description. > >> ACCESSION 12345 > >> KEYWORDS . > >> FEATURES Location/Qualifiers > >> ORIGIN > >> 1 atggagaact > >> // > >> ==================================== > >> > >> Adding the extra line before the s/// didn't help any (warning still > >> pops > >> up, no change in output). Anybody out there with any ideas? > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab > >> Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Scott Markel > >>> Sent: Thursday, March 30, 2006 7:18 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers > >>> > >>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the > >>> following. > >>> > >>> Annotation tags used by Bio::SeqIO::FTHelper were strings and > >>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper > >>> subroutine of Bio::SeqIO::genbank the following code still > >>> assumes that tags are strings. > >>> > >>> foreach my $tag ( keys %{$fth->field} ) { > >>> foreach my $value ( @{$fth->field->{$tag}} ) { > >>> $value =~ s/\"/\"\"/g; > >>> > >>> If the tag value was a zero, an empty string is written. > >>> > >>> We think that > >>> > >>> $value = $value->{"value"}; > >>> > >>> should be added before the s/// call. > >>> > >>> Here's our test case. Note that the qualifier value for "foo" > >>> is changed to an empty string. > >>> > >>> Input file > >>> > >>> ==================================== > >>> LOCUS MY_LOCUS 10 aa linear UNK > >>> DEFINITION my description. > >>> ACCESSION 12345 > >>> FEATURES Location/Qualifiers > >>> misc_feature 1..10 > >>> /foo="0" > >>> ORIGIN > >>> 1 atggagaact > >>> // > >>> ==================================== > >>> > >>> Perl code > >>> ==================================== > >>> use strict; > >>> use warnings; > >>> > >>> use Bio::SeqIO; > >>> > >>> my $inputFilename = "input.gbff"; > >>> my $outputFilename = "output.gbff"; > >>> > >>> my $in = Bio::SeqIO->new(-file => $inputFilename, > >>> -format => "genbank"); > >>> my $out = Bio::SeqIO->new(-file => ">$outputFilename", > >>> -format => "genbank"); > >>> > >>> my $sequence = $in->next_seq(); > >>> $out->write_seq($sequence); > >>> ==================================== > >>> > >>> Output file > >>> ==================================== > >>> LOCUS MY_LOCUS 10 aa linear > >>> linear > >>> DEFINITION my description. > >>> ACCESSION 12345 > >>> KEYWORDS . > >>> FEATURES Location/Qualifiers > >>> misc_feature 1..10 > >>> /foo="" > >>> ORIGIN > >>> 1 atggagaact > >>> // > >>> ==================================== > >>> > >>> I'll add this to bugzilla, but first I want to make sure > >>> I'm not missing something obvious. > >>> > >>> Scott > > > > -- > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 205 3653 > > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > > San Diego, CA 92123 fax: +1 858 279 8804 > > USA web: http://www.scitegic.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Mar 31 14:35:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 13:35:41 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <565c0146aaaa3890042a92efe786bb2f@gmx.net> Message-ID: <000901c654fa$4bb443a0$15327e82@pyrimidine> Sorry about that; stupid Outlook sent my mail before I had a chance to finish it up. The Bio::Annotation::Simple fix sounds best, but the problem is that CVS shows a fix on this line by Heikki after 1.5.1 was released: fix to allow 0 values despite operator overload (Paul Mooney) which changed the overload to: use overload '""' => sub { $_[0]->value}; I'll try out your fix here to see if it breaks anything (can't see why it would), but I may need to dig through the archives a little to see why this latest change was made. If everything works and passes tests I'll roll back the commit I made to Bio::SeqIO::genbank earlier today. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Friday, March 31, 2006 12:23 PM > To: Scott Markel > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers > > Scott, > > your fix assumes that $value in reality is not a scalar but a hash ref > and that it has a key "value". > > Apparently in your test environment this is all indeed true, but there > is no guarantee that this will still be true tomorrow when you next > update from CVS (or install a new version). > > It seems to me that making feature tag values Bio::AnnotationI objects > and the stringification overload is what is interfering here. More > specifically, the broken overload in Bio::Annotation::SimpleValue > > use overload '""' => sub { $_[0]->value || ''}; > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates > to false if the value is '0'). > > You say you build and populate the feature dynamically - are you using > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated > to get this behavior reverted, i.e., will return to using scalars for > tag values. (Or so I recall ...) > > To fix the problem for you now, I suggest you either fix the overload > statement above to be > > use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' > }; > > I suppose this should in fact be committed to the repository - does > anybody see any damage from this change? > > Or, if you do want to mess with the GenBank format writer, protect the > conversion to string and use the object access method: > > if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) { > # convert SimpleValue object to represented (string) value > $value = $value->value; > } > > Hth, > > -hilmar > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > > > Chris, > > > > Looks like I made my test case too simple. In our application, > > which calls BioPerl, I'm creating the feature with the zero- > > valued qualifier. It's not being read in from a file, so > > my only issue is with writing GenBank files. The real feature > > is one for a primer binding site. The qualifier contains the > > number of mismatches. The one line change of > > > > $value = $value->{"value"} > > > > definitely fixes our problem and causes no regression > > failures in our application. > > > > Scott > > > > Chris Fields wrote: > > > >> I tried this on WinXP (I'm using bioperl-live) and got a warning: > >> > >> -------------------- WARNING --------------------- > >> MSG: Unexpected error in feature table for Skipping feature, > >> attempting to > >> recover > >> --------------------------------------------------- > >> > >> Running using debugging shows that no feature key was found in > >> _read_FTHelper_GenBank. So I'm getting an error, but on input not > >> output. > >> In fact, turning on -verbose in the SeqIO input object gives the > >> below extra > >> output, whereas turning -verbose on only in the output object just > >> gives the > >> warning above. > >> > >> ==================================== > >> C:\Perl\Scripts\gb_test>test.pl > >> no feature key! > >> > >> -------------------- WARNING --------------------- > >> MSG: Unexpected error in feature table for Skipping feature, > >> attempting to > >> recover > >> STACK Bio::SeqIO::genbank::next_seq > >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > >> sequence length is 10 > >> ==================================== > >> > >> The sequence came back w/o any features in the feature table, which > >> is what > >> I would expect from this error: > >> ==================================== > >> LOCUS MY_LOCUS 10 aa linear linear > >> DEFINITION my description. > >> ACCESSION 12345 > >> KEYWORDS . > >> FEATURES Location/Qualifiers > >> ORIGIN > >> 1 atggagaact > >> // > >> ==================================== > >> > >> Adding the extra line before the s/// didn't help any (warning still > >> pops > >> up, no change in output). Anybody out there with any ideas? > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab > >> Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Scott Markel > >>> Sent: Thursday, March 30, 2006 7:18 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers > >>> > >>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the > >>> following. > >>> > >>> Annotation tags used by Bio::SeqIO::FTHelper were strings and > >>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper > >>> subroutine of Bio::SeqIO::genbank the following code still > >>> assumes that tags are strings. > >>> > >>> foreach my $tag ( keys %{$fth->field} ) { > >>> foreach my $value ( @{$fth->field->{$tag}} ) { > >>> $value =~ s/\"/\"\"/g; > >>> > >>> If the tag value was a zero, an empty string is written. > >>> > >>> We think that > >>> > >>> $value = $value->{"value"}; > >>> > >>> should be added before the s/// call. > >>> > >>> Here's our test case. Note that the qualifier value for "foo" > >>> is changed to an empty string. > >>> > >>> Input file > >>> > >>> ==================================== > >>> LOCUS MY_LOCUS 10 aa linear UNK > >>> DEFINITION my description. > >>> ACCESSION 12345 > >>> FEATURES Location/Qualifiers > >>> misc_feature 1..10 > >>> /foo="0" > >>> ORIGIN > >>> 1 atggagaact > >>> // > >>> ==================================== > >>> > >>> Perl code > >>> ==================================== > >>> use strict; > >>> use warnings; > >>> > >>> use Bio::SeqIO; > >>> > >>> my $inputFilename = "input.gbff"; > >>> my $outputFilename = "output.gbff"; > >>> > >>> my $in = Bio::SeqIO->new(-file => $inputFilename, > >>> -format => "genbank"); > >>> my $out = Bio::SeqIO->new(-file => ">$outputFilename", > >>> -format => "genbank"); > >>> > >>> my $sequence = $in->next_seq(); > >>> $out->write_seq($sequence); > >>> ==================================== > >>> > >>> Output file > >>> ==================================== > >>> LOCUS MY_LOCUS 10 aa linear > >>> linear > >>> DEFINITION my description. > >>> ACCESSION 12345 > >>> KEYWORDS . > >>> FEATURES Location/Qualifiers > >>> misc_feature 1..10 > >>> /foo="" > >>> ORIGIN > >>> 1 atggagaact > >>> // > >>> ==================================== > >>> > >>> I'll add this to bugzilla, but first I want to make sure > >>> I'm not missing something obvious. > >>> > >>> Scott > > > > -- > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 205 3653 > > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > > San Diego, CA 92123 fax: +1 858 279 8804 > > USA web: http://www.scitegic.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Mar 31 15:27:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 14:27:18 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: Message-ID: <000101c65501$81db02a0$15327e82@pyrimidine> Well, I'm running off bioperl-live now using WinXP and latest ActivePerl (5.8.8.817) and, although all tests pass (SeqIO and genbank), I keep getting the same error and erroneous output using Scott's (albeit very simple) sequence example. The problem seems to pop up on the input end, not output (the 'no feature key' in the below output only shows up when -verbose is turned on with the input SeqIO object). Questions: 1) Is the below example Scott gave valid GenBank format? I don't know, but it looks okay. 2) If so, should it work? Yes, no question. 3) And if it is supposed to, why isn't it working here? Don't know, but any of the mentioned fixes don't do anything (get rid of the error) for me. Scott gets it to work but my guess is that it is b/c he's using a Linux/UNIX flavor. Can't wait 'til I get my MacTel (4 more months....) I'm personally not too worried about it at the moment as anything I passed through SeqIO has worked w/o a problem, even on WinXP. It's just a bit frustrating to see something fail here that seems to work elsewhere. So here's what I did: input.gbff ==================================== LOCUS MY_LOCUS 10 aa linear UNK DEFINITION my description. ACCESSION 12345 FEATURES Location/Qualifiers misc_feature 1..10 /foo="0" ORIGIN 1 atggagaact // ==================================== Run through this: ==================================== use Bio::SeqIO; my $inputFilename = "input.gbff"; my $outputFilename = "output.gbff"; my $in = Bio::SeqIO->new(-verbose => 1, -file => $inputFilename, -format => "genbank"); my $out = Bio::SeqIO->new(-verbose => 0, -file => ">$outputFilename", -format => "genbank"); my $sequence = $in->next_seq(); $out->write_seq($sequence); ==================================== Gets this error: ==================================== C:\Perl\Scripts\gb_test>test.pl no feature key! -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover STACK Bio::SeqIO::genbank::next_seq C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 sequence length is 10 ==================================== And this output, which is missing the feature, somewhat expected judging from the error (output.gbff) ==================================== LOCUS MY_LOCUS 10 aa linear linear DEFINITION my description. ACCESSION 12345 KEYWORDS . FEATURES Location/Qualifiers ORIGIN 1 atggagaact // ==================================== I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give it a try this weekend on Mac OS X using the latest CVS to see what happens. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar > Lapp > Sent: Friday, March 31, 2006 1:51 PM > To: Chris Fields > Cc: Scott Markel; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers > > The only problem with Heikki's version of the line is that if the > value is undefined you get a (ugly) warning from perl stating that you > printed an undefined value. Since normally tags should always have a > value (even if an empty string) this is a rather theoretical issue. > > -hilmar > > On 3/31/06, Chris Fields wrote: > > Sorry about that; stupid Outlook sent my mail before I had a chance to > > finish it up. > > > > The Bio::Annotation::Simple fix sounds best, but the problem is that CVS > > shows a fix on this line by Heikki after 1.5.1 was released: > > > > fix to allow 0 values despite operator overload (Paul Mooney) > > > > which changed the overload to: > > > > use overload '""' => sub { $_[0]->value}; > > > > I'll try out your fix here to see if it breaks anything (can't see why > it > > would), but I may need to dig through the archives a little to see why > this > > latest change was made. If everything works and passes tests I'll roll > back > > the commit I made to Bio::SeqIO::genbank earlier today. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > > Sent: Friday, March 31, 2006 12:23 PM > > > To: Scott Markel > > > Cc: bioperl-l at lists.open-bio.org; Chris Fields > > > Subject: Re: [Bioperl-l] possible bug printing GenBank feature > qualfiers > > > > > > Scott, > > > > > > your fix assumes that $value in reality is not a scalar but a hash ref > > > and that it has a key "value". > > > > > > Apparently in your test environment this is all indeed true, but there > > > is no guarantee that this will still be true tomorrow when you next > > > update from CVS (or install a new version). > > > > > > It seems to me that making feature tag values Bio::AnnotationI objects > > > and the stringification overload is what is interfering here. More > > > specifically, the broken overload in Bio::Annotation::SimpleValue > > > > > > use overload '""' => sub { $_[0]->value || ''}; > > > > > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates > > > to false if the value is '0'). > > > > > > You say you build and populate the feature dynamically - are you using > > > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is > slated > > > to get this behavior reverted, i.e., will return to using scalars for > > > tag values. (Or so I recall ...) > > > > > > To fix the problem for you now, I suggest you either fix the overload > > > statement above to be > > > > > > use overload '""' => sub { defined($_[0]->value) ? $_[0]->value > : '' > > > }; > > > > > > I suppose this should in fact be committed to the repository - does > > > anybody see any damage from this change? > > > > > > Or, if you do want to mess with the GenBank format writer, protect the > > > conversion to string and use the object access method: > > > > > > if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) > { > > > # convert SimpleValue object to represented (string) > value > > > $value = $value->value; > > > } > > > > > > Hth, > > > > > > -hilmar > > > > > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > > > > > > > Chris, > > > > > > > > Looks like I made my test case too simple. In our application, > > > > which calls BioPerl, I'm creating the feature with the zero- > > > > valued qualifier. It's not being read in from a file, so > > > > my only issue is with writing GenBank files. The real feature > > > > is one for a primer binding site. The qualifier contains the > > > > number of mismatches. The one line change of > > > > > > > > $value = $value->{"value"} > > > > > > > > definitely fixes our problem and causes no regression > > > > failures in our application. > > > > > > > > Scott > > > > > > > > Chris Fields wrote: > > > > > > > >> I tried this on WinXP (I'm using bioperl-live) and got a warning: > > > >> > > > >> -------------------- WARNING --------------------- > > > >> MSG: Unexpected error in feature table for Skipping feature, > > > >> attempting to > > > >> recover > > > >> --------------------------------------------------- > > > >> > > > >> Running using debugging shows that no feature key was found in > > > >> _read_FTHelper_GenBank. So I'm getting an error, but on input not > > > >> output. > > > >> In fact, turning on -verbose in the SeqIO input object gives the > > > >> below extra > > > >> output, whereas turning -verbose on only in the output object just > > > >> gives the > > > >> warning above. > > > >> > > > >> ==================================== > > > >> C:\Perl\Scripts\gb_test>test.pl > > > >> no feature key! > > > >> > > > >> -------------------- WARNING --------------------- > > > >> MSG: Unexpected error in feature table for Skipping feature, > > > >> attempting to > > > >> recover > > > >> STACK Bio::SeqIO::genbank::next_seq > > > >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > > > >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > > > >> sequence length is 10 > > > >> ==================================== > > > >> > > > >> The sequence came back w/o any features in the feature table, which > > > >> is what > > > >> I would expect from this error: > > > >> ==================================== > > > >> LOCUS MY_LOCUS 10 aa linear > linear > > > >> DEFINITION my description. > > > >> ACCESSION 12345 > > > >> KEYWORDS . > > > >> FEATURES Location/Qualifiers > > > >> ORIGIN > > > >> 1 atggagaact > > > >> // > > > >> ==================================== > > > >> > > > >> Adding the extra line before the s/// didn't help any (warning > still > > > >> pops > > > >> up, no change in output). Anybody out there with any ideas? > > > >> > > > >> Christopher Fields > > > >> Postdoctoral Researcher - Switzer Lab > > > >> Dept. of Biochemistry > > > >> University of Illinois Urbana-Champaign > > > >> > > > >> > > > >> From smarkel at scitegic.com Fri Mar 31 15:48:24 2006 From: smarkel at scitegic.com (Scott Markel) Date: Fri, 31 Mar 2006 12:48:24 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <000101c65501$81db02a0$15327e82@pyrimidine> References: <000101c65501$81db02a0$15327e82@pyrimidine> Message-ID: <442D9598.30404@scitegic.com> Chris, I get clean runs with both a Windows build of Perl This is perl, v5.8.7 built for MSWin32-x86-multi-thread and a cygwin build This is perl, v5.8.6 built for cygwin-thread-multi-64int My test input file should be okay. I started with a valid file from NCBI and trimmed. misc_feature is allowed to have qualifiers. To be DDBJ/EMBL/GenBank Feature Table compliant, we could change "foo" to "note", but I don't think BioPerl is doing any checks for valid qualifier names. Scott Chris Fields wrote: > Well, I'm running off bioperl-live now using WinXP and latest ActivePerl > (5.8.8.817) and, although all tests pass (SeqIO and genbank), I keep getting > the same error and erroneous output using Scott's (albeit very simple) > sequence example. The problem seems to pop up on the input end, not output > (the 'no feature key' in the below output only shows up when -verbose is > turned on with the input SeqIO object). > > Questions: > > 1) Is the below example Scott gave valid GenBank format? I don't know, but > it looks okay. > 2) If so, should it work? Yes, no question. > 3) And if it is supposed to, why isn't it working here? Don't know, but any > of the mentioned fixes don't do anything (get rid of the error) for me. > Scott gets it to work but my guess is that it is b/c he's using a Linux/UNIX > flavor. Can't wait 'til I get my MacTel (4 more months....) > > I'm personally not too worried about it at the moment as anything I passed > through SeqIO has worked w/o a problem, even on WinXP. It's just a bit > frustrating to see something fail here that seems to work elsewhere. > > So here's what I did: > > input.gbff > ==================================== > LOCUS MY_LOCUS 10 aa linear UNK > DEFINITION my description. > ACCESSION 12345 > FEATURES Location/Qualifiers > misc_feature 1..10 > /foo="0" > ORIGIN > 1 atggagaact > // > ==================================== > > Run through this: > ==================================== > use Bio::SeqIO; > > my $inputFilename = "input.gbff"; > my $outputFilename = "output.gbff"; > > my $in = Bio::SeqIO->new(-verbose => 1, > -file => $inputFilename, > -format => "genbank"); > > my $out = Bio::SeqIO->new(-verbose => 0, > -file => ">$outputFilename", > -format => "genbank"); > > my $sequence = $in->next_seq(); > $out->write_seq($sequence); > ==================================== > > Gets this error: > > ==================================== > C:\Perl\Scripts\gb_test>test.pl > no feature key! > > -------------------- WARNING --------------------- > MSG: Unexpected error in feature table for Skipping feature, attempting to > recover > STACK Bio::SeqIO::genbank::next_seq > C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > sequence length is 10 > ==================================== > > And this output, which is missing the feature, somewhat expected judging > from the error (output.gbff) > > ==================================== > LOCUS MY_LOCUS 10 aa linear linear > DEFINITION my description. > ACCESSION 12345 > KEYWORDS . > FEATURES Location/Qualifiers > ORIGIN > 1 atggagaact > // > ==================================== > > I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give > it a try this weekend on Mac OS X using the latest CVS to see what happens. > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > >>-----Original Message----- >>From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar >>Lapp >>Sent: Friday, March 31, 2006 1:51 PM >>To: Chris Fields >>Cc: Scott Markel; bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers >> >>The only problem with Heikki's version of the line is that if the >>value is undefined you get a (ugly) warning from perl stating that you >>printed an undefined value. Since normally tags should always have a >>value (even if an empty string) this is a rather theoretical issue. >> >> -hilmar >> >>On 3/31/06, Chris Fields wrote: >> >>>Sorry about that; stupid Outlook sent my mail before I had a chance to >>>finish it up. >>> >>>The Bio::Annotation::Simple fix sounds best, but the problem is that CVS >>>shows a fix on this line by Heikki after 1.5.1 was released: >>> >>> fix to allow 0 values despite operator overload (Paul Mooney) >>> >>>which changed the overload to: >>> >>> use overload '""' => sub { $_[0]->value}; >>> >>>I'll try out your fix here to see if it breaks anything (can't see why >> >>it >> >>>would), but I may need to dig through the archives a little to see why >> >>this >> >>>latest change was made. If everything works and passes tests I'll roll >> >>back >> >>>the commit I made to Bio::SeqIO::genbank earlier today. >>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>>Sent: Friday, March 31, 2006 12:23 PM >>>>To: Scott Markel >>>>Cc: bioperl-l at lists.open-bio.org; Chris Fields >>>>Subject: Re: [Bioperl-l] possible bug printing GenBank feature >> >>qualfiers >> >>>>Scott, >>>> >>>>your fix assumes that $value in reality is not a scalar but a hash ref >>>>and that it has a key "value". >>>> >>>>Apparently in your test environment this is all indeed true, but there >>>>is no guarantee that this will still be true tomorrow when you next >>>>update from CVS (or install a new version). >>>> >>>>It seems to me that making feature tag values Bio::AnnotationI objects >>>>and the stringification overload is what is interfering here. More >>>>specifically, the broken overload in Bio::Annotation::SimpleValue >>>> >>>> use overload '""' => sub { $_[0]->value || ''}; >>>> >>>>will lead exactly to the behavior you see (b/c $_[0]->value evaluates >>>>to false if the value is '0'). >>>> >>>>You say you build and populate the feature dynamically - are you using >>>>Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is >> >>slated >> >>>>to get this behavior reverted, i.e., will return to using scalars for >>>>tag values. (Or so I recall ...) >>>> >>>>To fix the problem for you now, I suggest you either fix the overload >>>>statement above to be >>>> >>>> use overload '""' => sub { defined($_[0]->value) ? $_[0]->value >> >>: '' >> >>>>}; >>>> >>>>I suppose this should in fact be committed to the repository - does >>>>anybody see any damage from this change? >>>> >>>>Or, if you do want to mess with the GenBank format writer, protect the >>>>conversion to string and use the object access method: >>>> >>>> if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) >> >>{ >> >>>> # convert SimpleValue object to represented (string) >> >>value >> >>>> $value = $value->value; >>>> } >>>> >>>>Hth, >>>> >>>> -hilmar >>>> >>>>On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: >>>> >>>> >>>>>Chris, >>>>> >>>>>Looks like I made my test case too simple. In our application, >>>>>which calls BioPerl, I'm creating the feature with the zero- >>>>>valued qualifier. It's not being read in from a file, so >>>>>my only issue is with writing GenBank files. The real feature >>>>>is one for a primer binding site. The qualifier contains the >>>>>number of mismatches. The one line change of >>>>> >>>>> $value = $value->{"value"} >>>>> >>>>>definitely fixes our problem and causes no regression >>>>>failures in our application. >>>>> >>>>>Scott >>>>> >>>>>Chris Fields wrote: >>>>> >>>>> >>>>>>I tried this on WinXP (I'm using bioperl-live) and got a warning: >>>>>> >>>>>>-------------------- WARNING --------------------- >>>>>>MSG: Unexpected error in feature table for Skipping feature, >>>>>>attempting to >>>>>>recover >>>>>>--------------------------------------------------- >>>>>> >>>>>>Running using debugging shows that no feature key was found in >>>>>>_read_FTHelper_GenBank. So I'm getting an error, but on input not >>>>>>output. >>>>>>In fact, turning on -verbose in the SeqIO input object gives the >>>>>>below extra >>>>>>output, whereas turning -verbose on only in the output object just >>>>>>gives the >>>>>>warning above. >>>>>> >>>>>>==================================== >>>>>>C:\Perl\Scripts\gb_test>test.pl >>>>>>no feature key! >>>>>> >>>>>>-------------------- WARNING --------------------- >>>>>>MSG: Unexpected error in feature table for Skipping feature, >>>>>>attempting to >>>>>>recover >>>>>>STACK Bio::SeqIO::genbank::next_seq >>>>>>C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 >>>>>>STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 >>>>>>sequence length is 10 >>>>>>==================================== >>>>>> >>>>>>The sequence came back w/o any features in the feature table, which >>>>>>is what >>>>>>I would expect from this error: >>>>>>==================================== >>>>>>LOCUS MY_LOCUS 10 aa linear >> >>linear >> >>>>>>DEFINITION my description. >>>>>>ACCESSION 12345 >>>>>>KEYWORDS . >>>>>>FEATURES Location/Qualifiers >>>>>>ORIGIN >>>>>> 1 atggagaact >>>>>>// >>>>>>==================================== >>>>>> >>>>>>Adding the extra line before the s/// didn't help any (warning >> >>still >> >>>>>>pops >>>>>>up, no change in output). Anybody out there with any ideas? >>>>>> >>>>>>Christopher Fields >>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>Dept. of Biochemistry >>>>>>University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From osborne1 at optonline.net Fri Mar 31 16:33:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 31 Mar 2006 16:33:36 -0500 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <000101c65501$81db02a0$15327e82@pyrimidine> Message-ID: Chris, Not OS-specific, I also see "no feature key!" on Mac OS. Brian O. On 3/31/06 3:27 PM, "Chris Fields" wrote: > I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give > it a try this weekend on Mac OS X using the latest CVS to see what happens. From hlapp at gmx.net Fri Mar 31 14:51:20 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 31 Mar 2006 11:51:20 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <000901c654fa$4bb443a0$15327e82@pyrimidine> References: <565c0146aaaa3890042a92efe786bb2f@gmx.net> <000901c654fa$4bb443a0$15327e82@pyrimidine> Message-ID: The only problem with Heikki's version of the line is that if the value is undefined you get a (ugly) warning from perl stating that you printed an undefined value. Since normally tags should always have a value (even if an empty string) this is a rather theoretical issue. -hilmar On 3/31/06, Chris Fields wrote: > Sorry about that; stupid Outlook sent my mail before I had a chance to > finish it up. > > The Bio::Annotation::Simple fix sounds best, but the problem is that CVS > shows a fix on this line by Heikki after 1.5.1 was released: > > fix to allow 0 values despite operator overload (Paul Mooney) > > which changed the overload to: > > use overload '""' => sub { $_[0]->value}; > > I'll try out your fix here to see if it breaks anything (can't see why it > would), but I may need to dig through the archives a little to see why this > latest change was made. If everything works and passes tests I'll roll back > the commit I made to Bio::SeqIO::genbank earlier today. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Friday, March 31, 2006 12:23 PM > > To: Scott Markel > > Cc: bioperl-l at lists.open-bio.org; Chris Fields > > Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers > > > > Scott, > > > > your fix assumes that $value in reality is not a scalar but a hash ref > > and that it has a key "value". > > > > Apparently in your test environment this is all indeed true, but there > > is no guarantee that this will still be true tomorrow when you next > > update from CVS (or install a new version). > > > > It seems to me that making feature tag values Bio::AnnotationI objects > > and the stringification overload is what is interfering here. More > > specifically, the broken overload in Bio::Annotation::SimpleValue > > > > use overload '""' => sub { $_[0]->value || ''}; > > > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates > > to false if the value is '0'). > > > > You say you build and populate the feature dynamically - are you using > > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated > > to get this behavior reverted, i.e., will return to using scalars for > > tag values. (Or so I recall ...) > > > > To fix the problem for you now, I suggest you either fix the overload > > statement above to be > > > > use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' > > }; > > > > I suppose this should in fact be committed to the repository - does > > anybody see any damage from this change? > > > > Or, if you do want to mess with the GenBank format writer, protect the > > conversion to string and use the object access method: > > > > if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) { > > # convert SimpleValue object to represented (string) value > > $value = $value->value; > > } > > > > Hth, > > > > -hilmar > > > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote: > > > > > Chris, > > > > > > Looks like I made my test case too simple. In our application, > > > which calls BioPerl, I'm creating the feature with the zero- > > > valued qualifier. It's not being read in from a file, so > > > my only issue is with writing GenBank files. The real feature > > > is one for a primer binding site. The qualifier contains the > > > number of mismatches. The one line change of > > > > > > $value = $value->{"value"} > > > > > > definitely fixes our problem and causes no regression > > > failures in our application. > > > > > > Scott > > > > > > Chris Fields wrote: > > > > > >> I tried this on WinXP (I'm using bioperl-live) and got a warning: > > >> > > >> -------------------- WARNING --------------------- > > >> MSG: Unexpected error in feature table for Skipping feature, > > >> attempting to > > >> recover > > >> --------------------------------------------------- > > >> > > >> Running using debugging shows that no feature key was found in > > >> _read_FTHelper_GenBank. So I'm getting an error, but on input not > > >> output. > > >> In fact, turning on -verbose in the SeqIO input object gives the > > >> below extra > > >> output, whereas turning -verbose on only in the output object just > > >> gives the > > >> warning above. > > >> > > >> ==================================== > > >> C:\Perl\Scripts\gb_test>test.pl > > >> no feature key! > > >> > > >> -------------------- WARNING --------------------- > > >> MSG: Unexpected error in feature table for Skipping feature, > > >> attempting to > > >> recover > > >> STACK Bio::SeqIO::genbank::next_seq > > >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583 > > >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18 > > >> sequence length is 10 > > >> ==================================== > > >> > > >> The sequence came back w/o any features in the feature table, which > > >> is what > > >> I would expect from this error: > > >> ==================================== > > >> LOCUS MY_LOCUS 10 aa linear linear > > >> DEFINITION my description. > > >> ACCESSION 12345 > > >> KEYWORDS . > > >> FEATURES Location/Qualifiers > > >> ORIGIN > > >> 1 atggagaact > > >> // > > >> ==================================== > > >> > > >> Adding the extra line before the s/// didn't help any (warning still > > >> pops > > >> up, no change in output). Anybody out there with any ideas? > > >> > > >> Christopher Fields > > >> Postdoctoral Researcher - Switzer Lab > > >> Dept. of Biochemistry > > >> University of Illinois Urbana-Champaign > > >> > > >> > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Scott Markel > > >>> Sent: Thursday, March 30, 2006 7:18 PM > > >>> To: bioperl-l at lists.open-bio.org > > >>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers > > >>> > > >>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the > > >>> following. > > >>> > > >>> Annotation tags used by Bio::SeqIO::FTHelper were strings and > > >>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper > > >>> subroutine of Bio::SeqIO::genbank the following code still > > >>> assumes that tags are strings. > > >>> > > >>> foreach my $tag ( keys %{$fth->field} ) { > > >>> foreach my $value ( @{$fth->field->{$tag}} ) { > > >>> $value =~ s/\"/\"\"/g; > > >>> > > >>> If the tag value was a zero, an empty string is written. > > >>> > > >>> We think that > > >>> > > >>> $value = $value->{"value"}; > > >>> > > >>> should be added before the s/// call. > > >>> > > >>> Here's our test case. Note that the qualifier value for "foo" > > >>> is changed to an empty string. > > >>> > > >>> Input file > > >>> > > >>> ==================================== > > >>> LOCUS MY_LOCUS 10 aa linear UNK > > >>> DEFINITION my description. > > >>> ACCESSION 12345 > > >>> FEATURES Location/Qualifiers > > >>> misc_feature 1..10 > > >>> /foo="0" > > >>> ORIGIN > > >>> 1 atggagaact > > >>> // > > >>> ==================================== > > >>> > > >>> Perl code > > >>> ==================================== > > >>> use strict; > > >>> use warnings; > > >>> > > >>> use Bio::SeqIO; > > >>> > > >>> my $inputFilename = "input.gbff"; > > >>> my $outputFilename = "output.gbff"; > > >>> > > >>> my $in = Bio::SeqIO->new(-file => $inputFilename, > > >>> -format => "genbank"); > > >>> my $out = Bio::SeqIO->new(-file => ">$outputFilename", > > >>> -format => "genbank"); > > >>> > > >>> my $sequence = $in->next_seq(); > > >>> $out->write_seq($sequence); > > >>> ==================================== > > >>> > > >>> Output file > > >>> ==================================== > > >>> LOCUS MY_LOCUS 10 aa linear > > >>> linear > > >>> DEFINITION my description. > > >>> ACCESSION 12345 > > >>> KEYWORDS . > > >>> FEATURES Location/Qualifiers > > >>> misc_feature 1..10 > > >>> /foo="" > > >>> ORIGIN > > >>> 1 atggagaact > > >>> // > > >>> ==================================== > > >>> > > >>> I'll add this to bugzilla, but first I want to make sure > > >>> I'm not missing something obvious. > > >>> > > >>> Scott > > > > > > -- > > > Scott Markel, Ph.D. > > > Principal Bioinformatics Architect email: smarkel at scitegic.com > > > SciTegic Inc. mobile: +1 858 205 3653 > > > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > > > San Diego, CA 92123 fax: +1 858 279 8804 > > > USA web: http://www.scitegic.com > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hlapp at gmx.net Fri Mar 31 18:02:03 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 31 Mar 2006 15:02:03 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: References: <000101c65501$81db02a0$15327e82@pyrimidine> Message-ID: Note that GenBank requires a 'source' feature. The GenBank parser uses it to get the NBCI taxon ID (as that is the feature where it will be given as a db_xref tag). I thought the parser wouldn't mandate the feature but maybe at some point it assumes that it's there. Need to check, just a speculation. On 3/31/06, Brian Osborne wrote: > Chris, > > Not OS-specific, I also see "no feature key!" on Mac OS. > > Brian O. > > > On 3/31/06 3:27 PM, "Chris Fields" wrote: > > > I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give > > it a try this weekend on Mac OS X using the latest CVS to see what happens. > > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From torsten.seemann at infotech.monash.edu.au Fri Mar 31 19:15:25 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 01 Apr 2006 11:15:25 +1100 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: References: <565c0146aaaa3890042a92efe786bb2f@gmx.net> <000901c654fa$4bb443a0$15327e82@pyrimidine> Message-ID: <442DC61D.8050408@infotech.monash.edu.au> Hilmar Lapp wrote: > The only problem with Heikki's version of the line is that if the > value is undefined you get a (ugly) warning from perl stating that you > printed an undefined value. Since normally tags should always have a > value (even if an empty string) this is a rather theoretical issue. I thought it would be worth mentioning here that tags like /pseudo, which don't have a value, quoted or otherwise, can be set using the magic string value "_no_value", when using the older tag system, eg. $feature->add_tag_value('pseudo', '_no_value'); Perhaps the 'undef' value could be used to replace '_no_value' cf. NULL in SQL... I'm not sure how this works with the new Bio::Annotation* approach, which uses typed objects rather than strings. And i'm also not sure if the "_no_value" business works with the GFF and other modules. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From cjfields at uiuc.edu Fri Mar 31 21:53:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 20:53:26 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: References: Message-ID: Yeah, I get the exact same error output on my wife's IBook (I just updated from CVS, which has Scott's suggested fix). So it's not OS- specific. Chris On Mar 31, 2006, at 3:33 PM, Brian Osborne wrote: > Chris, > > Not OS-specific, I also see "no feature key!" on Mac OS. > > Brian O. > > > On 3/31/06 3:27 PM, "Chris Fields" wrote: > >> I wouldn't be a bit surprised if it is a WinXP-specific issue, so >> I'll give >> it a try this weekend on Mac OS X using the latest CVS to see what >> happens. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Mar 31 22:16:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 21:16:22 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: References: <000101c65501$81db02a0$15327e82@pyrimidine> Message-ID: <83DA4181-11E2-4CB9-BED4-A9C2E58812B6@uiuc.edu> On Mar 31, 2006, at 5:02 PM, Hilmar Lapp wrote: > Note that GenBank requires a 'source' feature. The GenBank parser uses > it to get the NBCI taxon ID (as that is the feature where it will be > given as a db_xref tag). I thought the parser wouldn't mandate the > feature but maybe at some point it assumes that it's there. Need to > check, just a speculation. Right, but if this were the case shouldn't converting from a simple format (one lacking most annotation and feature information, including a source, like fasta) to genbank raise all sorts of warnings? I get this conversion to work without errors or warnings, though there doesn't seem to be any attempt by SeqIO to check the fasta header line for accessions, etc. (a separate issue which I think Jason mentioned a fix for; I guess it hasn't been implemented yet). No source line is in the output sequence either. Chris > On 3/31/06, Brian Osborne wrote: >> Chris, >> >> Not OS-specific, I also see "no feature key!" on Mac OS. >> >> Brian O. >> >> >> On 3/31/06 3:27 PM, "Chris Fields" wrote: >> >>> I wouldn't be a bit surprised if it is a WinXP-specific issue, so >>> I'll give >>> it a try this weekend on Mac OS X using the latest CVS to see >>> what happens. >> >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Mar 31 23:26:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Mar 2006 22:26:53 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: References: <000101c65501$81db02a0$15327e82@pyrimidine> Message-ID: <62CBAEDB-A87B-4D0B-9266-D295CF5401D4@uiuc.edu> Hilmar, I figured it out. The reason the test case wasn't parsing correctly was the spacing in the feature table. This works: ========================= LOCUS MY_LOCUS 10 aa linear UNK DEFINITION my description. ACCESSION 12345 FEATURES Location/Qualifiers misc_feature 1..10 /foo="0" ORIGIN 1 atggagaact // ========================= Removing or adding a space before 'misc_feature', so that the line does not start with exactly 5 spaces, causes the warning and misparses the feature. The spacing after is not as important. Should it be that inflexible? As for the fix, Scott's addition didn't change anything unless Heikki's last CVS fix to Bio::Annotation::SimpleValue, which changes the overloaded operator, is also reverted. That makes sense when reading Heikki's commit message: "fix to allow 0 values despite operator overload (Paul Mooney)" Since the problem seems to be solved and both fixes (Scott's and Heikki's) are redundant and essentially get the same results, one of them should be rolled back. I'll go ahead and roll the Bio::SeqIO::genbank commit back sometime in the next few days, unless you think there might be a better way to go about this? On Mar 31, 2006, at 5:02 PM, Hilmar Lapp wrote: > Note that GenBank requires a 'source' feature. The GenBank parser uses > it to get the NBCI taxon ID (as that is the feature where it will be > given as a db_xref tag). I thought the parser wouldn't mandate the > feature but maybe at some point it assumes that it's there. Need to > check, just a speculation. > > On 3/31/06, Brian Osborne wrote: >> Chris, >> >> Not OS-specific, I also see "no feature key!" on Mac OS. >> >> Brian O. >> >> >> On 3/31/06 3:27 PM, "Chris Fields" wrote: >> >>> I wouldn't be a bit surprised if it is a WinXP-specific issue, so >>> I'll give >>> it a try this weekend on Mac OS X using the latest CVS to see >>> what happens. >> >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign