From cjfields at uiuc.edu Sat Apr 1 01:54:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 1 Apr 2006 00:54:45 -0600 Subject: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) In-Reply-To: <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> References: <001901c65446$bfb0b270$15327e82@pyrimidine> <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> Message-ID: <778E9E29-FA55-4288-8C3B-E386AC6A508A@uiuc.edu> Albert, I had no problems with this on Mac OS X (I'm running from 10.4.5, perl 5.8.6). I noticed that several fixes were made in the last 6 months that would potentially fix this issue, including two which involve newlines (that is what seems to be hanging things up here). Have you tried a full bioperl-live install to see if that fixes it? computer:~/searchio_test cjfields$ perl psl.pl /usr/local/blat/db/hg17/hg17.2bit:chr5 100.00 /usr/local/blat/db/hg17/hg17.2bit:chr21 90.00 /usr/local/blat/db/hg17/hg17.2bit:chr5 85.00 /usr/local/blat/db/hg17/hg17.2bit:chr13 80.00 /usr/local/blat/db/hg17/hg17.2bit:chr7 80.00 Chris On Mar 31, 2006, at 7:40 AM, Chris Fields wrote: > I'll try it with Mac OS X this weekend to confirm; I'm running v. > 10.4.5 with perl 5.8.6. > > I noticed that there's no tests for psl in SearchIO.t which should > have caught this error. I'll double check that in case I'm > mistaken. If not, I'll add a few to see what happens... maybe we'll > get some responses back? > > I'll also forward this to the mail list to see if anybody else has > had this issue. > > Chris > > On Mar 31, 2006, at 3:53 AM, Albert Vernon Smith wrote: > >> Running your same code, on the same file, I get: >> >> Output: >> ------- >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 100.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr21 >> 90.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 85.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr13 >> 80.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr7 >> 80.00 >> Use of uninitialized value in pattern match (m//) at /Users/albert/ >> Documents/CSHL/cvswork/bioperl-live/Bio/SearchIO/psl.pm line 173, >> line 10. >> -------- >> >> This is current CVS, and I see the problem on Mac OS X, as well as >> on Linux. >> >> As it stands the code for Bio::Search::psl *should* be fine (as I >> run it in my head :-), and the error message is kinda weird. The >> last line is #10, so there should be a value for the line, unless >> things are trying to cycle back over it again for some reason. >> >> -albert >> >> >> On 30.3.2006, at 22:10, Chris Fields wrote: >> >>> I'm running off bioperl-live from CVS (updated yesterday) and I get >>> everything to work on this end (no errors) using your file, >>> although I'm >>> just printing names and HSP scores out, like this: >>> >>> -------------------------------------- >>> >>> my $parser = Bio::SearchIO->new(-verbose => $v, >>> -file => 'psl.out', >>> -format => 'psl'); >>> >>> while (my $result = $parser->next_result) { >>> while (my $hit = $result->next_hit) { >>> print $hit->name,"\n"; >>> while (my $hsp = $hit->next_hsp) { >>> print " ",$hsp->score,"\n"; >>> } >>> } >>> } >>> >>> -------------------------------------- >>> Output: >>> -------------------------------------- >>> /usr/local/blat/db/hg17/hg17.2bit:chr5 >>> 100.00 >>> /usr/local/blat/db/hg17/hg17.2bit:chr21 >>> 90.00 >>> /usr/local/blat/db/hg17/hg17.2bit:chr5 >>> 85.00 >>> /usr/local/blat/db/hg17/hg17.2bit:chr13 >>> 80.00 >>> /usr/local/blat/db/hg17/hg17.2bit:chr7 >>> 80.00 >>> -------------------------------------- >>> >>> Is this a recent update of Bioperl? There were several updates in >>> CVS to >>> Bio::SearchIO::psl for various bugfixes over the last year, >>> including one >>> that postdates the 1.5.1 release. I would recommend trying the >>> CVS version >>> (copy it over the your old version if possible or just install >>> bioperl-live >>> from CVS). If this doesn't work could you send your script? It >>> may be a >>> specific method that's acting up. >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>>> -----Original Message----- >>>> From: Albert Vernon Smith [mailto:smithav at cshl.edu] >>>> Sent: Thursday, March 30, 2006 1:49 PM >>>> To: Chris Fields >>>> Subject: Re: Bioperl bug 1977 >>>> >>>> [Message never went out before. Was stuck in outbox.] >>>> >>>> I've attached an output which causes issues. While parsing this >>>> output gives me an issue, I'm actually doing something slightly >>>> different. I have a webBlat server, and am getting output via >>>> LWP::UserAgent, and I take the psl returned from my query and pass >>>> that in memory (with IO::String) to the parser. When I do that, I >>>> get a complaint which references the last line. Still, parsing >>>> this >>>> as a file should be the same thing. >>>> >>>> -albert >>> >>> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sat Apr 1 04:34:19 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 1 Apr 2006 01:34:19 -0800 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <62CBAEDB-A87B-4D0B-9266-D295CF5401D4@uiuc.edu> References: <000101c65501$81db02a0$15327e82@pyrimidine> <62CBAEDB-A87B-4D0B-9266-D295CF5401D4@uiuc.edu> Message-ID: <1a4799cf28d062163ae67235dd0956a5@gmx.net> On Mar 31, 2006, at 8:26 PM, Chris Fields wrote: > Since the problem seems to be solved and both fixes (Scott's and > Heikki's) are redundant and essentially get the same results, one of > them should be rolled back. Please don't leave the line $value = $value->{"value"}; Either remove it (because the problem needs to either be 'fixed' in all writers, or at the root), or if you want to leave this construct in, protect it with a conditional and use the method-based access as I sent an example before. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Sat Apr 1 10:47:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 1 Apr 2006 09:47:52 -0600 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <1a4799cf28d062163ae67235dd0956a5@gmx.net> References: <000101c65501$81db02a0$15327e82@pyrimidine> <62CBAEDB-A87B-4D0B-9266-D295CF5401D4@uiuc.edu> <1a4799cf28d062163ae67235dd0956a5@gmx.net> Message-ID: <85387BB7-8F38-41C8-A6D9-A5A7E6E08C33@uiuc.edu> Removed Scott's line from Bio::SeqIO::genbank in CVS (reverted back to the previous version); it's much simpler that way and we avoid an extra method call. Sorry about all the confusion here; I was worried that the parsing issue I was seeing about the feature flag had something to do with Scott's bug when they were completely unrelated. Chris On Apr 1, 2006, at 3:34 AM, Hilmar Lapp wrote: > > On Mar 31, 2006, at 8:26 PM, Chris Fields wrote: > >> Since the problem seems to be solved and both fixes (Scott's and >> Heikki's) are redundant and essentially get the same results, one of >> them should be rolled back. > > Please don't leave the line > > $value = $value->{"value"}; > > Either remove it (because the problem needs to either be 'fixed' in > all > writers, or at the root), or if you want to leave this construct in, > protect it with a conditional and use the method-based access as I > sent > an example before. > > -hilmar > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Mon Apr 3 11:32:20 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 3 Apr 2006 11:32:20 -0400 Subject: [Bioperl-l] PAML 3.15 parsing Message-ID: <5E25B9FC-EBCF-4664-847A-A85C59D80DAA@duke.edu> PAML 3.15 parsing should work now - small change in the format between 3.14 and 3.15 caused a choke-up. One day we'll get the parser to be more robust, as it stands right now it is pretty fragile to any format changes between versions. I have some other outstanding commits dealing with branch-specific parameter parsers that I'll get to later in the month. -jason -- Jason Stajich Duke University http://www.duke.edu/~jes12 From smarkel at scitegic.com Mon Apr 3 12:28:09 2006 From: smarkel at scitegic.com (Scott Markel) Date: Mon, 03 Apr 2006 09:28:09 -0700 Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers In-Reply-To: <85387BB7-8F38-41C8-A6D9-A5A7E6E08C33@uiuc.edu> References: <000101c65501$81db02a0$15327e82@pyrimidine> <62CBAEDB-A87B-4D0B-9266-D295CF5401D4@uiuc.edu> <1a4799cf28d062163ae67235dd0956a5@gmx.net> <85387BB7-8F38-41C8-A6D9-A5A7E6E08C33@uiuc.edu> Message-ID: <44314D19.5080804@scitegic.com> Chris, Apologies again that my simplistic input file caused confusion. Thanks for making the CVS changes. Scott Chris Fields wrote: > Removed Scott's line from Bio::SeqIO::genbank in CVS (reverted back to > the previous version); it's much simpler that way and we avoid an extra > method call. Sorry about all the confusion here; I was worried that > the parsing issue I was seeing about the feature flag had something to > do with Scott's bug when they were completely unrelated. > > Chris > > On Apr 1, 2006, at 3:34 AM, Hilmar Lapp wrote: > >> >> On Mar 31, 2006, at 8:26 PM, Chris Fields wrote: >> >>> Since the problem seems to be solved and both fixes (Scott's and >>> Heikki's) are redundant and essentially get the same results, one of >>> them should be rolled back. >> >> >> Please don't leave the line >> >> $value = $value->{"value"}; >> >> Either remove it (because the problem needs to either be 'fixed' in all >> writers, or at the root), or if you want to leave this construct in, >> protect it with a conditional and use the method-based access as I sent >> an example before. >> >> -hilmar >> -- >> ---------------------------------------------------------- >> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : >> ---------------------------------------------------------- >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From cjfields at uiuc.edu Tue Apr 4 10:15:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Apr 2006 09:15:00 -0500 Subject: [Bioperl-l] biosql-schema CVS problems Message-ID: <000001c657f2$2924d660$15327e82@pyrimidine> I'm having problems checking out biosql-schema from CVS using my developer account on the new CVS server (dev.open-bio.org). I'm currently using TortoiseCVS and I am able to checkout all other packages w/o problems (core, run, db, etc), so I know everything is set up correctly and works. When I try 'biosql-schema' or 'schema' I get this: In C:\Perl\src\bioperl: "C:\Program Files\TortoiseCVS\cvs.exe" "-q" "--lf" "checkout" "-P" "schema" CVSROOT=:ext:cjfields at dev.open-bio.org:/home/repository/bioperl cvs checkout: failed to create lock directory for `/home/repository/bioperl/biosql-schema' (/home/repository/bioperl/biosql-schema/#cvs.lock): Permission denied cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/biosql-schema' cvs [checkout aborted]: read lock failed - giving up cvs.exe checkout: in directory .: cvs.exe checkout: cannot open CVS/Entries for reading: No such file or directory Error, CVS operation failed This didn't happen with the old cvs server. Any ideas? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Tue Apr 4 11:01:44 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 4 Apr 2006 11:01:44 -0400 Subject: [Bioperl-l] biosql-schema CVS problems In-Reply-To: <000001c657f2$2924d660$15327e82@pyrimidine> References: <000001c657f2$2924d660$15327e82@pyrimidine> Message-ID: <1EF7D89D-864C-4E67-8997-C9F717FA9456@duke.edu> It is a symlink to /home/repository/biosql because this is a shared project, but you need to be in the biosql group though to checkout with read+write permissions. An email to the helpdesk will get this straightened out. We'll have to think more about how we want to handle this in the future in terms of granting permissions to the schema for r/w. -jason On Apr 4, 2006, at 10:15 AM, Chris Fields wrote: > I'm having problems checking out biosql-schema from CVS using my > developer > account on the new CVS server (dev.open-bio.org). I'm currently using > TortoiseCVS and I am able to checkout all other packages w/o > problems (core, > run, db, etc), so I know everything is set up correctly and works. > When I > try 'biosql-schema' or 'schema' I get this: > > In C:\Perl\src\bioperl: "C:\Program Files\TortoiseCVS\cvs.exe" "-q" > "--lf" > "checkout" "-P" "schema" > CVSROOT=:ext:cjfields at dev.open-bio.org:/home/repository/bioperl > > cvs checkout: failed to create lock directory for > `/home/repository/bioperl/biosql-schema' > (/home/repository/bioperl/biosql-schema/#cvs.lock): Permission denied > cvs checkout: failed to obtain dir lock in repository > `/home/repository/bioperl/biosql-schema' > cvs [checkout aborted]: read lock failed - giving up > cvs.exe checkout: in directory .: > cvs.exe checkout: cannot open CVS/Entries for reading: No such file or > directory > > Error, CVS operation failed > > > > This didn't happen with the old cvs server. Any ideas? > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Apr 4 11:08:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Apr 2006 10:08:32 -0500 Subject: [Bioperl-l] biosql-schema CVS problems In-Reply-To: <1EF7D89D-864C-4E67-8997-C9F717FA9456@duke.edu> Message-ID: <000701c657f9$a3245ba0$15327e82@pyrimidine> No problem (I can just download the tarball, I don't need r/w permissions for the schema). I didn't have an issue when checking out from the old cvs server; it's been a while so I may be wrong. Looks like Chris D. figured it out though. Thanks! Chris (F.) Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Tuesday, April 04, 2006 10:02 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] biosql-schema CVS problems > > It is a symlink to /home/repository/biosql because this is a shared > project, but you need to be in the biosql group though to checkout > with read+write permissions. > > An email to the helpdesk will get this straightened out. > > We'll have to think more about how we want to handle this in the > future in terms of granting permissions to the schema for r/w. > > > -jason > On Apr 4, 2006, at 10:15 AM, Chris Fields wrote: > > > I'm having problems checking out biosql-schema from CVS using my > > developer > > account on the new CVS server (dev.open-bio.org). I'm currently using > > TortoiseCVS and I am able to checkout all other packages w/o > > problems (core, > > run, db, etc), so I know everything is set up correctly and works. > > When I > > try 'biosql-schema' or 'schema' I get this: > > > > In C:\Perl\src\bioperl: "C:\Program Files\TortoiseCVS\cvs.exe" "-q" > > "--lf" > > "checkout" "-P" "schema" > > CVSROOT=:ext:cjfields at dev.open-bio.org:/home/repository/bioperl > > > > cvs checkout: failed to create lock directory for > > `/home/repository/bioperl/biosql-schema' > > (/home/repository/bioperl/biosql-schema/#cvs.lock): Permission denied > > cvs checkout: failed to obtain dir lock in repository > > `/home/repository/bioperl/biosql-schema' > > cvs [checkout aborted]: read lock failed - giving up > > cvs.exe checkout: in directory .: > > cvs.exe checkout: cannot open CVS/Entries for reading: No such file or > > directory > > > > Error, CVS operation failed > > > > > > > > This didn't happen with the old cvs server. Any ideas? > > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Apr 4 18:03:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Apr 2006 17:03:58 -0500 Subject: [Bioperl-l] FW: Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) Message-ID: <001701c65833$ad255840$15327e82@pyrimidine> Of course I forgot to forward this... Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign Original message: I'm going to go back on the mail list since it's probably not the script, though it still sounds like an input file issue via IO::String. Jason may have some ideas on this as well. On the surface it looks like the end_document method isn't called for some reason so it tries reading through, gets undef and chokes. However I am unable to replicate that here (everything ends as expected). It doesn't hurt to add your fix as a failsafe so I'll go ahead and commit it (it passes all tests) but I would like you to try a few things first. Did you try 'make test' when you installed bioperl from CVS? I had previously thought that the psl tests weren't included but they are actually under UCSCParsers.t. This error ("Use of uninitialized value in pattern match (m//)") should pop up if you run 'make test'. Or, even simpler, run this from the bioperl-live directory (where you ran 'perl Makefile.PL, and 'make test'): perl -I. -w t/USCSParsers.t The only other thing I can think of is that, in your original test run of the script, your error line indicated that you had a local installation of bioperl and that version of Bio::SearchIO::psl was the one causing the error: Use of uninitialized value in pattern match (m//) at /Users/albert/ Documents/CSHL/cvswork/bioperl-live/Bio/SearchIO/psl.pm line 173, line 10. Is it possible that you have two versions of bioperl installed, one in the /Library/Perl folder and the one being used above? If you tried uninstalling and reinstalling using 'perl Makefile.PL', 'make', 'make install' then it will install in your system by default (though I believe you have to use 'sudo' for that). If PERL5LIB is set to your local perl version, it will use that one first. Personally, I don't think that's it though it's worth a shot. I tried old versions of SearchIO::psl from CVS and they all seemed to work just fine, so I'm a bit mystified about this one since you get the same error on Linux and Mac OS X, but I don't get it on either WinXP or Mac OS X (both using a CVS copy of bioperl with PERL5LIB). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Albert Vernon Smith [mailto:smithav at cshl.edu] > Sent: Tuesday, April 04, 2006 1:27 PM > To: Chris Fields > Subject: Re: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl > bug > 1977) > > it is exactly your script where i get this (as well as my own > variations). plus data for the test. > > -a From asnascimento at if.sc.usp.br Tue Apr 4 11:28:26 2006 From: asnascimento at if.sc.usp.br (Alessandro S. Nascimento) Date: Tue, 04 Apr 2006 12:28:26 -0300 Subject: [Bioperl-l] problems with blast parser Message-ID: <4432909A.9030107@if.sc.usp.br> Hi all I'm trying to parse a blast standalone (blaspgp) result file and filter some sequences using length and identity. The script used to work but this time after several minutes working in 99.9% of my processor I have a "killed"message with no more information. The blast file is very large. Does anyone have any clue ? Thanks in advance Alessandro From smithav at cshl.edu Tue Apr 4 11:15:42 2006 From: smithav at cshl.edu (Albert Vernon Smith) Date: Tue, 4 Apr 2006 15:15:42 +0000 Subject: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) In-Reply-To: <778E9E29-FA55-4288-8C3B-E386AC6A508A@uiuc.edu> References: <001901c65446$bfb0b270$15327e82@pyrimidine> <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> <778E9E29-FA55-4288-8C3B-E386AC6A508A@uiuc.edu> Message-ID: <930B08C3-209D-4915-B86D-82CA2CF0E42B@cshl.edu> I deleted all of my bioperl installation, and reinstalled to be certain that this is a current issue. Having done that, I am still seeing the issue. I also updated IO::Handle to the most recent version, on the off chance that it was something coming from there. -albert On 1.4.2006, at 06:54, Chris Fields wrote: > Albert, > > I had no problems with this on Mac OS X (I'm running from 10.4.5, > perl 5.8.6). I noticed that several fixes were made in the last 6 > months that would potentially fix this issue, including two which > involve newlines (that is what seems to be hanging things up > here). Have you tried a full bioperl-live install to see if that > fixes it? > > computer:~/searchio_test cjfields$ perl psl.pl > /usr/local/blat/db/hg17/hg17.2bit:chr5 > 100.00 > /usr/local/blat/db/hg17/hg17.2bit:chr21 > 90.00 > /usr/local/blat/db/hg17/hg17.2bit:chr5 > 85.00 > /usr/local/blat/db/hg17/hg17.2bit:chr13 > 80.00 > /usr/local/blat/db/hg17/hg17.2bit:chr7 > 80.00 > > > Chris From cjfields at uiuc.edu Wed Apr 5 15:24:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Apr 2006 14:24:14 -0500 Subject: [Bioperl-l] mailing list summaries Message-ID: <001201c658e6$86c779e0$15327e82@pyrimidine> I posted this on the news blog already, but I want to know what people think. Jason has listed on the wiki a list of priorities, including a regular summary of the mail list traffic. I have decided to take this up and see how it goes. Basically, I plan on starting a weekly, possibly biweekly, summary of mailing list traffic. These will be somewhat in the same vein as the Perl5 or Perl6 summaries and will be posted on the blog here and sent to the bioperl-l mail list. Barring another natural disaster here, these should start up next week (covering mail list traffic starting from April 1). The summaries will cover traffic mainly from bioperl-l (the main mail list) but will include biosql-l, since it's fairly low traffic, and bugs/module updates from bioperl-guts-l. Let me know if there are any requests/questions/gripes/etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From y.itan at ucl.ac.uk Wed Apr 5 18:00:54 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Thu, 06 Apr 2006 08:00:54 +1000 Subject: [Bioperl-l] Getting sequences by ID Message-ID: <200604051803.25547.y.itan@ucl.ac.uk> Hi Torsten, I would be grateful for an advice from you regarding Bioperl, after I was fiddling around trying to write the Perl script for that from scratch. I have a large fasta file of about 20,000 genes, and another file which is a list of about 2,000 gene IDs (no sequences), all included in the large file. I need to create a fasta file which will include only the genes with these specific 200 IDs. I was wondering if there is a method in Bioperl that will allow me to do the following pseudocode: For each $ID from 200_IDs_set_file { $my_seq = get_sequence_by_ID(from large_fasta_file, $ID) write $my_seq into file } Many thanks for any hint! Yuval From torsten.seemann at infotech.monash.edu.au Wed Apr 5 18:14:01 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 06 Apr 2006 08:14:01 +1000 Subject: [Bioperl-l] Getting sequences by ID In-Reply-To: <200604051803.25547.y.itan@ucl.ac.uk> References: <200604051803.25547.y.itan@ucl.ac.uk> Message-ID: <1144275241.22967.7.camel@chauvel.csse.monash.edu.au> On Wed, 2006-04-05 at 18:03 +0100, Yuval Itan wrote: > I would be grateful for an advice from you regarding Bioperl, after I was > fiddling around trying to write the Perl script for that from scratch. > I have a large fasta file of about 20,000 genes, and another file which is a > list of about 2,000 gene IDs (no sequences), all included in the large file. > I need to create a fasta file which will include only the genes with these > specific 200 IDs. I was wondering if there is a method in Bioperl that will > allow me to do the following pseudocode: > > For each $ID from 200_IDs_set_file > { > $my_seq = get_sequence_by_ID(from large_fasta_file, $ID) > write $my_seq into file > } There are many possibilities involving combinations of pure Perl and BioPerl modules, and some even involving no Perl, but rather using commands like 'formatdb' and 'fastacmd -s'. There are probably EMBOSS solutions too. Using your pseudo code, you could use Bio::Index::Fasta to index your 20,000 genes. Then loop over each ID, and retrieve the Seq via the index, and write it out using Bio::SeqIO. Perhaps look at it from another perspective: # put all the IDs we want into a hash (read from file) my %want_id = .... ; foreach $seq (use Seq::IO to read large_fasta_file) { if $want_id{$seq->id} then use Seq::IO to write this $seq out end } -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Thu Apr 6 00:28:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Apr 2006 23:28:12 -0500 Subject: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) In-Reply-To: <930B08C3-209D-4915-B86D-82CA2CF0E42B@cshl.edu> References: <001901c65446$bfb0b270$15327e82@pyrimidine> <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> <778E9E29-FA55-4288-8C3B-E386AC6A508A@uiuc.edu> <930B08C3-209D-4915-B86D-82CA2CF0E42B@cshl.edu> Message-ID: As it turns out, the script you sent me back (the one which you said was mine) gave me the answer. It's the -w flag in the shebang line. You had modified modified my script a bit by adding -w (I used 'use warnings' since -w gives a ton of debugging info on Windows). I see the error popping up now on Mac (I haven't tried WinXP yet but will tomorrow). I'll give it a closer look tomorrow. I don't think the fix you suggested is the right one for now since it doesn't address the real problem (the parser doesn't 'see' the end of the report), but keep it in if it works for you. Another workaround, w/o modifying Bio::SearchIO::psl, is to do this: my $result = $parser->next_result; while (my $hit = $result->next_hit) { print $hit->name,"\n"; while (my $hsp = $hit->next_hsp) { print " ",$hsp->score,"\n"; } } which calls next_report only once. I don't get the error with this. Chris On Apr 4, 2006, at 10:15 AM, Albert Vernon Smith wrote: > I deleted all of my bioperl installation, and reinstalled to be > certain that this is a current issue. Having done that, I am still > seeing the issue. I also updated IO::Handle to the most recent > version, on the off chance that it was something coming from there. > > -albert > > > On 1.4.2006, at 06:54, Chris Fields wrote: > >> Albert, >> >> I had no problems with this on Mac OS X (I'm running from 10.4.5, >> perl 5.8.6). I noticed that several fixes were made in the last 6 >> months that would potentially fix this issue, including two which >> involve newlines (that is what seems to be hanging things up >> here). Have you tried a full bioperl-live install to see if that >> fixes it? >> >> computer:~/searchio_test cjfields$ perl psl.pl >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 100.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr21 >> 90.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr5 >> 85.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr13 >> 80.00 >> /usr/local/blat/db/hg17/hg17.2bit:chr7 >> 80.00 >> >> >> Chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ch01ph14 at uohyd.ernet.in Thu Apr 6 01:13:02 2006 From: ch01ph14 at uohyd.ernet.in (ch01ph14 at uohyd.ernet.in) Date: Thu, 6 Apr 2006 10:43:02 +0530 (IST) Subject: [Bioperl-l] MeltDNA: A software to prediction of DNA duplex stability and thermodynamics. Message-ID: <56986.172.16.56.11.1144300382.squirrel@172.16.1.11> Announcement ------------ Dear All, We are releasing the first version of MeltDNA. This can be downloaded from. http://meltdna.sourceforge.net Overview of the software ------------------------- Prediction of DNA duplex stability and thermodynamics is invaluable for many molecular biology applications involving sequence dependent hybridization reactions. Sequence dependent stability of duplex DNA plays a major role in fundamental processes of the living cell, such as replication, transcription, and recombination. Many techniques in molecular biology depend on the oligonucleotide melting temperature (Tm), and several formulas have been developed to estimate Tm. We have developed a Perl based tool to predict DNA duplex hybridization & melting thermo-dynamics. This tool consider all factors those determine thermodynamic values like-Base composition,NN parameters, salt dependency (Na+, Mg++), internal mismatch, dangling ends ,loops etc. For improve NN(nearest neighbour) model based calculations we introduced fusion matrices. MeltDNA provides better prediction accuracy for DNA duplex hybridization especially its ability to handle the structural features in DNA duplex like loops, loop size etc. It has got wide applicability in primer selection in PCR, probe selection in Microarrays, DNA secondary structure stability prediction, DNA computing etc. Comments are always welcome to improve the software quality. Best Regards, Sunil From sdavis2 at mail.nih.gov Thu Apr 6 06:45:37 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 06 Apr 2006 06:45:37 -0400 Subject: [Bioperl-l] MeltDNA: A software to prediction of DNA duplex stability and thermodynamics. In-Reply-To: <56986.172.16.56.11.1144300382.squirrel@172.16.1.11> Message-ID: Sunil, I just looked quickly but did not see a publication describing the methods in detail or comparison with other methods. Is such a document available? Sean On 4/6/06 1:13 AM, "ch01ph14 at uohyd.ernet.in" wrote: > Announcement > ------------ > Dear All, > > We are releasing the first version of MeltDNA. This can be downloaded from. > > http://meltdna.sourceforge.net > > Overview of the software > ------------------------- > > Prediction of DNA duplex stability and thermodynamics is invaluable for > many molecular biology applications involving sequence dependent > hybridization reactions. Sequence dependent stability of duplex DNA plays > a major role in fundamental processes of the living cell, such as > replication, transcription, and recombination. Many techniques in > molecular biology depend on the oligonucleotide melting temperature (Tm), > and several formulas have been developed to estimate Tm. We have developed > a Perl based tool to predict DNA duplex hybridization & melting > thermo-dynamics. > This tool consider all factors those determine thermodynamic values > like-Base composition,NN parameters, salt dependency (Na+, Mg++), > internal mismatch, dangling ends ,loops etc. For improve NN(nearest > neighbour) model based calculations we introduced fusion matrices. > MeltDNA provides better prediction accuracy for DNA duplex hybridization > especially its ability to handle the structural features in DNA duplex > like loops, loop size etc. It has got wide applicability in primer > selection in PCR, probe selection in Microarrays, DNA secondary > structure stability prediction, DNA computing etc. > > Comments are always welcome to improve the software quality. > > Best Regards, > Sunil > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Thu Apr 6 10:25:09 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 6 Apr 2006 10:25:09 -0400 Subject: [Bioperl-l] cut_seq in Bio::Tools::RestrictionEnzyme Message-ID: <7930EE6CD7CA354D93B444D0433C061101D0862A@NIHCESMLBX6.nih.gov> Please help me to understand the working of cut_seq in http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html#POD7 explicitly to extract the information contained in the return variable --- some sort of reference to something. Documentation says Title : cut_seq Usage : $re->cut_seq(); Purpose : Conceptually cut or "digest" a DNA sequence with the given enzyme. Example : $string = $re->cut_seq(); Returns : List of strings containing the resulting fragments. Argument : Reference to a Bio::PrimarySeq.pm-derived object. Eventually I want the length of the fragments. Thank you. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From georg.otto at tuebingen.mpg.de Thu Apr 6 10:39:43 2006 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Thu, 06 Apr 2006 16:39:43 +0200 Subject: [Bioperl-l] StandAloneBlast on pre-formatted GenBank nt database Message-ID: Hi, I apologize if this is a FAQ, but I couldn't find a solution using google. I downloaded the pre-formatted nt blast database from GenBank which is split in five smaller databases with file extensions like this: nt.0[0-4].* It also comes with an alias file nt.nal blastall works well like this: blastall -p blastn -d nt But unfortunately my Perl script, which starts a blast search using Bio::Tools::Run::StandAloneBlast and the same parameters as the blastall command above gives a warning and an exception: [blastall] WARNING: Could not find index files for database /Library/WebServer/Documents/blast/db/Volumes/Seahorse_HD2/blastdb/nt [blastall] WARNING: BI867164: Could not find index files for database /Library/WebServer/Documents/blast/db/Volumes/Seahorse_HD2/blastdb/nt ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 /usr/local/bin/blastall -p blastn -d "/Library/WebServer/Documents/blast/db/Volumes/Seahorse_HD2/blastdb/nt" -i /tmp/Wb5ThHwjEu -o /tmp/zvWomE5U3U STACK Bio::Tools::Run::StandAloneBlast::_runblast /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel /Users/rglab/Georg/bin/run_blast.pl:53 -------------------------------------- The perl script works well with other databases, that are not split, so my guess is that the blast search as it is called from bioperl can not deal with the split database while blastall can. Does anybody have an idea how to solve this? Best, Georg From cjfields at uiuc.edu Thu Apr 6 11:56:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Apr 2006 10:56:58 -0500 Subject: [Bioperl-l] problems with blast parser In-Reply-To: <4432909A.9030107@if.sc.usp.br> Message-ID: <001701c65992$bcb873e0$15327e82@pyrimidine> Alessandro, We need to know a few things first: 1) What version of Bioperl? 2) BLAST version? 3) What OS? 4) Perl version? 5) Exactly how large is your file? It would also be nice to see at least a chunk of your script to rule out a logic error there. If you want you can also submit your script by filing this as a bug in Bugzilla and attaching your script. http://www.bioperl.org/wiki/Bugs If you have an older version of Bioperl (such as 1.4) consider upgrading to 1.5.1 or CVS. Lots of fixes have been incorporated since 1.4, including to SearchIO. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento > Sent: Tuesday, April 04, 2006 10:28 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] problems with blast parser > > Hi all > > I'm trying to parse a blast standalone (blaspgp) result file and filter > some sequences using length and identity. The script used to work but > this time after several minutes working in 99.9% of my processor I have > a "killed"message with no more information. The blast file is very > large. Does anyone have any clue ? > > Thanks in advance > > Alessandro > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Thu Apr 6 11:34:57 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 06 Apr 2006 11:34:57 -0400 Subject: [Bioperl-l] Getting sequences by ID In-Reply-To: <1144275241.22967.7.camel@chauvel.csse.monash.edu.au> Message-ID: <049c01c6598f$a9ca4a40$e6028a0a@GOLHARMOBILE1> Here's how I'm doing it with bioperl, but with large genbank files (such as chromosomes) it take a while: my $inseq = Bio::SeqIO->new(...); while (my $seqobj = $inseq->next_seq) { next if ($seqobj->accession ne $id); # process the sequence here } -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Torsten Seemann Sent: Wednesday, April 05, 2006 6:14 PM To: Yuval Itan Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Getting sequences by ID On Wed, 2006-04-05 at 18:03 +0100, Yuval Itan wrote: > I would be grateful for an advice from you regarding Bioperl, after I > was > fiddling around trying to write the Perl script for that from scratch. > I have a large fasta file of about 20,000 genes, and another file which is a > list of about 2,000 gene IDs (no sequences), all included in the large file. > I need to create a fasta file which will include only the genes with these > specific 200 IDs. I was wondering if there is a method in Bioperl that will > allow me to do the following pseudocode: > > For each $ID from 200_IDs_set_file > { > $my_seq = get_sequence_by_ID(from large_fasta_file, $ID) > write $my_seq into file > } There are many possibilities involving combinations of pure Perl and BioPerl modules, and some even involving no Perl, but rather using commands like 'formatdb' and 'fastacmd -s'. There are probably EMBOSS solutions too. Using your pseudo code, you could use Bio::Index::Fasta to index your 20,000 genes. Then loop over each ID, and retrieve the Seq via the index, and write it out using Bio::SeqIO. Perhaps look at it from another perspective: # put all the IDs we want into a hash (read from file) my %want_id = .... ; foreach $seq (use Seq::IO to read large_fasta_file) { if $want_id{$seq->id} then use Seq::IO to write this $seq out end } -- Torsten Seemann Victorian Bioinformatics Consortium _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Apr 6 12:48:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Apr 2006 11:48:56 -0500 Subject: [Bioperl-l] cut_seq in Bio::Tools::RestrictionEnzyme In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D0862A@NIHCESMLBX6.nih.gov> Message-ID: <002c01c6599a$008f8840$15327e82@pyrimidine> I believe Bio::Tools::RestrictionEnzyme is no longer maintained (deprecated). Use Bio::Restriction instead. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) [C] > Sent: Thursday, April 06, 2006 9:25 AM > To: bioperl-l > Subject: [Bioperl-l] cut_seq in Bio::Tools::RestrictionEnzyme > > Please help me to understand the working of > cut_seq in > http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html#POD7 > explicitly to extract the information contained in the return variable --- > some sort of reference to something. > Documentation says > Title : cut_seq > Usage : $re->cut_seq(); > Purpose : Conceptually cut or "digest" a DNA sequence with the given > enzyme. > Example : $string = $re->cut_seq(); > Returns : List of strings containing the resulting fragments. > Argument : Reference to a Bio::PrimarySeq.pm-derived object. > > Eventually I want the length of the fragments. > > > Thank you. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Apr 6 13:30:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 6 Apr 2006 13:30:17 -0400 Subject: [Bioperl-l] problems with blast parser In-Reply-To: <001701c65992$bcb873e0$15327e82@pyrimidine> References: <001701c65992$bcb873e0$15327e82@pyrimidine> Message-ID: <7C06C583-E84D-420A-8997-CB0F5F213C60@duke.edu> I'm pretty sure for thousands of HSPs this can be an out of memory problem. I've explained workarounds before on the list, but they basically mean building a new listener object that creates simple hashes (or arrays) instead of full-blown HSP objects. Personally I use a hybrid approach depending on the dataset - SearchIO can be too slow and too memory intensive for the cases where I am just getting top hits or summary stats, but if I want the alignment strings, more stats, etc then I use SearchIO. The question is - do you really want to be parsing a huge file, can you get away with using tabular output (-m8 or -m9) from BLAST? If you are balking at re-running the blast something like blast2table is simple pure-perl to generate an -m 8 tabular output from BLAST report very efficiently. This is discussed on the bioperl BLAST wiki page I believe. -jason On Apr 6, 2006, at 11:56 AM, Chris Fields wrote: > Alessandro, > > We need to know a few things first: > > 1) What version of Bioperl? > 2) BLAST version? > 3) What OS? > 4) Perl version? > 5) Exactly how large is your file? > > It would also be nice to see at least a chunk of your script to > rule out a > logic error there. If you want you can also submit your script by > filing > this as a bug in Bugzilla and attaching your script. > > http://www.bioperl.org/wiki/Bugs > > If you have an older version of Bioperl (such as 1.4) consider > upgrading to > 1.5.1 or CVS. Lots of fixes have been incorporated since 1.4, > including to > SearchIO. > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento >> Sent: Tuesday, April 04, 2006 10:28 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] problems with blast parser >> >> Hi all >> >> I'm trying to parse a blast standalone (blaspgp) result file and >> filter >> some sequences using length and identity. The script used to work but >> this time after several minutes working in 99.9% of my processor I >> have >> a "killed"message with no more information. The blast file is very >> large. Does anyone have any clue ? >> >> Thanks in advance >> >> Alessandro >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu Apr 6 13:42:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Apr 2006 12:42:16 -0500 Subject: [Bioperl-l] problems with blast parser In-Reply-To: <7C06C583-E84D-420A-8997-CB0F5F213C60@duke.edu> Message-ID: <002d01c659a1$72602c70$15327e82@pyrimidine> I didn't think of that, but makes sense considering he mentioned the file is huge and the process is killed off. I agree with Jason, that tabular output is probably the best way to go here. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Thursday, April 06, 2006 12:30 PM > To: Chris Fields; Alessandro S. Nascimento > Cc: BioPerl list > Subject: Re: [Bioperl-l] problems with blast parser > > I'm pretty sure for thousands of HSPs this can be an out of memory > problem. I've explained workarounds before on the list, but they > basically mean building a new listener object that creates simple > hashes (or arrays) instead of full-blown HSP objects. Personally I > use a hybrid approach depending on the dataset - SearchIO can be too > slow and too memory intensive for the cases where I am just getting > top hits or summary stats, but if I want the alignment strings, more > stats, etc then I use SearchIO. > > > The question is - do you really want to be parsing a huge file, can > you get away with using tabular output (-m8 or -m9) from BLAST? If > you are balking at re-running the blast something like blast2table is > simple pure-perl to generate an -m 8 tabular output from BLAST report > very efficiently. This is discussed on the bioperl BLAST wiki page I > believe. > > > -jason > On Apr 6, 2006, at 11:56 AM, Chris Fields wrote: > > > Alessandro, > > > > We need to know a few things first: > > > > 1) What version of Bioperl? > > 2) BLAST version? > > 3) What OS? > > 4) Perl version? > > 5) Exactly how large is your file? > > > > It would also be nice to see at least a chunk of your script to > > rule out a > > logic error there. If you want you can also submit your script by > > filing > > this as a bug in Bugzilla and attaching your script. > > > > http://www.bioperl.org/wiki/Bugs > > > > If you have an older version of Bioperl (such as 1.4) consider > > upgrading to > > 1.5.1 or CVS. Lots of fixes have been incorporated since 1.4, > > including to > > SearchIO. > > > > Chris > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento > >> Sent: Tuesday, April 04, 2006 10:28 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] problems with blast parser > >> > >> Hi all > >> > >> I'm trying to parse a blast standalone (blaspgp) result file and > >> filter > >> some sequences using length and identity. The script used to work but > >> this time after several minutes working in 99.9% of my processor I > >> have > >> a "killed"message with no more information. The blast file is very > >> large. Does anyone have any clue ? > >> > >> Thanks in advance > >> > >> Alessandro > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu Apr 6 20:40:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 6 Apr 2006 19:40:56 -0500 Subject: [Bioperl-l] Issue with Bio::SearchIO::psl (was: Bioperl bug 1977) In-Reply-To: <1F76C2A9-13CE-4AA5-A6D8-ABBD307538C5@cshl.edu> References: <001901c65446$bfb0b270$15327e82@pyrimidine> <9AFA96E5-79BE-4BE7-8B33-F6C28012B929@cshl.edu> <6E8F6494-F5C9-4763-93D3-7A9B3F238821@uiuc.edu> <778E9E29-FA55-4288-8C3B-E386AC6A508A@uiuc.edu> <930B08C3-209D-4915-B86D-82CA2CF0E42B@cshl.edu> <1F76C2A9-13CE-4AA5-A6D8-ABBD307538C5@cshl.edu> Message-ID: <501A40D5-CEF9-4619-8178-FFCAC3DDD991@uiuc.edu> I just committed a fix to Bio::SearchIO::psl. It may take about 20 minutes from now to update on CVS. This seems to work with or w/o the -w shebang flag on Mac OS X. Let me know if you have more problems with it. Chris On Apr 6, 2006, at 3:19 AM, Albert Vernon Smith wrote: > OK. Neither you nor I put our shebang lines or the 'use' lines. > I'm always putting -w on the shebang line out of habit, and trying > to remove associated messages. Makes things cleaner in the long > run. I had always thought that -w was the same as 'use warnings', > but I've now learned it is not. I don't get the message with 'use > warnings', so we are on the same page now. > > Thanks, > -albert > > > On 6.4.2006, at 04:28, Chris Fields wrote: > >> As it turns out, the script you sent me back (the one which you >> said was mine) gave me the answer. It's the -w flag in the >> shebang line. You had modified modified my script a bit by adding >> -w (I used 'use warnings' since -w gives a ton of debugging info >> on Windows). I see the error popping up now on Mac (I haven't >> tried WinXP yet but will tomorrow). >> >> I'll give it a closer look tomorrow. I don't think the fix you >> suggested is the right one for now since it doesn't address the >> real problem (the parser doesn't 'see' the end of the report), but >> keep it in if it works for you. Another workaround, w/o modifying >> Bio::SearchIO::psl, is to do this: >> >> my $result = $parser->next_result; >> while (my $hit = $result->next_hit) { >> print $hit->name,"\n"; >> while (my $hsp = $hit->next_hsp) { >> print " ",$hsp->score,"\n"; >> } >> } >> >> which calls next_report only once. I don't get the error with this. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Apr 6 21:35:08 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 07 Apr 2006 11:35:08 +1000 Subject: [Bioperl-l] MeltDNA: A software to prediction of DNA duplex stability and thermodynamics. In-Reply-To: References: Message-ID: <4435C1CC.50009@infotech.monash.edu.au> Sean, > I just looked quickly but did not see a publication describing the methods > in detail or comparison with other methods. Is such a document available? I tracked this down: MeltDNA: A Tool for DNA Hybridization and Melting Thermodynamics Prediction, Abhishek Tiwari and Vipin Wadhwa. Bioinformatics India Journal (Issue July-Sep 2005) The journal web site only has listings up to the April-June 2005 though. http://www.bioinformaticscentre.org/journal/first_journal.asp -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From sonmitra at gmail.com Fri Apr 7 04:04:20 2006 From: sonmitra at gmail.com (Sonmitra Mondal) Date: Fri, 7 Apr 2006 13:34:20 +0530 Subject: [Bioperl-l] Problem in getting sbjctseq using remote Blast Message-ID: I am having problem while running Remote blast program for getting hsp->sbjctseq . Rest of the program is running properly , but when we want to fetch that sequence following error is generating .-------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

502 Bad Gateway --------------------------------------------------- Error return code for BlastID code 1144396008-25960-52288961349.BLASTQ4 ... I am using the following code : use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; open (fh,">result.txt"); my (@params, $remote_blast_object, $blast_file, $r, $rc, $database); my $sleep_time = 2; $database = 'swissprot'; @params = (-prog=>'blastp', -data=>'swissprot', -expect=>'1e-10'); $remote_blast_object = Bio::Tools::Run::RemoteBlast->new(@params); $blast_file = Bio::Root::IO->catfile("roa1.fasta"); $r = $remote_blast_object->submit_blast( $blast_file); while ( my @rids = $remote_blast_object->each_rid ) { foreach my $rid ( @rids ) { my$rc = $remote_blast_object->retrieve_blast($rid); if(!ref($rc) ) { # $rc not a reference => error or job not yet finished if( $rc < 0 ) { $remote_blast_object->remove_rid($rid); print "Error return code for BlastID code $rid ... \n";} sleep $sleep_time; if ($sleep_time < 120) {$sleep_time *= 2;} } else { $sleep_time = 2; $remote_blast_object->remove_rid($rid); my $count = 0; my $index=1; while( my $res = $rc->next_result ) { $count++; #print "result db is ", $res->database_name(), "\n"; while( my $hit = $res->next_hit()) { print fh $hit->name(); #print $hit->name(),"\t",$hit->description()," \t"; while( my $hsp = $hit->next_hsp ) { #print "\t",$hsp->bits, print $hsp->evalue, "\t",$hsp->score; print "\n",$hsp->sbjctseq; #print fh "\n", $hsp->match; #print fh "\n", $hsp->positive; } print fh "\n"; } close(fh); } } } } ----------------------------------------------------------------------------------------------------------------- I am facing also another problem . When i am changing the parameter - program as 'tblastn' , there's no output . From avilella at gmail.com Fri Apr 7 05:39:36 2006 From: avilella at gmail.com (Albert Vilella) Date: Fri, 07 Apr 2006 10:39:36 +0100 Subject: [Bioperl-l] trap warnings in eval? Message-ID: <1144402776.7427.14.camel@localhost> Hi all, Short question: What is the preferred way to trap a bioperl warning inside an eval block? Thanks in advance, Bests, Albert. From cjfields at uiuc.edu Fri Apr 7 08:59:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Apr 2006 07:59:51 -0500 Subject: [Bioperl-l] trap warnings in eval? In-Reply-To: <1144402776.7427.14.camel@localhost> References: <1144402776.7427.14.camel@localhost> Message-ID: There are a few examples here: http://www.bioperl.org/wiki/Advanced_BioPerl#Throwing_Exceptions Some of the scripts and tests in the bioperl core distribution also have examples of how this works. bioperl-live/examples/root various root tests in t/ Chris On Apr 7, 2006, at 4:39 AM, Albert Vilella wrote: > Hi all, > > Short question: > > What is the preferred way to trap a bioperl warning inside an eval > block? > > Thanks in advance, > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From heikki at sanbi.ac.za Fri Apr 7 09:12:51 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 7 Apr 2006 15:12:51 +0200 Subject: [Bioperl-l] trap warnings in eval? In-Reply-To: <1144402776.7427.14.camel@localhost> References: <1144402776.7427.14.camel@localhost> Message-ID: <200604071512.51938.heikki@sanbi.ac.za> Albert, Change the verbosity of the the object to >=2 ,e.g. $obj->verbose(2) and Bio::Root::RootI::warn() will call throw instead - and which eval will trap it. -Heikki On Friday 07 April 2006 11:39, Albert Vilella wrote: > Hi all, > > Short question: > > What is the preferred way to trap a bioperl warning inside an eval > block? > > Thanks in advance, > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From n.haigh at sheffield.ac.uk Thu Apr 6 10:28:45 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 6 Apr 2006 15:28:45 +0100 Subject: [Bioperl-l] nt frequencies at 4 fold degenerate sites Message-ID: <004c01c65986$69505580$9c5ea78f@bmbpc196> A colleague would like to calculate the nt frequencies of 4 fold degenerate sites of coding regions in a GenBank file. Does anyone know if there might ba a script that can do this, or something similar that I might be able to rip apart in order to do accomplish this? Any pointers to particular modules that I might find useful for doing this would be good - I've been told not to spend too much time on this. Cheers Nathan ---------------------------------- Nathan S. Haigh Bioinformatics PostDoctoral Research Associate Room B2 211 Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0614-0, 03/04/2006 Tested on: 06/04/2006 15:28:43 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com -------------- next part -------------- A non-text attachment was scrubbed... Name: n.haigh at sheffield.ac.uk.vcf Type: text/x-vcard Size: 811 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060406/026fbfbb/attachment.vcf From swansonj at email.arizona.edu Fri Apr 7 02:12:04 2006 From: swansonj at email.arizona.edu (Jordan Mark Swanson) Date: Thu, 06 Apr 2006 23:12:04 -0700 Subject: [Bioperl-l] Minor Bio::Tools::Run::Alignment::Muscle update Message-ID: <443602B4.9020802@email.arizona.edu> The muscle alignment program accepts a variable for a gap opening penalty. In order to allow the Muscle bioperl-run module accept this argument, one can add GAPOPEN to the @MUSCLE_PARAMS array on line 114 of Bio/Tools/Run/Alignment/Muscle.pm . -- Jordan Swanson swansonj at email.arizona.edu Genetics Graduate Interdisciplinary Program University of Arizona From y.itan at ucl.ac.uk Thu Apr 6 11:59:06 2006 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Thu, 6 Apr 2006 16:59:06 +0100 Subject: [Bioperl-l] Getting sequences by ID In-Reply-To: References: Message-ID: <200604061659.06709.y.itan@ucl.ac.uk> Thanks a lot for your help guys. Problem solved, although I did it in very non-elegant way mixing Bioperl and too many IOs... : my $seqio_obj = Bio::SeqIO->new(-file => "/home/Yuval/unproc_pseudo.build35.cdna", -format => "fasta" ); #file to write into my $seq_out = Bio::SeqIO->new(-file => ">/home/Yuval/unproc_pseudo_truncated.build35.cdna", -format => "fasta"); while (my $seq_obj = $seqio_obj->next_seq) # reading ids from each gene in big file { my $temp2 = $seq_obj->display_id; open(FILE, "/home/Yuval/Pseudo_human35like_trunctuated_IDs.txt"); #trunctuated Ids while () #reading wanted ids { my $temp = $_; if ($temp =~ /$temp2/ || $temp2 =~ /$temp/) #id match { $seq_out->write_seq($seq_obj); #writing the fasta truncated sequence } } close(FILE); } Cheers, Yuval On Thursday 06 April 2006 14:04, Brian Osborne wrote: > Yuval, > > See: > > http://www.bioperl.org/wiki/HOWTO:Beginners#Indexing_for_Fast_Retrieval > > Also see: > > http://www.bioperl.org/wiki/Bioperl_scripts > > > Brian O. > > On 4/5/06 6:00 PM, "Yuval Itan" wrote: > > Hi Torsten, > > > > I would be grateful for an advice from you regarding Bioperl, after I was > > fiddling around trying to write the Perl script for that from scratch. > > I have a large fasta file of about 20,000 genes, and another file which > > is a list of about 2,000 gene IDs (no sequences), all included in the > > large file. I need to create a fasta file which will include only the > > genes with these specific 200 IDs. I was wondering if there is a method > > in Bioperl that will allow me to do the following pseudocode: > > > > For each $ID from 200_IDs_set_file > > { > > $my_seq = get_sequence_by_ID(from large_fasta_file, $ID) > > write $my_seq into file > > } > > > > Many thanks for any hint! > > > > Yuval > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Apr 7 11:48:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 7 Apr 2006 10:48:56 -0500 Subject: [Bioperl-l] Problem in getting sbjctseq using remote Blast In-Reply-To: Message-ID: <000f01c65a5a$c79e88d0$15327e82@pyrimidine> I only sporadically got the error you stated ("502 Bad Gateway"), which can happen if the remote BLAST server is extremely busy. The main error I saw was an message saying it did not find method 'sbjctseq': Can't locate object method "sbjctseq" via package "Bio::Search::HSP::GenericHSP" at C:\Perl\Scripts\blast test\test.pl line 41, line 1336. The reason you can't find it is b/c there is no 'sbjctseq' method for GenericHSP objects. This is left over from when Bio::Tools::BPLite was used for parsing remote BLAST reports; BPLite is now deprecated in favour of Bio::SearchIO. We would like to know where you found this in the documentation so it can be changed. Try this instead: $hsp->hit_string; You do have a few more problems here. Although this script works, using Bio::Perl in combination with Bio::Tools::Run::RemoteBlast and Bio::SearchIO is a bit redundant; Bio::Perl 'uses' methods from Bio::Tools::Run::RemoteBlast, which 'uses' methods from Bio::SearchIO. For your example you should probably not use Bio::Perl to retrieve your sequence since you're parsing remote BLAST output. Bio::SeqIO is more direct; use it instead. Try this: use Bio::Tools::Run::RemoteBlast; use Bio::SeqIO; my ( @params, $remote_blast_object, $blast_file, $r, $rc, $database); my $sleep_time = 2; open (fh, ">result.txt"); @params = (-verbose=>1, -prog=>'blastp', -data=>'swissprot', -expect=>'1e-10'); $remote_blast_object = Bio::Tools::Run::RemoteBlast->new(@params); $blast_file = Bio::SeqIO->new(-file => '< roa1.fasta', -format => 'fasta'); $r = $remote_blast_object->submit_blast($blast_file->next_seq); while ( my @rids = $remote_blast_object->each_rid ) { ...... One final thing: remote BLAST output breaks bioperl v 1.5.1 text parsing. You may need to upgrade your Bio::SearchIO::blast to the latest in CVS. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sonmitra Mondal > Sent: Friday, April 07, 2006 3:04 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Problem in getting sbjctseq using remote Blast > > I am having problem while running Remote blast program for getting > hsp->sbjctseq . Rest of the program is running properly , but when we > want to fetch that sequence following error is generating > .-------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 502 Bad Gateway > > > > --------------------------------------------------- > Error return code for BlastID code 1144396008-25960-52288961349.BLASTQ4 > ... > > > I am using the following code : > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > open (fh,">result.txt"); > my (@params, $remote_blast_object, $blast_file, $r, $rc, $database); > my $sleep_time = 2; > $database = 'swissprot'; > @params = (-prog=>'blastp', -data=>'swissprot', -expect=>'1e-10'); > $remote_blast_object = Bio::Tools::Run::RemoteBlast->new(@params); > $blast_file = Bio::Root::IO->catfile("roa1.fasta"); > $r = $remote_blast_object->submit_blast( $blast_file); > while ( my @rids = $remote_blast_object->each_rid ) { > foreach my $rid ( @rids ) { > my$rc = $remote_blast_object->retrieve_blast($rid); > if(!ref($rc) ) { # $rc not a reference => error or job not yet finished > if( $rc < 0 ) { $remote_blast_object->remove_rid($rid); > print "Error return code for BlastID code $rid ... \n";} > sleep $sleep_time; if ($sleep_time < 120) {$sleep_time *= 2;} > } else { > $sleep_time = 2; > $remote_blast_object->remove_rid($rid); > my $count = 0; > my $index=1; > while( my $res = $rc->next_result ) { > $count++; > #print "result db is ", $res->database_name(), "\n"; > > > while( my $hit = $res->next_hit()) { > > > print fh $hit->name(); > #print $hit->name(),"\t",$hit->description()," \t"; > while( my $hsp = $hit->next_hsp ) { > #print "\t",$hsp->bits, > print $hsp->evalue, "\t",$hsp->score; > print "\n",$hsp->sbjctseq; > #print fh "\n", $hsp->match; > #print fh "\n", $hsp->positive; > } > print fh "\n"; > } > close(fh); > } > } > } > } > -------------------------------------------------------------------------- > --------------------------------------- > > I am facing also another problem . When i am changing the parameter - > program as 'tblastn' , there's no output . > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Fri Apr 7 11:44:30 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 7 Apr 2006 11:44:30 -0400 Subject: [Bioperl-l] Bio::Restriction::Enzyme exception due to Bio::PrimarySeq Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08631@NIHCESMLBX6.nih.gov> Well! What to do when hit with this sort of message? I am not working with any circular sequences. line 46 is my @frags=$ra->fragments($enz); ------------- EXCEPTION ------------- MSG: Abstract method "Bio::PrimarySeqI::is_circular" is not implemented by package Bio::PrimarySeq::Fasta. This is not your fault - author of Bio::PrimarySeq::Fasta should be blamed! STACK Bio::Root::RootI::throw_not_implemented /usr/lib/perl5/site_perl/5.8.5/Bio/Root/RootI.pm:523 STACK Bio::PrimarySeqI::is_circular /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeqI.pm:671 STACK Bio::Restriction::Analysis::_cuts /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:807 STACK Bio::Restriction::Analysis::cut /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:370 STACK Bio::Restriction::Analysis::fragments /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:443 STACK toplevel TestNew.pl:46 -------------------------------------- <> Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina -------------- next part -------------- A non-text attachment was scrubbed... Name: TestNew.pl Type: application/applefile Size: 1416 bytes Desc: TestNew.pl Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060407/513a2f44/attachment.bin From jason at bioperl.org Fri Apr 7 09:50:30 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 7 Apr 2006 09:50:30 -0400 Subject: [Bioperl-l] Interpretation of percentage_idendity In-Reply-To: <44367180.8040601@agrar.hu-berlin.de> References: <44367180.8040601@agrar.hu-berlin.de> Message-ID: <9266E3B0-CDAD-4716-B2E5-4DED31ADAC26@bioperl.org> These methods are really more for multiple sequence alignment than pairwise identities. Although I guess we don't have anywhere else that calculates percent ID for a pair of sequences in an alignment - would be nice for someone to add that. First off, percentage_identity is an alias for average_percentage_identity - this has to do with preserving the function names that existed before there where two methods. There are only really two implementations to concentrate on. average_percentage_identity overall_percentage_identity The documentation for Bio::SimpleAlign gives you some hints about how each works overall is just the overall number of columns that are identical so it is very conservative. Here is the pertinent documentation for average_percent_identity Function: The function uses a fast method to calculate the average percentage identity of the alignment Notes : This method implemented by Kevin Howe calculates a figure that is designed to be similar to the average pairwise identity of the alignment (identical in the absence of gaps), without having to explicitly calculate pairwise identities proposed by Richard Durbin. Validated by Ewan Birney ad Alex Bateman. If someone wants to put some except of this on the SimpleAlign wiki page that would be awesome. -jason On Apr 7, 2006, at 10:04 AM, Armin Schmitt wrote: > Dear Jason, > > I need some help with the interpretation of > the results from all three percentage_identity > variants offered in the Bioperl module AlignI.pm > > - percentage_identity > - average_percentage_identity > - overall_percentage_identity > > Please understand that I am not a Perl expert, > so I am not able to get the meaning from the > source code. > > By percentage identity for a 2 sequence alignment > I undertand the proportion of matching amino acids > of the total length. > > But I suspect that this is different now? > > Thank you very much > > Armin Schmitt > > -- > Dr. Armin Schmitt > Z?chtungsbiologie und molekulare Genetik > Institut f?r Nutztierwissenschaften > Humboldt-Universit?t zu Berlin > Invalidenstra?e 42 > 10115 Berlin > > Breeding Biology and Molecular Genetics > Institute for Animal Sciences > Humboldt-Universit?t zu Berlin > Invalidenstra?e 42 > 10115 Berlin > Germany > > Tel: +49 30 2093 9074 > Fax: +49 30 2093 6397 > http://www.agrar.hu-berlin.de/nutztier/zb/ > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From avilella at gmail.com Fri Apr 7 13:11:30 2006 From: avilella at gmail.com (Albert Vilella) Date: Fri, 07 Apr 2006 18:11:30 +0100 Subject: [Bioperl-l] Minor Bio::Tools::Run::Alignment::Muscle update In-Reply-To: <443602B4.9020802@email.arizona.edu> References: <443602B4.9020802@email.arizona.edu> Message-ID: <1144429890.6012.8.camel@localhost> added. haven't tested at all. Albert. On Thu, 2006-04-06 at 23:12 -0700, Jordan Mark Swanson wrote: > The muscle alignment program accepts a variable for a gap opening > penalty. In order to allow the Muscle bioperl-run module accept this > argument, one can add > GAPOPEN to the @MUSCLE_PARAMS array on line 114 of > Bio/Tools/Run/Alignment/Muscle.pm . > From osborne1 at optonline.net Fri Apr 7 15:26:43 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 07 Apr 2006 15:26:43 -0400 Subject: [Bioperl-l] Bio::Restriction::Enzyme exception due to Bio::PrimarySeq In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08631@NIHCESMLBX6.nih.gov> Message-ID: Nick, Minor bug in Bio::DB::Fasta, now fixed. Either install the latest BioPerl or copy the latest Bio/DB/Fasta.pm from CVS. Brian O. On 4/7/06 11:44 AM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Well! > What to do when hit with this sort of message? > I am not working with any circular sequences. > > line 46 is > my @frags=$ra->fragments($enz); > > > ------------- EXCEPTION ------------- > MSG: Abstract method "Bio::PrimarySeqI::is_circular" is not implemented by > package Bio::PrimarySeq::Fasta. > This is not your fault - author of Bio::PrimarySeq::Fasta should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /usr/lib/perl5/site_perl/5.8.5/Bio/Root/RootI.pm:523 > STACK Bio::PrimarySeqI::is_circular > /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeqI.pm:671 > STACK Bio::Restriction::Analysis::_cuts > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:807 > STACK Bio::Restriction::Analysis::cut > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:370 > STACK Bio::Restriction::Analysis::fragments > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:443 > STACK toplevel TestNew.pl:46 > > -------------------------------------- > <> > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mblanche at berkeley.edu Fri Apr 7 21:54:21 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Fri, 07 Apr 2006 18:54:21 -0700 Subject: [Bioperl-l] Display of gff annotation Message-ID: Dear all-- I have a gff annotation in the form of: ##gff-version 2 # seqname source feature start end score strand frame attributes AF260530 ( - exon 262 693 . + . note "1" AF260530 ( - exon 4450 4630 . + . note "2" AF260530 ( - exon 13432 13776 . + . note "3" AF260530 ( - exon 15198 15359 . + . note "4.1" AF260530 ( - exon 15537 15698 . + . note "4.2" AF260530 ( - exon 16060 16221 . + . note "4.3" AF260530 ( - exon 16682 16852 . + . note "4.4" AF260530 ( - exon 16985 17146 . + . note "4.5" AF260530 ( - exon 16985 17146 . + . note "5" Where the value of the tag 'note' is the exon name. all exons with a decimal annotation are alternatively spliced exon. My goal would be to display the alternative exons in a different color (let say red) than the constitutive exons (let say blue). I have this script: #!/usr/bin/perl use strict; use Bio::Tools::GFF; use Bio::Graphics; use Bio::SeqFeature::Generic; my $gffio = Bio::Tools::GFF->new(-file => $ARGV[0], -gff_version => 2, ); my $feat_o = Bio::SeqFeature::Generic->new(-name => 'aGene'); while (my $feature = $gffio->next_feature) { $feat_o->add_SeqFeature($feature,'EXPAND'); } my $panel = Bio::Graphics::Panel->new( -length => $feat_o->end, -width => 1500, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start=>1, -end=>$feat_o->end); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, ); $panel->add_track($feat_o, -glyph=> 'segments'); print $panel->png That does almost what I need. My questions are: 1) is there a better way to transform the GFF file into a SeqFeature object than the while loop that I am using 2) How could I tell the Bio::Graphics::Panel object to color the exon with a '.' in red and the others in blue. Many thanks all Marco Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From mblanche at berkeley.edu Sat Apr 8 12:21:59 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Sat, 08 Apr 2006 09:21:59 -0700 Subject: [Bioperl-l] Display of gff annotation In-Reply-To: <1144464582.28535.220.camel@localhost.localdomain> Message-ID: Many Thanks Scott, Everything works perfectly [although I had to change the $feature->attributes('note') to $feature->get_tag_values('note')]. My new goal now is to add, under the gene structure, binding sites hits in the form of a diamond glyph with their color representing the score of the binding site. I am trying to use: my ($hits, $min, $max) = hits2feat ($hits_File); $panel->add_track( diamond => $hits, -height => 6, ############ #-bgcolor => sub{ my $feature = shift; # should be able to use # value of -score to set # the color value # ############ } ); Where the subroutine hits2feat() return a Bio::SeqFeature::Generic object with the start, end and score of the feature set to the different binding site hits and score. In addition it returns the smallest and largest score values of the different hits. My goal would be to set, probably using a callback, the bgcolor accordingly to the score value in a green to red gradient where green represent the $min and red the $max value. Many thanks. Marco. On 4/7/06 7:49 PM, "Scott Cain" wrote: > Hi Marco, > > In answer to your questions: > > 1. You might find it easier to use Bio::DB::GFF with you gff file; you > can use the 'memory' adaptor to have it use a file instead of a 'real' > database. It will give you Bio::DB::GFF::Feature objects which you can > use in the same was as your Bio::SeqFeature::Generic objects. > > 2. To color your exons differently, you can use a callback in the > bgcolor attribute: > > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > -bgcolor => sub { > my $feature = shift; > my ($note) = $feature->attributes('note'); > if ($note =~ /\./) { > return 'red'; > } > else { > return 'blue'; > } > } > ); > > My only comment is that your GFF is a little funny looking: the "( -" > in the source column might result in some strange behavior. > > Scott > > > On Fri, 2006-04-07 at 18:54 -0700, Marco Blanchette wrote: >> Dear all-- >> >> I have a gff annotation in the form of: >> >> ##gff-version 2 >> # seqname source feature start end score strand frame >> attributes >> AF260530 ( - exon 262 693 . + . note "1" >> AF260530 ( - exon 4450 4630 . + . note "2" >> AF260530 ( - exon 13432 13776 . + . note "3" >> AF260530 ( - exon 15198 15359 . + . note "4.1" >> AF260530 ( - exon 15537 15698 . + . note "4.2" >> AF260530 ( - exon 16060 16221 . + . note "4.3" >> AF260530 ( - exon 16682 16852 . + . note "4.4" >> AF260530 ( - exon 16985 17146 . + . note "4.5" >> AF260530 ( - exon 16985 17146 . + . note "5" >> >> >> Where the value of the tag 'note' is the exon name. all exons with a decimal >> annotation are alternatively spliced exon. My goal would be to display the >> alternative exons in a different color (let say red) than the constitutive >> exons (let say blue). >> >> I have this script: >> #!/usr/bin/perl >> >> use strict; >> use Bio::Tools::GFF; >> use Bio::Graphics; >> use Bio::SeqFeature::Generic; >> >> >> my $gffio = Bio::Tools::GFF->new(-file => $ARGV[0], >> -gff_version => 2, >> ); >> >> my $feat_o = Bio::SeqFeature::Generic->new(-name => 'aGene'); >> >> while (my $feature = $gffio->next_feature) { >> $feat_o->add_SeqFeature($feature,'EXPAND'); >> } >> >> my $panel = Bio::Graphics::Panel->new( >> -length => $feat_o->end, >> -width => 1500, >> -pad_left => 10, >> -pad_right => 10, >> ); >> >> my $full_length = Bio::SeqFeature::Generic->new(-start=>1, >> -end=>$feat_o->end); >> >> $panel->add_track($full_length, >> -glyph => 'arrow', >> -tick => 2, >> -fgcolor => 'black', >> -double => 1, >> ); >> >> $panel->add_track($feat_o, >> -glyph=> 'segments'); >> print $panel->png >> >> >> That does almost what I need. My questions are: >> 1) is there a better way to transform the GFF file into a SeqFeature object >> than the while loop that I am using >> >> 2) How could I tell the Bio::Graphics::Panel object to color the exon with a >> '.' in red and the others in blue. >> >> Many thanks all >> >> Marco >> >> >> >> Marco Blanchette, Ph.D. >> >> mblanche at berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Marco Blanchette, Ph.D. mblanche at berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 From iain.m.wallace at gmail.com Sat Apr 8 12:29:08 2006 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Sat, 8 Apr 2006 17:29:08 +0100 Subject: [Bioperl-l] SimpleAlign Question Message-ID: <8cff3eb80604080929o5e2badf8kf31e3be48e87ea62@mail.gmail.com> Hi all, I am trying to extract slices out of an alignment, but sometimes the slice I am trying to take contains a sequence that has no residues. I would still like to add this sequence, unfortunately I am getting this error ------------- EXCEPTION ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK Bio::PrimarySeq::_guess_alphabet /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:837 STACK Bio::PrimarySeq::seq /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:279 STACK Bio::SimpleAlign::slice /usr/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm:794 Is there a simple way I can override this behaviour, which I think is just validating the sequence before it adds it to the alignment. The code I am usinge is below followed by a test alignment, Thanks for any help Iain --script use strict; use Bio::AlignIO; #read in the initial alignment from the command line. my $in = $ARGV[0]; my $alignio = Bio::AlignIO->new(-file=> $in); my $out = Bio::AlignIO->new(-file=> ">test.fasta"); my $aln = $alignio->next_aln; my $new_aln = new Bio::SimpleAlign; $new_aln=$aln->slice(1,10); $out->write_aln($new_aln); --sample file >1huma/1-7 APMGSDP--- >1b3aa/1-5 -PYSS-D--- >2eot/1-5 -----GPASV >1rhpa/1-1 ---------D >1mgsa/1-6 ----ASVATE >1mi2a/1-6 ----AVVASE >1roda/1-4 ------SAKE >1il8a/1-3 -------AKE >1sdf/1-6 ----KPVSLS From staffa at niehs.nih.gov Fri Apr 7 14:38:52 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 7 Apr 2006 14:38:52 -0400 Subject: [Bioperl-l] Bio::Restriction::Enzyme exception due to Bio::PrimarySeq Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08635@NIHCESMLBX6.nih.gov> Maybe this message won't have the suspicious header that got it bounced. Well! What to do when hit with this sort of message? I am not working with any circular sequences. line 46 is my @frags=$ra->fragments($enz); ------------- EXCEPTION ------------- MSG: Abstract method "Bio::PrimarySeqI::is_circular" is not implemented by package Bio::PrimarySeq::Fasta. This is not your fault - author of Bio::PrimarySeq::Fasta should be blamed! STACK Bio::Root::RootI::throw_not_implemented /usr/lib/perl5/site_perl/5.8.5/Bio/Root/RootI.pm:523 STACK Bio::PrimarySeqI::is_circular /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeqI.pm:671 STACK Bio::Restriction::Analysis::_cuts /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:807 STACK Bio::Restriction::Analysis::cut /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:370 STACK Bio::Restriction::Analysis::fragments /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:443 STACK toplevel TestNew.pl:46 -------------------------------------- <> Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov )) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina -------------- next part -------------- A non-text attachment was scrubbed... Name: TestNew.pl Type: application/octet-stream Size: 1284 bytes Desc: TestNew.pl Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060407/f1af0cfd/attachment.obj From e.rapsomaniki at mail.cryst.bbk.ac.uk Sat Apr 8 09:08:00 2006 From: e.rapsomaniki at mail.cryst.bbk.ac.uk (e.rapsomaniki at mail.cryst.bbk.ac.uk) Date: Sat, 8 Apr 2006 14:08:00 +0100 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails Message-ID: <1144501680.4437b5b02dff4@webmail.cryst.bbk.ac.uk> Hi I am trying to retrieve coding sequences associated with RefSeq proteins. My code (below) works for non-refseq proteins (e.g BAB26271) but not for refseq (no sequence features are retrieved although I checked the web-page and a coded_by feature should be there). Any suggestions? I am using bioperl 1.4 Here's my code: use Bio::Seq; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::DB::RefSeq; my $gb = new Bio::DB::GenBank; my $gp = new Bio::DB::RefSeq; my $prot_obj = $gp->get_Seq_by_acc("NP_001008293"); return unless defined($prot_obj); # factory to turn strings into Bio::Location objects my $loc_factory = new Bio::Factory::FTLocationFactory; my $orf; my @f=$prot_obj->top_SeqFeatures(); print "@f\n"; #returns nothing foreach my $feat ( $prot_obj->top_SeqFeatures ) { print $feat->primary_tag, "\n"; if ( $feat->primary_tag eq 'CDS' ) { my @coded_by = $feat->each_tag_value('coded_by'); print @coded_by, "\n"; my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; #$nuc_acc=~ s/\..*//; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); return unless defined($nuc_obj); my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = new Bio::SeqFeature::Generic(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; $orf=$cds_obj->seq; } } print "$orf\n"; ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From cjfields at uiuc.edu Sun Apr 9 12:26:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 9 Apr 2006 11:26:38 -0500 Subject: [Bioperl-l] Bio::Restriction::Enzyme exception due toBio::PrimarySeq In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08635@NIHCESMLBX6.nih.gov> Message-ID: <000c01c65bf2$6094de20$15327e82@pyrimidine> Nick, I believe Brian fixed this in the last few days (after you posted this this first time). Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS) [C] > Sent: Friday, April 07, 2006 1:39 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Restriction::Enzyme exception due > toBio::PrimarySeq > > Maybe this message won't have the suspicious header that got it bounced. > > > Well! > What to do when hit with this sort of message? > I am not working with any circular sequences. > > line 46 is > my @frags=$ra->fragments($enz); > > > ------------- EXCEPTION ------------- > MSG: Abstract method "Bio::PrimarySeqI::is_circular" is not implemented by > package Bio::PrimarySeq::Fasta. > This is not your fault - author of Bio::PrimarySeq::Fasta should be > blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /usr/lib/perl5/site_perl/5.8.5/Bio/Root/RootI.pm:523 > STACK Bio::PrimarySeqI::is_circular > /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeqI.pm:671 > STACK Bio::Restriction::Analysis::_cuts > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:807 > STACK Bio::Restriction::Analysis::cut > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:370 > STACK Bio::Restriction::Analysis::fragments > /usr/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:443 > STACK toplevel TestNew.pl:46 > > -------------------------------------- > <> > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov )) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > From cjfields at uiuc.edu Sun Apr 9 13:22:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 9 Apr 2006 12:22:49 -0500 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <1144501680.4437b5b02dff4@webmail.cryst.bbk.ac.uk> Message-ID: <000d01c65bfa$3b31f840$15327e82@pyrimidine> That's because Bio::DB::RefSeq retrieves the data from EBI, not from NCBI. The EBI version has no seq. features but the NCBI version does (I believe NCBI adds them provisionally). Retrieving via Bio::DB::GenBank is the way to go, but you must add the -no_redirect flag to prevent Bio::DB::GenBank from redirecting to Bio::DB::RefSeq: my $factory = Bio::DB::GenBank->new(-no_redirect => 1); my $seq = $factory->get_Sec_by_acc(' NM_001008292'); As a warning, RefSeqs are nonstandard GenBank, but I believe I have parsed them for seq features before w/o problems. From perldoc for Bio::DB::RefSeq: DESCRIPTION Allows the dynamic retrieval of sequence objects Bio::Seq from the RefSeq database using the dbfetch script at EBI: . In order to make changes transparent we have host type (currently only ebi) and location (defaults to ebi) separated out. This allows later additions of more servers in different geographical locations. The functionality of this module is inherited from Bio::DB::DBFetch which implements Bio::DB::WebDBSeqI. This module retrieves entries from EBI although it retrives database entries produced at NCBI. When read into bioperl objects, the parser for GenBank format it used. RefSeq is a NONSTANDARD GenBank file so be ready for surprises. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > e.rapsomaniki at mail.cryst.bbk.ac.uk > Sent: Saturday, April 08, 2006 8:08 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails > > > Hi > > I am trying to retrieve coding sequences associated with RefSeq proteins. > My > code (below) works for non-refseq proteins (e.g BAB26271) but not for > refseq > (no sequence > features are retrieved although I checked the web-page and a coded_by > feature > should be there). Any suggestions? I am using bioperl 1.4 > > Here's my code: > use Bio::Seq; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::DB::RefSeq; > my $gb = new Bio::DB::GenBank; > my $gp = new Bio::DB::RefSeq; > my $prot_obj = $gp->get_Seq_by_acc("NP_001008293"); > return unless defined($prot_obj); > > # factory to turn strings into Bio::Location objects > my $loc_factory = new Bio::Factory::FTLocationFactory; > my $orf; > > my @f=$prot_obj->top_SeqFeatures(); > print "@f\n"; #returns nothing > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > print $feat->primary_tag, "\n"; > if ( $feat->primary_tag eq 'CDS' ) { > > my @coded_by = $feat->each_tag_value('coded_by'); > print @coded_by, "\n"; > my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; > #$nuc_acc=~ s/\..*//; > my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); > return unless defined($nuc_obj); > my $loc_object = $loc_factory->from_string($loc_str); > # create a Feature object by using a Location > my $feat_obj = new Bio::SeqFeature::Generic(-location > =>$loc_object); > # associate the Feature object with the nucleotide Seq object > $nuc_obj->add_SeqFeature($feat_obj); > my $cds_obj = $feat_obj->spliced_seq; > $orf=$cds_obj->seq; > } > } > print "$orf\n"; > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sun Apr 9 13:43:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 9 Apr 2006 12:43:32 -0500 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <000d01c65bfa$3b31f840$15327e82@pyrimidine> Message-ID: <000001c65bfd$1ed2e8f0$15327e82@pyrimidine> Forgot to add: I'm not sure when -no_redirect was added, but if this doesn't work you might consider upgrading to 1.5.1 or CVS if possible; 1.5.1 is stable considering it is a developer's release and CVS versions are also pretty stable. v. 1.4 is ~2 years old. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Sunday, April 09, 2006 12:23 PM > To: e.rapsomaniki at mail.cryst.bbk.ac.uk; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins > fails > > That's because Bio::DB::RefSeq retrieves the data from EBI, not from NCBI. > The EBI version has no seq. features but the NCBI version does (I believe > NCBI adds them provisionally). Retrieving via Bio::DB::GenBank is the way > to go, but you must add the -no_redirect flag to prevent Bio::DB::GenBank > from redirecting to Bio::DB::RefSeq: > > my $factory = Bio::DB::GenBank->new(-no_redirect => 1); > my $seq = $factory->get_Sec_by_acc(' NM_001008292'); > > As a warning, RefSeqs are nonstandard GenBank, but I believe I have parsed > them for seq features before w/o problems. From perldoc for > Bio::DB::RefSeq: > > DESCRIPTION > Allows the dynamic retrieval of sequence objects Bio::Seq from the > RefSeq database using the dbfetch script at EBI: > . > > In order to make changes transparent we have host type (currently only > ebi) and location (defaults to ebi) separated out. This allows later > additions of more servers in different geographical locations. > > The functionality of this module is inherited from Bio::DB::DBFetch > which implements Bio::DB::WebDBSeqI. > > This module retrieves entries from EBI although it retrives database > entries produced at NCBI. When read into bioperl objects, the parser > for > GenBank format it used. RefSeq is a NONSTANDARD GenBank file so be > ready > for surprises. > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of > > e.rapsomaniki at mail.cryst.bbk.ac.uk > > Sent: Saturday, April 08, 2006 8:08 AM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins > fails > > > > > > Hi > > > > I am trying to retrieve coding sequences associated with RefSeq > proteins. > > My > > code (below) works for non-refseq proteins (e.g BAB26271) but not for > > refseq > > (no sequence > > features are retrieved although I checked the web-page and a coded_by > > feature > > should be there). Any suggestions? I am using bioperl 1.4 > > > > Here's my code: > > use Bio::Seq; > > use Bio::DB::GenPept; > > use Bio::DB::GenBank; > > use Bio::DB::RefSeq; > > my $gb = new Bio::DB::GenBank; > > my $gp = new Bio::DB::RefSeq; > > my $prot_obj = $gp->get_Seq_by_acc("NP_001008293"); > > return unless defined($prot_obj); > > > > # factory to turn strings into Bio::Location objects > > my $loc_factory = new Bio::Factory::FTLocationFactory; > > my $orf; > > > > my @f=$prot_obj->top_SeqFeatures(); > > print "@f\n"; #returns nothing > > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > > print $feat->primary_tag, "\n"; > > if ( $feat->primary_tag eq 'CDS' ) { > > > > my @coded_by = $feat->each_tag_value('coded_by'); > > print @coded_by, "\n"; > > my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; > > #$nuc_acc=~ s/\..*//; > > my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); > > return unless defined($nuc_obj); > > my $loc_object = $loc_factory->from_string($loc_str); > > # create a Feature object by using a Location > > my $feat_obj = new Bio::SeqFeature::Generic(-location > > =>$loc_object); > > # associate the Feature object with the nucleotide Seq object > > $nuc_obj->add_SeqFeature($feat_obj); > > my $cds_obj = $feat_obj->spliced_seq; > > $orf=$cds_obj->seq; > > } > > } > > print "$orf\n"; > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Sun Apr 9 16:37:36 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 9 Apr 2006 16:37:36 -0400 Subject: [Bioperl-l] GFF3 Validator Message-ID: <200604091637.37674.lstein@cshl.edu> Hi All, I just wanted to draw your attention to a GFF3 validator put up by the NIAID BRC program. It checks that the GFF3 file is well formed and that only SO terms are used in the feature type column. It doesn't check that parent/child relationships follow the spec, but it is a real good start. The documentation: http://iowg.brcdevel.org/gff3.html Standalone script: http://iowg.brcdevel.org/gff3validator/ Web-based script: http://www.tigr.org/tigr-scripts/prok_manatee/brc-central/gff_validation.cgi Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cain at cshl.edu Fri Apr 7 22:49:42 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 07 Apr 2006 22:49:42 -0400 Subject: [Bioperl-l] Display of gff annotation In-Reply-To: References: Message-ID: <1144464582.28535.220.camel@localhost.localdomain> Hi Marco, In answer to your questions: 1. You might find it easier to use Bio::DB::GFF with you gff file; you can use the 'memory' adaptor to have it use a file instead of a 'real' database. It will give you Bio::DB::GFF::Feature objects which you can use in the same was as your Bio::SeqFeature::Generic objects. 2. To color your exons differently, you can use a callback in the bgcolor attribute: $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -bgcolor => sub { my $feature = shift; my ($note) = $feature->attributes('note'); if ($note =~ /\./) { return 'red'; } else { return 'blue'; } } ); My only comment is that your GFF is a little funny looking: the "( -" in the source column might result in some strange behavior. Scott On Fri, 2006-04-07 at 18:54 -0700, Marco Blanchette wrote: > Dear all-- > > I have a gff annotation in the form of: > > ##gff-version 2 > # seqname source feature start end score strand frame > attributes > AF260530 ( - exon 262 693 . + . note "1" > AF260530 ( - exon 4450 4630 . + . note "2" > AF260530 ( - exon 13432 13776 . + . note "3" > AF260530 ( - exon 15198 15359 . + . note "4.1" > AF260530 ( - exon 15537 15698 . + . note "4.2" > AF260530 ( - exon 16060 16221 . + . note "4.3" > AF260530 ( - exon 16682 16852 . + . note "4.4" > AF260530 ( - exon 16985 17146 . + . note "4.5" > AF260530 ( - exon 16985 17146 . + . note "5" > > > Where the value of the tag 'note' is the exon name. all exons with a decimal > annotation are alternatively spliced exon. My goal would be to display the > alternative exons in a different color (let say red) than the constitutive > exons (let say blue). > > I have this script: > #!/usr/bin/perl > > use strict; > use Bio::Tools::GFF; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > > my $gffio = Bio::Tools::GFF->new(-file => $ARGV[0], > -gff_version => 2, > ); > > my $feat_o = Bio::SeqFeature::Generic->new(-name => 'aGene'); > > while (my $feature = $gffio->next_feature) { > $feat_o->add_SeqFeature($feature,'EXPAND'); > } > > my $panel = Bio::Graphics::Panel->new( > -length => $feat_o->end, > -width => 1500, > -pad_left => 10, > -pad_right => 10, > ); > > my $full_length = Bio::SeqFeature::Generic->new(-start=>1, > -end=>$feat_o->end); > > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > ); > > $panel->add_track($feat_o, > -glyph=> 'segments'); > print $panel->png > > > That does almost what I need. My questions are: > 1) is there a better way to transform the GFF file into a SeqFeature object > than the while loop that I am using > > 2) How could I tell the Bio::Graphics::Panel object to color the exon with a > '.' in red and the others in blue. > > Many thanks all > > Marco > > > > Marco Blanchette, Ph.D. > > mblanche at berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From kevinvicy at gmail.com Sun Apr 9 18:59:47 2006 From: kevinvicy at gmail.com (Kevin Victor) Date: Sun, 9 Apr 2006 17:59:47 -0500 Subject: [Bioperl-l] primer pair sequence search Message-ID: Hi, I am interested to know if there is any optimal way to search for primer pair sequences in a given set of longer sequences. I would like to do this in batch mode. This would be more like searching for substrings in a string where the substrings we are searching are several thousands and are known. Any suggestions would be very helpful. Thanks, Kevin From dalke at dalkescientific.com Mon Apr 10 01:33:31 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 9 Apr 2006 23:33:31 -0600 Subject: [Bioperl-l] GFF3 Validator In-Reply-To: <200604091637.37674.lstein@cshl.edu> References: <200604091637.37674.lstein@cshl.edu> Message-ID: <020b1642e641e08b241171a92f87bc96@dalkescientific.com> Lincoln: > I just wanted to draw your attention to a GFF3 validator put up by the > NIAID > BRC program. It checks that the GFF3 file is well formed and that only > SO > terms are used in the feature type column. It doesn't check that > parent/child > relationships follow the spec, but it is a real good start. > > The documentation: http://iowg.brcdevel.org/gff3.html > Standalone script: http://iowg.brcdevel.org/gff3validator/ > Web-based script: > http://www.tigr.org/tigr-scripts/prok_manatee/brc-central/ > gff_validation.cgi As part of the DAS2 project a couple weeks ago I wrote a basic GFF3 parser in Python. If anyone is interested, I've put a snapshot at http://www.dalkescientific.com/PyGFF3-0.5.tar.gz There's a few performance tricks which might be translatable to the bioperl or gff3validator codes. Mostly just tricks to optimize for the common case. I bring it up to point out I've implemented but not rigorously tested the validation code to ensure that features have no cycles. Once I find all of the features in a set I do a topological sort on them. If no topological sort is possible, there's a cycle and hence an error. The code for toposort.py is very simple, with about 15 lines for the initialization and another 15 for the recursive part. It should be easy to convert just that part without having to go through the rest of the Python code. Please note that so far I have only parsed two gff files with this parser. It is incomplete and not tested. I wanted to get an idea of how hard is to to handle GFF-like complex features. Hard enough that I'm going to propose a change to DAS2 to make it easier. Andrew dalke at dalkescientific.com From akarger at CGR.Harvard.edu Mon Apr 10 13:06:32 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 10 Apr 2006 13:06:32 -0400 Subject: [Bioperl-l] Getting sequences by ID Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060410/a9cade68/attachment.pl From donald.jackson at bms.com Mon Apr 10 16:36:16 2006 From: donald.jackson at bms.com (Donald Jackson) Date: Mon, 10 Apr 2006 16:36:16 -0400 Subject: [Bioperl-l] primer pair sequence search In-Reply-To: References: Message-ID: <443AC1C0.6090908@bms.com> Take a look at NCBI's E-PCR program. It can test PCR primers against a library of genomic or mRNA sequences and identify all predicted products within a given size range. It has a web interface or can be run locally. Don Jackson Kevin Victor wrote: >Hi, I am interested to know if there is any optimal way to search for primer >pair sequences in a given set of longer sequences. I would like to do this >in batch mode. This would be more like searching for substrings in a string >where the substrings we are searching are several thousands and are known. >Any suggestions would be very helpful. > >Thanks, >Kevin > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From osborne1 at optonline.net Mon Apr 10 23:32:59 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 10 Apr 2006 23:32:59 -0400 Subject: [Bioperl-l] SimpleAlign Question In-Reply-To: <8cff3eb80604080929o5e2badf8kf31e3be48e87ea62@mail.gmail.com> Message-ID: Iain, You can pass a third argument to slice(), it tells the method to keep sequences with no valid characters, e.g.: $aln2 = $aln1->slice(22,33,1); I'll document this a bit more... Brian O. On 4/8/06 12:29 PM, "Iain Wallace" wrote: > Hi all, > > I am trying to extract slices out of an alignment, but sometimes the > slice I am trying to take contains a sequence that has no residues. I > would still like to add this sequence, unfortunately I am getting this > error > > ------------- EXCEPTION ------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > STACK Bio::PrimarySeq::_guess_alphabet > /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:837 > STACK Bio::PrimarySeq::seq > /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:279 > STACK Bio::SimpleAlign::slice > /usr/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm:794 > > Is there a simple way I can override this behaviour, which I think is > just validating the sequence before it adds it to the alignment. > > The code I am usinge is below followed by a test alignment, > > Thanks for any help > > Iain > --script > use strict; > > use Bio::AlignIO; > #read in the initial alignment from the command line. > > my $in = $ARGV[0]; > > my $alignio = Bio::AlignIO->new(-file=> $in); > my $out = Bio::AlignIO->new(-file=> ">test.fasta"); > > my $aln = $alignio->next_aln; > my $new_aln = new Bio::SimpleAlign; > $new_aln=$aln->slice(1,10); > $out->write_aln($new_aln); > > --sample file > >> 1huma/1-7 > APMGSDP--- >> 1b3aa/1-5 > -PYSS-D--- >> 2eot/1-5 > -----GPASV >> 1rhpa/1-1 > ---------D >> 1mgsa/1-6 > ----ASVATE >> 1mi2a/1-6 > ----AVVASE >> 1roda/1-4 > ------SAKE >> 1il8a/1-3 > -------AKE >> 1sdf/1-6 > ----KPVSLS > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 11 09:52:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Apr 2006 08:52:45 -0500 Subject: [Bioperl-l] BioPerl Mailing List Summaries Message-ID: <000601c65d6f$361ed0a0$15327e82@pyrimidine> The first of a biweekly summary of BioPerl mailing list summaries has been posted to the wiki: http://www.bioperl.org/wiki/BioPerl_Mailing_List_Summaries_for_April_1-11 These will likely be archived on the wiki but may be moved to a more suitable location in the future (maybe to the news blog?). Any suggestions are welcome. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Tue Apr 11 10:51:26 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 11 Apr 2006 10:51:26 -0400 Subject: [Bioperl-l] BioPerl Mailing List Summaries In-Reply-To: <000601c65d6f$361ed0a0$15327e82@pyrimidine> Message-ID: Chris, That's very nice. It's always the case that there are active threads whose details aren't always clear to me, having them summarized is very useful. Thanks again, Brian O. On 4/11/06 9:52 AM, "Chris Fields" wrote: > The first of a biweekly summary of BioPerl mailing list summaries has been > posted to the wiki: > > http://www.bioperl.org/wiki/BioPerl_Mailing_List_Summaries_for_April_1-11 > > These will likely be archived on the wiki but may be moved to a more > suitable location in the future (maybe to the news blog?). Any suggestions > are welcome. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.rapsomaniki at mail.cryst.bbk.ac.uk Tue Apr 11 03:30:38 2006 From: e.rapsomaniki at mail.cryst.bbk.ac.uk (Eleni Rapsomaniki) Date: Tue, 11 Apr 2006 08:30:38 +0100 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <000001c65bfd$1ed2e8f0$15327e82@pyrimidine> References: <000001c65bfd$1ed2e8f0$15327e82@pyrimidine> Message-ID: <1144740638.443b5b1ee78c9@webmail.cryst.bbk.ac.uk> Hi again, Thank you for your reply regarding the retrieval of sequence features for Refseq proteins. I have updated to bioperl 1.5 but the re-direct still does not work and the following code: my $factory = Bio::DB::GenBank->new(-no_redirect => 1); my $seq = $factory->get_Seq_by_acc(' NM_001008292'); returns: MSG: [gb| NM_001008292] is not a normal sequence entry but a RefSeq entry. Redirecting the request. -------------------- WARNING --------------------- MSG: acc (gb| NM_001008292) does not exist --------------------------------------------------- I get the same error when I replace Bio::DB::GenBank with Bio::DB::RefSeq, and with many other refseq accessions. Could I be doing something wrong or maybe I should be using a different method for parsing refseq records? I suppose the alternative is to create objects from files after fetching them.. Many thanks Eleni Rapsomaniki ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From cjfields at uiuc.edu Tue Apr 11 18:06:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Apr 2006 17:06:57 -0500 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <1144740638.443b5b1ee78c9@webmail.cryst.bbk.ac.uk> Message-ID: <000801c65db4$400d55b0$15327e82@pyrimidine> I'll look into it tomorrow, but you might try updating to 1.5.1 or CVS to see if it helps. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Eleni Rapsomaniki > Sent: Tuesday, April 11, 2006 2:31 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins > fails > > Hi again, > > Thank you for your reply regarding the retrieval of sequence features for > Refseq proteins. I have updated to bioperl 1.5 but the re-direct still > does not > work and the following code: > > my $factory = Bio::DB::GenBank->new(-no_redirect => 1); > my $seq = $factory->get_Seq_by_acc(' NM_001008292'); > > returns: > MSG: [gb| NM_001008292] is not a normal sequence entry but a RefSeq entry. > Redirecting the request. > > -------------------- WARNING --------------------- > MSG: acc (gb| NM_001008292) does not exist > --------------------------------------------------- > > I get the same error when I replace Bio::DB::GenBank with Bio::DB::RefSeq, > and > with many other refseq accessions. > > Could I be doing something wrong or maybe I should be using a different > method > for parsing refseq records? I suppose the alternative is to create objects > from > files after fetching them.. > > Many thanks > Eleni Rapsomaniki > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Tue Apr 11 18:14:52 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 11 Apr 2006 18:14:52 -0400 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <1144740638.443b5b1ee78c9@webmail.cryst.bbk.ac.uk> Message-ID: Eleni, I can retrieve NM_001008292, no problem: ~>perl -e 'use Bio::DB::GenBank; $db = Bio::DB::GenBank->new(-no_redirect => 1); $seq = $db->get_Seq_by_acc('NM_001008292');print $seq->seq;' CCGGCGTCCGTCCGCCGCGGGATGGCGGCTCTGAGGAGTTGGATGACTCGGAGCGTAACCTTTCTGTTCAGGTACG GACAGCGTTTTCCTGTCTCTGCTAACTCTAAGAAACGCTGTTTCTCAGAATTGATAAGACCATGGCACAAAACTAT GTTGACTGGATTCGGTGTGACACTGTGCGCCGTTCCTATTGCTCAGAAATCAGAGCCTCAGTCTCTCAGCAACGAA GCACTGATGAGGAGGGCTGTGTCTTTGGTAACGGACAGCACCTCTACCTTTCTGTCTCAAACCACGTACGCACTGA TCGAAGCCATCACCGAGTATACTAAGGCTGTCTACACGTTAGTGTCTCTGTACCGACAGTATACAAGCTTGCTGGG GAAGATGAATTCCCAGGAGGAGGATGAGGTGTGGCAGGTGATCATAGGAGCCAGAGTTGAGATGACTTCAAAGCAG CAGGAGTACTTGAAGCTGGAGACCACGTGGATGACAGCAGTCAGCCTTTCAGAGATGGCTGCGGAGGCTGCCTATC AGACTGGAGCAGATCAGGCCTCTATAACCGCCAGGAATCACATCCAGTTGGTGAAGTCGCAGGTGCAGGAGGTGCG CCAGCTCTCCCAGAAGGCGGAAACCAAGCTGGCAGAAGTGCAGACACAAGAGCTGCGCCAGAAAACACAGGAAGCG AGCGATGAGGCAGCGGACCAGGAAGAGGAGGCCTACCTACGAGAAGATTGAGGGCTCGAGCCCAGTGCCCTGTCCA TCCACTCTGTGGGGAAAGCAGGTGCATGACATCCACCCAGTGACGTCCCAACTGCAGAAGCTGACCGGTTCTGCCA TTGACAGTCAGACCAGAGCCTTTGCGAGCTGCGCCTGGCCCCTTCTCTCTGCTCGCAGCCTCTCTGGGCCTGGCTC TGCACTGTCCCTCACACAGCTTCTCTCTTGATCTTTTACCTCACTCCCAAAGCACTTCACCAACCGGGGCCAATGG AGGAGGGGCCTTTTCTGCCACACCCTTAAGTTCAATAGCTGTTTAACTCCAGTTTTTACTGTTACTCAGATGTTCA AGTATGAATTACTGCTTGCTCCTCCACAGGGAAGCTTGTCTGGTTTGTAACATTTCTTTGTGTTTATAATGTCCTT TCTCCCTGTGAGCACAGCTCAGCTAAGGCGTTACTCAGTGTGAACAGTTCCCCTGGTGCTCCCCACAGCACCTTCT CCAACACGTGCTTCTTGTTCGTTCCTTTTTTGAATTTCTCTGCTGTATCCAAAGGAAGAGAAGTTTGGTGTTTGCA TTAAAAAAAAAAAAAAAGTCAAAAAAAAAAAAAAAAAAAAA Could it be the ' ' before the string 'NM_001008292' that's causing the problem? Brian O. On 4/11/06 3:30 AM, "Eleni Rapsomaniki" wrote: > Hi again, > > Thank you for your reply regarding the retrieval of sequence features for > Refseq proteins. I have updated to bioperl 1.5 but the re-direct still does > not > work and the following code: > > my $factory = Bio::DB::GenBank->new(-no_redirect => 1); > my $seq = $factory->get_Seq_by_acc(' NM_001008292'); > > returns: > MSG: [gb| NM_001008292] is not a normal sequence entry but a RefSeq entry. > Redirecting the request. > > -------------------- WARNING --------------------- > MSG: acc (gb| NM_001008292) does not exist > --------------------------------------------------- > > I get the same error when I replace Bio::DB::GenBank with Bio::DB::RefSeq, and > with many other refseq accessions. > > Could I be doing something wrong or maybe I should be using a different method > for parsing refseq records? I suppose the alternative is to create objects > from > files after fetching them.. > > Many thanks > Eleni Rapsomaniki > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Apr 11 22:27:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Apr 2006 21:27:58 -0500 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: References: Message-ID: I got this to work as well using Mac OS X. Looking at CVS, I noticed that 'no_redirect' was added about three months ago by Brian, so you definitely need to update to CVS (NOT v. 1.5.1). Chris On Apr 11, 2006, at 5:14 PM, Brian Osborne wrote: > Eleni, > > I can retrieve NM_001008292, no problem: > > ~>perl -e 'use Bio::DB::GenBank; $db = Bio::DB::GenBank->new(- > no_redirect > => 1); $seq = $db->get_Seq_by_acc('NM_001008292');print $seq->seq;' > CCGGCGTCCGTCCGCCGCGGGATGGCGGCTCTGAGGAGTTGGATGACTCGGAGCGTAACCTTTCTGTTCA > GGTACG > GACAGCGTTTTCCTGTCTCTGCTAACTCTAAGAAACGCTGTTTCTCAGAATTGATAAGACCATGGCACAA > AACTAT > GTTGACTGGATTCGGTGTGACACTGTGCGCCGTTCCTATTGCTCAGAAATCAGAGCCTCAGTCTCTCAGC > AACGAA > GCACTGATGAGGAGGGCTGTGTCTTTGGTAACGGACAGCACCTCTACCTTTCTGTCTCAAACCACGTACG > CACTGA > TCGAAGCCATCACCGAGTATACTAAGGCTGTCTACACGTTAGTGTCTCTGTACCGACAGTATACAAGCTT > GCTGGG > GAAGATGAATTCCCAGGAGGAGGATGAGGTGTGGCAGGTGATCATAGGAGCCAGAGTTGAGATGACTTCA > AAGCAG > CAGGAGTACTTGAAGCTGGAGACCACGTGGATGACAGCAGTCAGCCTTTCAGAGATGGCTGCGGAGGCTG > CCTATC > AGACTGGAGCAGATCAGGCCTCTATAACCGCCAGGAATCACATCCAGTTGGTGAAGTCGCAGGTGCAGGA > GGTGCG > CCAGCTCTCCCAGAAGGCGGAAACCAAGCTGGCAGAAGTGCAGACACAAGAGCTGCGCCAGAAAACACAG > GAAGCG > AGCGATGAGGCAGCGGACCAGGAAGAGGAGGCCTACCTACGAGAAGATTGAGGGCTCGAGCCCAGTGCCC > TGTCCA > TCCACTCTGTGGGGAAAGCAGGTGCATGACATCCACCCAGTGACGTCCCAACTGCAGAAGCTGACCGGTT > CTGCCA > TTGACAGTCAGACCAGAGCCTTTGCGAGCTGCGCCTGGCCCCTTCTCTCTGCTCGCAGCCTCTCTGGGCC > TGGCTC > TGCACTGTCCCTCACACAGCTTCTCTCTTGATCTTTTACCTCACTCCCAAAGCACTTCACCAACCGGGGC > CAATGG > AGGAGGGGCCTTTTCTGCCACACCCTTAAGTTCAATAGCTGTTTAACTCCAGTTTTTACTGTTACTCAGA > TGTTCA > AGTATGAATTACTGCTTGCTCCTCCACAGGGAAGCTTGTCTGGTTTGTAACATTTCTTTGTGTTTATAAT > GTCCTT > TCTCCCTGTGAGCACAGCTCAGCTAAGGCGTTACTCAGTGTGAACAGTTCCCCTGGTGCTCCCCACAGCA > CCTTCT > CCAACACGTGCTTCTTGTTCGTTCCTTTTTTGAATTTCTCTGCTGTATCCAAAGGAAGAGAAGTTTGGTG > TTTGCA > TTAAAAAAAAAAAAAAAGTCAAAAAAAAAAAAAAAAAAAAA > > Could it be the ' ' before the string 'NM_001008292' that's causing > the > problem? > > Brian O. > > > > On 4/11/06 3:30 AM, "Eleni Rapsomaniki" > > wrote: > >> Hi again, >> >> Thank you for your reply regarding the retrieval of sequence >> features for >> Refseq proteins. I have updated to bioperl 1.5 but the re-direct >> still does >> not >> work and the following code: >> >> my $factory = Bio::DB::GenBank->new(-no_redirect => 1); >> my $seq = $factory->get_Seq_by_acc(' NM_001008292'); >> >> returns: >> MSG: [gb| NM_001008292] is not a normal sequence entry but a >> RefSeq entry. >> Redirecting the request. >> >> -------------------- WARNING --------------------- >> MSG: acc (gb| NM_001008292) does not exist >> --------------------------------------------------- >> >> I get the same error when I replace Bio::DB::GenBank with >> Bio::DB::RefSeq, and >> with many other refseq accessions. >> >> Could I be doing something wrong or maybe I should be using a >> different method >> for parsing refseq records? I suppose the alternative is to create >> objects >> from >> files after fetching them.. >> >> Many thanks >> Eleni Rapsomaniki >> >> >> >> ---------------------------------------------------------------- >> This message was sent using IMP, the Internet Messaging Program. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Tue Apr 11 22:47:17 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 11 Apr 2006 22:47:17 -0400 Subject: [Bioperl-l] BioPerl Mailing List Summaries In-Reply-To: <000601c65d6f$361ed0a0$15327e82@pyrimidine> References: <000601c65d6f$361ed0a0$15327e82@pyrimidine> Message-ID: <4E4A73D2-3466-4A3E-A828-1C401897358E@gmx.net> Nice read indeed! On Apr 11, 2006, at 9:52 AM, Chris Fields wrote: > The first of a biweekly summary of BioPerl mailing list summaries > has been > posted to the wiki: > > http://www.bioperl.org/wiki/ > BioPerl_Mailing_List_Summaries_for_April_1-11 > > These will likely be archived on the wiki but may be moved to a more > suitable location in the future (maybe to the news blog?). Any > suggestions > are welcome. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From torsten.seemann at infotech.monash.edu.au Tue Apr 11 22:37:20 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 12 Apr 2006 12:37:20 +1000 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <1144740638.443b5b1ee78c9@webmail.cryst.bbk.ac.uk> References: <000001c65bfd$1ed2e8f0$15327e82@pyrimidine> <1144740638.443b5b1ee78c9@webmail.cryst.bbk.ac.uk> Message-ID: <443C67E0.9020404@infotech.monash.edu.au> Eleni, > my $factory = Bio::DB::GenBank->new(-no_redirect => 1); > my $seq = $factory->get_Seq_by_acc(' NM_001008292'); You have a space character before the NM_ ? Try removing it. > returns: > MSG: [gb| NM_001008292] is not a normal sequence entry but a RefSeq entry. > Redirecting the request. There's the space, right in the error message. Is that the problem? Brian's counter example didn't have the space. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From e.rapsomaniki at mail.cryst.bbk.ac.uk Wed Apr 12 04:09:57 2006 From: e.rapsomaniki at mail.cryst.bbk.ac.uk (Eleni Rapsomaniki) Date: Wed, 12 Apr 2006 09:09:57 +0100 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: References: Message-ID: <1144829397.443cb5d5047ea@webmail.cryst.bbk.ac.uk> Hi again, Thank you for your replies. It will sound crazy but the same code,ie. my $factory = Bio::DB::GenBank->new(-no_redirect => 1); print my $seq = $factory->get_Seq_by_acc('NM_001008292'); used to work 3-4 days ago, then stopped working 2 days ago (despite updating to 1.5) and today (no more updates or changes!) it works again! Could be because I never logged off in between the bioperl updates.. I'll try to persuade our administrator to update to CVs (life is harder without admin rights..) Thank you so much for your help. Eleni ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From osborne1 at optonline.net Wed Apr 12 10:08:59 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 12 Apr 2006 10:08:59 -0400 Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails In-Reply-To: <1144829397.443cb5d5047ea@webmail.cryst.bbk.ac.uk> Message-ID: Eleni, If you have sufficient space you can install Bioperl in your home directory, the necessary steps are explained in the INSTALL file ( http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL). Brian O. On 4/12/06 4:09 AM, "Eleni Rapsomaniki" wrote: > Hi again, > > Thank you for your replies. It will sound crazy but the same code,ie. > my $factory = Bio::DB::GenBank->new(-no_redirect => 1); > print my $seq = $factory->get_Seq_by_acc('NM_001008292'); > > used to work 3-4 days ago, then stopped working 2 days ago (despite updating > to > 1.5) and today (no more updates or changes!) it works again! Could be because > I > never logged off in between the bioperl updates.. > I'll try to persuade our administrator to update to CVs (life is harder > without > admin rights..) > > Thank you so much for your help. > Eleni > > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maximilianh at gmail.com Wed Apr 12 10:24:00 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Wed, 12 Apr 2006 16:24:00 +0200 Subject: [Bioperl-l] primer pair sequence search In-Reply-To: References: Message-ID: <76f031ae0604120724v5b61d8a3n85b2d3a1e0eb0264@mail.gmail.com> if you want to test primers against LARGE sequences: have a look at UCSC's in silico PCR program (isPcr), accessible from the browser. They explain in their faq how to download and compile the source files if you want to use it locally. Max On 10/04/06, Kevin Victor wrote: > Hi, I am interested to know if there is any optimal way to search for primer > pair sequences in a given set of longer sequences. I would like to do this > in batch mode. This would be more like searching for substrings in a string > where the substrings we are searching are several thousands and are known. > Any suggestions would be very helpful. > > Thanks, > Kevin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Maximilian Haeussler, CNRS Gif-sur-Yvette, Paris tel: +33 6 12 82 76 16 icq: 3825815 -- msn: maximilian.haeussler at hpi.uni-potsdam.de skype: maximilianhaeussler From cjm at fruitfly.org Thu Apr 13 23:42:55 2006 From: cjm at fruitfly.org (chris mungall) Date: Thu, 13 Apr 2006 20:42:55 -0700 Subject: [Bioperl-l] [SO-devel] GFF3 Validator In-Reply-To: References: <200604091637.37674.lstein@cshl.edu> Message-ID: <0C01EE11-79C5-4C0E-AC93-DC3124BACBA0@fruitfly.org> There's also a script in cvs/song/software/scripts/ validate_features.pl - it focuses purely on validating the parent/ child relationships w.r.t SO It uses Bio::Tools::GFF, it should probably be updated to use the latest bioperl code On Apr 13, 2006, at 8:32 PM, Suzanna Lewis wrote: > That is nice. Thanks for the pointer. > > -S > > On Apr 9, 2006, at 1:37 PM, Lincoln Stein wrote: > >> Hi All, >> >> I just wanted to draw your attention to a GFF3 validator put up by >> the NIAID >> BRC program. It checks that the GFF3 file is well formed and that >> only SO >> terms are used in the feature type column. It doesn't check that >> parent/child >> relationships follow the spec, but it is a real good start. >> >> The documentation: http://iowg.brcdevel.org/gff3.html >> Standalone script: http://iowg.brcdevel.org/gff3validator/ >> Web-based script: >> http://www.tigr.org/tigr-scripts/prok_manatee/brc-central/ >> gff_validation.cgi >> >> Lincoln >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting >> language >> that extends applications into web and mobile media. Attend the >> live webcast >> and join the prime developer group breaking into this new coding >> territory! >> http://sel.as-us.falkag.net/sel? >> cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> SOng-devel mailing list >> SOng-devel at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/song-devel > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > SOng-devel mailing list > SOng-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/song-devel From kaimueller at uni-bonn.de Thu Apr 13 20:38:00 2006 From: kaimueller at uni-bonn.de (Kai =?iso-8859-1?q?M=FCller?=) Date: Thu, 13 Apr 2006 20:38:00 -0400 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? Message-ID: <200604132038.01568.kaimueller@uni-bonn.de> hi, I'm very new to BioPerl and have a maybe silly question. when using Bio::AlignIO to load a set of sequences, the questionmarks are simply lost (they refer to missing characters as opposed to gap characters [-] or ambiguity [N]). I thought that 'missing_char()' might help, but it didn't (I probably used it the wrong way). when $filename contains sequences with ????, the following snippet would produce an alignment with ???? lost and downstream nucleotide just shifted and the resulting length differnces filled by '---' @ 3' end: my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => 'fasta'); my $aln = $aln_in->next_aln(); $aln->gap_char('-'); $aln->missing_char('?'); my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => 'clustalw'); $testout->write_aln($aln); Can somebody give me a hint here? thanks and all the best, Kai M?ller From suzi at fruitfly.org Thu Apr 13 23:32:27 2006 From: suzi at fruitfly.org (Suzanna Lewis) Date: Thu, 13 Apr 2006 20:32:27 -0700 Subject: [Bioperl-l] [SO-devel] GFF3 Validator In-Reply-To: <200604091637.37674.lstein@cshl.edu> References: <200604091637.37674.lstein@cshl.edu> Message-ID: That is nice. Thanks for the pointer. -S On Apr 9, 2006, at 1:37 PM, Lincoln Stein wrote: > Hi All, > > I just wanted to draw your attention to a GFF3 validator put up by the > NIAID > BRC program. It checks that the GFF3 file is well formed and that only > SO > terms are used in the feature type column. It doesn't check that > parent/child > relationships follow the spec, but it is a real good start. > > The documentation: http://iowg.brcdevel.org/gff3.html > Standalone script: http://iowg.brcdevel.org/gff3validator/ > Web-based script: > http://www.tigr.org/tigr-scripts/prok_manatee/brc-central/ > gff_validation.cgi > > Lincoln > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > SOng-devel mailing list > SOng-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/song-devel From avilella at gmail.com Fri Apr 14 10:01:01 2006 From: avilella at gmail.com (Albert Vilella) Date: Fri, 14 Apr 2006 15:01:01 +0100 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: <200604132038.01568.kaimueller@uni-bonn.de> References: <200604132038.01568.kaimueller@uni-bonn.de> Message-ID: <1145023261.12449.15.camel@localhost> It seems like missing_char is more for SimpleAlign than for AlignIO. So in case of fasta files with '?' chars, they will be ignores in line 113 of Bio::AlignIO::fasta.pm. So you can add '\?' in that line of fasta.pm. That will parse it correctly, although I am not sure whether fasta format should or shouldn't allow '?' chars in the file. Anyone? Cheers, Albert. On Thu, 2006-04-13 at 20:38 -0400, Kai M?ller wrote: > hi, > > I'm very new to BioPerl and have a maybe silly question. > when using Bio::AlignIO to load a set of sequences, the questionmarks are > simply lost (they refer to missing characters as opposed to gap characters > [-] or ambiguity [N]). I thought that 'missing_char()' might help, but it > didn't (I probably used it the wrong way). > > when $filename contains sequences with ????, the following snippet would > produce an alignment with ???? lost and downstream nucleotide just shifted > and the resulting length differnces filled by '---' @ 3' end: > > > my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => 'fasta'); > my $aln = $aln_in->next_aln(); > $aln->gap_char('-'); > $aln->missing_char('?'); > > my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => 'clustalw'); > $testout->write_aln($aln); > > > > Can somebody give me a hint here? > > thanks and all the best, > > Kai M?ller > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dnm_a at swbell.net Fri Apr 14 10:13:54 2006 From: dnm_a at swbell.net (David Messina) Date: Fri, 14 Apr 2006 09:13:54 -0500 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: <200604132038.01568.kaimueller@uni-bonn.de> References: <200604132038.01568.kaimueller@uni-bonn.de> Message-ID: Hi Kai, I'm by no means an expert with this module, but I'll take a shot. Running your code through a debugger, I'm seeing that Bio::AlignIO::fasta is gobbling the question marks: line 66: $MATCHPATTERN = '^A-Za-z\.\-'; and then where $entry contains a line of sequence from the input file line 118: $entry =~ s/[$MATCHPATTERN]//g; As far as I can tell, a question mark is not a valid character for the FASTA format (see http://en.wikipedia.org/wiki/FASTA_format) -- perhaps that's the reason Bio::AlignIO::fasta doesn't permit them? And then by the time missing_char() is applied, the question marks are already gone. What happens if you read in your sequence with question marks in a format that explicitly permits question marks? Dave On Apr 13, 2006, at 7:38 PM, Kai M?ller wrote: > hi, > > I'm very new to BioPerl and have a maybe silly question. > when using Bio::AlignIO to load a set of sequences, the > questionmarks are > simply lost (they refer to missing characters as opposed to gap > characters > [-] or ambiguity [N]). I thought that 'missing_char()' might help, > but it > didn't (I probably used it the wrong way). > > when $filename contains sequences with ????, the following snippet > would > produce an alignment with ???? lost and downstream nucleotide just > shifted > and the resulting length differnces filled by '---' @ 3' end: > > > my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => > 'fasta'); > my $aln = $aln_in->next_aln(); > $aln->gap_char('-'); > $aln->missing_char('?'); > > my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => > 'clustalw'); > $testout->write_aln($aln); > > > > Can somebody give me a hint here? > > thanks and all the best, > > Kai M?ller > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Fri Apr 14 01:14:25 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 14 Apr 2006 00:14:25 -0500 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: <200604132038.01568.kaimueller@uni-bonn.de> References: <200604132038.01568.kaimueller@uni-bonn.de> Message-ID: Hi Kai, I'm by no means an expert with this module, but I'll take a shot. Running your code through a debugger, I'm seeing that Bio::AlignIO::fasta is gobbling the question marks: line 66: $MATCHPATTERN = '^A-Za-z\.\-'; and then where $entry contains a line of sequence from the input file line 118: $entry =~ s/[$MATCHPATTERN]//g; As far as I can tell, a question mark is not a valid character for the FASTA format (see http://en.wikipedia.org/wiki/FASTA_format) -- perhaps that's the reason Bio::AlignIO::fasta doesn't permit them? And then by the time missing_char() is applied, the question marks are already gone. What happens if you read in your sequence with question marks in a format that explicitly permits question marks? Dave On Apr 13, 2006, at 7:38 PM, Kai M?ller wrote: > hi, > > I'm very new to BioPerl and have a maybe silly question. > when using Bio::AlignIO to load a set of sequences, the > questionmarks are > simply lost (they refer to missing characters as opposed to gap > characters > [-] or ambiguity [N]). I thought that 'missing_char()' might help, > but it > didn't (I probably used it the wrong way). > > when $filename contains sequences with ????, the following snippet > would > produce an alignment with ???? lost and downstream nucleotide just > shifted > and the resulting length differnces filled by '---' @ 3' end: > > > my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => > 'fasta'); > my $aln = $aln_in->next_aln(); > $aln->gap_char('-'); > $aln->missing_char('?'); > > my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => > 'clustalw'); > $testout->write_aln($aln); > > > > Can somebody give me a hint here? > > thanks and all the best, > > Kai M?ller > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From madhurima_b at persistent.co.in Fri Apr 14 08:41:46 2006 From: madhurima_b at persistent.co.in (madhurima bhattacharjee) Date: Fri, 14 Apr 2006 18:11:46 +0530 Subject: [Bioperl-l] problem with needle Message-ID: <443F988A.5090400@persistent.co.in> Hello, I am new ti BioPerl.I am using the pise module to perform needle allignment. I am using the following code: use Bio::Tools::Run::AnalysisFactory::Pise; my $factory = new Bio::Tools::Run::AnalysisFactory::Pise(); my $needle = $factory->program('needle'); my $job = $needle->run(-sequencea => $ARGV[0], -seqall => $ARGV[1], -gapopen => 5, -gapextend => 1); if ($job->error) { print ".............error: ",$job->error_message,".............\n"; exit; } print STDERR "jobid: ", $job->jobid, "\n"; print $job->content('needletest.needle'); But it gives me the following error: .............error: Bio::Tools::Run::PiseJob _submit: Can't connect to bioweb.pasteur.fr:80 (connect: timeout)............. Can anyone please point me to the problem in code.I am really stuck in this. Thanks and Regards, Madhurima. From cjfields at uiuc.edu Fri Apr 14 11:00:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 14 Apr 2006 10:00:55 -0500 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: <1145023261.12449.15.camel@localhost> Message-ID: <000a01c65fd4$3b4dae90$15327e82@pyrimidine> This has been fixed in CVS a week or two ago. It was reported as a bug before: http://bioperl.org/pipermail/bioperl-guts-l/2006-April/020836.html If you either install from CVS or replace your local Bio::AlignIO::fasta with the CVS version; it should work then. I don't adding '?' necessarily think it harms anything; the intent is to maintain the nucleotide positions in the alignment. If you used this alignment (from the bug above): >seq1 AAAAAAA?A?A >seq2 AAAAAAA?AAA >seq3 AAAAAAAAAAA >seq4 AAAAAAAAAAA then you got this when passing through AlignIO before the fix: >seq1 AAAAAAAAA-- >seq2 AAAAAAAAAA- >seq3 AAAAAAAAAAA >seq4 AAAAAAAAAAA which is not good since nucleotide positions in the alignment are modified and gaps are introduced where there shouldn't be any. After the fix you get what you expect (the same as input). A few proprietary sequence analysis programs use (or formerly used) '?' to mean 'N or gap' or 'N' and actively exported this symbol to other formats. I think older versions of Sequencher or Strider did this but I can't be certain; I've slept since then. Anyway, the intent of the user here is to maintain the integrity of the alignment, so regardless of what '?' means the nucleotide positions in the alignment are retained. OT here but another issue with Sequencher is that ':' is used for gaps as well, something that Bio::AlignIO::fasta doesn't address. I haven't tried this out to see what it does, but I wouldn't be surprised if it breaks again. My question is, how many alignment formats don't retain positional information? In other words, don't all alignment formats (FASTA, clustal, MSF) have sequences that have the same length with all gaps included (internal and at both ends) so that they retain position regardless of what symbols are used? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Albert Vilella > Sent: Friday, April 14, 2006 9:01 AM > To: Kai M?ller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::AlignIO ignores questionmarks? > > It seems like missing_char is more for SimpleAlign than for AlignIO. So > in case of fasta files with '?' chars, they will be ignores in line 113 > of Bio::AlignIO::fasta.pm. > > So you can add '\?' in that line of fasta.pm. > > That will parse it correctly, although I am not sure whether fasta > format should or shouldn't allow '?' chars in the file. > > Anyone? > > Cheers, > > Albert. > > On Thu, 2006-04-13 at 20:38 -0400, Kai M?ller wrote: > > hi, > > > > I'm very new to BioPerl and have a maybe silly question. > > when using Bio::AlignIO to load a set of sequences, the questionmarks > are > > simply lost (they refer to missing characters as opposed to gap > characters > > [-] or ambiguity [N]). I thought that 'missing_char()' might help, but > it > > didn't (I probably used it the wrong way). > > > > when $filename contains sequences with ????, the following snippet would > > produce an alignment with ???? lost and downstream nucleotide just > shifted > > and the resulting length differnces filled by '---' @ 3' end: > > > > > > my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => > 'fasta'); > > my $aln = $aln_in->next_aln(); > > $aln->gap_char('-'); > > $aln->missing_char('?'); > > > > my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => > 'clustalw'); > > $testout->write_aln($aln); > > > > > > > > Can somebody give me a hint here? > > > > thanks and all the best, > > > > Kai M?ller > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Fri Apr 14 11:10:47 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 14 Apr 2006 10:10:47 -0500 Subject: [Bioperl-l] problem with needle In-Reply-To: <443F988A.5090400@persistent.co.in> References: <443F988A.5090400@persistent.co.in> Message-ID: <443FBB77.2080306@campus.iztacala.unam.mx> The error means that the bioweb.pasteur.fr server was probably down in that moment or it couldn't be looked up. Try again later :) Regards, Mauricio. madhurima bhattacharjee wrote: > Hello, > I am new ti BioPerl.I am using the pise module to perform needle allignment. > I am using the following code: > > use Bio::Tools::Run::AnalysisFactory::Pise; > my $factory = new Bio::Tools::Run::AnalysisFactory::Pise(); > my $needle = $factory->program('needle'); > my $job = $needle->run(-sequencea => $ARGV[0], > -seqall => $ARGV[1], > -gapopen => 5, > -gapextend => 1); > if ($job->error) { > print ".............error: ",$job->error_message,".............\n"; > exit; > } > print STDERR "jobid: ", $job->jobid, "\n"; > print $job->content('needletest.needle'); > > But it gives me the following error: > .............error: Bio::Tools::Run::PiseJob _submit: Can't connect to > bioweb.pasteur.fr:80 (connect: timeout)............. > > Can anyone please point me to the problem in code.I am really stuck in this. > > Thanks and Regards, > Madhurima. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Fri Apr 14 11:41:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 14 Apr 2006 10:41:09 -0500 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: Message-ID: <000b01c65fd9$daa9a250$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of David Messina > Sent: Friday, April 14, 2006 12:14 AM > To: Kai M?ller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::AlignIO ignores questionmarks? > > Hi Kai, > > I'm by no means an expert with this module, but I'll take a shot. > > Running your code through a debugger, I'm seeing that > Bio::AlignIO::fasta is gobbling the question marks: > > line 66: $MATCHPATTERN = '^A-Za-z\.\-'; > > and then where $entry contains a line of sequence from the input file > > line 118: $entry =~ s/[$MATCHPATTERN]//g; > > As far as I can tell, a question mark is not a valid character for > the FASTA format (see http://en.wikipedia.org/wiki/FASTA_format) -- > perhaps that's the reason Bio::AlignIO::fasta doesn't permit them? I wouldn't trust wikipedia with that one. Check out the bioperl page: http://www.bioperl.org/wiki/FASTA_sequence_format The problem is, there is no really well-established universal rule for FASTA format. These are three valid FASTA input sequences for some programs: >xyz X > - > It's all dependent on how a program/web interface imports the sequence. You don't need a description line, just '>' will do. Some don't even reuire a sequence, though most filters will warn you. Even the rules for wrapping the sequences on multiple line are different (is it 60, 80, 100, or none?). I know, when I first started (early '90's), a quick and easy way to get sequences ready for BLAST searches which required FASTA was copy-paste and add '>' and CR in a line above, with no additional line breaks in the sequence (all on one line). Still works AFAIK... > And then by the time missing_char() is applied, the question marks > are already gone. > > What happens if you read in your sequence with question marks in a > format that explicitly permits question marks? > > Dave > > > On Apr 13, 2006, at 7:38 PM, Kai M?ller wrote: > > > hi, > > > > I'm very new to BioPerl and have a maybe silly question. > > when using Bio::AlignIO to load a set of sequences, the > > questionmarks are > > simply lost (they refer to missing characters as opposed to gap > > characters > > [-] or ambiguity [N]). I thought that 'missing_char()' might help, > > but it > > didn't (I probably used it the wrong way). > > > > when $filename contains sequences with ????, the following snippet > > would > > produce an alignment with ???? lost and downstream nucleotide just > > shifted > > and the resulting length differnces filled by '---' @ 3' end: > > > > > > my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' => > > 'fasta'); > > my $aln = $aln_in->next_aln(); > > $aln->gap_char('-'); > > $aln->missing_char('?'); > > > > my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' => > > 'clustalw'); > > $testout->write_aln($aln); > > > > > > > > Can somebody give me a hint here? > > > > thanks and all the best, > > > > Kai M?ller > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Apr 14 14:36:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 14 Apr 2006 13:36:40 -0500 Subject: [Bioperl-l] Bio::AlignIO ignores questionmarks? In-Reply-To: <000a01c65fd4$3b4dae90$15327e82@pyrimidine> Message-ID: <000001c65ff2$5ed883d0$15327e82@pyrimidine> > I don't adding '?' necessarily think it harms anything; the intent is to > maintain the nucleotide positions in the alignment. If you used this > alignment (from the bug above): .... Oi! Forgot to clean up my sentence there. Should be "I don't think adding '?' necessarily harms anything." Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 18 10:09:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 18 Apr 2006 09:09:19 -0500 Subject: [Bioperl-l] FW: Bioperl PAML bug Message-ID: <000201c662f1$afc288c0$15327e82@pyrimidine> Jason, There is a new bug for PAML results parsing: http://bugzilla.open-bio.org/show_bug.cgi?id=1983 I'll try looking into it but I can't get around to it until late tomorrow or Thursday at the earliest; I'm forwarding this to you and the mail list in case you or others might have ideas. Looks like the results aren't parsed past the third model result in the test results file (in the bug report). Below is the email I sent to the bug submitter. Cheers! Chris P.S. Hope you're enjoying the break after your defense. Congrats again! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, April 18, 2006 8:55 AM > To: 'nivsabath at hotmail.com' > Subject: Bioperl PAML bug > > From what I have read on the HOWTO PAML page, I believe you globbed > together two different scripts here; in other words, the submitted script > is NOT the same as on the web page. > > ===================================== > use strict; > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::Tools::Run::Alignment::Clustalw; > > # for projecting alignments from protein to R/DNA space > use Bio::Align::Utilities qw(aa_to_dna_aln); > > # for input of the sequence data > use Bio::SeqIO; > use Bio::AlignIO; > ... > ===================================== > > should be: > ===================================== > use strict; > use Bio::Tools::Phylo::PAML; > ... > ===================================== > > since you're only parsing PAML output (not running PAML and parsing the > results). > > Regardless, I still get errors here; it looks like it's not parsing past > the first three models, so it's a legit bug. Looks like the parser > doesn't catch the end of the model (calling the model method time_used > gets an "Use of uninitialized value in print" error). > > Also, looks like the POD and docs for using Bioperl and PAML are a bit > out-of-date and still refer to names of older methods (I ran into a number > of missing method calls while looking into this). I won't be able to get > to this today but I'll drop Jason a line to see what he thinks. > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign From akarger at CGR.Harvard.edu Tue Apr 18 13:30:58 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 18 Apr 2006 13:30:58 -0400 Subject: [Bioperl-l] BioPerl Mailing List Summaries Message-ID: Nice! I'd suggest putting a link to (the News blog in general and) a Mailing_List_Summaries page in the Current Events section, for those who don't read the navbar. Mailing_List_Summaries can just be a list of links to biweekly summary pages (which can be on the wiki or the blog, as you prefer). - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, April 11, 2006 9:53 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] BioPerl Mailing List Summaries > > The first of a biweekly summary of BioPerl mailing list > summaries has been > posted to the wiki: > > http://www.bioperl.org/wiki/BioPerl_Mailing_List_Summaries_for > _April_1-11 > > These will likely be archived on the wiki but may be moved to a more > suitable location in the future (maybe to the news blog?). > Any suggestions > are welcome. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > From dwaner at scitegic.com Tue Apr 18 18:05:59 2006 From: dwaner at scitegic.com (David Waner) Date: Tue, 18 Apr 2006 15:05:59 -0700 Subject: [Bioperl-l] PSI-BLAST parsing fails on Windows Message-ID: <444562C7.7080506@scitegic.com> I have found that there is a small difference in output format between the linux and Windows versions of NCBI blastpgp that causes parsing of psi-blast results to fail on Windows. Specifically, the Windows version of blastpgp does not output the "Searching.done" lines between the results of each iteration. For example: --------------------------------------------------------------------------------- LINUX: Database: C:\BioPerl\REGRES~1\Output\R517_protDB 100 sequences; 33,683 total letters Searching.done Results from round 1 --------------------------------------------------------------------------------- WINDOWS: Database: C:\BioPerl\REGRES~1\Output\R517_protDB 100 sequences; 33,683 total letters Results from round 1 --------------------------------------------------------------------------------- As a fix I have changed line 474 of Bio::SearchIO::blast.pm from: elsif (/^Searching/) { to: elsif (/^Results from round/) { This seems to fix the problem without breaking anything, but I would like to see if anyone on the mailing list can run some additional test cases, or sees anything wrong with this proposed change before I submit it to bugzilla. Thanks. - David Additional information: This problem occurs with blast versions 2.2.10 and 2.2.13. I have the latest version of blast.pm (1.92) from the bioperl-live cvs. From cjfields at uiuc.edu Tue Apr 18 20:19:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 18 Apr 2006 19:19:48 -0500 Subject: [Bioperl-l] PSI-BLAST parsing fails on Windows In-Reply-To: <444562C7.7080506@scitegic.com> References: <444562C7.7080506@scitegic.com> Message-ID: <5F0C81D9-295D-4E71-A344-88DB667857B3@uiuc.edu> Could you submit this as a bug in Bugzilla so we can track it and attach the blast output files if they are small enough? If Bugzilla rejects the files b/c they are too large (I think the cutoff is 100-200 KB) you can attach them to an email and send them to me directly. I'll try the fix on Mac OS X and Windows, run tests, and commit the fix if everything passes. Chris On Apr 18, 2006, at 5:05 PM, David Waner wrote: > I have found that there is a small difference in output format between > the linux and Windows versions of NCBI blastpgp that causes parsing of > psi-blast results to fail on Windows. Specifically, the Windows > version > of blastpgp does not output the "Searching.done" lines between the > results of each iteration. For example: > ---------------------------------------------------------------------- > ----------- > LINUX: > > Database: C:\BioPerl\REGRES~1\Output\R517_protDB > 100 sequences; 33,683 total letters > > Searching.done > > > Results from round 1 > ---------------------------------------------------------------------- > ----------- > WINDOWS: > > Database: C:\BioPerl\REGRES~1\Output\R517_protDB > 100 sequences; 33,683 total letters > > > Results from round 1 > ---------------------------------------------------------------------- > ----------- > As a fix I have changed line 474 of Bio::SearchIO::blast.pm > from: > elsif (/^Searching/) { > to: > elsif (/^Results from round/) { > > This seems to fix the problem without breaking anything, but I would > like to see if anyone on the mailing list can run some additional test > cases, or sees anything wrong with this proposed change before I > submit > it to bugzilla. > > Thanks. > > - David > > > Additional information: > > This problem occurs with blast versions 2.2.10 and 2.2.13. > I have the latest version of blast.pm (1.92) from the bioperl-live > cvs. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Marc.Logghe at DEVGEN.com Fri Apr 21 04:56:02 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 21 Apr 2006 10:56:02 +0200 Subject: [Bioperl-l] Input for Bio::CodonUsage::IO Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D0B@ANTARESIA.be.devgen.com> Hi, I was wondering what format Bio::CodonUsage::IO expects as input for the -file option. I tried to pass it a *.cut file generated by EMBOSS' cutgextract that looks like this: #Species: Oryza sativa #Division: gbpln #Release: CUTG #CdsCount: 70050 #Coding GC 55.34% #1st letter GC 58.41% #2nd letter GC 46.34% #3rd letter GC 61.29% #Codon AA Fraction Frequency Number GCA A 0.185 17.382 431151 TGA * 0.435 1.228 30463 Looking into the _parse() method of Bio::CodonUsage::IO it appears that the table resembles this kind of format but is actually not exactly what it expects. My question is: how should it really look like ? I could not find an example in t/data. Any clues ? Thanks, Marc From prabubio at gmail.com Fri Apr 21 06:45:46 2006 From: prabubio at gmail.com (Prabu R) Date: Fri, 21 Apr 2006 16:15:46 +0530 Subject: [Bioperl-l] Genbank parsing using Bioperl Message-ID: Dear all! I am a novice bioperl user, trying to parse Genbank files with Bioperl modules to get some specific features and details. Anyone please tell me, whether we can retrive a Gene, its Transcript ID and its Protein ID from the Genbank file. I mainly need to extract with one to one relationship between TranscriptID and Protein ID. I was trying this. I was able to take these details if the gene is not alternatively spliced. If a gene contains multiple mRNA/CDS feature, I am not able to build the relationship between Transcript and its Protein. Kindly help me to find out whether this is possible in Bioperl. Thanks in advance, R. Prabu From osborne1 at optonline.net Fri Apr 21 09:09:12 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 21 Apr 2006 09:09:12 -0400 Subject: [Bioperl-l] Input for Bio::CodonUsage::IO In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746D0B@ANTARESIA.be.devgen.com> Message-ID: Marc, It wants a file from the database CUTG. You can ftp them from this mirror: ftp://ftp.ebi.ac.uk/pub/databases/cutg Brian O. On 4/21/06 4:56 AM, "Marc Logghe" wrote: > Hi, > I was wondering what format Bio::CodonUsage::IO expects as input for the > -file option. > I tried to pass it a *.cut file generated by EMBOSS' cutgextract that > looks like this: > #Species: Oryza sativa > #Division: gbpln > #Release: CUTG > #CdsCount: 70050 > > #Coding GC 55.34% > #1st letter GC 58.41% > #2nd letter GC 46.34% > #3rd letter GC 61.29% > > #Codon AA Fraction Frequency Number > GCA A 0.185 17.382 431151 > > TGA * 0.435 1.228 30463 > > Looking into the _parse() method of Bio::CodonUsage::IO it appears that > the table resembles this kind of format but is actually not exactly what > it expects. My question is: how should it really look like ? I could not > find an example in t/data. > Any clues ? > Thanks, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Marc.Logghe at DEVGEN.com Fri Apr 21 09:26:07 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 21 Apr 2006 15:26:07 +0200 Subject: [Bioperl-l] Input for Bio::CodonUsage::IO Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D0E@ANTARESIA.be.devgen.com> Hi Brian Thanks for the reply. I might be overlooking something but I dowloaded this last week. The tarball contained *.codon and *.spsum files and did not look at all like as a codon usage table (kind of pseudo fasta). For that reason, I used EMBOSS cutgextract that produced *.cut files starting from the CUTG *.codon files. I finally managed to parse this *.cut files. In order to do that I created a Bio::CodonUsage::IO::emboss module that only contains the private _parse() method. The setup I used is a copycat from Bio::SeqIO. Meaning, now you can do: my $io = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss' ); In case no format option is given it defaults to the Bio::CodonUsage::IO::default module that contains the _parse() method from the original Bio::CodonUsage::IO module. Actually, this should be changed to a name that makes more sense but I did not know what this default format looks like and/or where it comes from. My guess it is coming from http://www.kazusa.or.jp but the site seems to be broken. At least today. Currently I continue with this setup in house, but in case you think it is usefull to commit, just let me know. Cheers, Marc > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Friday, April 21, 2006 3:09 PM > To: Marc Logghe; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO > > Marc, > > It wants a file from the database CUTG. You can ftp them from > this mirror: > > ftp://ftp.ebi.ac.uk/pub/databases/cutg > > > Brian O. > > > On 4/21/06 4:56 AM, "Marc Logghe" wrote: > > > Hi, > > I was wondering what format Bio::CodonUsage::IO expects as > input for > > the -file option. > > I tried to pass it a *.cut file generated by EMBOSS' > cutgextract that > > looks like this: > > #Species: Oryza sativa > > #Division: gbpln > > #Release: CUTG > > #CdsCount: 70050 > > > > #Coding GC 55.34% > > #1st letter GC 58.41% > > #2nd letter GC 46.34% > > #3rd letter GC 61.29% > > > > #Codon AA Fraction Frequency Number > > GCA A 0.185 17.382 431151 > > > > TGA * 0.435 1.228 30463 > > > > Looking into the _parse() method of Bio::CodonUsage::IO it appears > > that the table resembles this kind of format but is actually not > > exactly what it expects. My question is: how should it really look > > like ? I could not find an example in t/data. > > Any clues ? > > Thanks, > > Marc > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sdavis2 at mail.nih.gov Fri Apr 21 09:21:20 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 21 Apr 2006 09:21:20 -0400 Subject: [Bioperl-l] Genbank parsing using Bioperl In-Reply-To: Message-ID: On 4/21/06 6:45 AM, "Prabu R" wrote: > Dear all! > > I am a novice bioperl user, trying to parse Genbank files with Bioperl > modules to get some specific features and details. > > Anyone please tell me, whether we can retrive a Gene, its Transcript ID and > its Protein ID from the Genbank file. > > I mainly need to extract with one to one relationship between TranscriptID > and Protein ID. > > I was trying this. I was able to take these details if the gene is not > alternatively spliced. > > If a gene contains multiple mRNA/CDS feature, I am not able to build the > relationship between Transcript and its Protein. > > Kindly help me to find out whether this is possible in Bioperl. See here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation However, genbank is only a repository, so not every transcript is going to necessarily have a protein annotation, I don't think. You might want to look into using something like the RefSeq set (from NCBI) or Ensembl, both of which have very rich annotation associated with their transcripts/proteins. Sean From prabubio at gmail.com Fri Apr 21 09:24:56 2006 From: prabubio at gmail.com (Prabu R) Date: Fri, 21 Apr 2006 18:54:56 +0530 Subject: [Bioperl-l] Genbank parsing using Bioperl In-Reply-To: References: Message-ID: Dear All, I feel sorry for making a small mistake in my earlier mail I am not actually using Genbank releases, But Refseq Genome build gbk files of NCBI (ftp.ncbi.nih.gov/genomes/) Those files are genbank formatted and contains Refseq IDs. Kindly help. R. Prabu ---------------------------- Dear all! I am a novice bioperl user, trying to parse Genbank files with Bioperl modules to get some specific features and details. Anyone please tell me, whether we can retrive a Gene, its Transcript ID and its Protein ID from the Genbank file. I mainly need to extract with one to one relationship between TranscriptID and Protein ID. I was trying this. I was able to take these details if the gene is not alternatively spliced. If a gene contains multiple mRNA/CDS feature, I am not able to build the relationship between Transcript and its Protein. Kindly help me to find out whether this is possible in Bioperl. Thanks in advance, R. Prabu From cjfields at uiuc.edu Fri Apr 21 11:26:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 21 Apr 2006 10:26:53 -0500 Subject: [Bioperl-l] Genbank parsing using Bioperl In-Reply-To: Message-ID: <000601c66558$04c8e540$15327e82@pyrimidine> I'm adding my 2c since I've got a bit of time on my hands. I'll add that I found most of these answers by looking through the mail list archives (now searchable through Gmane) and the BioPerl wiki. I believe Sean pointed out the HOWTO on the BioPerl wiki: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences In theory, you should be able to retrieve from the CDS feature which gene feature or transcript each coding feature belongs to, and normally vice versa. I may be wrong (I work with bacterial genome sequences mainly), but I believe this is completely dependent on how well the features are annotated (which can vary greatly between different sequencing centers) so can be a bit tricky depending on the source of the GenBank file. I would, instead, try a database that's well-curated and has a consistent interface across different genome projects. In other words, something like what Sean suggested, like Ensembl: http://www.ensembl.org/index.html Use can use the Ensembl Perl API to retrieve data from Ensembl databases: http://www.ensembl.org/info/software/core/core_tutorial.html You could also have a look at Entrez Gene; Brian's working on modules (in CVS) for retrieving and parsing Entrez Gene's output: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene You'll need the Bio::ASN1 parser for Brian's modules: http://sourceforge.net/projects/egparser Both Ensembl and Entrez Gene are constantly updated for transcript/protein information and are likely what you are looking for. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Prabu R > Sent: Friday, April 21, 2006 8:25 AM > To: bioperl-l at lists.open-bio.org; Sean Davis > Subject: Re: [Bioperl-l] Genbank parsing using Bioperl > > Dear All, > > I feel sorry for making a small mistake in my earlier mail > > I am not actually using Genbank releases, But Refseq Genome build gbk > files > of NCBI (ftp.ncbi.nih.gov/genomes/) > > Those files are genbank formatted and contains Refseq IDs. > > Kindly help. > > R. Prabu > > ---------------------------- > Dear all! > > I am a novice bioperl user, trying to parse Genbank files with Bioperl > modules to get some specific features and details. > > Anyone please tell me, whether we can retrive a Gene, its Transcript ID > and > its Protein ID from the Genbank file. > > I mainly need to extract with one to one relationship between TranscriptID > and Protein ID. > > I was trying this. I was able to take these details if the gene is not > alternatively spliced. > > If a gene contains multiple mRNA/CDS feature, I am not able to build the > relationship between Transcript and its Protein. > > Kindly help me to find out whether this is possible in Bioperl. > > Thanks in advance, > > R. Prabu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bornmand at BATTELLE.ORG Fri Apr 21 12:49:41 2006 From: bornmand at BATTELLE.ORG (Bornman, Daniel M) Date: Fri, 21 Apr 2006 12:49:41 -0400 Subject: [Bioperl-l] StandAlone BLAST on Windows??? Message-ID: Dear BioPerl Listers, Does anyone know if the BioPerl StandAloneBlast module works on Windows? I have a very simple script that will not work correctly. My PATH is set to 'C:\BLASTDB\bin' My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory Here is the script: ## START ## BEGIN{ $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; } use strict; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; my @params = (program => 'blastn', database => 'ecoli.nt'); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; ## END ## Here is the error message I get: [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z STACK Bio::Tools::Run::StandAloneBlast::_runblast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel localblast.pl:12 -------------------------------------- Thank You, Daniel Bornman Researcher Battelle Memorial Institute 505 King Ave Columbus, OH 43201 614.424.3229 From cjfields at uiuc.edu Fri Apr 21 13:16:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 21 Apr 2006 12:16:00 -0500 Subject: [Bioperl-l] StandAlone BLAST on Windows??? In-Reply-To: Message-ID: <000e01c66567$42d72270$15327e82@pyrimidine> Do you have the database formatted and are your env variables set correctly in the ncbi.ini file? From this output, [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin it looks like blastall couldn't find the properly formatted database (env variable not set correctly) or that the database doesn't exist (need to format using formatdb). If I remember correctly, the directory containing your data in your ncbi.ini file (I placed mine in the C:\Windows dir) should look something like this: [NCBI] Data=C:\Research\blast\data You might also want to try leaving out the BLASTDIR if it is already set in ncbi.ini. If you can't get it working I'll look into it (unless Torsten has WinXP????). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bornman, Daniel M > Sent: Friday, April 21, 2006 11:50 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] StandAlone BLAST on Windows??? > > Dear BioPerl Listers, > > Does anyone know if the BioPerl StandAloneBlast module works on Windows? > > I have a very simple script that will not work correctly. > > My PATH is set to 'C:\BLASTDB\bin' > My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory > > Here is the script: > > ## START ## > > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); my > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = > $report_obj->next_result; print $result_obj->num_hits; > > ## END ## > > > Here is the error message I get: > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > > Thank You, > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bornmand at BATTELLE.ORG Fri Apr 21 13:52:07 2006 From: bornmand at BATTELLE.ORG (Bornman, Daniel M) Date: Fri, 21 Apr 2006 13:52:07 -0400 Subject: [Bioperl-l] StandAlone BLAST on Windows??? Message-ID: All that you have asked about is in place. 1) My PATH variable is set to 'C:\BLASTDB\bin\' 2) the 'ncbi.ini' file is in the C:\WINDOWS directory The contents of this file are: [NCBI] Data=C:\BLASTDB\data 3) I formatted the ecoli.nt database with the following command in the DOS prompt... formatdb -i ecoli.nt -p F -o T ...this generated 7 files in the C:\BLASTDB\data\ directory. (I can generate a Blast output with the following command..."blastall -p blastn -d ecoli.nt -i test.txt -o test.out") 4) I also tried running the script after commenting out the "$ENV{BLASTDIR} = 'C:\\BLASTDB\\data';" section BUPKIS... HAS ANYONE HERE EVER TRIED AND HAD SUCCESS USING 'StandAloneBlast' ON WINDOWS? -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Friday, April 21, 2006 1:16 PM To: Bornman, Daniel M; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] StandAlone BLAST on Windows??? Do you have the database formatted and are your env variables set correctly in the ncbi.ini file? From this output, [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin it looks like blastall couldn't find the properly formatted database (env variable not set correctly) or that the database doesn't exist (need to format using formatdb). If I remember correctly, the directory containing your data in your ncbi.ini file (I placed mine in the C:\Windows dir) should look something like this: [NCBI] Data=C:\Research\blast\data You might also want to try leaving out the BLASTDIR if it is already set in ncbi.ini. If you can't get it working I'll look into it (unless Torsten has WinXP????). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bornman, Daniel M > Sent: Friday, April 21, 2006 11:50 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] StandAlone BLAST on Windows??? > > Dear BioPerl Listers, > > Does anyone know if the BioPerl StandAloneBlast module works on Windows? > > I have a very simple script that will not work correctly. > > My PATH is set to 'C:\BLASTDB\bin' > My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory > > Here is the script: > > ## START ## > > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); my > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = > $report_obj->next_result; print $result_obj->num_hits; > > ## END ## > > > Here is the error message I get: > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > > Thank You, > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bornmand at BATTELLE.ORG Fri Apr 21 14:10:15 2006 From: bornmand at BATTELLE.ORG (Bornman, Daniel M) Date: Fri, 21 Apr 2006 14:10:15 -0400 Subject: [Bioperl-l] StandAlone BLAST on Windows??? Message-ID: Barry's suggestion has apparently fixed the problem. My simple script only asked for number of hits which returns "35" and I am not getting an error msg. Further testing will say for sure but it looks pretty good. Thanks Barry! -Dan -----Original Message----- From: Barry Moore [mailto:barry.moore at genetics.utah.edu] Sent: Friday, April 21, 2006 2:01 PM To: Bornman, Daniel M Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] StandAlone BLAST on Windows??? Change > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; to > $ENV{BLASTDIR} = 'C:\\BLASTDB\\; or set > > $ENV{BLASTDB} = 'C:\\BLASTDB\\data'; and when you do a blastp search don't forget to add > $ENV{BLASTMAT} = 'C:\\BLASTDB\\data'; BLASTDIR points to your blast executables, and assumes a data directory underneath that for databases and matrix files. If you have the standard setup with the executable in one directory and the databases and matrix files in a subdirectory 'data' can set BLASTDIR to take care of everything. If you've put that executables, databases, or matrix files in different locations then you'll want to set BLASTDB to point directly to the database, and BLASTMAT to the matrix files (probably the same directory). I know the ncbi.ini file is supposed to take care of all of this, but it seems like I had trouble with that approach in the past, and just went back to setting ENV variables. Barry On Apr 21, 2006, at 10:49 AM, Bornman, Daniel M wrote: > Dear BioPerl Listers, > > Does anyone know if the BioPerl StandAloneBlast module works on > Windows? > > I have a very simple script that will not work correctly. > > My PATH is set to 'C:\BLASTDB\bin' > My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory > > Here is the script: > > ## START ## > > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); my > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = > $report_obj->next_result; print $result_obj->num_hits; > > ## END ## > > > Here is the error message I get: > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > > Thank You, > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri Apr 21 14:00:38 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 21 Apr 2006 12:00:38 -0600 Subject: [Bioperl-l] StandAlone BLAST on Windows??? In-Reply-To: References: Message-ID: Change > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; to > $ENV{BLASTDIR} = 'C:\\BLASTDB\\; or set > > $ENV{BLASTDB} = 'C:\\BLASTDB\\data'; and when you do a blastp search don't forget to add > $ENV{BLASTMAT} = 'C:\\BLASTDB\\data'; BLASTDIR points to your blast executables, and assumes a data directory underneath that for databases and matrix files. If you have the standard setup with the executable in one directory and the databases and matrix files in a subdirectory 'data' can set BLASTDIR to take care of everything. If you've put that executables, databases, or matrix files in different locations then you'll want to set BLASTDB to point directly to the database, and BLASTMAT to the matrix files (probably the same directory). I know the ncbi.ini file is supposed to take care of all of this, but it seems like I had trouble with that approach in the past, and just went back to setting ENV variables. Barry On Apr 21, 2006, at 10:49 AM, Bornman, Daniel M wrote: > Dear BioPerl Listers, > > Does anyone know if the BioPerl StandAloneBlast module works on > Windows? > > I have a very simple script that will not work correctly. > > My PATH is set to 'C:\BLASTDB\bin' > My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory > > Here is the script: > > ## START ## > > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); my > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = > $report_obj->next_result; print $result_obj->num_hits; > > ## END ## > > > Here is the error message I get: > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > > Thank You, > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bornmand at BATTELLE.ORG Fri Apr 21 10:13:03 2006 From: bornmand at BATTELLE.ORG (Bornman, Daniel M) Date: Fri, 21 Apr 2006 10:13:03 -0400 Subject: [Bioperl-l] StandAloneBlast in Windows Message-ID: Dear BioPerl, Does anyone know how to use "Bio::Tools::Run::StandAloneBlast" in Windows? I am able to query my own stand alone blast database from the command line directly without perl so I know its installed correctly but I am unable to get my BioPerl script to work. Here is a very simple program example and the error message I get. My Path (Environmental Variable) is set to 'C:\BLASTDB\bin' ## START ## BEGIN{ $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; } use strict; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; my @params = (program => 'blastn', database => 'ecoli.nt'); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; ## END ## ## ERROR MESSAGE START ## [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z STACK Bio::Tools::Run::StandAloneBlast::_runblast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel localblast.pl:12 -------------------------------------- ## ERROR MESSAGE END ## Any help would be greatly appreciated. Thanks, Dan Daniel Bornman Researcher Battelle Memorial Institute 505 King Ave Columbus, OH 43201 614.424.3229 From cjfields at uiuc.edu Fri Apr 21 14:23:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 21 Apr 2006 13:23:01 -0500 Subject: [Bioperl-l] StandAlone BLAST on Windows??? In-Reply-To: <44491EE4.7030908@atgc.org> Message-ID: <001b01c66570$9fb13090$15327e82@pyrimidine> > -----Original Message----- > From: Alexander Kozik [mailto:akozik at atgc.org] > Sent: Friday, April 21, 2006 1:05 PM > To: Bornman, Daniel M > Cc: Chris Fields; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] StandAlone BLAST on Windows??? > > It looks as a query file was located on > -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx > "Documents and Settings" Windows directory. > NCBI BLAST on windows does not understand white spaces in file names and > folders/directories. Removing of white space(s) from file names and > directories may solve the problem. Also, pay attention to the short > style "DOCUME~1" of these names in this particular report. > > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > Bornman, Daniel M wrote: > > All that you have asked about is in place. > > > > > > 1) My PATH variable is set to 'C:\BLASTDB\bin\' > > > > > > 2) the 'ncbi.ini' file is in the C:\WINDOWS directory > > The contents of this file are: > > > > [NCBI] > > Data=C:\BLASTDB\data > > > > > > 3) I formatted the ecoli.nt database with the following command in the > > DOS prompt... > > > > formatdb -i ecoli.nt -p F -o T > > > > ...this generated 7 files in the C:\BLASTDB\data\ directory. > > (I can generate a Blast output with the following command..."blastall -p > > blastn -d ecoli.nt -i test.txt -o test.out") > > > > > > > > 4) I also tried running the script after commenting out the > > "$ENV{BLASTDIR} = 'C:\\BLASTDB\\data';" section > > > > > > > > BUPKIS... > > > > > > > > HAS ANYONE HERE EVER TRIED AND HAD SUCCESS USING 'StandAloneBlast' ON > > WINDOWS? > > Really not the best way to go about asking for help. I'm looking into it. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Apr 21 14:23:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 21 Apr 2006 13:23:41 -0500 Subject: [Bioperl-l] StandAloneBlast in Windows In-Reply-To: Message-ID: <001c01c66570$b78fe080$15327e82@pyrimidine> And don't spam the mail list with repeated requests. It shows a lack of respect. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bornman, Daniel M > Sent: Friday, April 21, 2006 9:13 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] StandAloneBlast in Windows > > > Dear BioPerl, > > > Does anyone know how to use "Bio::Tools::Run::StandAloneBlast" in > Windows? > > I am able to query my own stand alone blast database from the command > line directly without perl so I know its installed correctly but I am > unable to get my BioPerl script to work. Here is a very simple program > example and the error message I get. > My Path (Environmental Variable) is set to 'C:\BLASTDB\bin' > > > > ## START ## > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); > my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); > my $result_obj = $report_obj->next_result; > print $result_obj->num_hits; > ## END ## > > > ## ERROR MESSAGE START ## > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > ## ERROR MESSAGE END ## > > > Any help would be greatly appreciated. > > Thanks, > Dan > > > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akozik at atgc.org Fri Apr 21 14:05:24 2006 From: akozik at atgc.org (Alexander Kozik) Date: Fri, 21 Apr 2006 11:05:24 -0700 Subject: [Bioperl-l] StandAlone BLAST on Windows??? In-Reply-To: References: Message-ID: <44491EE4.7030908@atgc.org> It looks as a query file was located on -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx "Documents and Settings" Windows directory. NCBI BLAST on windows does not understand white spaces in file names and folders/directories. Removing of white space(s) from file names and directories may solve the problem. Also, pay attention to the short style "DOCUME~1" of these names in this particular report. Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Bornman, Daniel M wrote: > All that you have asked about is in place. > > > 1) My PATH variable is set to 'C:\BLASTDB\bin\' > > > 2) the 'ncbi.ini' file is in the C:\WINDOWS directory > The contents of this file are: > > [NCBI] > Data=C:\BLASTDB\data > > > 3) I formatted the ecoli.nt database with the following command in the > DOS prompt... > > formatdb -i ecoli.nt -p F -o T > > ...this generated 7 files in the C:\BLASTDB\data\ directory. > (I can generate a Blast output with the following command..."blastall -p > blastn -d ecoli.nt -i test.txt -o test.out") > > > > 4) I also tried running the script after commenting out the > "$ENV{BLASTDIR} = 'C:\\BLASTDB\\data';" section > > > > BUPKIS... > > > > HAS ANYONE HERE EVER TRIED AND HAD SUCCESS USING 'StandAloneBlast' ON > WINDOWS? > > > > > > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Friday, April 21, 2006 1:16 PM > To: Bornman, Daniel M; bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] StandAlone BLAST on Windows??? > > Do you have the database formatted and are your env variables set > correctly in the ncbi.ini file? From this output, > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > it looks like blastall couldn't find the properly formatted database > (env variable not set correctly) or that the database doesn't exist > (need to format using formatdb). If I remember correctly, the directory > containing your data in your ncbi.ini file (I placed mine in the > C:\Windows dir) should look something like this: > > [NCBI] > Data=C:\Research\blast\data > > You might also want to try leaving out the BLASTDIR if it is already set > in ncbi.ini. If you can't get it working I'll look into it (unless > Torsten has WinXP????). > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bornman, Daniel M >> Sent: Friday, April 21, 2006 11:50 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] StandAlone BLAST on Windows??? >> >> Dear BioPerl Listers, >> >> Does anyone know if the BioPerl StandAloneBlast module works on > Windows? >> I have a very simple script that will not work correctly. >> >> My PATH is set to 'C:\BLASTDB\bin' >> My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory >> >> Here is the script: >> >> ## START ## >> >> BEGIN{ >> $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; >> } >> use strict; >> use Bio::Seq; >> use Bio::Tools::Run::StandAloneBlast; >> >> my @params = (program => 'blastn', database => 'ecoli.nt'); my >> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); >> my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq >> =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); >> my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = >> $report_obj->next_result; print $result_obj->num_hits; >> >> ## END ## >> >> >> Here is the error message I get: >> >> [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin >> >> ------------- EXCEPTION ------------- >> MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > >> -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o >> C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z >> >> STACK Bio::Tools::Run::StandAloneBlast::_runblast >> C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 >> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >> C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 >> STACK Bio::Tools::Run::StandAloneBlast::blastall >> C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 >> STACK toplevel localblast.pl:12 >> >> -------------------------------------- >> >> >> Thank You, >> >> Daniel Bornman >> Researcher >> Battelle Memorial Institute >> 505 King Ave >> Columbus, OH 43201 >> 614.424.3229 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bornmand at BATTELLE.ORG Fri Apr 21 11:57:48 2006 From: bornmand at BATTELLE.ORG (Bornman, Daniel M) Date: Fri, 21 Apr 2006 11:57:48 -0400 Subject: [Bioperl-l] StandAlone BLAST on Windows Message-ID: Does anyone know if the BioPerl StandAloneBlast module works on Windows? I have a very simple script that will not work correctly. My PATH is set to 'C:\BLASTDB\bin' My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory Here is the script: ## START ## BEGIN{ $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; } use strict; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; my @params = (program => 'blastn', database => 'ecoli.nt'); my $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; ## END ## Here is the error message I get: [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z STACK Bio::Tools::Run::StandAloneBlast::_runblast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel localblast.pl:12 -------------------------------------- From osborne1 at optonline.net Fri Apr 21 17:29:54 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 21 Apr 2006 17:29:54 -0400 Subject: [Bioperl-l] StandAlone BLAST on Windows??? In-Reply-To: Message-ID: Daniel, Yes, it runs on Windows, no problem. Some questions: did you run formatdb? Can you run blastall from the commandline using these same files? Brian O. On 4/21/06 12:49 PM, "Bornman, Daniel M" wrote: > Dear BioPerl Listers, > > Does anyone know if the BioPerl StandAloneBlast module works on Windows? > > I have a very simple script that will not work correctly. > > My PATH is set to 'C:\BLASTDB\bin' > My database, 'ecoli.nt' is in 'C:\BLASTDB\data' directory > > Here is the script: > > ## START ## > > BEGIN{ > $ENV{BLASTDIR} = 'C:\\BLASTDB\\data'; > } > use strict; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > > my @params = (program => 'blastn', database => 'ecoli.nt'); my > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > my $seq_obj = Bio::Seq->new(-id =>"testquery", -seq > =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); > my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = > $report_obj->next_result; print $result_obj->num_hits; > > ## END ## > > > Here is the error message I get: > > [NULL_Caption] WARNING: testquery: Unable to open ecoli.nt.nin > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 C:\BLASTDB\bin\blastall.exe -p blastn > -d "\ecoli.nt" -i C:\DOCUME~1\Home\LOCALS~1\Temp\QZwYdo9eKx -o > C:\DOCUME~1\Home\LOCALS~1\Temp\ADP2OcZw8z > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel localblast.pl:12 > > -------------------------------------- > > > Thank You, > > Daniel Bornman > Researcher > Battelle Memorial Institute > 505 King Ave > Columbus, OH 43201 > 614.424.3229 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Apr 21 18:00:11 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 21 Apr 2006 18:00:11 -0400 Subject: [Bioperl-l] Input for Bio::CodonUsage::IO In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746D0E@ANTARESIA.be.devgen.com> Message-ID: Marc, I spoke too soon, looking at IO.pm's _parse method shows clearly that it's meant to parse the same format it writes, something like what's shown below. As you noted, this doesn't look anything like the *codon files, which have a fasta-style header followed by 64 numbers, clearly these are the counts of codons in the sequence that's referenced. In addition IO.pm wouldn't have worked anyway, it was missing a basic "use" statement, fixed now. Like you I can't connect to Codon at Kasuza, when I can I'll see if it can provide us with a file formatted something like the text below. Brian O. WDW311031#4\AJ311031\complement(1717..2511)\795\CAC84661.1\Wheat dwarf virus - [unknown] 1 CDS's AmAcid Codon Number /1000 Fraction Gly GGG 0.00 0.00 0.00 Gly GGA 0.00 0.00 0.00 Gly GGT 0.00 0.00 0.00 Gly GGC 0.00 0.00 0.00 Glu GAG 0.00 0.00 0.00 Glu GAA 0.00 0.00 0.00 Asp GAT 0.00 0.00 0.00 Asp GAC 0.00 0.00 0.00 Val GTG 0.00 0.00 0.00 Val GTA 0.00 0.00 0.00 Val GTT 0.00 0.00 0.00 Val GTC 0.00 0.00 0.00 Ala GCG 0.00 0.00 0.00 Ala GCA 0.00 0.00 0.00 Ala GCT 0.00 0.00 0.00 Ala GCC 0.00 0.00 0.00 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Ser AGT 0.00 0.00 0.00 Ser AGC 0.00 0.00 0.00 Lys AAG 0.00 0.00 0.00 Lys AAA 0.00 0.00 0.00 Asn AAT 0.00 0.00 0.00 Asn AAC 0.00 0.00 0.00 Met ATG 0.00 0.00 0.00 Ile ATA 0.00 0.00 0.00 Ile ATT 0.00 0.00 0.00 Ile ATC 0.00 0.00 0.00 Thr ACG 0.00 0.00 0.00 Thr ACA 0.00 0.00 0.00 Thr ACT 0.00 0.00 0.00 Thr ACC 0.00 0.00 0.00 Trp TGG 0.00 0.00 0.00 Ter TGA 0.00 0.00 0.00 Cys TGT 0.00 0.00 0.00 Cys TGC 0.00 0.00 0.00 Ter TAG 0.00 0.00 0.00 Ter TAA 0.00 0.00 0.00 Tyr TAT 0.00 0.00 0.00 Tyr TAC 0.00 0.00 0.00 Leu TTG 0.00 0.00 0.00 Leu TTA 0.00 0.00 0.00 Phe TTT 0.00 0.00 0.00 Phe TTC 0.00 0.00 0.00 Ser TCG 0.00 0.00 0.00 Ser TCA 0.00 0.00 0.00 Ser TCT 0.00 0.00 0.00 Ser TCC 0.00 0.00 0.00 Arg CGG 0.00 0.00 0.00 Arg CGA 0.00 0.00 0.00 Arg CGT 0.00 0.00 0.00 Arg CGC 0.00 0.00 0.00 Gln CAG 0.00 0.00 0.00 Gln CAA 0.00 0.00 0.00 His CAT 0.00 0.00 0.00 His CAC 0.00 0.00 0.00 Leu CTG 0.00 0.00 0.00 Leu CTA 0.00 0.00 0.00 Leu CTT 0.00 0.00 0.00 Leu CTC 0.00 0.00 0.00 Pro CCG 0.00 0.00 0.00 Pro CCA 0.00 0.00 0.00 Pro CCT 0.00 0.00 0.00 Pro CCC 0.00 0.00 0.00 Coding GC 0% 1st letter GC 0% 2nd letter GC 0% 3rd letter GC 0% Genetic code 1 On 4/21/06 9:26 AM, "Marc Logghe" wrote: > Hi Brian > Thanks for the reply. > I might be overlooking something but I dowloaded this last week. The > tarball contained *.codon and *.spsum files and did not look at all like > as a codon usage table (kind of pseudo fasta). For that reason, I used > EMBOSS cutgextract that produced *.cut files starting from the CUTG > *.codon files. > > I finally managed to parse this *.cut files. > In order to do that I created a Bio::CodonUsage::IO::emboss module that > only contains the private _parse() method. The setup I used is a copycat > from Bio::SeqIO. > Meaning, now you can do: > my $io = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss' > ); > > In case no format option is given it defaults to the > Bio::CodonUsage::IO::default module that contains the _parse() method > from the original Bio::CodonUsage::IO module. Actually, this should be > changed to a name that makes more sense but I did not know what this > default format looks like and/or where it comes from. My guess it is > coming from http://www.kazusa.or.jp but the site seems to be broken. At > least today. > Currently I continue with this setup in house, but in case you think it > is usefull to commit, just let me know. > Cheers, > Marc > > > >> -----Original Message----- >> From: Brian Osborne [mailto:osborne1 at optonline.net] >> Sent: Friday, April 21, 2006 3:09 PM >> To: Marc Logghe; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO >> >> Marc, >> >> It wants a file from the database CUTG. You can ftp them from >> this mirror: >> >> ftp://ftp.ebi.ac.uk/pub/databases/cutg >> >> >> Brian O. >> >> >> On 4/21/06 4:56 AM, "Marc Logghe" wrote: >> >>> Hi, >>> I was wondering what format Bio::CodonUsage::IO expects as >> input for >>> the -file option. >>> I tried to pass it a *.cut file generated by EMBOSS' >> cutgextract that >>> looks like this: >>> #Species: Oryza sativa >>> #Division: gbpln >>> #Release: CUTG >>> #CdsCount: 70050 >>> >>> #Coding GC 55.34% >>> #1st letter GC 58.41% >>> #2nd letter GC 46.34% >>> #3rd letter GC 61.29% >>> >>> #Codon AA Fraction Frequency Number >>> GCA A 0.185 17.382 431151 >>> >>> TGA * 0.435 1.228 30463 >>> >>> Looking into the _parse() method of Bio::CodonUsage::IO it appears >>> that the table resembles this kind of format but is actually not >>> exactly what it expects. My question is: how should it really look >>> like ? I could not find an example in t/data. >>> Any clues ? >>> Thanks, >>> Marc >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Apr 21 18:06:04 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 21 Apr 2006 18:06:04 -0400 Subject: [Bioperl-l] Genbank parsing using Bioperl In-Reply-To: References: Message-ID: <4449574C.7050208@mail.nih.gov> Prabu R wrote: > Dear All, > > I feel sorry for making a small mistake in my earlier mail > > I am not actually using Genbank releases, But Refseq Genome build gbk > files of NCBI (ftp.ncbi.nih.gov/genomes/ ) > > Those files are genbank formatted and contains Refseq IDs. > > Kindly help. > > R. Prabu > > ---------------------------- > Dear all! > > I am a novice bioperl user, trying to parse Genbank files with Bioperl > modules to get some specific features and details. > > Anyone please tell me, whether we can retrive a Gene, its Transcript ID > and its Protein ID from the Genbank file. > > I mainly need to extract with one to one relationship between > TranscriptID and Protein ID. Just grab this file: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz It contains the one-to-one mapping you are looking for in a tab-delimited format. Sean From osborne1 at optonline.net Fri Apr 21 18:07:13 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 21 Apr 2006 18:07:13 -0400 Subject: [Bioperl-l] Input for Bio::CodonUsage::IO In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746D0E@ANTARESIA.be.devgen.com> Message-ID: Marc, An example input file for Bio::CodonUsage::IO is t/data/MmCT. Brian O. On 4/21/06 9:26 AM, "Marc Logghe" wrote: > Hi Brian > Thanks for the reply. > I might be overlooking something but I dowloaded this last week. The > tarball contained *.codon and *.spsum files and did not look at all like > as a codon usage table (kind of pseudo fasta). For that reason, I used > EMBOSS cutgextract that produced *.cut files starting from the CUTG > *.codon files. > > I finally managed to parse this *.cut files. > In order to do that I created a Bio::CodonUsage::IO::emboss module that > only contains the private _parse() method. The setup I used is a copycat > from Bio::SeqIO. > Meaning, now you can do: > my $io = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss' > ); > > In case no format option is given it defaults to the > Bio::CodonUsage::IO::default module that contains the _parse() method > from the original Bio::CodonUsage::IO module. Actually, this should be > changed to a name that makes more sense but I did not know what this > default format looks like and/or where it comes from. My guess it is > coming from http://www.kazusa.or.jp but the site seems to be broken. At > least today. > Currently I continue with this setup in house, but in case you think it > is usefull to commit, just let me know. > Cheers, > Marc > > > >> -----Original Message----- >> From: Brian Osborne [mailto:osborne1 at optonline.net] >> Sent: Friday, April 21, 2006 3:09 PM >> To: Marc Logghe; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO >> >> Marc, >> >> It wants a file from the database CUTG. You can ftp them from >> this mirror: >> >> ftp://ftp.ebi.ac.uk/pub/databases/cutg >> >> >> Brian O. >> >> >> On 4/21/06 4:56 AM, "Marc Logghe" wrote: >> >>> Hi, >>> I was wondering what format Bio::CodonUsage::IO expects as >> input for >>> the -file option. >>> I tried to pass it a *.cut file generated by EMBOSS' >> cutgextract that >>> looks like this: >>> #Species: Oryza sativa >>> #Division: gbpln >>> #Release: CUTG >>> #CdsCount: 70050 >>> >>> #Coding GC 55.34% >>> #1st letter GC 58.41% >>> #2nd letter GC 46.34% >>> #3rd letter GC 61.29% >>> >>> #Codon AA Fraction Frequency Number >>> GCA A 0.185 17.382 431151 >>> >>> TGA * 0.435 1.228 30463 >>> >>> Looking into the _parse() method of Bio::CodonUsage::IO it appears >>> that the table resembles this kind of format but is actually not >>> exactly what it expects. My question is: how should it really look >>> like ? I could not find an example in t/data. >>> Any clues ? >>> Thanks, >>> Marc >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri Apr 21 20:59:43 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 22 Apr 2006 10:59:43 +1000 Subject: [Bioperl-l] Speed of bioperl mailing list distribution Message-ID: <44497FFF.7000801@infotech.monash.edu.au> Hi, I've noticed that we often get repeated posts on the Bioperl lists, as well as multiple duplicate responses that are hours apart in post time. I just thought I should mention that a post to bioperl-l (or an appearance in bioperl-guts-l after a CVS commit), from my point of view, can take up to 10 hours to appear, at best 30 minutes. I'm not sure where the delay is but I don't think it is at my end (?). So a new poster may post multiple times under the assumption that the first ones didn't work because the didn't appear in a timely fashion (for their definition of timely :-). And responders will respond because they haven't yet received the previous response which already answered the post. Just thought I'd mention it - any one else have this problem? -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From cjfields at uiuc.edu Fri Apr 21 22:05:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 21 Apr 2006 21:05:54 -0500 Subject: [Bioperl-l] Speed of bioperl mailing list distribution In-Reply-To: <44497FFF.7000801@infotech.monash.edu.au> Message-ID: <000301c665b1$497503a0$15327e82@pyrimidine> This problem was mentioned by Daniel Bornman earlier today (the one I sniped at for repeated posts). He sent three emails which didn't make it through until this afternoon. I originally thought it might be the changeover to the new server. However, personally, I haven't seen any problems today, though I have seen slowdowns before. A post to the list went through just fine this morning around the same time that Daniel sent an email to the list (my post to the list was at 10:27 AM and I received it through the list at 10:34 AM). It could be a server problem on his end. If your running into the same problem it could be coincidence, but it likely is something else. Bad timing? I also made CVS commits which showed up pretty quickly on bioperl-guts-l yesterday. I believe Chris D. mentioned at some point that the mail to the list goes through virus-checking and a spam-filters so that explains a slight delay, but not several hours. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Friday, April 21, 2006 8:00 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Speed of bioperl mailing list distribution > > Hi, > > I've noticed that we often get repeated posts on the Bioperl lists, > as well as multiple duplicate responses that are hours apart in post time. > > I just thought I should mention that a post to bioperl-l (or an appearance > in > bioperl-guts-l after a CVS commit), from my point of view, can take up to > 10 > hours to appear, at best 30 minutes. I'm not sure where the delay is but I > don't think it is at my end (?). > > So a new poster may post multiple times under the assumption that the > first > ones didn't work because the didn't appear in a timely fashion (for their > definition of timely :-). And responders will respond because they haven't > yet > received the previous response which already answered the post. > > Just thought I'd mention it - any one else have this problem? > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Sat Apr 22 09:11:53 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 22 Apr 2006 09:11:53 -0400 Subject: [Bioperl-l] Speed of bioperl mailing list distribution In-Reply-To: <000301c665b1$497503a0$15327e82@pyrimidine> References: <000301c665b1$497503a0$15327e82@pyrimidine> Message-ID: <64CE332B-6B2D-486E-A858-1610E3C460D5@duke.edu> (ChrisD will probably be offline for a few days with some server issues for his company.) One reason for the slowdown can be greylisting. See http:// en.wikipedia.org/wiki/Greylisting - We have enabled greylisting so mails to the list from new IP addresses will take longer the first time a message is sent because the server requires the sender's SMTP server to reconnect to the OBF server a second time. This cuts down dramatically on spam that is sent by transient IPs or by spammers and we feel that the short delay is better than the reams of spam we had to moderate in the past. The CVS commits come from an internal server so they will be already okayed without a retry. Posts from non-subscribers must wait in purgatory for a list admin to okay them via the mailman software. Currently myself and a few other people monitor this and try and authorize posts as soon as possible. There have been a few problems on the new mailing list and web server due mostly to it not having a lot of memory installed. This has usually been the delay when the machine locked up and for us to do our remote-reset of the box. You would have also noticed that the website was unreachable during that time. It appears that Chris has installed the new mem yesterday so we hope this will be less of an issue in the future but we are still trying to investigate whether there is a memory leak in some of the software or if this was all due to the low-memory conditions. This usual the problem with long mailing delays are a server problem and nothing particular that a user has to worry about. -jason On Apr 21, 2006, at 10:05 PM, Chris Fields wrote: > This problem was mentioned by Daniel Bornman earlier today (the one > I sniped > at for repeated posts). He sent three emails which didn't make it > through > until this afternoon. I originally thought it might be the > changeover to > the new server. However, personally, I haven't seen any problems > today, > though I have seen slowdowns before. A post to the list went > through just > fine this morning around the same time that Daniel sent an email to > the list > (my post to the list was at 10:27 AM and I received it through the > list at > 10:34 AM). It could be a server problem on his end. If your > running into > the same problem it could be coincidence, but it likely is > something else. > Bad timing? > > I also made CVS commits which showed up pretty quickly on bioperl- > guts-l > yesterday. > > I believe Chris D. mentioned at some point that the mail to the > list goes > through virus-checking and a spam-filters so that explains a slight > delay, > but not several hours. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >> Sent: Friday, April 21, 2006 8:00 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Speed of bioperl mailing list distribution >> >> Hi, >> >> I've noticed that we often get repeated posts on the Bioperl lists, >> as well as multiple duplicate responses that are hours apart in >> post time. >> >> I just thought I should mention that a post to bioperl-l (or an >> appearance >> in >> bioperl-guts-l after a CVS commit), from my point of view, can >> take up to >> 10 >> hours to appear, at best 30 minutes. I'm not sure where the delay >> is but I >> don't think it is at my end (?). >> >> So a new poster may post multiple times under the assumption that the >> first >> ones didn't work because the didn't appear in a timely fashion >> (for their >> definition of timely :-). And responders will respond because they >> haven't >> yet >> received the previous response which already answered the post. >> >> Just thought I'd mention it - any one else have this problem? >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> http://www.vicbioinformatics.com/ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From robertemurphy at hotmail.com Sun Apr 23 23:13:21 2006 From: robertemurphy at hotmail.com (Robert Murphy) Date: Sun, 23 Apr 2006 22:13:21 -0500 Subject: [Bioperl-l] BLAST and parsing question Message-ID: #!/usr/bin/perl; use strict; Hi, I have a sequence and db I made myself. I have to run blast on subsequences of length 20 and identify the ones that are unique to my sequence. Any suggestion on how to go about doing this? I can get blast to "blast" my sequence, at length 20, but I'm having lots of trouble with parsing. There are so many parameters to set up. Any suggstions on which ones I need to accomplish my goal. The current code I'm using gives me 1000's of OUT files. There has to be a better way than having to read get of the OUT files. Below is the code I'm using. Thank you in advance. # installed bioperl locally, use local bioperl lib #use lib="/home/murphyr/lib/"; use Bio::Seq; use Bio::SeqIO::fasta; use Bio::Tools::Blast; use Bio::Tools::Run::StandAloneBlast; use Bio::Tools::SeqStats; use Data::Dumper; print "Program to blast subsequences \n"; print "Enter File name of Sequence in FASTA Format \n"; my $inputfile = ; print "$inputfile \n"; my $inseq1 = Bio::SeqIO ->new('-file' => "$inputfile", '-format'=> 'fasta'); my $outseq1 = new Bio::SeqIO (-fh => \*STDOUT, -format => 'fasta'); my $seq1 = $inseq1 -> next_seq(); my $seq_stats1 = Bio::Tools::SeqStats->new(-seq=>$seq1); my $monomer_ref = $seq_stats1 -> count_monomers(); my @k = keys %$monomer_ref; my @v = values %$monomer_ref; print "@k\n"; print "@v\n"; my $mytotal = 0; foreach my $value (@v) { $mytotal = $mytotal + $value; } # print "$seq1 -> seq()\n"; my $did = $seq1 -> display_id(); print "$did\n"; print "Here \n"; # print $outseq1 -> write_seq($seq1); my $start_pos = 0; my $end_pos = 24; while ($end_pos != $mytotal) { $start_pos = $start_pos + 1; $end_pos = $end_pos + 1; my $subseq = $seq1 -> subseq($start_pos, $end_pos); print "$subseq \n"; print "Now doing blast on $subseq \n"; my @params = ('program' => 'blastn', 'database' => 'virus-db', 'outfile' => "$subseq.out"); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $subinput = Bio::Seq->new('-id'=>"sub test query", '-seq'=>"$subseq"); my $blast_report = $factory->blastall($subinput); # Get the report my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => "$subseq.out"); my $result = $searchio->next_result; # Get info about the entire report $result->database_name; my $algorithm_type = $result->algorithm; # get info about the first hit my $hit = $result->next_hit; my $hit_name = $hit->name ; # get info about the first hsp of the first hit my $hsp = $hit->next_hsp; my $hsp_start = $hsp->query->start; print "Now parsing the blast reports using SearchIO \n"; # # Not being selective here ... print the whole she-bang! ... # print "Algorithm Type \n"; print "$algorithm_type \n"; while ( (my $khit, my $vhit) = each %{$hit}) { print "$khit => $vhit \n"; } print Dumper(\%$hit); print "Hit Name \n"; print "$hit_name\n"; while ( (my $khit,my $vhit) = each %{$hsp}) { if ($khit eq "_evalue") { print "$khit => $vhit \n"; } } From torsten.seemann at infotech.monash.edu.au Mon Apr 24 00:28:36 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 24 Apr 2006 14:28:36 +1000 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: <000b01c6674d$0ac56850$0501a8c0@bert> References: <000b01c6674d$0ac56850$0501a8c0@bert> Message-ID: <1145852916.793.18.camel@chauvel.csse.monash.edu.au> > Hi, I have a sequence and db I made myself. I have to run blast on > subsequences of length 20 and identify the ones that are unique to my > sequence. Any suggestion on how to go about doing this? So you want all length 20 subsequences (derived using a sliding window from some set of sequences) which are do not appear in some other set of sequences (virus-db) ? -- Torsten Seemann Victorian Bioinformatics Consortium From ste.ghi at libero.it Mon Apr 24 09:48:24 2006 From: ste.ghi at libero.it (Stefano Ghignone) Date: Mon, 24 Apr 2006 15:48:24 +0200 Subject: [Bioperl-l] invalid species name error handling Message-ID: Hi all. I have a problem with a script that extracts the species name from GenPept flat files, querying the database with a list of acc.no. When it encounters an invalid species name (e.g. Cryptococcus neoformans var. neoformans JEC21), it exits definitely with an exception error. I don't know how to handle this error, and make the script pass to the next value. I suppose I have to use Bio::Root::Exception and Bio::Root::Root but I don't know how to start. Who can help me with some hints? stefano From arareko at campus.iztacala.unam.mx Mon Apr 24 10:17:51 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 24 Apr 2006 09:17:51 -0500 Subject: [Bioperl-l] invalid species name error handling In-Reply-To: References: Message-ID: <444CDE0F.5070309@campus.iztacala.unam.mx> Maybe I can help you. Can you post your script? Mauricio. Stefano Ghignone wrote: > Hi all. > I have a problem with a script that extracts the species name from GenPept flat files, querying the database with a list of acc.no. When it encounters an invalid species name (e.g. Cryptococcus neoformans var. neoformans JEC21), it exits definitely with an exception error. I don't know how to handle this error, and make the script pass to the next value. I suppose I have to use Bio::Root::Exception and Bio::Root::Root but I don't know how to start. > Who can help me with some hints? > stefano > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Mon Apr 24 13:58:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 24 Apr 2006 12:58:46 -0500 Subject: [Bioperl-l] RSS feed weirdness from the wiki Message-ID: <005201c667c8$bb8a21f0$15327e82@pyrimidine> Jason, I'm getting this error when trying the rss feed from the "Recent Changes" page on the wiki using Firefox 1.5.0.2. It looks like FireFox doesn't like the extra empty line before the xml declaration, which is a common problem apparently from checking (Googled it). I think this may be a Firefox issue with RSS feeds as I don't see it on (ick) IE on Windows. Haven't tried with Safari yet. XML Parsing Error: xml declaration not at start of external entity Location: http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss Line Number 2, Column 1: ^ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Mon Apr 24 14:21:43 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 24 Apr 2006 13:21:43 -0500 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: <005201c667c8$bb8a21f0$15327e82@pyrimidine> References: <005201c667c8$bb8a21f0$15327e82@pyrimidine> Message-ID: <444D1737.3030707@campus.iztacala.unam.mx> Seems to me like the problem is the new line character at the start of the XML files. The tag must be in the first line, not at the second. Are these files auto-generated from some template which may contain the error? Mauricio. Chris Fields wrote: > Jason, > > I'm getting this error when trying the rss feed from the "Recent Changes" > page on the wiki using Firefox 1.5.0.2. It looks like FireFox doesn't like > the extra empty line before the xml declaration, which is a common problem > apparently from checking (Googled it). I think this may be a Firefox issue > with RSS feeds as I don't see it on (ick) IE on Windows. Haven't tried with > Safari yet. > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > Line Number 2, Column 1: > > > ^ > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Mon Apr 24 15:06:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 24 Apr 2006 14:06:18 -0500 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: <444D1737.3030707@campus.iztacala.unam.mx> Message-ID: <005601c667d2$2b3b56a0$15327e82@pyrimidine> That's the problem. When I removed the extra line from the saved XML and threw it into my newsreader (Sage in Firefox) everything looks fine. Trying to access the feed directly in Sage gets the same XML error message as before. Directly accessing the RSS feed from Firefox (not using Sage) also gives the same error. The fact that it works in IE doesn't surprise me too much; probably looser in how strict XML is validated. When I Googled this before, it comes up as a common error associated with WordPress and plugins. http://www.stepanoff.org/wordpress/2006/01/03/be-careful-with-new-plugins/ I'm guessing that some XML validators (including those with Firefox) are so strict that docs containing extra empty space before the XML declaration are declared invalid. Maybe the plugin generating the output is to blame? Chris > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Monday, April 24, 2006 1:22 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Jason Stajich' > Subject: Re: [Bioperl-l] RSS feed weirdness from the wiki > > Seems to me like the problem is the new line character at the start of > the XML files. The tag must be in the first line, not at the > second. Are these files auto-generated from some template which may > contain the error? > > Mauricio. > > Chris Fields wrote: > > Jason, > > > > I'm getting this error when trying the rss feed from the "Recent > Changes" > > page on the wiki using Firefox 1.5.0.2. It looks like FireFox doesn't > like > > the extra empty line before the xml declaration, which is a common > problem > > apparently from checking (Googled it). I think this may be a Firefox > issue > > with RSS feeds as I don't see it on (ick) IE on Windows. Haven't tried > with > > Safari yet. > > > > XML Parsing Error: xml declaration not at start of external entity > > Location: > > http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > > Line Number 2, Column 1: > > > > > > ^ > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From hwang at uga.edu Mon Apr 24 15:02:02 2006 From: hwang at uga.edu (Haiming Wang) Date: Mon, 24 Apr 2006 15:02:02 -0400 Subject: [Bioperl-l] How to merge mulitple genbank records into one record Message-ID: <444D20AA.1050009@uga.edu> Hi, I am wondering if there is a script or tool can merge several genbank records into one record with all features' coordinates updated accordingly. For example, I have multiple Fugu scaffold_1 genbank files which are arbitrarily cut by 1000000 bps. I'd like to merge them into one big scaffold_1 genbank file. Thanks in advance! -Haiming p.s. example data genbank record 1: LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial sequence 1..1000000 reannotated via EnsEMBL ACCESSION scaffold:FUGU4:scaffold_1:1:1000000:1 ...... // genbank record 2: LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial sequence1000001..2000000 reannotated via EnsEMBL ACCESSION scaffold:FUGU4:scaffold_1:1000001:2000000:1 ...... // From jason.stajich at duke.edu Mon Apr 24 15:10:58 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 24 Apr 2006 15:10:58 -0400 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: <005201c667c8$bb8a21f0$15327e82@pyrimidine> References: <005201c667c8$bb8a21f0$15327e82@pyrimidine> Message-ID: hmm yes I see it too -- this seems to be a problem with the GeSHi syntax highlighting extension. Will let you know when I've got it fixed or will just disable syntax highlighting... -j On Apr 24, 2006, at 1:58 PM, Chris Fields wrote: > Jason, > > I'm getting this error when trying the rss feed from the "Recent > Changes" > page on the wiki using Firefox 1.5.0.2. It looks like FireFox > doesn't like > the extra empty line before the xml declaration, which is a common > problem > apparently from checking (Googled it). I think this may be a > Firefox issue > with RSS feeds as I don't see it on (ick) IE on Windows. Haven't > tried with > Safari yet. > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php? > title=Special:Recentchanges&feed=rss > Line Number 2, Column 1: > > > ^ > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From heikki at sanbi.ac.za Mon Apr 24 14:53:25 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 24 Apr 2006 20:53:25 +0200 Subject: [Bioperl-l] invalid species name error handling In-Reply-To: References: Message-ID: <200604242053.25395.heikki@sanbi.ac.za> Stefano, If you want to ignore the error, you need to put the offending line(s) inside an eval block. The following example is from 'perldoc -f eval': eval { $answer = $a / $b; }; warn $@ if $@; Note the ';' after the curly bracket closing teh eval block. -Heikki On Monday 24 April 2006 15:48, Stefano Ghignone wrote: > Hi all. > I have a problem with a script that extracts the species name from GenPept > flat files, querying the database with a list of acc.no. When it encounters > an invalid species name (e.g. Cryptococcus neoformans var. neoformans > JEC21), it exits definitely with an exception error. I don't know how to > handle this error, and make the script pass to the next value. I suppose I > have to use Bio::Root::Exception and Bio::Root::Root but I don't know how > to start. Who can help me with some hints? > stefano > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From osborne1 at optonline.net Mon Apr 24 16:35:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 24 Apr 2006 16:35:01 -0400 Subject: [Bioperl-l] How to merge mulitple genbank records into one record In-Reply-To: <444D20AA.1050009@uga.edu> Message-ID: Haiming, Do the locations of the features refer to the individual 1000000 bp sub-sequences or are they actually locations on the merged sequence, the "chromosome"? Brian O. On 4/24/06 3:02 PM, "Haiming Wang" wrote: > Hi, > > I am wondering if there is a script or tool can merge several genbank > records into one record with all features' coordinates updated > accordingly. For example, I have multiple Fugu scaffold_1 genbank files > which are arbitrarily cut by 1000000 bps. I'd like to merge them into > one big scaffold_1 genbank file. > > Thanks in advance! > > -Haiming > > p.s. example data > genbank record 1: > LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 > DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial sequence > 1..1000000 reannotated via EnsEMBL > ACCESSION scaffold:FUGU4:scaffold_1:1:1000000:1 > ...... > // > > genbank record 2: > LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 > DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial > sequence1000001..2000000 reannotated via EnsEMBL > ACCESSION scaffold:FUGU4:scaffold_1:1000001:2000000:1 > ...... > // > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hwang at uga.edu Mon Apr 24 16:50:10 2006 From: hwang at uga.edu (Haiming Wang) Date: Mon, 24 Apr 2006 16:50:10 -0400 Subject: [Bioperl-l] How to merge mulitple genbank records into one record In-Reply-To: References: Message-ID: <444D3A02.4080003@uga.edu> The locations of the features refer to the individual 1000000 bp sub-sequences. For example, in the second genbank record 'scaffold:FUGU4:scaffold_1:1000001:2000000:1', the location of a gene is 1760..4580. It is supposed to be 1001760..1004580 on the chromosome. Thanks -Haiming Brian Osborne wrote: > Haiming, > > Do the locations of the features refer to the individual 1000000 bp > sub-sequences or are they actually locations on the merged sequence, the > "chromosome"? > > Brian O. > > > On 4/24/06 3:02 PM, "Haiming Wang" wrote: > > >> Hi, >> >> I am wondering if there is a script or tool can merge several genbank >> records into one record with all features' coordinates updated >> accordingly. For example, I have multiple Fugu scaffold_1 genbank files >> which are arbitrarily cut by 1000000 bps. I'd like to merge them into >> one big scaffold_1 genbank file. >> >> Thanks in advance! >> >> -Haiming >> >> p.s. example data >> genbank record 1: >> LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 >> DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial sequence >> 1..1000000 reannotated via EnsEMBL >> ACCESSION scaffold:FUGU4:scaffold_1:1:1000000:1 >> ...... >> // >> >> genbank record 2: >> LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 >> DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial >> sequence1000001..2000000 reannotated via EnsEMBL >> ACCESSION scaffold:FUGU4:scaffold_1:1000001:2000000:1 >> ...... >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From hwang at uga.edu Mon Apr 24 20:59:40 2006 From: hwang at uga.edu (Haiming Wang) Date: Mon, 24 Apr 2006 20:59:40 -0400 Subject: [Bioperl-l] How to merge mulitple genbank records into one record In-Reply-To: <444D53FB.4090909@colibase.bham.ac.uk> References: <444D20AA.1050009@uga.edu> <444D53FB.4090909@colibase.bham.ac.uk> Message-ID: <444D747C.1000905@uga.edu> Hi Roy, Thanks a lot for pointing it out. This is exactly what I need and it works fine for me. Again, your helps are highly appreciated! -Haiming Roy Chaudhuri wrote: >> The locations of the features refer to the individual 1000000 bp >> sub-sequences. For example, in the second genbank record >> 'scaffold:FUGU4:scaffold_1:1000001:2000000:1', the location of a gene >> is 1760..4580. It is supposed to be 1001760..1004580 on the >> chromosome. >> > > Hi Haiming, > > a couple of months ago I wrote a cat function for Bio::Seq objects that > should do what you want- > http://bioperl.org/pipermail/bioperl-l/2006-January/020675.html > > I think Heikki commited it to the Bio::SeqUtils version in cvs. The > function will automatically adjust the coordinates so should work for > your situation. The syntax is: > Bio::SeqUtils->cat($firstseq, @otherseqs) > > Hope this helps. > Roy. > > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > From hwang at uga.edu Mon Apr 24 21:08:37 2006 From: hwang at uga.edu (Haiming Wang) Date: Mon, 24 Apr 2006 21:08:37 -0400 Subject: [Bioperl-l] How to merge mulitple genbank records into one record In-Reply-To: References: Message-ID: <444D7695.5070700@uga.edu> Hi Brian, Thanks for the quick reply. Dr. Roy Chaudhuri has suggested to use the cat function in Bio::SeqUtils to do the concatenation. It works well for me. Appreciate your efforts in looking into the question. Cheers, Haiming Brian Osborne wrote: > Haiming, > > Do the locations of the features refer to the individual 1000000 bp > sub-sequences or are they actually locations on the merged sequence, the > "chromosome"? > > Brian O. > > > On 4/24/06 3:02 PM, "Haiming Wang" wrote: > > >> Hi, >> >> I am wondering if there is a script or tool can merge several genbank >> records into one record with all features' coordinates updated >> accordingly. For example, I have multiple Fugu scaffold_1 genbank files >> which are arbitrarily cut by 1000000 bps. I'd like to merge them into >> one big scaffold_1 genbank file. >> >> Thanks in advance! >> >> -Haiming >> >> p.s. example data >> genbank record 1: >> LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 >> DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial sequence >> 1..1000000 reannotated via EnsEMBL >> ACCESSION scaffold:FUGU4:scaffold_1:1:1000000:1 >> ...... >> // >> >> genbank record 2: >> LOCUS scaffold_1 1000000 bp DNA HTG 8-FEB-2006 >> DEFINITION Fugu rubripes scaffold scaffold_1 FUGU4 partial >> sequence1000001..2000000 reannotated via EnsEMBL >> ACCESSION scaffold:FUGU4:scaffold_1:1000001:2000000:1 >> ...... >> // >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Tue Apr 25 00:06:33 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 25 Apr 2006 00:06:33 -0400 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: References: <005201c667c8$bb8a21f0$15327e82@pyrimidine> Message-ID: Should be fixed now, trailing newline in one of the php scripts was the culprit. Thanks for pointing that out. -jason On Apr 24, 2006, at 3:10 PM, Jason Stajich wrote: > hmm yes I see it too -- this seems to be a problem with the GeSHi > syntax highlighting extension. Will let you know when I've got it > fixed or will just disable syntax highlighting... > > -j > On Apr 24, 2006, at 1:58 PM, Chris Fields wrote: > >> Jason, >> >> I'm getting this error when trying the rss feed from the "Recent >> Changes" >> page on the wiki using Firefox 1.5.0.2. It looks like FireFox >> doesn't like >> the extra empty line before the xml declaration, which is a common >> problem >> apparently from checking (Googled it). I think this may be a >> Firefox issue >> with RSS feeds as I don't see it on (ick) IE on Windows. Haven't >> tried with >> Safari yet. >> >> XML Parsing Error: xml declaration not at start of external entity >> Location: >> http://www.bioperl.org/w/index.php? >> title=Special:Recentchanges&feed=rss >> Line Number 2, Column 1: >> >> >> ^ >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Tue Apr 25 01:55:52 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 25 Apr 2006 15:55:52 +1000 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: References: Message-ID: <444DB9E8.9080802@infotech.monash.edu.au> Robert, >> So you want all length 20 subsequences (derived using a sliding window >> from some set of sequences) which are do not appear in some other set of >> sequences (virus-db) ? > Yes, that's basically it. Find out which 20 unit long subsequences of > my sequence are not found in my database. Well, using BLAST is probably not the most appropriate tool for this problem as it will find 'high scoring' matches, not exact matches. Perhaps simply using Perl's "index()" function, which tests if one string is in another string, would be simpler? You could even concatenate all your database sequences into one big sequence, inserting 20 "N" (if DNA) or "X" (if nucletotide) between each (or any other char you don't have in your sequences). Then you could simply loop through your 20-length subsequences using the sliding window as before, and do a "index()" for each against the one big database string. If index() returns a negative value, it wasn't found. Hope this helps, Torsten Seemann. From robertemurphy at hotmail.com Mon Apr 24 10:12:09 2006 From: robertemurphy at hotmail.com (Robert Murphy) Date: Mon, 24 Apr 2006 14:12:09 +0000 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: <1145852916.793.18.camel@chauvel.csse.monash.edu.au> Message-ID: Yes, that's basically it. Find out which 20 unit long subsequences of my sequence are not found in my database. -Robert So you want all length 20 subsequences (derived using a sliding window from some set of sequences) which are do not appear in some other set of sequences (virus-db) ? -- Torsten Seemann Victorian Bioinformatics Consortium From arareko at campus.iztacala.unam.mx Tue Apr 25 02:40:17 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 25 Apr 2006 01:40:17 -0500 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: References: <005201c667c8$bb8a21f0$15327e82@pyrimidine> Message-ID: <444DC451.9010707@campus.iztacala.unam.mx> Yes, the XML files are now properly generated. Did you also disabled the syntax highlighting plugin? (it doesn't work in the HOWTO:Trees examples). Mauricio. Jason Stajich wrote: > Should be fixed now, trailing newline in one of the php scripts was > the culprit. > > Thanks for pointing that out. > > -jason > On Apr 24, 2006, at 3:10 PM, Jason Stajich wrote: > >> hmm yes I see it too -- this seems to be a problem with the GeSHi >> syntax highlighting extension. Will let you know when I've got it >> fixed or will just disable syntax highlighting... >> >> -j >> On Apr 24, 2006, at 1:58 PM, Chris Fields wrote: >> >>> Jason, >>> >>> I'm getting this error when trying the rss feed from the "Recent >>> Changes" >>> page on the wiki using Firefox 1.5.0.2. It looks like FireFox >>> doesn't like >>> the extra empty line before the xml declaration, which is a common >>> problem >>> apparently from checking (Googled it). I think this may be a >>> Firefox issue >>> with RSS feeds as I don't see it on (ick) IE on Windows. Haven't >>> tried with >>> Safari yet. >>> >>> XML Parsing Error: xml declaration not at start of external entity >>> Location: >>> http://www.bioperl.org/w/index.php? >>> title=Special:Recentchanges&feed=rss >>> Line Number 2, Column 1: >>> >>> >>> ^ >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Marc.Logghe at DEVGEN.com Tue Apr 25 02:58:56 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 25 Apr 2006 08:58:56 +0200 Subject: [Bioperl-l] BLAST and parsing question Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D18@ANTARESIA.be.devgen.com> Hi Robert, It should be possible to do it with blast also by fiddling with the blast options. You could set the word size to 20 as you are not interested in smaller hits for instance. Also, the hits are ranked so you easily can check the best hit (taking first line of tabular blast output) and check whether the hit length is 20 as well. Enlarging the word size and producing tabular output will definitely increase the speed. My $0.02 Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Robert Murphy > Sent: Monday, April 24, 2006 4:12 PM > To: torsten.seemann at infotech.monash.edu.au > Subject: Re: [Bioperl-l] BLAST and parsing question > > Yes, that's basically it. Find out which 20 unit long > subsequences of my sequence are not found in my database. > > -Robert > > > So you want all length 20 subsequences (derived using a > sliding window from some set of sequences) which are do not > appear in some other set of sequences (virus-db) ? > > -- > Torsten Seemann > Victorian Bioinformatics Consortium > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From reenayadav at gmail.com Tue Apr 25 04:48:45 2006 From: reenayadav at gmail.com (Reena Yadav) Date: Tue, 25 Apr 2006 14:18:45 +0530 Subject: [Bioperl-l] Installation problem Message-ID: <76f897dd0604250148v59357b9apeed2e2352e480c8c@mail.gmail.com> Linux: Bioperl was installed using root privileges. Tried using the first script, from: http://bioperl.org/Core/Latest/bptutorial.html#i_2_quick_getting_started_scripts Internet is connected and is accessible. I got the following reply. ------------- EXCEPTION ------------- MSG: WebDBSeqI Request Error: 501 Protocol scheme 'webproxyap1.inbg.astrazeneca.net' is not supported Content-Type: text/plain Client-Date: Tue, 25 Apr 2006 08:24:19 GMT Client-Warning: Internal response 501 Protocol scheme 'webproxyap1.inbg.astrazeneca.net' is not supported STACK Bio::DB::WebDBSeqI::_stream_request /usr/lib/perl5/site_perl/5.8.5/Bio/DB/WebDBSeqI.pm:728 STACK Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.5/Bio/DB/WebDBSeqI.pm:460 STACK Bio::DB::WebDBSeqI::get_Stream_by_id /usr/lib/perl5/site_perl/5.8.5/Bio/DB/WebDBSeqI.pm:287 STACK Bio::DB::WebDBSeqI::get_Seq_by_id /usr/lib/perl5/site_perl/5.8.5/Bio/DB/WebDBSeqI.pm:153 STACK Bio::Perl::get_sequence /usr/lib/perl5/site_perl/5.8.5/Bio/Perl.pm:511 STACK toplevel bp.pl:7 -------------------------------------- -------------------- WARNING --------------------- MSG: id (ROA1_HUMAN) does not exist --------------------------------------------------- You have a non object [] passed to write_sequence. It maybe that you want to use new_sequence to make this string into a sequence object? at /usr/lib/perl5/site_perl/5.8.5/Bio/Perl.pm line 283 Bio::Perl::write_sequence('>roa1.fasta', 'fasta', 'undef') called at bp.pl line 9 Win: activestate of perl was downloaded and installed. then in ppm.. ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm ppm> search Bioperl searching in active Repositories. no matches for 'Bioperl' ; see 'help search' Where am not going wrong..please point out the experiences. Reena Yadav. From jason.stajich at duke.edu Tue Apr 25 08:17:02 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 25 Apr 2006 08:17:02 -0400 Subject: [Bioperl-l] RSS feed weirdness from the wiki In-Reply-To: <444DC451.9010707@campus.iztacala.unam.mx> References: <005201c667c8$bb8a21f0$15327e82@pyrimidine> <444DC451.9010707@campus.iztacala.unam.mx> Message-ID: Apparently I had not quite tweaked the highlight script correctly - everything should be copacetic now. -j On Apr 25, 2006, at 2:40 AM, Mauricio Herrera Cuadra wrote: > Yes, the XML files are now properly generated. Did you also > disabled the syntax highlighting plugin? (it doesn't work in the > HOWTO:Trees examples). > > Mauricio. > > Jason Stajich wrote: >> Should be fixed now, trailing newline in one of the php scripts >> was the culprit. >> Thanks for pointing that out. >> -jason >> On Apr 24, 2006, at 3:10 PM, Jason Stajich wrote: >>> hmm yes I see it too -- this seems to be a problem with the GeSHi >>> syntax highlighting extension. Will let you know when I've got it >>> fixed or will just disable syntax highlighting... >>> >>> -j >>> On Apr 24, 2006, at 1:58 PM, Chris Fields wrote: >>> >>>> Jason, >>>> >>>> I'm getting this error when trying the rss feed from the "Recent >>>> Changes" >>>> page on the wiki using Firefox 1.5.0.2. It looks like FireFox >>>> doesn't like >>>> the extra empty line before the xml declaration, which is a common >>>> problem >>>> apparently from checking (Googled it). I think this may be a >>>> Firefox issue >>>> with RSS feeds as I don't see it on (ick) IE on Windows. Haven't >>>> tried with >>>> Safari yet. >>>> >>>> XML Parsing Error: xml declaration not at start of external entity >>>> Location: >>>> http://www.bioperl.org/w/index.php? >>>> title=Special:Recentchanges&feed=rss >>>> Line Number 2, Column 1: >>>> >>>> >>>> ^ >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Apr 25 11:47:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 25 Apr 2006 10:47:29 -0500 Subject: [Bioperl-l] ListSummaries for April 12-25 Message-ID: <006f01c6687f$8f41fe80$15327e82@pyrimidine> The newest summary of the BioPerl mailing lists has been posted to the wiki: http://www.bioperl.org/wiki/ListSummary:April_12-25%2C2006 Post gripes, harrassments, and faint praises here. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From iamvela at yahoo.com Tue Apr 25 12:07:36 2006 From: iamvela at yahoo.com (Raghunath Verabelli) Date: Tue, 25 Apr 2006 09:07:36 -0700 (PDT) Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: <000501c638a7$c2802630$15327e82@pyrimidine> Message-ID: <20060425160736.48121.qmail@web36612.mail.mud.yahoo.com> Hi All, I am looking for a blast program ('blastp') that I can run locally on windows platform instead of using NCBI's blast over web. Do you know any such programs? I checked Washington University's blast program, but looks like they do not support it on windows platform: http://blast.wustl.edu/blast/README.html#platforms Any other alternatives? Please let me know if you are aware of any 'blastp' program equivalent to WU blast for Windows. Thanks in advance, Raghu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From arareko at campus.iztacala.unam.mx Tue Apr 25 12:14:28 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 25 Apr 2006 11:14:28 -0500 Subject: [Bioperl-l] ListSummaries for April 12-25 In-Reply-To: <006f01c6687f$8f41fe80$15327e82@pyrimidine> References: <006f01c6687f$8f41fe80$15327e82@pyrimidine> Message-ID: <444E4AE4.1000004@campus.iztacala.unam.mx> Nice job Chris! :) Chris Fields wrote: > The newest summary of the BioPerl mailing lists has been posted to the wiki: > > http://www.bioperl.org/wiki/ListSummary:April_12-25%2C2006 > > Post gripes, harrassments, and faint praises here. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Tue Apr 25 12:14:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 25 Apr 2006 11:14:46 -0500 Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: <20060425160736.48121.qmail@web36612.mail.mud.yahoo.com> Message-ID: <007c01c66883$61f29490$15327e82@pyrimidine> Use the NCBI win32 binaries and either download your database from NCBI or build your own custom databases using formatdb: http://www.ncbi.nih.gov/BLAST/download.shtml More up-to-date help is available here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html Instructions on how to format your own databases is here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html Chris > -----Original Message----- > From: Raghunath Verabelli [mailto:iamvela at yahoo.com] > Sent: Tuesday, April 25, 2006 11:08 AM > To: Chris Fields; 'Jason Stajich' > Cc: bioperl-l at lists.open-bio.org > Subject: blast program to run locally on windows > > Hi All, > > I am looking for a blast program ('blastp') that I can > run locally on windows platform instead of using > NCBI's blast over web. > > Do you know any such programs? > > I checked Washington University's blast program, but > looks like they do not support it on windows platform: > http://blast.wustl.edu/blast/README.html#platforms > > > Any other alternatives? > > Please let me know if you are aware of any 'blastp' > program equivalent to WU blast for Windows. > > Thanks in advance, > Raghu > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From hlapp at gmx.net Tue Apr 25 13:14:52 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Apr 2006 10:14:52 -0700 Subject: [Bioperl-l] ListSummaries for April 12-25 In-Reply-To: <006f01c6687f$8f41fe80$15327e82@pyrimidine> References: <006f01c6687f$8f41fe80$15327e82@pyrimidine> Message-ID: <60D44839-E52A-420C-95CE-EA4B9DB674BE@gmx.net> BTW the OBO parser mentioned as me having committed is work by Sohel Merchant and I haven't announced it yet b/c Sohel is going to do some more reshuffling of code when he gets his CVS account, so I don't really want anyone using it quite yet just in case one of these impending changes causes an API change. -hilmar On Apr 25, 2006, at 8:47 AM, Chris Fields wrote: > The newest summary of the BioPerl mailing lists has been posted to > the wiki: > > http://www.bioperl.org/wiki/ListSummary:April_12-25%2C2006 > > Post gripes, harrassments, and faint praises here. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From roy at colibase.bham.ac.uk Tue Apr 25 12:45:31 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Tue, 25 Apr 2006 17:45:31 +0100 Subject: [Bioperl-l] ListSummaries for April 12-25 In-Reply-To: <006f01c6687f$8f41fe80$15327e82@pyrimidine> References: <006f01c6687f$8f41fe80$15327e82@pyrimidine> Message-ID: <444E522B.3030103@colibase.bham.ac.uk> > Post gripes, harrassments, and faint praises here. Slight gripe- I replied to Haiming Wang by pressing the reply all button, like a good little mailing-lister, so a copy should have gone to the list. I was sending from my home ISP, though, so I'm guessing the message is suffering from the greylisting problem that Jason mentioned. Still, plenty of non-faint praise, excellent job with the summary. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From robertemurphy at hotmail.com Tue Apr 25 13:14:37 2006 From: robertemurphy at hotmail.com (Robert Murphy) Date: Tue, 25 Apr 2006 17:14:37 +0000 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746D18@ANTARESIA.be.devgen.com> Message-ID: Question, is there a program that will search all my out files for just ones that have exact matches? That might just slove my problem. Thanks! From iamvela at yahoo.com Tue Apr 25 17:54:33 2006 From: iamvela at yahoo.com (Raghunath Verabelli) Date: Tue, 25 Apr 2006 14:54:33 -0700 (PDT) Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: <007c01c66883$61f29490$15327e82@pyrimidine> Message-ID: <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Thanks Chris. I have installed everything, but unable to get the blast hits from the database file. I get this message: BLASTP 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= target48 (115 letters) Database: temp 2,093,771 sequences; 757,160,754 total letters ***** No hits found ****** Database: temp Posted date: Nov 11, 2005 3:50 PM Number of letters in database: 757,160,754 Number of sequences in database: 2,093,771 I also see following error messages: [NULL_Caption] WARNING: [000.000] target48: Unable to open BLOSUM62 [NULL_Caption] WARNING: [000.000] target48: BlastScoreBlkMatFill returned non-zero status [NULL_Caption] WARNING: [000.000] target48: SetUpBlastSearch failed. Any ideas how to resolve this issue? thanks, Raghu --- Chris Fields wrote: > Use the NCBI win32 binaries and either download your > database from NCBI or > build your own custom databases using formatdb: > > http://www.ncbi.nih.gov/BLAST/download.shtml > > More up-to-date help is available here: > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html > > Instructions on how to format your own databases is > here: > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html > > Chris > > > -----Original Message----- > > From: Raghunath Verabelli > [mailto:iamvela at yahoo.com] > > Sent: Tuesday, April 25, 2006 11:08 AM > > To: Chris Fields; 'Jason Stajich' > > Cc: bioperl-l at lists.open-bio.org > > Subject: blast program to run locally on windows > > > > Hi All, > > > > I am looking for a blast program ('blastp') that I > can > > run locally on windows platform instead of using > > NCBI's blast over web. > > > > Do you know any such programs? > > > > I checked Washington University's blast program, > but > > looks like they do not support it on windows > platform: > > http://blast.wustl.edu/blast/README.html#platforms > > > > > > Any other alternatives? > > > > Please let me know if you are aware of any > 'blastp' > > program equivalent to WU blast for Windows. > > > > Thanks in advance, > > Raghu > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From torsten.seemann at infotech.monash.edu.au Tue Apr 25 19:55:30 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 26 Apr 2006 09:55:30 +1000 Subject: [Bioperl-l] Installation problem In-Reply-To: <76f897dd0604250148v59357b9apeed2e2352e480c8c@mail.gmail.com> References: <76f897dd0604250148v59357b9apeed2e2352e480c8c@mail.gmail.com> Message-ID: <1146009330.6028.1.camel@chauvel.csse.monash.edu.au> > http://bioperl.org/Core/Latest/bptutorial.html#i_2_quick_getting_started_scripts > Internet is connected and is accessible. > I got the following reply. > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 501 Protocol scheme 'webproxyap1.inbg.astrazeneca.net' is not supported Are you using a web proxy? If so, what is your $http_proxy environmental variable set to? -- Torsten Seemann Victorian Bioinformatics Consortium From torsten.seemann at infotech.monash.edu.au Tue Apr 25 19:57:18 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 26 Apr 2006 09:57:18 +1000 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: References: Message-ID: <1146009438.6028.2.camel@chauvel.csse.monash.edu.au> On Tue, 2006-04-25 at 17:14 +0000, Robert Murphy wrote: > Question, is there a program that will search all my out files for just ones > that have exact matches? That might just slove my problem. The EMBOSS set of tools comes with a few options, perhaps 'wordmatch' ? http://bioweb.pasteur.fr/docs/EMBOSS/wordmatch.html -- Torsten Seemann Victorian Bioinformatics Consortium From Marc.Logghe at DEVGEN.com Wed Apr 26 03:07:34 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 26 Apr 2006 09:07:34 +0200 Subject: [Bioperl-l] BLAST and parsing question Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D1E@ANTARESIA.be.devgen.com> Have you already looked into MUMmer ? I believe you even don't need a sliding window approach. Given the complete query sequence, the application will just show the stretches of exact matches. It is based on suffix trees. Disadvantage however, is that currently there is no possibility to store the indexes to the suffix trees, e.g. you can not 'formatdb' your sequences into a 'MUMmer database'. The suffix tree datastructures for the 'database' are generated in every run and stored into memory. Check it out at http://mummer.sourceforge.net/. Cheers, Marc Marc Logghe, PhD Expert Scientist Bioinformatics deVGen NV Technologiepark 30 B - 9052 Ghent-Zwijnaarde Tel. +32 9 324 24 83 Fax. +32 9 324 24 25 Web: www.devgen.com --- Disclaimer start --- This e-mail and any attachments thereto may contain information which is confidential and/or which is proprietary to the sender. Accordingly, this e-mail and any attachments thereto, as well as any and all information contained therein, are intended for the sole use of the recipient or recipients designated above. Any use of this e-mail, of any attachments thereto, of any and all information contained therein, and/or of any part(s) thereof (including, without limitation, total or partial reproduction, communication and/or distribution in any form) by persons other than the designated recipient(s) is prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you for your cooperation. --- Disclaimer end --- > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Torsten Seemann > Sent: Wednesday, April 26, 2006 1:57 AM > To: Robert Murphy > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] BLAST and parsing question > > On Tue, 2006-04-25 at 17:14 +0000, Robert Murphy wrote: > > Question, is there a program that will search all my out files for > > just ones that have exact matches? That might just slove > my problem. > > The EMBOSS set of tools comes with a few options, perhaps > 'wordmatch' ? > http://bioweb.pasteur.fr/docs/EMBOSS/wordmatch.html > > -- > Torsten Seemann > Victorian Bioinformatics Consortium > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robertemurphy at hotmail.com Wed Apr 26 11:33:53 2006 From: robertemurphy at hotmail.com (Robert Murphy) Date: Wed, 26 Apr 2006 10:33:53 -0500 Subject: [Bioperl-l] BLAST and parsing question In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746D1E@ANTARESIA.be.devgen.com> Message-ID: Thanks for all the great ideas. Now I just need to try them out! -Rob From mark_gosink at yahoo.com Wed Apr 26 20:15:29 2006 From: mark_gosink at yahoo.com (Mark Gosink) Date: Wed, 26 Apr 2006 20:15:29 -0400 Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff Message-ID: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> I'd like to reformat gene predictions from several different programs (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the output from these and other predictors and that it can export into GFF. But I'm not clear on how to string the two together. Can anyone point me at any example code? Thanks, Mark Mark Gosink Scripps Florida From richmond.todd at gmail.com Wed Apr 26 21:47:52 2006 From: richmond.todd at gmail.com (Todd Richmond) Date: Wed, 26 Apr 2006 20:47:52 -0500 Subject: [Bioperl-l] GI identifier missing when using Bio::Index::GenBank? Message-ID: <8f47ebe60604261847n340402e3h57e72bcf2dcc53a5@mail.gmail.com> I've got an application where I grab the daily updates from NCBI, pull out just the plant sequences and store them in a separate flat file. Then I use Bio::Index::GenBank to index the plant flat file so I can pull out my sequences of interest. I'm in the midst of converting my scripts to using bioperl-db/biosql so I can push those sequences into the database. The problem is that the NCBI GI identifier isn't returned when using the index file. When I run the following test script: *** use Bio::Index::GenBank; use Bio::SeqIO; use strict; my $Index_File_Name = 'nc0425.idx'; my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name); my $seqio = new Bio::SeqIO( '-format' => 'genbank' ); my $seq = $inx->get_Seq_by_acc('CJ521890'); $seqio->write_seq($seq); *** Diffing to the original GenBank record, the only difference is the GI identifier: diff CJ521890_orig.out CJ521890_seqio.out 5c5 < VERSION CJ521890.1 GI:93266243 --- > VERSION CJ521890.1 Is this expected behaviour? If so, is there a workaround that will allow me to retrieve the GI from the index file so I can store it in the bioentry table? Thanks, Todd -- Todd Richmond richmond.todd at gmail.com From osborne1 at optonline.net Thu Apr 27 12:29:43 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 27 Apr 2006 12:29:43 -0400 Subject: [Bioperl-l] GI identifier missing when using Bio::Index::GenBank? In-Reply-To: <8f47ebe60604261847n340402e3h57e72bcf2dcc53a5@mail.gmail.com> Message-ID: Todd, Can't you go directly from the daily update to the database? Brian O. On 4/26/06 9:47 PM, "Todd Richmond" wrote: > I've got an application where I grab the daily updates from NCBI, pull > out just the plant sequences and store them in a separate flat file. > Then I use Bio::Index::GenBank to index the plant flat file so I can > pull out my sequences of interest. I'm in the midst of converting my > scripts to using bioperl-db/biosql so I can push those sequences into > the database. The problem is that the NCBI GI identifier isn't > returned when using the index file. > > When I run the following test script: > *** > use Bio::Index::GenBank; > use Bio::SeqIO; > use strict; > my $Index_File_Name = 'nc0425.idx'; > my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name); > > my $seqio = new Bio::SeqIO( '-format' => 'genbank' ); > my $seq = $inx->get_Seq_by_acc('CJ521890'); > $seqio->write_seq($seq); > *** > > Diffing to the original GenBank record, the only difference is the GI > identifier: > > diff CJ521890_orig.out CJ521890_seqio.out > 5c5 > < VERSION CJ521890.1 GI:93266243 > --- >> VERSION CJ521890.1 > > Is this expected behaviour? If so, is there a workaround that will > allow me to retrieve the GI from the index file so I can store it in > the bioentry table? > > Thanks, Todd > > > -- > Todd Richmond > richmond.todd at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu Apr 27 14:04:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 27 Apr 2006 14:04:18 -0400 Subject: [Bioperl-l] GI identifier missing when using Bio::Index::GenBank? In-Reply-To: <8f47ebe60604271056v7074c52atb1a86b2debb78280@mail.gmail.com> Message-ID: Todd, No, I don't think so, I think this is a bug. Can you put this into Bugzilla along with that Genbank file, CJ521890, that shows it? Then I'll take a closer look... Brian O. On 4/27/06 1:56 PM, "Todd Richmond" wrote: > I could, but I don't want to store all that information. For instance, > in the past two weeks, 387000 plant sequences have been added to > GenBank. I'm interested in storing complete information for the ~600 > sequences from that set that are related to the gene families I'm > interested in. > > I can certainly come up with a workaround myself by implementing a > hash of accession/gi numbers or modifiying the load script supplied by > bioperl to accept a list of accession numbers as a filter. I was just > wondering if I'm missing something obvious... > > Todd > > > On 4/27/06, Brian Osborne wrote: >> Todd, >> >> Can't you go directly from the daily update to the database? >> >> Brian O. >> >> >> On 4/26/06 9:47 PM, "Todd Richmond" wrote: >> >>> I've got an application where I grab the daily updates from NCBI, pull >>> out just the plant sequences and store them in a separate flat file. >>> Then I use Bio::Index::GenBank to index the plant flat file so I can >>> pull out my sequences of interest. I'm in the midst of converting my >>> scripts to using bioperl-db/biosql so I can push those sequences into >>> the database. The problem is that the NCBI GI identifier isn't >>> returned when using the index file. >>> >>> When I run the following test script: >>> *** >>> use Bio::Index::GenBank; >>> use Bio::SeqIO; >>> use strict; >>> my $Index_File_Name = 'nc0425.idx'; >>> my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name); >>> >>> my $seqio = new Bio::SeqIO( '-format' => 'genbank' ); >>> my $seq = $inx->get_Seq_by_acc('CJ521890'); >>> $seqio->write_seq($seq); >>> *** >>> >>> Diffing to the original GenBank record, the only difference is the GI >>> identifier: >>> >>> diff CJ521890_orig.out CJ521890_seqio.out >>> 5c5 >>> < VERSION CJ521890.1 GI:93266243 >>> --- >>>> VERSION CJ521890.1 >>> >>> Is this expected behaviour? If so, is there a workaround that will >>> allow me to retrieve the GI from the index file so I can store it in >>> the bioentry table? >>> >>> Thanks, Todd >>> >>> >>> -- >>> Todd Richmond >>> richmond.todd at gmail.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Todd Richmond > richmond.todd at gmail.com From richmond.todd at gmail.com Thu Apr 27 13:56:31 2006 From: richmond.todd at gmail.com (Todd Richmond) Date: Thu, 27 Apr 2006 12:56:31 -0500 Subject: [Bioperl-l] GI identifier missing when using Bio::Index::GenBank? In-Reply-To: References: <8f47ebe60604261847n340402e3h57e72bcf2dcc53a5@mail.gmail.com> Message-ID: <8f47ebe60604271056v7074c52atb1a86b2debb78280@mail.gmail.com> I could, but I don't want to store all that information. For instance, in the past two weeks, 387000 plant sequences have been added to GenBank. I'm interested in storing complete information for the ~600 sequences from that set that are related to the gene families I'm interested in. I can certainly come up with a workaround myself by implementing a hash of accession/gi numbers or modifiying the load script supplied by bioperl to accept a list of accession numbers as a filter. I was just wondering if I'm missing something obvious... Todd On 4/27/06, Brian Osborne wrote: > Todd, > > Can't you go directly from the daily update to the database? > > Brian O. > > > On 4/26/06 9:47 PM, "Todd Richmond" wrote: > > > I've got an application where I grab the daily updates from NCBI, pull > > out just the plant sequences and store them in a separate flat file. > > Then I use Bio::Index::GenBank to index the plant flat file so I can > > pull out my sequences of interest. I'm in the midst of converting my > > scripts to using bioperl-db/biosql so I can push those sequences into > > the database. The problem is that the NCBI GI identifier isn't > > returned when using the index file. > > > > When I run the following test script: > > *** > > use Bio::Index::GenBank; > > use Bio::SeqIO; > > use strict; > > my $Index_File_Name = 'nc0425.idx'; > > my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name); > > > > my $seqio = new Bio::SeqIO( '-format' => 'genbank' ); > > my $seq = $inx->get_Seq_by_acc('CJ521890'); > > $seqio->write_seq($seq); > > *** > > > > Diffing to the original GenBank record, the only difference is the GI > > identifier: > > > > diff CJ521890_orig.out CJ521890_seqio.out > > 5c5 > > < VERSION CJ521890.1 GI:93266243 > > --- > >> VERSION CJ521890.1 > > > > Is this expected behaviour? If so, is there a workaround that will > > allow me to retrieve the GI from the index file so I can store it in > > the bioentry table? > > > > Thanks, Todd > > > > > > -- > > Todd Richmond > > richmond.todd at gmail.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Todd Richmond richmond.todd at gmail.com From pjm at sanger.ac.uk Fri Apr 28 06:41:03 2006 From: pjm at sanger.ac.uk (Paul Mooney) Date: Fri, 28 Apr 2006 11:41:03 +0100 Subject: [Bioperl-l] Bio::Tools::GFF Message-ID: <1ae5268519a170870d2de3daa52ed163@sanger.ac.uk> Hi, I've modified one line in my local copy of Bio::Tools::GFF so it takes into account *all* of the special tags defined here; http://song.sourceforge.net/gff3.shtml Current CVS version; $tag= lcfirst($tag) unless ($tag =~/^ID|Name|Alias|Parent|Gap|Target$/); My version; $tag= lcfirst($tag) unless ($tag =~/ ^ID|Name|Alias|Parent|Gap|Target|Derives_from|Note|Dbxref|Ontology_term$ /); Could someone update cvs please? Paul. From cjfields at uiuc.edu Fri Apr 28 09:03:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 28 Apr 2006 08:03:36 -0500 Subject: [Bioperl-l] Bio::Tools::GFF In-Reply-To: <1ae5268519a170870d2de3daa52ed163@sanger.ac.uk> References: <1ae5268519a170870d2de3daa52ed163@sanger.ac.uk> Message-ID: <70C6614D-27AF-4B5D-A4FB-19981F97F05B@uiuc.edu> Lincoln's working on GFF3 integration into BioPerl at the moment which will not use Bio::DB::GFF, so there's a bit of reluctance at the moment to commit this unless Lincoln signals that it's fine. http://article.gmane.org/gmane.comp.lang.perl.bio.general/10691/match= http://www.bioperl.org/wiki/ListSummary:April_12-25%2C2006#Bioperl- guts_.28for_the_die-hards.29 If you look in CVS in Bio::DB, several new modules have been added (also in the ListSummary above). You could try those out but I have no idea what kind of working condition they are in and how stable the API is at the moment. You could email the GMOD lists as well (I get the GBrowse, which Scott Cain and Lincoln regularly post to). Info here: https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse Chris On Apr 28, 2006, at 5:41 AM, Paul Mooney wrote: > Hi, > > I've modified one line in my local copy of Bio::Tools::GFF so it takes > into account *all* of the special tags defined here; > http://song.sourceforge.net/gff3.shtml > > Current CVS version; > $tag= lcfirst($tag) unless ($tag =~/^ID|Name|Alias|Parent|Gap|Target > $/); > > My version; > $tag= lcfirst($tag) unless ($tag > =~/ > ^ID|Name|Alias|Parent|Gap|Target|Derives_from|Note|Dbxref| > Ontology_term$ > /); > > Could someone update cvs please? > Paul. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cain at cshl.edu Fri Apr 28 09:14:35 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 28 Apr 2006 09:14:35 -0400 Subject: [Bioperl-l] Bio::Tools::GFF In-Reply-To: <70C6614D-27AF-4B5D-A4FB-19981F97F05B@uiuc.edu> References: <1ae5268519a170870d2de3daa52ed163@sanger.ac.uk> <70C6614D-27AF-4B5D-A4FB-19981F97F05B@uiuc.edu> Message-ID: <1146230076.2770.219.camel@localhost.localdomain> Hi Chris and Paul, I committed the change--Paul is right, the list was really out of date. Thanks for pointing it out. Scott On Fri, 2006-04-28 at 08:03 -0500, Chris Fields wrote: > Lincoln's working on GFF3 integration into BioPerl at the moment > which will not use Bio::DB::GFF, so there's a bit of reluctance at > the moment to commit this unless Lincoln signals that it's fine. > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/10691/match= > > http://www.bioperl.org/wiki/ListSummary:April_12-25%2C2006#Bioperl- > guts_.28for_the_die-hards.29 > > If you look in CVS in Bio::DB, several new modules have been added > (also in the ListSummary above). You could try those out but I have > no idea what kind of working condition they are in and how stable the > API is at the moment. > > You could email the GMOD lists as well (I get the GBrowse, which > Scott Cain and Lincoln regularly post to). Info here: > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > Chris > > On Apr 28, 2006, at 5:41 AM, Paul Mooney wrote: > > > Hi, > > > > I've modified one line in my local copy of Bio::Tools::GFF so it takes > > into account *all* of the special tags defined here; > > http://song.sourceforge.net/gff3.shtml > > > > Current CVS version; > > $tag= lcfirst($tag) unless ($tag =~/^ID|Name|Alias|Parent|Gap|Target > > $/); > > > > My version; > > $tag= lcfirst($tag) unless ($tag > > =~/ > > ^ID|Name|Alias|Parent|Gap|Target|Derives_from|Note|Dbxref| > > Ontology_term$ > > /); > > > > Could someone update cvs please? > > Paul. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory